Combining All Available Clinical Outcomes on Cervical Disc Arthroplasty: A Systematic Review and Meta-Analysis

Background: Reviews of total disc arthroplasty (TDA) performance have focused on prospective randomized controlled trials (RCTs), excluding potentially important clinical information reported by others. The goal of the present study was to perform a comprehensive review, including both RCTs and non-randomized cohorts with more than five years of clinical outcome. We further explored the differences in outcome between prospective RCT and non-randomized, including retrospective studies. Methods: A systematic literature review was performed following PRISMA guidelines. Inclusion criteria were: clinical follow-up ≥ 5 years with quantitative clinical and radiographic outcome. All studies that met these criteria, including retrospective and non-randomized studies, were included, for a total of 62 studies. As anterior cervical discectomies and fusion (ACDF) was included as a control group in the majority of the studies, comparisons between TDA and ACDF were conducted.


Introduction
Total disc arthroplasty (TDA) for the cervical spine was introduced with the promise of preservation of motion and alleviation of pain, while minimizing the likelihood of developing adjacent segment degeneration, a common complication following anterior cervical discectomy and fusion [1][2][3] . Given that TDA is still a relatively new technology, long-term outcome studies are necessary to understand the overall clinical performance. Several reviews have reported short-term success for a variety of cervical TDA [4][5][6][7] . Further, some recent studies have presented the combined findings for longer outcomes, ranging from 4-10 years; however, these studies have included only prospective randomized controlled trials (RCT), typically funded by industry, excluding data from thousands of patients in dozens of articles, reported in retrospective and nonrandomized studies [8][9][10][11] .
While randomized controlled trials are generally considered to be the most objective way to evaluate an intervention, relying only on these studies may severely compromise, if not bias, a systematic review 12 . Further, as most RCTs are conducted for regulatory approval, patient selection and inclusion tends to be carefully monitored. This is due largely to the fact that RCTs are costly, limiting clinical trials to large academic centers, typically with substantial industry support. Consequently, the largest of previous published systematic reviews included data from only eleven centers, while clinical use, particularly in the global setting, has become far more widespread 9 .
In the present study, we provide a comprehensive overview of all available quantitative outcome data for cervical TDA patients with more than five years of followup. The goals were to 1) to compare outcomes between randomized and non-randomized studies and 2) combine the outcome of all studies, regardless of whether they were randomized. Accordingly, we expanded the inclusion criteria used previously in other studies by including non-randomized prospective studies, retrospective radiographic reviews, and registry data, to gain a more balanced global perspective on the experience to date with cervical arthroplasty. Outcome variables included: reoperation rates, adjacent segment degeneration, heterotopic ossification, range of motion, and clinical outcome scores.

Literature Search and Selection Criteria
Two of the authors (C.J.B. and J.M.W.) systematically searched electronic databases following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for this study between June and September 2020 13 . A comprehensive search of the PubMed, Google Scholar, and Medline databases was conducted for studies related to TDA. The keyword search terms used were "cervical disc replacement/arthroplasty," "long-term outcome," "radiographic," "reoperation," "heterotopic ossification," or "adjacent segment degeneration". As adjacent segment degeneration and heterotopic ossification have been heavily studied and commonly reported in current TDA literature, these were included in the search terms, as well.
To be eligible for the systematic review the articles had to: 1) have follow-up data at ≥ 5 years for TDAs, 2) have data for reoperation rates, and partial or complete data for the following: range of motion in flexion/extension (ROM), adjacent segment degeneration (ASD), heterotopic ossification (HO), and/or clinical outcome scores, 3) use radiographic images to quantify ROM, ASD, and/or HO.

Data Extraction
The following categories of data were extracted from each article that met the criteria: 1) general information such as author, date, type of study, number of participants, follow-up rate, device type, and distribution of surgical level, 2) data on experimental design such as key methods and inclusion/exclusion criteria, 3) overall outcome such as reoperation rates, ROM, ASD, HO, as well as any clinical outcome scores, adverse events, and histopathology.
Single-level only studies generally reported index-level and full cervical spine ROM (cROM). Multiple/unspecified level studies reported superior and inferior level ROM as well as cROM. The majority of studies classified HO according to the McAfee classification which uses a scale from 0-4 with grade 0 being no HO and grade 4 being extreme with a spontaneous fusion and complete loss of mobility 14 .

Statistical Analysis
Incidence rates for dichotomous variables such as adjacent segment degeneration reoperations were calculated using the patient population size of each as a fixed variable in the JBI System for Unified Management, Assessment and Review of Information Software Version 5.0 (JBI, Adelaide, Australia). Odds ratios, and 95% confidence intervals were calculated for these variables using the Mantel-Haenzel statistical method in the JBI Software, as well.
For remaining comparisons, preoperative and postoperative weighted averages were calculated using the number of patients reported and their respective average value, then combining those values and taking the overall average with the total patients, using SPSS Version 19.0 (IBM, Inc., Houston, Texas). Preoperative averages used the number of patients at the beginning of the experiment while postoperative values were taken using the number of patients at the final follow-up based on the follow-up rate reported. Duplicate studies were noted and the study with the longest follow-up time was included in the data analyses, excluding the duplicate. All studies that included a control used ACDF patients as a control group, so this was included throughout the analyses. Comparisons in this review were analyzed using either a Fisher exact test for categorical variables or a t-test for continuous variables.
identified as possible clinical TDA studies. From these, 164 were removed due to having follow-up periods shorter than 5 years. Then, 21 articles were removed due to lack of data on quantitative outcomes such as adjacent segment degeneration, heterotopic ossification, reoperation rates, or being a case study. This left a total of 62 articles to be reviewed in the present study ( Figure 1

Study characteristics
The literature identified included prospective randomized and nonrandomized controlled trials, comparative studies, retrospective studies, and blinded and unblinded studies (Table 1). Common patient inclusion criteria included degenerative disc disease, radiculopathy, myelopathy, and failed response to non-operative treatment. Common exclusion criteria were multi-level surgery, immobility, or prior cervical spine surgery.
All articles reported on one or more of the following outcome variables: reoperation rates, ROM, ASD, HO, or clinical outcome scores. All other articles were long-term radiographic reviews, with follow-up from 5 to 30 years. The age in individual studies ranged from 35 to 57. The combined mean age was 45.2 ± 5.3 for the TDA group and 48.4 ± 3.5 for the ACDF group, with the majority of the studies age-matched.
Overall, 7,910 patients received a TDA and 8,353 patients received an ACDF that were included in this analysis. Twenty-eight of the included studies were prospective RCTs and the remaining forty were not randomized and included retrospective, and non-randomized studies (Table  1). For brevity and clarity, all prospective RCT studies will be referred to as randomized studies and all other studies will be referred to as nonrandomized studies throughout the remainder of this paper. All 62 studies included data on TDA 1,2,15-65 and 33 studies included data on ACDF ( Table  1). A total of 50 articles specified the level operated on. The most common level for both TDA and ACDF was C5/ C6 at 51% for TDA and 50% for ACDF. The second most levels treated were C6/C7 at 34% for TDA and 35% for ACDF. Therefore, the majority of data presented is known to pertain to treatment at those two levels.

Reoperation
Secondary procedures were reported as: reoperation for any reason, reoperation at the index level, reoperation at the adjacent level, removal of the device, revision of the device, or supplemental fixation. All secondary procedure values were statistically different between the randomized studies and the non-randomized studies. Overall secondary surgery was performed in 5.4% of patients in randomized studies (132/ 2,129) and 7.5% of patients in non-randomized studies (74/ 754) (P<0.01). Reoperation at the adjacent level was 4.3% in randomized studies and 6.1% in non-randomized studies (P<0.001). Reoperation at the index level was 2.6% in randomized studies and 4.4% in non-randomized studies (P<0.001) (Figure 2a). * = nonrandomized cohort if reported separately, ** = randomized cohort if reported separately, TDR, total disc replacement group; ACDF, anterior cervical discectomy and fusion group; N/A, not available; RCT, randomized controlled trial; SD, standard deviation  The combined rates of reoperation for any reason for TDA patients was 5.6% and for ACDF patients was 7.8% (P=0.06; OR=0.48; CI=0.39, 0.60) (Figure 3). Reoperation was defined as any procedure at the index level or adjacent level that does not remove, modify, or add to the original implant. Removal surgery removed one or all components of the original implant. Revision involved the modification of the original implant without removal. Supplemental fixation occurred if nonunion occurs, typically supplemental fixation is an additional posterior fusion approach. All these secondary procedure rates were reported for TDA and ACDF surgeries ( Table 2 and Appendix).

Preservation of Motion
The combined average index level preoperative range of motion (ROM) in flexion/extension for randomized studies was 8. . As expected, ACDF patients had a preoperative ROM of 7.8° with a postoperative reduction to 0.8° (P<0.001) ( Table 3). Of the articles that reported treatment level and were included in the ROM calculations, levels C5/C6 and C6/C7 were the most frequent index level. For TDA patients, 50% of patients had a C5/C6 arthroplasty and 35% of patients had a C6/C7 arthroplasty. These results were mimicked with ACDF patients-49% of patients had a C5/ C6 fusion and 36% of patients had a C6/C7 fusion. The full cervical spine ROM for TDA patients was 43.7° increasing slightly after surgery to 45.1°. For ACDF patients, the full cervical spine ROM was 39.2° decreasing postoperatively to 32.2° (P<0.001).

Adjacent Segment Degeneration
Twenty-six studies included data on adjacent segment degeneration (ASD) 17 . Among these studies, when randomized and non-randomized studies were compared, the difference in the number of patients with ASD and without ASD was significantly different (P<0.02). Specifically, randomized studies reported the presence of ASD in 17.1% of patients (227/ 1167) and nonrandomized studies reported the presence of ASD in 24.2% of patients (265/ 1128) (Figure 2c).
Among these studies TDAs were also compared to ACDF. Overall, the reported incidence of ASD in patients with TDA was 26.2%, and in patients with ACDF was 43.9%, (P<0.01; OR=0.35; CI=0.39,0.64) (Figure 4). While some studies specified the location of ASD as superior and/or inferior, in the present analysis, the location of ASD did not differ widely among TDA patients (superior = 30.6% v. inferior = 30.5%) or among ACDF patients (superior = 68.4% v. inferior = 62.2%) ( Table 4). Heterotopic ossification was reported in TDA patients as a grade (0 through 4) according to the McAfee classification system or as absent versus present. When randomized and non-randomized studies were compared, the presence of heterotopic ossification was significantly different (P<0.01). In randomized studies, the absence of HO was reported in 35 patients of 100 (35%) of patients. In nonrandomized studies, the absence of HO was reported in 244 patients of 1,133 (21.5%). In randomized studies, in grade 1 HO was reported in 26.5% of patients, grade 2 in 33.8% of patients, grade 3 in 16.8% of patients and grade 4 in 11.3% of patients. In non-randomized studies grade 1 was reported in 22% of patients, grade 2 in 29.5% of patients, grade 3 in 16.6% of patients and grade 4 in 12.5% of patients (Figure 2d). Grade 1 and grade 2 are considered not clinically relevant while grade 4 is a severe, symptomatic presentation of HO.

Heterotopic Ossification
Including all studies, a total of 2,762 TDA patients had HO reported as absent or present. The HO absence rate was 56.0% (1,548/2,761), meaning a majority of patients
Both randomized and non-randomized studies reported improved clinical outcome scores for NDI, VAS neck and/or arm pain, and SF-36 PCS or MCS (P<0.001). No randomized studies reported JOA scores; therefore, this comparison was not included. Interestingly, all non-randomized studies had lower post-operative scores for NDI and VAS arm/neck than randomized studies ( Figure 5). All clinical outcomes improved significantly from baseline in both TDA and ACDF groups (P<0.001) ( Table 5).     received either a disc arthroplasty or fusion. Specifically, the rate of secondary surgeries at the index level does not show a significant difference between ACDF and TDA patients. Overall, the success rates in this systematic review show very different results than those of the randomized controlled trials, further validating the need to examine all possible data to gain a broad understanding of implant success in the general population 9 . Finally, compared to previous publications, the present study provided a more thorough analysis of the specific complications involved in TDA, such as breaking down reoperations into categories and reporting adjacent segment degeneration by the level affected.

Comparison of RCT to other studies
Most outcomes were significantly different between the reported patient averages of randomized studies and nonrandomized studies, with major outcomes showing better success in randomized studies. Specifically, the overall variables of most interest to this review that showed differences, favoring randomized studies, were reoperation rates, adjacent segment degeneration, and heterotopic ossification. The grades of HO were varying between being significantly different; however, randomized studies reported more patients with an absence of any HO or with non-clinically relevant HO (grades 1 and 2) and significantly less patients with severe HO, grade 4, than non-randomized studies. Further, all secondary surgery rates and incidence of ASD is significantly lower in randomized studies. This data further supports the need for comprehensive analysis

Discussion
In the present study, the findings from 62 peerreviewed manuscripts that reported quantitative data with a minimum follow-up of five years were reviewed and evaluated to assess the overall performance of cervical disc arthroplasty to date. In previous systematic reviews of cervical TDA outcome, only randomized controlled trials were included, resulting in a limited and potentially biased scope of investigation. In contrast, in the present study, by including retrospective and non-randomized studies, we were able to include an additional 57 publications, and five-thousand additional patients. A number of articles in the orthopaedic literature as well as other medical subspecialties have addressed the potential limitations and short-comings of including only prospective randomized studies when making evidence-based conclusions 12,[77][78][79] .
While the results of the present study do not directly contradict previous systematic reviews comparing TDA and ACDF, our study provides original findings in four different aspects of TDA outcome. First, we were able to compare results of the included prospective RCT studies and the remaining non-randomized studies. From this, we showed the importance of utilizing all available data to understand the clinical outcomes of the general population. Additionally, as we intended, we were able to assess TDA outcome at a higher length of follow-up than previous systematic reviews and meta-analyses. Third, our results show a narrower margin of difference in the outcome of patients who were eligible for TDAs, but of all available studies to gain a broad understanding of potential complications. The use of only prospective, randomized controlled trials may bias the literature and lead to large complications not being further addressed. The variables of most interest from all included studies are further discussed below.

Reoperation Rate
Overall, the combined rates of reoperation for any reason for TDA and ACDF were 5.6% and 7.8%, respectively (P=0.06). However, while many studies included in this systematic review reported significantly lower TDA secondary surgery rates 1,19,21,30,[41][42][43]60,65 , many also reported lower rates in ACDF, or insignificant differences between the two groups 16,20,23,27,33,43,53,55 . This may be due to differences in follow-up times, patient inclusion criteria, or limited ACDF patient data for comparison to TDA. Reoperation rates at the adjacent level were similar between patients with TDA and patients with ACDF (4.8% v. 5.8%, P=0.67) ( Table 2). Several of the studies included in the present analysis reported significantly lower adjacent level surgeries for TDA patients, as compared to ACDF patients 2,29,30,35,41,42,57 . In contrast, others reported that there was no difference in adjacent level surgeries 1,16,17,21,23,59 . This suggested that the motion preserving quality of TDA may not reduce the need for adjacent level surgeries, as intended. However, the removal rate at the index level between TDA and ACDF was statistically significant (P=0.04). Further, the revision and supplemental fixation rates were also significantly different between ACDF and TDA, favoring TDA patients ( Table 2). This indicated all additional surgical intervention categories should be compared and assessed when comparing overall outcome of TDA. Accordingly, TDA patients appeared to have an overall favorable reoperation outcome when compared to ACDF patients.
There was some question of validity for reoperation rates as a significant long-term efficacy metric. The decision to operate could be considered highly subjective and dependent on the surgeon. However, this point is often refuted using the fact that reoperation rate is a dichotomous variable that requires significant symptomatic signs to move forward with surgery 33 . To demonstrate the efficacy of reoperation as a metric more studies should be done outside of the context of FDA IDE approval trials to determine the influence of surgical bias.

Preservation of Motion
As expected, range of motion, both at the index level, and for the cervical spine as a whole, was larger for patients with TDA, when compared to fusion. As C5/C6 and C6/C7 made up over 80% of the data reported, the results of the present review may be more representative of those levels and range of motion at the preceding levels could have a different outcome. Intuitively, fusion surgeries restricted motion at the index level, while TDAs retained almost all pre-operative motions. The biomechanical and pathological implications of ROM are still largely unclear, but if it is a priority for the patient to regain full range of motion following surgery, TDA is clearly the better option.

Adjacent Segment Degeneration
There was a significant difference in the incidence of ASD for TDA patients and ACDF patients (26.2% v. 43.9%, P<0.001). This indicated that, as intended, disc arthroplasty appeared to reduce ASD, while fusions tended to increase stresses on adjacent levels. There were some inconsistencies among the included studies regarding the way in which ASD was quantified and reported. Some authors defined ASD as the need for surgical intervention, while others considered it an umbrella term for any postoperative new symptoms which developed at the adjacent level 16,80 . This demonstrated the need for more objective criteria for the evaluation and quantification of adjacent level disorders that develop postoperatively.

Heterotopic Ossification and Bone Adaptation
The overall incidence of HO of any grade was 43.89% for TDA, however the rate of motion-limiting HO (Grade 4) was much lower at 13.84%. This was consistent with findings reported by other investigators 26,30,36,40 . Although HO is a common complication of TDA, the impact that it has on clinical outcome is still largely unclear 35 . The present review indicated high rates of HO in studies with more than five years of clinical follow-up.
Although difficult to quantify, the preservation of motion using TDA may allow the body to maintain a natural kinematic state after surgery. In contrast, fusion may place constraints on the spine, which could result in an overall lower clinical outcome rating for measures evaluating perceived health and functionality.

Limitations
There were several limitations in the present study. First, the only outcome data included for ACDF were studies that included ACDF patients as their control group. This may not be representative of general population of fusion patients, which may include patients that are not candidates for TDA surgeries. However, our study was focused on TDA performance, so the use of ACDF patients who were eligible for a TDA may be more appropriate. Additionally, while most studies reported similar categories of data, the way in which data was reported was not entirely consistent. For example, some authors reported overall reoperation rates, while others specified the location and extent of additional surgical procedures. Lack of cervical level-specific outcome in many of the studies is another potential weakness; however, more than 80% of the studies specified that levels treated were C5/C6 and C6/C7, the most widely indicated levels for TDA treatment. The majority of studies matched treated levels for comparison between ACDF and TDA patients; thus we can assume most ACDF surgeries were for C5/C6 or C6/C7, as well. Since we did not have the raw data from each study, reported means and standard deviations were used, with the inherent assumption that the general population is normally distributed.

Conclusion
The results of this study demonstrate the importance of including all possible studies and accounting for the potential of financial bias in reported outcomes. By reviewing all mid-to long-term data on cervical disc arthroplasty, this study provided a comprehensive overview of the performance of cervical disc arthroplasty. The results of this study suggest that TDA was successful in the general population at preserving motion, reducing adjacent segment degeneration, and improving overall quality of life, using standardized metrics for reporting.