Introduction

The natural history of a condition represents its evolution over time when untreated [1]. Understanding the natural history is essential to inform the prescription of any intervention aiming to modify the course of the condition while avoiding unnecessary treatments [2].

Scoliosis is a three-dimensional torsional deformity of the spine and trunk [3]. In 80% of cases, it is idiopathic (IS) because it is not attributable to a specific cause. Adolescent IS is the most common type (prevalence of 2–3%) [3,4,5]. IS progresses with growth, especially during puberty [6]. Not all patients show the same progression rate, and studies do not show consistent results [7,8,9]. A meta-analysis [10] confirmed the heterogeneity of natural history studies, precluding providing reliable information on progression during growth.

The Bracing in Adolescent IS Trial (BrAIST) [11] was stopped early for ethical reasons, given the clear superiority of bracing over natural history in the BrAIST study [11]. This result also reduces the probability of observing other prospective untreated IS cohorts during growth. We proposed a novel approach to collecting natural history information based on retrospective observation of prospectively collected radiographs [12]. In a previous paper [12], we described a general progression model throughout all growth. We could predict 47% of the observed values with adequate precision (± 5°). The prediction curve showed a progressively slower progression with time, consistent with our understanding of natural history after puberty but not with IS progression in other growth periods. To improve on this model, we considered the studies of Duval—Beaupère et al. [13], whose hypothesis is confirmed by clinical experience. They recognised 3 phases of scoliosis progression: a first slow period from discovery to the start of puberty (Point P), a second rapid period up to Risser 3, and a final period of gradual stabilisation until the end of growth. We hypothesised that splitting our data and developing models specifically for 3 phases similar to those described by Duval-Beaupère et al. [13] could improve the predictive validity of our earlier model. Therefore, this secondary analysis of the previous data [12] aims to describe the natural history of IS in different growth phases and whether a 3-stage model predicts the Cobb angle more accurately than the single-stage model we previously developed [12]. Due to the characthristics of the prospective database used for the study, we will study only the available data: frontal spine radiographic information (Risser, curve type, tri-radiate cartilage, Cobb angle) and demographic parameters (sex, age). These prediction models for future curve severity could have a significant clinical value in informing treatments and achieving shared-decision making with patients and families.

Materials and methods

Design

We performed a retrospective analysis of the clinical records from tertiary care clinics in our Institute, which specialises in scoliosis. To obtain natural history information, we evaluated the radiographs children performed before accessing our clinics, provided they did not receive any treatment. The local Ethical Committee approved the study. All patients (or their parents, if minors) provided informed consent to a retrospective anonymised analysis of their clinical data. The 2018 Research Grant by the Scoliosis Research Society (SRS) provided the funding for the study. This paper presents a secondary analysis of the data reported in another study [12].

Participants

We selected consecutive eligible participants from all the children’s data available in our Institute’s clinical files. We checked whether they had complete treatment history and available radiographic images (collected since October 2008). To increase the number of participants, we recruited new patients prospectively throughout the research project from 2017 to April 2020. We also asked all children to bring the radiographs taken before coming to our Institute and systematically checked their treatment history.

We recruited patients with IS who had at least two standing full-spine coronal radiographs before any treatment: one at our first consultation and at least another one while still growing (up to European [EU] Risser 4) taken before coming to our Institute. At their first Institute visit, participants could be during growth or skeletally mature (when the curve is likely stable) because we were interested in predicting curve severity until the end of growth. We set an upper age limit at 25 years for the first consult to avoid including participants with curve progression after bone maturity.

To analyse data according to Duval-Beaupère et al.’s hypothesis [13], we had to identify the pubertal growth spurt, which was impossible in our retrospective radiographic sample. Consequently, we identified the best available proxy by looking at the skeletal maturity. We wanted to identify three groups (1) BEFORE: before the growth spurt; (2) AT: from the onset of the growth spurt to EU Risser 2 skeletal maturity (corresponding to US Risser 3 [14]) – due to the different age between sexes at pubertal growth spurt, we split this age into two groups: 2a) AT-females and 2b) AT-males; (3) AFTER: from EU Risser 3 to end of growth. We used the EU Risser score because it was the only bone maturity sign visible in the available radiographs. Radiographic data allow easy retrieving of the well-established EU Risser 3 boundary to identify AFTER puberty [11, 14, 15], while there is no radiographic threshold exists to determine BEFORE puberty. According to Duval-Beaupère et al. [13], the first stage of the EU Risser sign appears in the middle of the pubertal growth spurt, which usually starts a few months earlier. For this reason, and because females experience the growth spurt before males, we searched the age at which we could minimise the number of patients with EU Risser 1 in the female cohort. This age served as the boundary between BEFORE and AT. Of all participants in the previous study [12], we included only those with two radiographs within the timeframe of one of the three groups we defined.

Exclusion criteria were prior brace or Physiotherapeutic Scoliosis Specific Exercises (PSSE) treatment, previous injuries, surgery, or pathologies that could have affected the scoliosis appearance or progression between the considered radiographs. We did not exclude patients according to the Cobb angle, provided they had at least one x-ray showing a curve above 10°.

Quality check

Two expert clinicians verified the clinical records of all eligible patients to exclude all those previously treated before they accessed our Institute. We asked for information from the treating physician and then from the patient in case of any doubt. We excluded all patients with uncertain prior treatments. Two evaluators blindly re-measured a random sample of 100 radiographs, confirming adequate measurement accuracy. This result allowed the use of measurements from the clinical records.

Variables

We included all the pairs of eligible individuals’ radiographs within each group’s timeframe. We considered the oldest radiograph as the baseline. We used continuous independent variables: Cobb angle in degrees (at each observation) and observation time in years (also squared and cubic) (the variable “TIME”) between the radiographs. We used the following independent baseline categorical variables: EU Risser score, sex, curve type (single thoracic, single lumbar/thoracolumbar, double and others), and triradiate cartilage (open/close). We also checked a continuous variable for curve severity, using the clinically significant 30° threshold [16], applying the Cobb degrees – 30°, with the lowest cut-off at 0.

Model derivation and validation

Considering the growth model characteristics of the curve severity data (slow progression, rapid progression and stabilization), we derived a prediction model of future Cobb angles for each puberty subgroup. We used linear mixed-effects models with random effects and a variance components structure. Independent variables were chosen based on their clinical and statistical significance. We decided on the final best model for each subgroup by evaluating the smallest Akaike (AIC) and Bayesian (BIC) Information Criterion.

Internal validation was done using cross-validation to test the prediction accuracy of each model. We aimed for a 5-fold cross-validation. If any group had less than 500 participants, we performed the maximum possible cross-validations with nearly 100 patients per run.

To assess prediction accuracy, we used standard prediction intervals. For clinical purposes, we also used new intervals of specified width centred at predicted values obtained from the model. We set the first of such intervals at the recognised radiographic measurement error (± 5°) threshold used to define a significant change [14]; we then progressively added ± 5° until we determined the interval with 95% accuracy during the cross-validations.

Results

Group selection and sample description

To determine the age boundary between BEFORE and AT, we found 77, 246 and 548 female participants at ages 9, 10 and 11, with an EU Risser 1 prevalence of 1,3%, 3,2% and 10,2%, respectively. Consequently, we included age 10 in BEFORE and 11 in AT subgroups. We finally had 275 participants (246 females) in BEFORE, 782 in AT females, 190 in AT males, and 378 (318 females) in AFTER. Table 1 reports the participants’ characteristics of the three subgroups. The average time between radiographs was shorter in AT-females (1,1 ± 0,6 years) compared to the other groups (1,4 ± 0,8, 1,3 ± 0,7, and 1,5 ± 1,2 for BEFORE, AT-males and AFTER, respectively) (Table 2). Cobb angles increased in all subgroups from baseline to the end of observation (Table 2). Considering individual patients, we found 48%, 62%, and 49% worsened 5° Cobb or more in groups BEFORE, AT and AFTER, respectively. We also found patients improved 5° Cobb or more: 2% in BEFORE and AFTER and 1% in AT, with 1 improved 10° Cobb or more in BEFORE and AFTER, and 4 in AT.

Table 1 Characteristics of the subgroups for sex, European Risser, curve type and Cobb degrees
Table 2 Baseline and end of observation data of the main variables studied to develop the models

Prediction models

The predictors of future Cobb angle included in the best predictive models for the different groups were similar (Table 3). They included the baseline Cobb angle, the observation time between baseline and prediction (linear, squared, and cubic) (viariable “Time”), and the EU Risser score (in AT and AFTER). Sex was a predictor only in AFTER. The predictors with a negative association with predicted curve severity were higher EU Risser score (in AT and AFTER), squared “Time” (in AT and AFTER), cubic “Time” (in BEFORE) and female sex (in AFTER). The variable “Time” showed a larger effect in AT (6,88 versus 0,82 and 1,78 for BEFORE and AFTER, respectively). Neither curve type nor triradiate cartilage were included in the groups’ final prediction models.

Table 3 Prediction models in the three groups

Prediction accuracy in internal cross-validations

Subgroup size allowed performing three cross-validations in BEFORE and AFTER and 5 in AT-females. We cross-validated AT males in the female’s final model. The median prediction error varied between 2,9 and 4,0 for the models developed with females but reached 14,7 when applying the female model AT puberty in males (Table 4). Considering the classical radiographic measurement error of ± 5°, the accuracy for the final models (measured as the proportion of the predictions within 5° of the observed values) was 74% (BEFORE), 72% (AFTER), 68% (AT-females), and 60% (AT-males). We reached the target of almost 95% accuracy of the observed values within ± 15° from the prediction for the groups with the most rapid progression (BEFORE and AT, females and males) and within ± 10° in the slowest one (AFTER).

Table 4 Accuracy of future Cobb angle prediction using the models developed in the three puberty groups. 25th = 25th percentile; 75th = 75th percentile

Discussion

Understanding the natural history of idiopathic scoliosis is essential for selecting the intervention to modify the disease’s course and avoid unnecessary treatments. There is general agreement that the progression risk changes according to the growth phase. Duval-Beaupère et al. [13] hypothesised three different curve progression slopes during growth, as confirmed by clinical experience. We aimed to verify if, following this assumption, we could build IS curve progression prediction models that were more precise than the overall model we developed previously [12]. Our new models proved more accurate. The prediction accuracy within the accepted radiographic measurement error of ± 5° improved from 47% OVERALL to 74% (BEFORE), 72% (AFTER), 68% (AT-females), and 60% (AT-males). These results also confirm that an important portion of the variance in the progression of scoliosis remains individual and based on other non-radiographic factors.

As expected, we verified that AT puberty is the period with the largest curve severity increases. In AFTER, we found that EU Risser 3 did not correspond to a stop of progression as usually proposed [15]: this result could be particularly relevant in severe curves in clinics. It is important to note that we studied AFTER patients from EU Risser 3 [17], which corresponds to the US Risser 4: consequently, progression is possible also after what is usually considered skeletal maturity [15]. While related to progression [9], we did not retain the topographical curve classification in the final models. A sex difference was present only in AFTER, with females progressing less than males. These results need further studies to be confirmed. Nevertheless, considering the higher prevalence of females with IS, they could explain the general assumption that the progression of scoliosis stops at EU Risser 3. In AT, we tested the model’s accuracy developed for females using male data. We showed that pubertal progression is only delayed in males but keeps a similar predictable behaviour. The accuracy decrease suggests caution in the implementation, and future studies should focus on the evolution in males AT puberty.

An interesting and surprising result concerns patients’ radiographic improvements in time above the classical measurement error threshold of 5°. When there were not yet ethical concerns about repeating radiographs within a short time, old publications showed results consistent with ours [18]. This stresses the need for caution in considering radiographic measures and complementing them with clinical repeatable data collection to allow proper decision-making.

Comparison with previous publications

Di Felice [10] performed a meta-analysis of the current literature covering 16 papers with 4083 patients reporting on the natural history of idiopathic scoliosis until 2018. The authors found that infantile IS between 15 and 35°, with 3 to 12 years of follow-up, progressed in 49% of cases (95% Confidence Interval (95CI) 1–97%). They did not find studies on juvenile IS alone but mixed with adolescent IS: curves below 25° progressed in 49% of cases (95CI: 19–79%) by 2,2° to 9,6° per year over 1 to 5,5 years. Finally, adolescents with IS between 11,1 and 13,8 years and curves from 19° to 30° progressed in 42% of cases (95CI: 11–73%) over follow-ups of 7 months to 4 years. The high heterogeneity suggests caution and the need for additional studies. We obtained our results from a large sample of 1563 untreated patients, a group as large as 40% of all patients previously studied in the literature. Compared to what Di Felice reported [10], we (1) did not have data to study infantile IS; (2) described for the first time juvenile IS with enough detail (even if we included age 10 in this group); and (3) split adolescent IS into two growth phases, with entirely different progression rates. Of note, instead of 10 years as the starting age for AIS as classically defined [3], we used age 11 after a specific check. This decision allowed us to increase the numbers in BEFORE with an improved validation process in this age group.

Lonstein and Carlson [9] reported one of the most widely cited models. They retrospectively observed 727 cases with juvenile and adolescent IS aged 12,5 years (< 10 to 19) for 25 months (12–88) and found progression in 23,2%. They followed the patients until progression or the end of growth occurred. Unfortunately, their work was never internally or externally validated. They also produced a regression equation for patients likely to progress by more than 5o, including the most important parameters, Cobb angle, EU Risser score, and age. These predictors are consistent with ours, even if we also found a correlation with sex for AFTER only.

Many studies have reported that single lumbar (and thoracolumbar) curves had a lower risk of progression [10]. Further, another study described the expected curve progression in decreasing order by types: triple 9,1o/yr, thoracic 8,7o/yr, double 6,8o/yr, lumbar 3,8o/yr, and thoracic-lumbar 3,7o/yr [13]. Similarly, we found that single lumbar and thoracolumbar curves progressed less than the others, even though the final models did not retain curve type as a predictor because it did not improve our results.

Recently, Dolan [19] developed a model on untreated adolescents from the BrAIST study (curves 20–40°, age 10–12, EU Risser 0–2 at the start) in a sub-group of 115. The authors used a mixed group to validate the model, including some BrAIST patients braced less than 6 h per day and two samples recruited at other times from Institutes that participated in the BrAIST study. They used logistic regression analysis to predict a different outcome: curve progression to 45° or surgery before EU Risser 4. Yet, their final model included the Cobb angle, curve type, and skeletal maturity evaluated according to Sanders [20]. They assessed but did not retain age, sex, and triradiate cartilage. Dolan followed up an untreated group until the end of growth. Consequently, they could consider the probability of reaching the surgical threshold as one of the primary treatment outcomes [15]. Our prediction models, with the Cobb angle as the outcome measure, address a different but highly relevant need [14]: to determine the risk of progression at a specific growth phase.

In our previous study [12], we derived and validated a prediction model from the whole sample of 2317 patients, compared to 1563 in this secondary analysis. The difference is due to the exclusion of subjects with pairs of radiographs in different puberty groups. While reducing the heterogeneity, the subgrouping procedure also reduced the sample size. This could affect the reliability of the analysis, considering we kept the same number of predictors. The increased accuracy of the current models could be due to the decrease in heterogeneity or the reduced number of cross-validations with the risk of overfitting [12]. Moreover, our age-dependent models cover a shorter prediction timespan than the overall model: 2–3 years for puberty subgroups instead of 5 years for the OVERALL model (Fig. 1). Therefore, although this strategy enhanced the accuracy and clinical usefulness, we are also aware of the potential risk of overfitting. External validation will help determine the clinical value of the equations.

Fig. 1
figure 1

Comparison between the three curve severity prediction models developed in this study from data from specific puberty periods and the model developed combining patients from the whole growth spectrum from the same research project. Hypothetical case with a starting point at 25 degrees. The length of the prediction lines for each model corresponds to the mean interval plus one standard deviation between the included radiographs. The general model slightly underestimates pubertal progression and overestimates all the other ages – the higher prevalence of pubertal cases could explain this result. Nevertheless, the overall model gives a more extended prediction due to the longer intervals between included radiographs

The differences among the three growth periods correspond to clinical experience and previous research. Scoliosis can manifest in various curve types, curve magnitude and clinical characteristics. The present design could consider only radiographic parameters, age, and sex. The high variance we found demonstrated the complexity of scoliosis. We highlighted the need for further studies in this field with larger samples to account for additional potential features predicting future scoliosis severity. Future models developed in larger cohorts with internal and external validation are desirable.

Clinical and educational use of the models

The proposed models can be helpful clinically and for educational purposes. Clinicians can estimate the progression anticipated for each individual with IS. For example, for a girl, 12 years of age, with EU Risser 0 and a 32° primary curve, the expected Cobb angles after 1 and 2 years would be:

  • AT regression equation: -1,01+(°Cobb×1,12)+(years between observations×6,88)-((years between observations)2 × 1,81)+((years between observations)3 × 0,35)-1,85(if EU Risser 1)-2,53(if EU Risser 2).

  • Prediction at 1 year: -1,01+(32 × 1,12)+(1 × 6,88)-(12 × 1,81)+(13 × 0,35)-1,85 × 0–2,53 × 0= -1,01 + 35,84 + 6,88 − 1,81 + 0,35 = 40,25°.

  • Prediction at 2 years: -1,01+(32 × 1,12)+(2 × 6,88)-(22 × 1,81)+(23 × 0,35)-1,85 × 0–2,53 × 0 = -1,01 + 35,84 + 13,76 − 7,24 + 2,8 = 44,15°.

The probability that the prediction is correct depends on the accuracy reported in Table 4 for each age group. Table 5 reports the likelihood of progression for this hypothetical case: this is also an example of how these data could be presented to the patient and their family to make them understand the risks of progression in their specific case.

Table 5 Probability of and the expected curve severity after 1 and 2 years for a girl, 12 years of age, European Risser 0, with a 32° primary curve

Clinicians could base the discussion with the patient and parents on what is reported in Table 5 in the appendix. In this specific case, both predicted future severities are important and would help the clinician in counselling the family to consider effective, aggressive, conservative treatments [3, 21]. Moreover, clinicians could consider combining this prediction model with the BrAIST one [19] to support counselling and final informed decisions as they focus on different outcomes. Clinicians should limit individual predictions based on our models to two years for BEFORE and AT and three years for AFTER. There are two main reasons for this recommended limitation: the average observation time between the pairs of radiographs of the participants included the development of each model (Table 2); the longer times would bring the prediction outside the puberty phases of the models.

Another important use of our results is developing educational tools to help patients, families, and students understand scoliosis severity over time. One such graph shows the potential evolution throughout all growth of three curves identified at age seven and for three cases starting with the same Cobb angle but at different ages (Fig. 2). We used a series of predictions to develop these graphs where the end of one prediction became the starting point for the subsequent prediction. According to the results, we used the projections over two years for BEFORE and AFTER and over one year for AT. Logically, this approach amplifies the prediction error and decreases the confidence in results as the prediction interval increases. Consequently, such graphs should not be used to predict single-patient results. Still, they can serve as an educational tool to explain the natural history of IS and help our overall understanding.

Fig. 2
figure 2

Two hypothetical overall curve severity evolution models for idiopathic scoliosis until the end of growth. A: three patients with scoliosis identified at age seven with curves of 15°, 25° or 35°. B: three other patients with a scoliosis curve of 30° identified at age 6, 11 (EU Risser 0) or 14 (EU Risser 3 - female)

Strengths and limitations

This study represents, the largest cohort for prediction within specific age subgroups. We performed a very accurate case selection process with quality checks. We proposed a new approach to studying IS natural history because it would be unethical to complete a prospective follow-up of untreated patients since we would have to withhold treatment now proven effective even when it would become indicated [3, 21]. This type of study would benefit from collaborative data collection in the future, i.e., involving multiple centres in collecting radiographs of patients who arrive at their observation without previous treatment. Other possibilities could include patients refusing treatments (but acknowledging they may have characteristics leading to poorer outcomes) or using populations without access to treatment.

There are some limitations. The observational retrospective data collection did not allow gathering information other than radiographic ones. The missing data about patients’ real growth spurt and height when radiographs were taken is particularly important: the differing typical growth velocity in the three considered phases would have added justification for using the three skeletal maturation stages we employed. The study only examined radiographic predictors, while other essential predictors [9] were unavailable in our database. Many patients had short observation times in each model, which was unavoidable because of the design. Splitting the populations according to the growth period reduced the number of participants in each model, thus increasing the risk of overfitting. The models have a short observation time, implying that the prediction also has specific time limitations. Because we included only patients who came at a second observation, we may overestimate the probability of increases in curve severity. Nevertheless, this is also a clinical reality since scoliosis is frequently not diagnosed until it reaches a visible threshold. Moreover, not all the patients we observed progressed during the observation period.

Conclusion

Understanding the prognosis of scoliosis is essential for clinical decision-making. Our study produced accurate curve severity prediction models tailored to growth spurt periods, easy to use in clinics, and useful for education and research. Future collaborative efforts, possibly combining clinical and radiographic data, could allow the derivation and validation of more accurate clinical prediction rules to improve prediction accuracy further and explain a larger portion of the variance.