Introduction

An estimated 15 million infants are born preterm (less than 37 weeks of completed gestation) each year worldwide, of whom approximately 1 million do not survive the first year of life1,2,3. Complications of preterm birth (PTB) can lead to many short- and long-term medical conditions in the child, including respiratory distress syndrome, chronic lung disease, cardiovascular disorders, asthma, and loss of hearing and vision4. Consequently, the economic costs and public health burden of PTB are substantial. Yearly costs have been estimated at £2.946 billion in the UK5 and $8 billion CAD in Canada6, where PTB has been linked to two-thirds of infant deaths7. Compared to the UK, Canada, and Western European countries, the US has a higher rate of PTB (10.02% in 2018)8,9, with annual costs of $26.2 billion10.

In light of the potential to reduce infant and childhood morbidity and mortality, reduction of PTB is a major public health priority11. Accurate and reliable prediction of PTB would enable preventative interventions to diminish morbidity and mortality12, thus benefitting pregnant women, neonates, healthcare systems, and society as a whole. However, recent studies have shown that prediction of PTB is a challenging task, with prediction models to date yielding an area under the receiver operating characteristic curve (AUC) of around 70%13.

Machine learning techniques present the potential for improved prediction of PTB. Use of machine learning has grown rapidly in a wide range of medical disciplines including cancer prediction, cardiovascular diagnostics, image analysis14,15, and maternal postpartum complications16, and machine learning was highlighted as one of the most important advances of 2019 for prenatal diagnosis17. The main advantage of machine learning methods over other methods such as linear regression is the possibility of relaxing assumptions regarding multicollinearity, additivity, and distribution18.

Machine learning can be conducted using multiple types of learning algorithms, among which decision trees, random forests, and artificial neural networks are commonly used in medical disciplines19. Decision trees constitute the most straightforward algorithm and provide a visual representation of the relation between predictors and outcome variables. However, the high variance in the results of the decision trees can in some cases be improved by using random forests, which aggregate results of randomly generated decision trees to produce a more accurate model20. Artificial neural networks present a third option that has been broadly used in medical studies21 and performs well when complex and non-linear associations exist between variables22. Briefly, artificial neural networks use predictors as inputs and connect them to multiple hidden layer combinations by assigning suitable weights to predict the outcome23.

This study identified predictors of PTB in a large cohort of multiparous women using multiple state-of-the-art machine learning algorithms and compared results obtained from these algorithms with those from traditional logistic regression analyses. Predictors of PTB that have not been commonly considered in prior prediction models, such as proteins used for screening for placental diseases and Down Syndrome, were included. Also the predictive models were developed using training datasets and then tested using validation data, as is recommended by guidelines24,25.

Methods

Data and population

A population-based retrospective cohort study was performed using data from Ontario’s Better Outcomes Registry and Network (BORN)26. A description of the reliability of BORN to provide accurate data is provided by Miao et al.26 In the present study, multiparous women with a singleton birth at 20–42 weeks’ gestation delivered in an Ontario hospital between April 1, 2012, and March 31, 2014, were included. The primary outcome, PTB, was defined as gestational age at birth less than 37 weeks. Spontaneous PTB was also considered as a secondary outcome. Spontaneous PTB was identified by births categorized as not “induced”, not “caesarean section” and not “augmented labor”27.

The potential predictors of PTB available in the BORN database were identified based on knowledge to date regarding the etiology of PTB11,28. Potential predictors considered for first-trimester models included maternal age, height, pre-pregnancy body mass index (BMI), pre-existing physical health conditions (Table S1), pre-existing mental health conditions (Table S2), socioeconomic variables (Table S3), smoking status, alcohol consumption during pregnancy, folic acid use, gravidity, number of prior abortions (including miscarriages), number of prior PTBs, number of prior term births, number of prior vaginal births, number of prior caesarean births, history of vaginal birth following caesarean section, history of stillbirth, gestational weight gain during the first trimester, diabetes, use of assisted reproductive technologies, and antenatal health care provider. The impact of pregnancy-associated plasma protein-A concentration and free beta-subunit of human chorionic gonadotropin were also considered as potential markers of placental diseases including preeclampsia29, as well as nuchal translucencies30.

In second-trimester models, all of the variables from the first trimester were considered. Moreover, the protein concentrations of dimeric inhibin-A32, unconjugated estriol33, human chorionic gonadotropin34, and alpha-fetoprotein35 were added. Furthermore, the intention to breastfeed, attendance at first-trimester appointments, hypertensive disorder of pregnancy, gestational diabetes, infection, medication exposure, fetal sex, pregnancy complications31 were included as potential predictors of PTB during the second trimester as well. Complications of pregnancy contained more than 600 categories, which were combined into three groups based on the expertise of our in-house maternal–fetal specialist: No complications, Moderate complications, and severe complications (including hypertensive, placental abnormalities, and maternal problems during the pregnancy, such as antepartum hemorrhaging). Also, biomarker concentrations and nuchal translucencies were categorized into three groups: normal, abnormal, and missing (see Table S4 for cutoffs).

Descriptive analyses

The associations between predictors and outcome variables were measured by applying chi-square and t-tests for categorical and continuous predictors respectively, with statistical significance defined as p < 0.05. Then crude odds ratios (ORs) between individual predictors and the primary and secondary outcomes were estimated using univariate logistic regression.

Model selection and evaluation

First the whole cleaned data set was divided into two portions: two-thirds of the data were used for model building (training), while the remaining one-third of the data were used for model testing (validation). For each of the algorithms, the tenfold cross-validation was used in the balanced (random oversampling36,37) training data to identify the optimal model producing the highest area under the receiver operating characteristic curve (AUC). Finally the model performance is assessed in the validation data by sensitivity, specificity, positive predictive value, negative predictive value, and AUC38.

Machine learning algorithms were executed using R software (version 3.5.2) and the caret39 package. The receiver operating characteristic curves (ROCs) and AUCs were obtained by using the pROC package in R. Variable selection for the machine learning models was performed using the Boruta package40. Boruta computes the importance of each variable as a measure of predictive ability. The higher the importance of a variable, the more predictive power in the model41. For missing values of prediction variables, ten multiple imputations were produced by chained equations42 using the R package MICE43 and substituted missing values with the average and mode for continuous and categorical variables, respectively. Variable selection for the logistic regression analyses was performed using the stepAIC function in the MASS package.

Consent to participate

Informed consent was obtained from all subjects and/or their legal guardian(s).

Ethics approval

This study followed the guidelines for the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis25. The Hamilton Integrated Research Ethics Board approved the study before its commencement (approval # 14-714-C).

Results

The cohort included 145,846 births, of which 8125 (5.57%) were preterm (Table 1). Nearly 30% of the study population was over 35 years of age. The mean maternal pre-pregnancy BMI was 25.9 ± 6.6 kg/m2, and the mean maternal height was 163.6 cm ± 7.3 cm. Twenty-one percent of women had at least one medical condition and approximately 40% had at least one previous abortion (including miscarriages) (Table S5). Of those patients who had experienced prior abortions (including miscarriages), 4.9% had more than three. Overall, 2.0% of pregnant women had a history of stillbirth, while 8.6% had a history of PTB and 24.8% had a previous caesarean section.

Table 1 Distribution of preterm birth and maternal baseline demographic and clinical variables in a population-based cohort study for prediction of preterm birth in multiparous women.

Univariable analysis

Twenty-nine potential first-trimester predictors of PTB were analyzed (Table S6). Patients who were older than 35 years of age, shorter than 160 cm, obese prior to pregnancy, smokers, patients who conceived using ovulation induction including IVF, and those who had prior medical conditions including diabetes or low pregnancy-associated plasma protein-A concentrations were more likely to experience PTB than patients without these conditions. Furthermore, patients with a history of abortion, PTB, or caesarean section had a higher likelihood of experiencing PTB than patients who did not have a history of such conditions. Results of the univariate analysis for second-trimester predictors are shown in Table S7. Patients who did not attend their first trimester clinical evaluation, failed to enroll in prenatal education programs, or had abnormal protein (pregnancy-associated plasma protein-A, dimeric inhibin-A, unconjugated estriol, human chorionic gonadotropin) levels were more likely to experience PTB than a term birth. Furthermore, patients pregnant with a male baby, having a hypertensive disorder, or experiencing severe complications during pregnancy were at a higher risk of PTB than term birth.

Multivariable analysis

Stepwise logistic regression identified 18 significant first-trimester predictors of PTB (Fig. S1). Among these, the highest adjusted OR was seen for prior PTB (aOR for one previous PTB: 3.91; 95% confidence interval (CI) 3.52–4.33; OR for two previous preterm births: 4.33; 95% CI 3.59–5.20; OR for three or more previous preterm births: 6.37; 95% CI 4.48–8.94). Diabetes and abnormal pregnancy-associated plasma protein-A concentrations had the second highest ORs. Overall, women were at increased risk of PTB if they were shorter than 160 cm, underweight (< 18.5 kg/m2), had a lower education level, a history of health concerns, history of miscarriage or caesarean section, or experienced excess gestational weight gain. In contrast, patients with a history of term birth were less likely to give birth preterm.

In the second-trimester model, stepwise logistic regression identified 23 significant predictors of PTB (Figs. S1, S2), including all 18 predictors identified in the first-trimester model. The five additional predictors identified were fetal sex, attendance at the first-trimester clinical evaluation, pregnancy complications, medication exposure, and abnormal alpha-fetoprotein concentration. Severe complications during pregnancy exhibited the highest adjusted OR (12.37; 95% CI 11.61–13.18) in the second-trimester model. Patients who were pregnant with a male fetus were also at a higher risk of experiencing PTB. Conversely, patients exposed to medications, including vitamins and herbal supplements, were less likely to experience PTB.

Machine learning (Boruta method) identified 24-predictors of PTB during the first trimester (Table S8). As with logistic regression analyses, machine learning methods found that a history of PTB was the strongest predictor of subsequent PTB. However, in second-trimester models, machine learning methods did not identify fetal sex, prior abortions (including miscarriages), or maternal height as important predictors of PTB (Table S9). As with logistic regression analyses, complications during pregnancy, hypertensive disorder of pregnancy, and prior PTB were the most important predictors identified by machine learning.

Performance of prediction models in the validation set

Among first-trimester models, the random forests model had the highest sensitivity, specificity, and AUC in training samples (Table S10). However, in the testing data, artificial neural networks yielded the highest AUC (68.8%, 95% CI 67.6–70.1%), sensitivity (54.5%, 95% CI 52.4–56.7%) and negative predictive value (96.6%, 95% CI 96.3–96.8%) (Fig. 1, Table 2).

Figure 1
figure 1

Comparison of prediction models for preterm birth in multiparous women during the first and second trimesters.

Table 2 Predictive power and 95% confidence intervals of models for preterm birth in multiparous women during the first and second trimesters in testing data.

The AUC for both machine learning methods and logistic regression models rose substantially with inclusion of second-trimester predictors, with logistic regression yielding the highest AUC (80.5%, 95% CI 79.6–81.5%). The notable rise in AUC between first- and second-trimester models stemmed largely from the addition of pregnancy complications. Accordingly, a sensitivity analysis without this variable was performed. This analysis yielded an AUC of 72.1% (95% CI 71.1–73.1%) (Fig. 2). Among the second-trimester models, artificial neural networks yielded the highest AUC, while also maintaining a high negative predictive value of 97.4% (95% CI 97.3–97.6%). However, both artificial neural networks and random forests exhibited overfitting (high performance in training data but low performance in testing data) issue.

Figure 2
figure 2

Comparison of prediction models for preterm birth in multiparous women during the second trimester, without inclusion of data on pregnancy complications.

Prediction of spontaneous PTB

In first-trimester models predicting spontaneous PTB, logistic regression and artificial neural networks yielded similar AUCs (71.0% and 70.9% respectively) (Fig. S3). However, in second-trimester models, the AUC was slightly higher for artificial neural networks (73.8%, 95% CI 72.3–75.3%) compared to logistic regression (70.8%, 95% CI 69.1–72.4%) and other methods (Fig. S4). Artificial neural networks and logistic regression both yielded negative predictive values for spontaneous preterm birth of ~ 97% in first- and second-trimester models (Table S11).

Discussion

Main findings

Using data from a population-based retrospective cohort of multiparous women, a set of prediction models for PTB developed and validated demonstrated moderate predictive power44,45, with an AUC of 68.8% in the first trimester. This study identified 18 first-trimester predictor variables, among which history of PTB, diabetes, and abnormal pregnancy-associated plasma protein-A concentrations were the strongest predictors. In second-trimester models, four additional significant predictors were identified (infant sex, antenatal care in the first trimester, medication exposure, and abnormal alpha-fetoprotein concentrations), resulting in a maximum AUC of 72.1%, which is considered an acceptable prediction accuracy38,45. Inclusion of data on complications during pregnancy yielded an AUC of 80.5%, which is indicative of an acceptable prediction model39,46.

For both the first and second trimesters, my models for overall and spontaneous PTB yielded negative predictive values higher than that of fetal fibronectin, whose negative predictive value was identified as 93% in a recent systematic review46. Moreover, my second-trimester models generated using artificial neural networks yielded sensitivity and specificity similar to the overall sensitivity (58%) and specificity (84%) of fetal fibronectin46.

Strengths and limitations

This study has several important strengths. First, logistic regression was applied and three of the most commonly used machine learning approaches to predict PTB. Such an approach contrasts with that used in previous studies, which have primarily focused on one or two machine learning methods47,48,49,50,51,52,53. Moreover, new prediction variables from both the first and second trimesters were considered, which have also not been consistently included in previous studies8,9,10,11,12,13,14,48,49,50,51,54,55,56,57,58,59. The proposed models yielded higher negative predictive values than fetal fibronectin for the prediction of PTB46. Accordingly, eventual use of these models in clinical practice could reduce non-essential hospitalization and other interventions, in turn reducing costs and hospital staff burden60.

An additional strength of this study is the use of a relatively large cohort compared to previous studies8,9,10,11,12,13,14. Also, the prediction variables that have not been considered by previous studies, including plasma proteins and first-trimester gestational weight gain, were examined in this study. As suggested by Courtney et al.61, prenatal factors such as maternal health behaviors and medical history were also included in our study. Finally, an additional drawback of previous studies is the lack of information provided to address the issue of imbalanced classes8,9,10,11,12,13,14 which was covered by random-oversampling method.

Despite using state-of-the-art machine learning algorithms, the predictive power of the proposed models was limited, and the AUCs varied from 69 to 73% when information on pregnancy complications was not included. The inclusion of information on fetal fibronectin and ultrasonography (cervical length), which are important predictors of PTB62, as well as data on preventative measures used by patients with high-risk pregnancies63,64, may improve predictive power. However, it should be noted that progesterone was rarely administered in Ontario during the timeframe of the data used in our study65.

Conclusion

In a large cohort of multiparous women, machine learning (artificial neural networks) and logistic regression methods yielded generally similar accuracy for the prediction of PTB. However, for spontaneous PTB, artificial neural networks provided slightly better results than logistic regression. For overall and spontaneous PTB, both first- and second-trimester models provided very high negative predictive values, higher than that of fetal fibronectin.