Introduction

Randomized clinical trials tend to exclude vulnerable populations such as pregnant women and neonates, children under 30 days of age, for ethical reasons [1, 2], necessitating observational studies, using prospective cohorts or real-world data to be the cornerstone for perinatal drug safety and effectiveness assessments. Studies that assess maternal and/or neonatal health outcomes rely commonly on administrative data sources including Birth Certificates (BCs) and claims data [3,4,5,6,7,8,9].

BCs have been utilized widely for public health surveillance in maternal and neonatal epidemiology because they collect detailed information on delivery and the neonate’s condition in addition to information pertaining to pregnancy course and maternal characteristics. Claims data, on the other hand, provide rich longitudinal information on healthcare utilization that allows inferences about maternal clinical history and drug exposure before and during pregnancy and outcomes in mothers and infants after delivery. Because both data sources are not collected for research purposes, data quality is of concern. For studies that aim to make causal inferences about drug exposure and outcomes, misclassification bias, especially low specificity of outcome definitions [10], can bias the estimated exposure-outcome association. Accordingly, some studies have evaluated the quality of a number of variables in both BCs and claims data for multiple states [11,12,13,14,15], while others have used variables in BCs and/or claims data with unknown sensitivity (the ability to identify true cases) or specificity (the ability to identify true non-cases) [16,17,18].

Studies that have evaluated the validity of either BCs or claims data, usually against medical records, have centered around maternal variables including method of delivery, diabetes, hypertension, gestational age [19, 20], and major birth defects such as cleft palate or heart malformations [13, 15, 21]. Limited information is available about neonates’ critical conditions data quality such as birth injury or respiratory distress syndrome. This study aimed to evaluate the concordance between BCs and claims data on several neonatal critical conditions and quantified the extent of possible false positive cases in each data source.

Methods

Data Sources

Texas and Florida BC data from 1999–2010 were harnessed and deterministically linked to Medicaid claims data. BC data contain information on neonates’ demographics, date of birth, gestational age, weight, complications during labor, conditions and maternal data. The Medicaid Analytic eXtract (MAX) files include in- and outpatient encounter claims for neonates and their mothers in addition to pharmacy claims of dispensed prescriptions. We required that both neonates and their mothers be continuously enrolled in the insurance for at least 30 days postpartum to capture neonatal health information in claims data.

BC and MAX linkage

Mothers and neonates in MAX and BCs were linked using a two-step deterministic linkage procedure [22]. In brief, if both the mother’s and neonate’s Social Security Numbers (SSNs) were available for a mother-neonate pair in MAX and BC, both mothers and neonates were linked directly using exact SSN matching. If only the mother’s or neonate’s SSN was available, we employed an established linkage algorithm for both Texas and Florida [23]. We first linked neonates and the mothers in MAX using a Medicaid family identifier (i.e., the case ID) and delivery claim dates. Neonates’ birth date was then matched to delivery. The established algorithm was assessed for Texas and Florida showing > 99% of the positive predictive value (the proportion of true cases among total identified cases) [24]. After creating the mother-neonate pairs in MAX, the available SSN was then used to link the mother-neonate pairs in MAX to the BC (Fig. 1).

Fig. 1
figure 1

MAX and BC Linkage Flowchart

Measurement of variables in MAX/BC

We measured the occurrence of birth injury, assisted ventilation (AV), seizure, respiratory distress syndrome (RDS) and neonatal intensive care unit (NICU) admission for each eligible neonate in both databases. In MAX data, the cases were ascertained based on the presence of at least one inpatient or outpatient encounter with condition-specific diagnosis codes in any diagnoses field within 30 days of birth on either neonates or mothers’ records using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) or Current Procedural Terminology (CPT) codes (Table 1). The claims codes to identify NICU admission, RDS, AV and seizures have been previously validated against medical records and have shown high sensitivity and specificity or positive predictive value [14,15,16, 25, 26]. In BCs, clinical measurements are recorded manually in an official form which is subsequently entered in the birth certificate database. A certified registrars, midwives and healthcare providers obtain the clinical information, according to the institution’s policy [27, 28]. Therefore, a condition was flagged for a neonate if it was checked on the BC form, i.e., given value 1 in the database.

Table 1 ICD-9 & CPT Codes Used in MAX.

Statistical analysis

Characteristics of the linked mother-neonate pairs in both states were examined in addition to the prevalence of each condition in each data source. We determined the level of agreement between data sources by calculating crude Kappa statistic. Unlike MAX where a missing diagnosis code is interpreted as absence of a condition, the BCs explicitly specify whether a condition was present or not, i.e., the BC forces selection of yes or no for a condition. Thus, we excluded BC records with missing determination of neonatal critical conditions, which occurred for less than 2% of records. Following usual convention, Kappa was categorized into high (> 80%), substantial (61%—80%), moderate (41%—60%), fair (21%—40%) and poor (≤ 20%) [29]. We also calculated the sensitivity of each data source as if the comparator were considered the gold standard in an attempt to quantify the ability of each source to capture cases identified by the other. For NICU admission, the analysis was conducted from 2004–2010 because this data field was added after 2003 as a result of a new BC form [30]. All analyses were performed using SAS 9.4 statistical software (SAS, Cary, NC). The study was approved by the University of Florida, Centers for Medicare and Medicaid Services, and Florida and Texas Departments of Health Institutional Review and Privacy Boards.

Results

The study sample was 1,539,344 mother-neonate pairs after successful linkage of MAX and BCs. The sample included 558,224 and 981,120 mother-neonate pairs for Florida and Texas, respectively. The median age of the mothers was 23.7 in Florida and 22.9 in Texas (Table 2). Half of mothers who gave birth in Texas had at most graduated from secondary school. In contrast, mothers in Florida who had at least a high school diploma comprised almost three quarters. About 97% of deliveries occurred in medical facilities in Florida which was lower than in Texas (99.9%). The median number of prenatal visits was eleven in Florida which was slightly higher when compared to Texas. Preterm deliveries comprised 10.2% of linked pairs in Florida and 11.4% in Texas. Concordantly, we found higher prevalence of NICU admissions, RDS and AV in Texas compared to Florida regardless of data source (Table 3).

Table 2 Characteristics of mother and deliveries in the birth Certificate and Medicaid Analytic eXtract linked cohort
Table 3 Prevalencea of neonatal critical conditions

The prevalence of neonatal critical conditions was consistently higher in MAX except for AV where the BC captured more than twice as many cases as MAX (Table 3). RDS showed a more than fourfold higher prevalence in MAX than in the BC and seizures were hardly ever captured in the BC. The highest absolute differences in prevalence between MAX and BC was for RDS (4%) in Texas. Figure 2 depicts the dissimilarity between MAX and BCs indicating that the majority of cases was predominantly identified by one data source but not the other. Except for NICU admission, more than 80% of all ascertained cases were captured by either MAX or BC but not both.

Fig. 2
figure 2

All cases distributed by data sources in Texas and Florida

The agreement was moderate for NICU admission in Florida (Kappa = 56%) and reached the substantial level in Texas (63%). For respiratory distress syndrome and assisted ventilation, Kappa showed poor agreement between BC and MAX in Texas (19% and 16%) and Florida (16%-15%). There was 8%—10% agreement between MAX and BC for seizures in the two states (Table 4). Birth injury showed extremely low agreement (Kappa 1.5%). Kappa values were generally slightly lower for Florida than Texas with MAX identifying lesser cases captured by the BC, but the BCs of both states showing similar sensitivity to capture cases in MAX.

Table 4 Estimated Kappa and sensitivity

Assuming that neither data source captured false positives, we found generally better sensitivity for MAX to capture BC cases than vice versa: MAX captured > 80% of NICU admissions that were recorded in Texas and 64% in Florida, while the BC only captured about half of all NICU admissions in MAX (Table 4). The only exception was AV, with about 70% of all identified assisted ventilation cases recorded on the BC and better capture of MAX cases in the BC (about 30%) than vice versa (about 10%).

Discussion

Our comparison of birth certificate and Medicaid claims data in measuring neonatal critical conditions in two large states found large variation in concordance between the two data sources. Capture of NICU admission showed moderate to substantial agreement between BCs and MAX, while variables such as birth injury and seizures had extremely low agreement. Importantly, except for NICU admission, 70% or more percent of cases in MAX were not captured by BCs in either state. MAX found higher prevalences for all examined conditions except for need for assisted ventilation with more than twice as many cases identified in the BC.

The overall higher prevalences captured by MAX may be explained by either lower sensitivity of data capture in BCs or false positive cases captured in MAX. Considering our use of previously validated claims-based algorithms, suggesting PPV >  = 86% for all evaluated conditions, low specificity of MAX-derived cases is unlikely to explain the lower prevalences reported in BCs. This conclusion is supported by a recent evaluation of BC data by the Centers for Disease Control and Prevention (CDC), which found quality to vary across states and hospitals [31]. The study, limited to two unidentified states and only 8 hospitals, reported agreement for NICU admissions and need for AV to be substantial or high for one state and low to extremely low for the other. The study confirmed that especially the sensitivity of BC is low, which reiterates previous findings that highlight underreporting of medical conditions as a key limitation of BCs [28, 32,33,34]. Specific to seizures, one research group studied neonatal seizure by evaluating capture on the BC [14] against hospital discharge and Medicaid claims [35]. Consistent with our finding, the authors reported low Kappa estimates (9–12%), which they attributed to BC underreporting. Of note, the timing of BC completion relative to the occurrence or discovery of neonatal conditions may further reduce BC sensitivity.

Failure of MAX data to capture cases detected by the BC on the other hand, may be explained by erroneous data on the BC or limited sensitivity of MAX. The previously quoted CDC study found in one state that the proportion of false positives on the BC varied between 15–30% for AV (depending on definition) and 11% for NICU admission. A more recent report from five hospitals in New York city shows a similar pattern of false-positive AV cases [36]. suggesting that BCs may not only lack sensitivity but also have some specificity issues in correctly identifying neonatal conditions [26]. Similarly, Zollinger et al [11] compared Indiana BC data to medical records for a variety of variables including RDS, AV, seizure and birth injury. The corresponding results showed extremely low specificity of BCs in identifying RDS, AV, seizure and birth injury cases. Administrative claims databases, on the contrary, have been compared to medical records for some variables and concluded overall high specificity. Two studies assessed ICD-9-CM codes for neonatal mechanical ventilation against medical records resulting in specificity of 99.7% and 97.1% in Canadian and Australian discharge databases, respectively [25, 26]. The same Australian discharge database showed 92.4% specificity for RDS. In addition, Bateman et al [37] validated neonatal seizure codes in MAX data and found a high positive predicative value of 86%. Although some of these validation studies occurred outside the U.S. and may not be generalizable to local coding conventions, the similarity in findings regarding specificity issues in BCs raises concerns about bias when used in inferential analyses. It should be noted that claims data’s focus on billing of medical services may result in varying sensitivity by prioritizing those diagnoses that are relevant for reimbursement decisions.

Our findings, taken together, support the assumption that MAX data may produce superior sensitivity and specificity in capturing the evaluated neonatal conditions. An exception of this observation appears to be in the measurement of need for assisted ventilation, which may be inherent in coding practices for billing purposes. For example, short-term need for assisted ventilation immediately after delivery, which is captured on the BC, may be incorporated in capitated billing arrangements and thus not appear itemized in billing data.

Motivation or incentives as well as training may influence the quality of data and discrepancy between BC and MAX. Both CDC and the American College of Obstetricians and Gynecologists have also pointed to lack of standardization of obstetric clinical data definitions [31, 38]. Ongoing efforts to improve BC documentation via enhanced training and standardization warrant future studies as more recent MAX and BC data are made available.

Although our results only cover two states and are therefore not nationally representative, Texas and Florida supply a large portion of infant information to the national surveillance systems. Thus, national trends of neonatal critical conditions such as seizures that are estimated from BCs ought to be interpreted with caution. Moreover, causal inference research that use BC neonatal critical conditions should interpret the results with extreme caution given the high false positive rates of the BC data. Claims data sources appear to be more suited for causal inference research studying the neonatal critical conditions.

In conclusion, we compared the extent of agreement between BCs and claims data regarding capture of neonatal critical conditions in two large states and found low agreement. Both data sources captured cases that the other one did not, presumably due to underreporting or capture of false positives. Future research ought to examine reasons for discrepancies between the two data sources.