Introduction

Schizophrenia is a severe mental health disorder marked by cognitive and emotional disturbances that significantly impact an individual’s social functioning1. Estimates of lifetime prevalence of schizophrenia have varied widely due to the heterogeneity of the disorder, as well as the study methods used, including the design, prevalence period type, data source and data quality2. Estimates of the prevalence of schizophrenia in the United States vary from 0.3 to 1.6% in community-based samples, 1.02% in uninsured samples, and 0.13% in commercial insurance samples to 1.66–5.11% in Medicaid3,4,5. Prevalence estimates have often relied upon community-based data estimates from over two decades ago3, while more recent studies have shown considerable promise leveraging pooled datasets4.

Surveillance methods in the United States face several challenges. The United States does not have a national health registry database, and national studies of community samples are costly. Moreover, occupational loss, institutionalization, and downward socioeconomic trajectories have greatly complicated naturalistic prospective longitudinal observational studies of individuals with schizophrenia. Researchers and governments have relied on large administrative claims databases, such as Medicaid, to conduct surveillance and epidemiological studies. Medicaid includes a large proportion of individuals with schizophrenia; however, it may underestimate incidence and prevalence when used alone4. Accordingly, using multiple independent databases may expand and diversify the population observed, allow for comparison of the prevalence and incidence by insurance type, and support testing and refinement of incidence measures to increase generalizability of assessment methods.

Given the chronic nature of schizophrenia, it is also important to examine prevalence and incidence across the lifespan for age-related differences to avoid underestimating prevalence and incidence. Accuracy and ongoing monitoring of prevalence and incidence are important to understand the scope of the burden of schizophrenia across the lifespan and to optimize interventions at specific developmental ages and within defined subgroups most in need of resources.

The current study compares the prevalence, incidence, and specificity and predictive value of incidence measures with varying lookback periods using two administrative databases in the United States. We hypothesized that a national commercial insurance database of employed individuals and their families (MarketScan) would exhibit a lower prevalence and higher proportion of incident to prevalent cases, due to an expected disproportionate loss of employer provided commercial insurance for the schizophrenia population. We also hypothesized that the Medicaid population would have higher prevalence rates as individuals with schizophrenia have a higher burden of socioeconomic risk factors, in addition to the tendency to transition from commercial insurance to Medicaid after developing schizophrenia. We examined the specificity and positive predictive value of incidence measures in these two databases across the lifespan, with the goal of addressing a critical gap to inform future schizophrenia research and policy development using large disparate databases.

Methods

Data sources

This cross-sectional study used two databases: (1) MarketScan, a proprietary deidentified national research database that comprises largely individuals covered under employer-sponsored commercial insurance, including employed individuals, their spouses, and dependents; and (2) New York State Medicaid, a large state public insurance program.

Study populations

Total population included all individuals in the MarketScan and Medicaid databases in 2019 with 12 months of continuous insurance (allowing for a 45 day gap), and one or more health services (inpatient or outpatient) during 2019. The 10 year continuously insured cohort was the subset of the total population that had continuous insurance in the 10 years prior to 2019 (1/1/2009-12/31/2018), with 12 months of coverage in each year, and no more than a 45 day gap in any year. To ensure that individuals had an opportunity to be assessed, one or more health service visits per year was an additional criterion for study inclusion.

Definitions and measures

Demographic characteristics included age (as of 12/31/2019), constructed age cohort groups (0–10, 11–15, 16–20, 21–25, 26–30, 31–35, 36–40, 41–45, 46–50, 51–55, 56–60, 61–64, 65+ years of age), a ≤35 years of age group, and sex (male/female). Race and ethnicity (available only in Medicaid database) were categorized as Hispanic or Latinx (any race), or non-Hispanic White, Black, Asian, Native American, Multiracial, or Unknown. Prevalent cases of schizophrenia include all individuals with a diagnosis of schizophrenia during the year (both new and pre-existing). In the present study prevalent cases were defined as those with two or more diagnoses of schizophrenia or schizoaffective disorder (ICD-9: 295.x; ICD-10: F20, F25; excluding schizophreniform: 295.4x, F20.81) in 2019. Incident cases of schizophrenia are the subset of prevalent cases who have a new onset of schizophrenia during the year. In this study incidence of schizophrenia was examined only in the 10 year continuously insured cohort, and was defined as the individuals with a diagnosis of schizophrenia in 2019 who had no prior diagnosis of schizophrenia in the preceding 10 years (1/1/2009-12/31/2018).

Analysis

The prevalence of schizophrenia in 2019 was calculated for the total population, and for the 10 year continuously insured cohorts in MarketScan and in Medicaid. The incidence of schizophrenia in 2019 was calculated for the 10 year continuously insured cohorts only. To identify new onset cases of schizophrenia, ideally an individual’s lifelong medical record would be reviewed, but no such database exists in the United States. We used a 10 year lookback period without any prior evidence of schizophrenia as the reference standard in Medicaid and MarketScan for defining incident cases of schizophrenia. Any case of schizophrenia diagnosed in 2019 that had no evidence of a schizophrenia diagnosis in the decade prior (1/1/2009-12/31/2018), was considered an incident case.

We examined the accuracy of using shorter lookback periods to identify incident cases of schizophrenia, since using a 10 year lookback period greatly reduces the size of the study population to those with 10 years of continuous observation in the database, and a 10 year lookback may not be feasible for some studies or databases. A variety of methods have been reported in the literature to assess incidence, ranging from lifetime review to a period as short as one-year prior to the index schizophrenia date4, but little data is available on the accuracy of these estimates to inform methodological decisions. Schizophrenia is a chronic condition, and it may be expected that for those engaged in care a single year or two of lookback may capture nearly all of the cases that would be identified in a much longer or lifelong review. However, if there is diagnostic uncertainty in the early years of illness, or if some individuals are diagnosed but have long gaps in engagement in treatment, then shorter lookback periods will miss prior diagnoses of schizophrenia. We examined the accuracy of measures of new onset of schizophrenia, testing lookback periods of “x” duration (1, 2, 3, 4, 5, 6, 7, 8 and 9 years), compared to a reference standard of a 10 year lookback period prior to the first observed diagnosis of schizophrenia in 2019 (Fig. 1). We calculated the positive predictive value (PPV) and specificity6 of incidence measures of new onset schizophrenia using sequentially longer lookback periods. Sensitivity and negative predictive value testing were not required since both were 100% for all values of “x” years of lookback due to the absence of false negatives; all prior diagnoses of schizophrenia identified within “x” years, will also be identified with a longer lookback period.

Fig. 1: Definitions of positive predictive value and specificity—adapting accuracy testing to incidence measures of new onset schizophrenia.
figure 1

We examine the positive predictive value and specificity of measures of new onset schizophrenia among individuals that were identified as having schizophrenia in 2019, and who had 10 years or more of continuous observation prior to 2019 in Medicaid or MarketScan. We test the positive predictive value and specificity of alternative measures of new onset schizophrenia using sequentially longer lookback periods “x” (1, 2, 3, 4, 5, 6, 7, 8, and 9 years), compared to our reference standard for new onset schizophrenia (10 years lookback with no prior evidence of schizophrenia). Test Positive cases are those defined as having new onset of schizophrenia using a specific lookback period of “x” years prior to 2019 (no prior schizophrenia diagnosis observed in “x” years lookback period). Test Negative cases are those where the measure finds a prior diagnosis of schizophrenia using a lookback period of “x”. True Positive and True Negative tests are those cases where using a lookback period of “x” provided an accurate result (consistent with the reference standard using a 10-year lookback period). As the lookback period “x” increases, more previous diagnoses of schizophrenia are identified, False Positives go down and True Negatives go up, leading to increases in Positive Predictive Value and Specificity.

The New York State Office of Mental Health IRB approved the study protocol, and the University of Chicago IRB approved their use of MarketScan data as IRB exempt.

Results

Population characteristics

The characteristics of the 2019 total population and 10 year continuously insured cohorts for MarketScan (N = 16,365,997 and n = 951,173, respectively) and Medicaid (N = 4,414,153 and n = 785,088, respectively) are summarized in Table 1. The mean age (±SD) was older in MarketScan (38.1 ± 20.2 years) than in Medicaid (33.5 ± 24.3 years).

Table 1 Characteristics of insured individuals in MarketScan and Medicaid in 2019 for total population and 10 years continuously insured cohorts.

The proportion of females was similar in both databases (MarketScan 54.14%, 95% CI: 54.12–54.17; Medicaid 55.68%, 95% CI: 55.64–55.73), and increased in the 10 year insured cohort (MarketScan 62.02%, 95% CI: 61.92–62.12; Medicaid 61.11%, 95% CI: 61.10–61.16). In the Medicaid 2019 total population, the Multiracial group was the largest (31.53%, 95% CI: 31.49–31.58), followed by White (26.45%, 95% CI: 26.41–26.49), Black (15.78%, 95% CI: 15.75–15.81), Hispanic or Latinx (12.17%, 95% CI: 12.14–12.20), Asian (9.63%, 95% CI: 9.60–9.66), and Native American (0.31%, 95% CI: 0.30–0.31) race and ethnicity groups. In comparison, in the Medicaid 10 year insured cohort the White group was the largest (31.19%, 95% CI: 31.11–31.30), followed by Multiracial (21.83%, 95% CI: 21.74–21.92), Hispanic or Latinx (19.10%, 95% CI: 19.01–19.19), Black (15.76%, 95% CI: 15.68–15.84), Asian (9.63%, 95% CI: 9.60–9.69), and Native American (0.29%, 95% CI: 0.28–0.30) race and ethnicity groups.

Prevalence

The prevalence of schizophrenia in 2019 for the total MarketScan and Medicaid populations and the 10 year continuously insured cohorts is summarized in Table 2. The prevalence of schizophrenia in the total population was higher in Medicaid (2.13%, 95% CI: 2.12–2.15; n = 94,153 of 4,414,153) than in MarketScan (0.134%, 95% CI: 0.132–0.135; n = 21,963 of 16,365,997). In both databases the prevalence was higher among the subsample of individuals with 10 years of continuous insurance prior to 2019, but this increase was more marked in Medicaid (5.74%, 95% CI: 5.69–5.79 vs. 2.13%) than in MarketScan (0.159%, 95% CI: 0.152–0.168 vs. 0.13%). Males had a higher prevalence than females (Medicaid: 2.80%, 95% CI: 2.78–2.83 vs. 1.60%, 95% CI: 1.58–1.62; MarketScan: 0.135%, 95% CI: 0.133–0.138 vs. 0.133%, 95% CI: 0.131–0.136) in the total population, as well as in the 10 year continuously insured cohort (Medicaid: 7.89%, 95% CI: 7.80–7.99 vs. 4.37%, 95% CI: 4.31–4.43; MarketScan: 0.183%, 95% CI: 0.170–0.198 vs. 0.145%, 95% CI: 0.136–0.155).

Table 2 Prevalence [95% CI] of schizophrenia in MarketScan and Medicaid (2019).

The prevalence of schizophrenia by age group for the total population and 10 year continuously insured cohort is presented in Fig. 2. The prevalence of schizophrenia in the Medicaid population increased over the course of the lifespan up to age 56–60 years (total population: 5.0%; 10 year continuously insured cohort; 11.3%). However, in MarketScan the prevalence peaks among young adults (0.25% for the total population at 21–25 years old, and 0.78% among those with 10 years of continuous insurance at 26–30 years old).

Fig. 2: Prevalence of schizophrenia in Medicaid and MarketScan for total population and for those with 10 years continuous insurance by age group over the lifespan (2019).
figure 2

We examined the prevalence of schizophrenia in 2019, by age group, in the total population, and in the 10-year cohort (the subset of the total population with 10 years of continuous insurance prior to 2019) in the Medicaid and MarketScan study populations. In 2019, and for each year of the 10-year cohort, individuals were required to have continuous insurance with a maximum gap of 45 days per year, and one or more health service per year to ensure an opportunity to be observed. Prevalent cases were defined as those with two or more diagnoses of schizophrenia or schizoaffective disorder (ICD-9: 295.x; ICD-10: F20, F25; excluding schizophreniform: 295.4x, F20.81) in 2019.

Incidence

The incidence of schizophrenia diagnoses in 2019 for the 10 year continuously insured cohort is presented in Table 3. Incidence was higher in Medicaid (0.19%, 95% CI: 0.18–0.20; n = 1,489 newly diagnosed in 2019); than in MarketScan (0.07%, 95% CI: 0.07–0.08; n = 712), and for males than females (Medicaid: 0.21%, 95% CI: 0.20–0.23 vs. 0.18%, 95% CI: 0.16–0.19; MarketScan: 0.08%, 95% CI: 0.07–0.09 vs. 0.07%, 95% CI: 0.07–0.08). While incident cases made up only a small proportion of the prevalent cases in the 10 year continuously insured cohort in Medicaid in 2019 (3.3%, 95% CI: 3.1–3.5), incident cases constituted nearly half of prevalent cases in MarketScan (46.9%, 95% CI: 44.4–49.4).

Table 3 Incidence of schizophrenia in Medicaid and MarketScan (2019 10 years continuously insured).

Incidence by age group is presented in Fig. 3. The pattern is similar in both databases, with incidence rising rapidly in late adolescence and peaking in young adulthood (26–30 years of age, Medicaid: 0.40%; MarketScan: 0.16%). New cases of schizophrenia continue to emerge over the lifespan in both populations, but at a lower frequency.

Fig. 3: Incidence of schizophrenia in 2019 in Medicaid and MarketScan by age group.
figure 3

We examined new onset (incidence) of schizophrenia in 2019 among individuals with 10 years of continuous insurance prior to 2019 in Medicaid and MarketScan. New onset of schizophrenia in 2019 was defined as those with two or more diagnoses of schizophrenia or schizoaffective disorder in 2019 (ICD-9: 295.x; ICD-10: F20, F25; excluding schizophreniform: 295.4x, F20.81), and no diagnoses of schizophrenia observed in the 10 years prior to 2019. Individuals were required to have continuous insurance with a maximum gap in insurance of 45 days per year, and one or more health services per year, during 2019 and each of the 10 years prior, to ensure an opportunity to observe a prior diagnosis of schizophrenia.

Accuracy of incidence measures

Table 4 summarizes the positive predictive value (PPV) for schizophrenia incidence measures with different periods of observation prior to the first schizophrenia diagnosis in 2019, using a 10 year reference standard. One year of observation prior to the index schizophrenia diagnosis yielded a PPV of 30% (95% CI: 29–31) for the total Medicaid population, and 51% (95% CI: 48–55) for those ≤35 years of age. The one year prior period incidence measure performed better in MarketScan, with a PPV of 81% (95% CI: 79–84) in the total MarketScan population, and 79% (95% CI: 73–84) for those ≤35 years of age. Increasing the period of observation prior to the index schizophrenia diagnosis increased the PPVs of the incidence measures in both databases. A PPV of ≥95% (96%, 95% CI: 93–97) was achieved in Medicaid for individuals ≤35 years of age using a minimum of seven years of prior observation, while for MarketScan a minimum period of only two years of prior observation (95%, 95% CI: 90–97) was required. For both Medicaid and MarketScan, PPVs of the incidence measures tested were generally higher for females than for males.

Table 4 Positive predictive value (PPV) of incidence measures of schizophrenia by duration of observation prior to the index schizophrenia diagnosis in Medicaid and MarketScan populations (2019 10 years continuously insured).

Figure 4 presents the PPVs of schizophrenia incidence measures by age group and by years of observation prior to the index schizophrenia diagnosis in 2019. In general, PPVs for new onset schizophrenia measures increase with increasing years of observation prior to the index diagnosis, and are higher for younger age groups. In Medicaid, incidence measures had the highest PPVs for individuals 11–15 years of age, followed by 16–20 years, 21–25 years, 26–30 years, and 31–35 years, with the lowest for individuals 65+ years of age. In MarketScan, the PPVs for incident measures for all age groups were generally higher than Medicaid for all age groups, but the younger age groups performed relatively less well, with the lowest PPV for individuals 26–30 years of age who achieved a 91% PPV by three years, and required seven years of prior observation to achieve a ≥95% PPV.

Fig. 4: Positive predictive value of schizophrenia incidence measures by duration of observation prior to index schizophrenia diagnosis for different age groups.
figure 4

We examine the positive predictive value of measures of new onset schizophrenia among individuals that were identified as having schizophrenia in 2019, and who had 10 years or more of continuous observation prior to 2019 in Medicaid or MarketScan. We test the positive predictive value of alternative measures of new onset schizophrenia using sequentially longer lookback periods “x” (1, 2, 3, 4, 5, 6, 7, 8, and 9 years), compared to our reference standard for new onset schizophrenia (10 years lookback with no prior evidence of schizophrenia). The Positive Predictive Value = True Test Positive/ Test Positive cases. True Positive tests are those cases where using a lookback period of “x” provided an accurate result (consistent with the reference standard using a 10-yearlookback period). Test Positive cases are those defined as having new onset of schizophrenia using a specific lookback period of “x” years prior to 2019 (no prior schizophrenia diagnosis observed in “x” years lookback period).

The specificity of measures of new onset schizophrenia was high in both Medicaid and MarketScan, with ≥95% specificity achieved in both databases using two or more years of observation prior to the index schizophrenia diagnosis (Table 5). Specificity was generally higher in Medicaid than in MarketScan (92.10%, 95% CI: 91.84–92.35 vs. 79.93%, 95% CI: 76.99–82.64 for the total population using one year of observation prior to the index schizophrenia diagnosis).

Table 5 Specificity of incidence measures of schizophrenia by duration of observation prior to index schizophrenia diagnosis in Medicaid and MarketScan populations (2019 10 years continuously insured).

Discussion

This study examined the prevalence and incidence rates of schizophrenia using two administrative databases in the United States and tested the accuracy of incidence measures using varying periods of observation prior to first observed diagnosis by age group. We found a higher prevalence (over 10-fold difference) and incidence (over two-fold difference) of schizophrenia in Medicaid compared to MarketScan. Moreover, the prevalence and incidence were higher in both databases for males compared to females, and among individuals with 10 years or more of continuous insurance compared to those with fewer years of continuous insurance. Accuracy testing indicated that well defined and valid cohorts of individuals with newly diagnosed schizophrenia can be created with high (≥95%) positive predictive value and specificity in both MarketScan and Medicaid by adjusting the lookback period and age group studied.

One implication of these findings is that depending on the outcome of interest and database used, it may be necessary to restrict the analysis to defined subcohorts, by age and duration of continuous insurance prior to first observed diagnosis of schizophrenia. For example, for individuals ≤35 years of age, a positive predictive value of ≥95% required only two years for MarketScan but seven years for Medicaid prior to the first documented schizophrenia diagnosis, while for both Medicaid and MarketScan only a two year period of observation was required to achieve a specificity ≥95% in this age group. MarketScan excels at identifying incident cases, even among older adults, as evidenced by the ≥95% positive predictive value for each age group examined over 45 years of age using only a two year period of observation prior to first schizophrenia diagnosis. In contrast, Medicaid required a restriction to younger ages (individuals ≤20 years old) to achieve a similarly high PPV of ≥90% when using a two year period.

The prevalence findings also highlight the differences between Medicaid and the MarketScan commercial insurance population. A chronic condition would be expected to rise in prevalence over the course of the lifespan, as was observed in the Medicaid population. By contrast in MarketScan, the prevalence peaks in young adulthood, and we observed a high proportion of incident to prevalent cases. This pattern is consistent with a tendency for individuals who develop schizophrenia to lose commercial insurance when they can no longer be covered on their parents’ policy (United States Affordable Care Act, maximum age 26 years old). The loss of commercial insurance coverage in young adulthood among individuals with schizophrenia mirrors the same trend in the general population7, but is greatly exacerbated among individuals with schizophrenia due to loss of unemployment in the typical young adult onset of psychosis with progressive cognitive impairment8. Additional studies are needed to better understand the impact of a new schizophrenia diagnosis on retention in the workforce and employer-based commercial insurance, and the transition to public insurance or loss of health insurance coverage, for example through survival analysis of retention in commercial insurance for young adults with and without new onset schizophrenia.

The incidence of schizophrenia varies over the lifespan with a similar pattern in Medicaid and MarketScan but is higher at all ages in the Medicaid population. Both populations have the lowest incidence rates in childhood, peak in young adulthood, dip in middle age, rise again in the 5th decade of life, and then decline again in older adulthood. Previous reports have identified higher prevalence of schizophrenia in Medicaid populations than other insurance types or the general population3,4,5, but we are not aware of large studies reporting on differences in incidence between populations. As proposed by Simeone et al.2, examination of prevalence and incidence rates over the lifespan provides a more comprehensive understanding that might otherwise be overlooked in surveillance studies that exclusively focus on young adulthood.

These findings have implications for future research on incidence and outcomes of schizophrenia using large health datasets. Population health is currently limited in understanding the benefit/risk assessment of treatments for new onset and prevalent cases of schizophrenia across the lifespan. Large-scale simple trials could be conducted on administrative health datasets using these well-defined cohorts to help address this understanding gap. The measures of new onset schizophrenia suggested here are particularly promising and could be used to test predictive models for identifying individuals at risk for developing schizophrenia. Identifying individuals at increased risk of schizophrenia could allow for early intervention and risk mitigation.

Limitations

This study has several limitations. Both MarketScan and Medicaid data are administrative claims and encounter databases; any services not invoiced are not observed. Moreover, administrative health databases rely upon diagnoses assigned in routine clinical practice, not research diagnostic assessments. Notably however, a meta-analysis of the accuracy of mental health diagnoses in administrative data found the highest positive predictive values for the broad category of psychotic illness (PPV ≥ 80%, with most studies ≥90%) and schizophrenia (PPV ≥ 75% for the majority of studies), with diagnostic errors occurring primarily at the level of the clinician, rather than errors in administrative data processing9. MarketScan is a national database of health services delivered to predominantly employed individuals and their families, while the Medicaid database includes services delivered to individuals enrolled in a large state program with high rates of unemployment, poverty, and disability, and findings may not generalize to other databases. However, incorporating these two disparate databases in our analyses may increase the generalizability of our findings, but further study is needed.

Conclusion

Incident cases of new diagnoses of schizophrenia can be identified in electronic health record databases with excellent validity for individuals aged 35 years and younger, with positive predictive values and specificity exceeding 95%. Combining multiple databases that encompass both Medicaid and commercial insurance provides a more comprehensive view of prevalence and incidence of schizophrenia. Further work is needed to leverage these findings to develop robust clinical outcome predictors for diagnosis, prevention, treatment, and prognosis of schizophrenia and other related mental health conditions.