Introduction

Unplanned readmission after discharge is an indicator of quality of care, often suggests problems with patient care during previous hospitalizations, and is also a key indicator of the adequacy of a patient's treatment plan. Broadly defined, a hospital readmission involves the readmission of a patient who had been discharged from a hospital, to the same or another hospital within a specified time frame. The original hospital stay is often called the "index admission" and the subsequent hospital stay is called the "readmission." Different time frames have been used for research purposes, the most common being 30-day, 90-day, and 1-year readmissions. In the United States, it has been reported that 20% of Medicare patients are readmitted within 30 days of discharge, and that unplanned readmissions cost an estimated $17.4 billion. In the United States, as part of the Affordable Care Act at the national level, institutional mechanisms have been established, such as providing incentives for each hospital based on the readmissions within 30 days of discharge. Since then, various studies have been conducted on unplanned hospital readmissions within 30 days, and many countries are using it as an indicator of quality of care. Considering this, our study also aims to develop a system for predicting readmission within 30 days, which can be applied to actual clinical practice.

The relationship among climate, human health, and diseases has been established through multiple studies in the past1, and various risk factors for hospital admission have been studied based on demographic, environmental, and clinical factors2,3,4,5,6,7,8,9. Temperature variation due to global warming has been linked to hospital admission rates10,11,12,13. Humidity has been noted as another important health factor, while pollutants, including carbon monoxide and fine air particulates, have been found to be associated with increased admissions for multiple conditions13,14,15. Further investigation of the association between the ambient climate condition and hospital admission will help healthcare stakeholders understand the severity of the effect of weather change and implement healthcare-resource and patient-care plans.

The common data model (CDM) is a healthcare data model based on standard terminology. An example is the CDM developed by the Observational Medical Outcomes Partnership (OMOP) and maintained by the Observational Health Data Sciences and Informatics (OHDSI)16,17,18. A system developed by converting data into a CDM is easily applicable through the distribution of the source code of the program without the need for installing the software on a specific institution’s system19. CDM is a data model based on common standard terms. Therefore, it guarantees standardized content from the data model and exhibits high extensibility.

In the present study, we developed and validated four prediction models for hospital readmission within 30 days of discharge using the OMOP CDM as well as weather and air quality factors. In addition, the model performance was externally validated to examine its extensibility. To the best of our knowledge, the present study is the first to create a patient-level prediction model for hospital readmission within 30 days using OMOP CDM and ambient weather data. A predictive model that combines weather and environmental data with a patient’s residence information is expected to enhance clinical decision making at the individual patient level. More specifically, the W-score of an individual patient was obtained by adding up the forecast values for each weather element for 7 days from the date of discharge, which enabled the use of the weather forecast data of the Korea Meteorological Administration to predict the re-hospitalization of this patient at the time of discharge. The model was designed with a view to using the weather forecast data for the next 7 days for the patient's address for clinical decision-making.

Results

Of the 61,922 index hospitalizations from the Seoul National University Hospital (SNUH) data included in our cohort research, 5794 resulted in a 30-day readmission through emergency-room visits (Table 1). The mean age of the readmitted individuals was 75.2 years, and more than half of the readmitted patients were males. The average length of stay was 2.5 days for the readmitted group and 0.2 days for the non-readmitted group.

Table 1 Basic characteristics of study data for each visit type.

Table 2 presents the number of patient visits and readmission incidence rate in different disease groups. The internal and external validation results of the proposed readmission prediction models are presented in Table 3, where we can observe the differences in model performance among different diseases. The external validation results indicate that the proposed models show significantly improved performance for the musculoskeletal disease group. The main purpose of external validation is to verify how generalized and interpretable the developed model can be for performance evaluation of the developed model. In this study, it is expected that the model performance improved in the external validation experiment due to the difference between the size of the data used for the external validation and the size of the data for which the model was developed (and internal validation was performed). Since the results were better in the verification process of the model for larger data, we are confident that the model created in this study is robust enough for generalization. Furthermore, supplementary Table S3-S6 shows the top 20 predictors of each model in this study. According to Table 4, PM10, rainfall, and maximum temperature were the weather and air quality variables that most impacted the model among the disease groups.

Table 2 Number of visits in each disease group and outcome incidence rate in our research cohorts.
Table 3 Comparison of disease-specific performance in each model based on the area under the receiver operating characteristic curve.
Table 4 Weather and air quality predictors in W-score.

Table 5 shows the details of the hyperparameter values used in this study.

Table 5 Summary of parameter values in each model.

The receiver operating characteristic curves in Fig. 1 reflect the predictive model performances for the internal and external validation of the models based on clinical covariates and W-score in patients with diseases of the musculoskeletal system and connective tissue, respectively. The clinical covariate and W-score model exhibited the greatest AUC for both the internal and external validations in the musculoskeletal disease group.

Figure 1
figure 1

ROC curves for the validation of the Adaboost and decision tree models.

Discussion

We developed a 30-day unplanned hospital readmission prediction model based on OMOP-CDM transformed patient medical records and meteorological public data. We also obtained the weather and air quality records for the patients’ residence localities. Furthermore, we established a W-score for individual visits based on the Korean weather warning issuance criteria. In addition, we developed a model that can predict patient readmission when discharged by using weather forecast data directly from the clinical setting.

Many epidemiological studies have established an association between environmental factors and hospital readmissions11,12,20,21. However, few studies have examined the impact of environmental factors, such as ambient air pollution or climate, on hospital readmissions and the result of the health outcome using predictive analysis.

We developed a model to predict hospital readmission at the time of discharge based on patient-level clinical diagnosis and drug prescription data before discharge as well as the weather and air quality records for the patient’s residence locality. The variables used in the proposed model were designed based on diagnosis and drug information to make the model extensible, considering the standard term mapping issues that may arise in the process of converting electronic health record (EHR) data to OMOP CDM. This is because diagnostic and drug terminology do not differ significantly from the terminology used by most many hospitals.

The Korea Meteorological Administration (KMA) provides weather forecast information for a period from 3 to 10 days from the forecast date. If the KMA weather forecast and the hospital system are linked in the future, so if short-term weather forecast data for 7 days from the patient's discharge date are input to the developed readmission prediction model, the actual patient's readmission forecast information will be used for clinical decision making.

The performance of the proposed model for the respiratory disease cohort was lower than expected. Moreover, the performance of the proposed model for the musculoskeletal disease cohort demonstrated good scalability. These results are presumed to be due to the occurrence of readmission for acute events that require post-operative management, rather than hospitalization due to the occurrence of chronic diseases in tertiary hospitals. Many patients who needed trauma management after surgery were not hospitalized for a sufficient period. The results of a disease-specific predictive model can be observed in further studies based on our research.

We could not externally validate the proposed model across multiple organizations. However, the proposed model can be easily reintegrated when migrating to a different EHR, either as an embedded frame in the EHR or as a standalone CDM application. Furthermore, the proposed model can perform better using a sophisticated weather data function approach. Our research provides a basis for future applications of the proposed model to clinical settings, to manage visiting patients based on clinical and weather data.

In summary, providing a clinical basis for a patient’s future risk of readmission at the time of discharge will assist hospitals in developing a patient care plan in advance. We developed a model for predicting hospital readmission based on environmental factors. External verification of the model demonstrated that a high-accuracy model can be developed based on weather and air quality factors. Improving the accuracy of the readmission prediction model will help in establishing patient care plans and making clinical decisions at the time of discharge.

Methods

Study population and clinical data description

Our retrospective cohort study was conducted using OMOP-CDM-converted EHR data between January 1, 2017 and December 31, 2018 from SNUH and the Seoul National University Bundang Hospital (SNUBH) in the Seoul metropolitan area, South Korea. These hospitals have converted the EHR data over a 15-year period into the OMOP CDM.

We considered consecutive hospitalizations among adults over 65 years who were discharged alive and underwent at least one hospitalization or emergency-room visit during our study period. We focused on patients living in the Seoul metropolitan area, including the Gyeonggi Province in South Korea, to create prediction models that consider weather and environmental variables during the study period.

In addition, we categorized patients into subgroups based on weather- and environment-related diseases studied previously20,22,23,24,25,26,27. Patients diagnosed with mental and behavioral disorders (F00–F99), circulatory system diseases (I00–I99), respiratory diseases (J00–J99), and musculoskeletal system and connective tissue–related diseases (M00–M99) at discharge were included together in subgroups based on the International Classification of Diseases, 10th Revision.

The primary outcome of this study was 30-day unplanned hospital readmission. We referred to the Hospital-Wide All-Cause Unplanned Readmission (HWR) measure from Centers for Medicare & Medicaid Services (CMS)28. According to the HWR measure, CMS classified the planned readmissions into planned disease or treatment groups, including chemotherapy, organ transplant, and rehabilitation. All admissions other than the scheduled admissions were considered to be unscheduled visits.

Figure 2 illustrates the study cohort design derived using SNUH data, which are mainly used as the training dataset in our research. Figure 3 shows the overall study process in this research.

Figure 2
figure 2

Study cohort design.

Figure 3
figure 3

Overall methodology of the study.

Clinical features, such as the gender of the patient, age of the subject on the index date, diagnosis conditions, drug exposures for patient medications, and the Charlson comorbidity index (Romano adaptation), were obtained using all conditions prior to the end of the readmission interval.

Diagnosis and drug prescription were used as clinical variables for individual patients. Moreover, each variable was extracted from the standardized CONDITION_ERA and DRUG_ERA in the CDM table as a higher concept of individual diagnosis and drugs. In OMOP CDM, a CONDITON_ERA data table is defined as the duration in which the patient is assumed to have a given condition29. The CONDITION_ERA table provided a chronological period of diagnosis. DRUG_ERA is defined as the duration in which the patient is assumed to be exposed to a particular active drug ingredient. The DRUG_ERA table provided successive periods of individual drug prescriptions combined following certain rules to produce continuous eras.

Weather and air quality data

Weather and air quality data were derived from KMA’s weather data open portal (https://data.kma.go.kr) and the official website of the Korean Ministry Of Environment (MOE) (https://www.airkorea.or.kr/eng) 30,31.

Records of daily mean temperature (ºC), daily mean relative humidity (RH) percentage (%), and daily rainfall (mm) during the study period were obtained from the KMA website. The daily mean concentration of ambient particulate matter (PM in μg/m3), sulfur dioxide (SO2 in μg/m3), nitrogen dioxide (NO2 in μg/m3), and ozone (O3 in μg/m3) from all general monitoring stations were collected from the Air Korea website for the study period. The daily median was averaged across the data for any missing record on a particular day. KMA and Air Korea data needed to be preprocessed into postal zip codes owing to the varying levels of location information granularity. LOCATION_ ID in CDM DB has an address identifier based on the postal code address system. For example, LOCATION_ ID for Jongno-gu, Seoul does not match SNUH LOCATION_ ID and SNUBH LOCATION_ ID value. Therefore, it is necessary to first check the details of the LOCATION table of each institution CDM DB. Meteorological data of KMA and air environment data of Air Korea are recorded at each measuring station across the country. KMA data is divided into cities/metropolitan cities/provinces, and Air Korea data is based on a smaller unit, that is, the street address. Therefore, we first integrated KMA data and Air Korea data with the same granularity, and performed preprocessing by finding the postal code for the integrated address and matching it with the patient's residence address.

W-score: weather and air quality scores for individual visits

We calculated a patient-level W-score based on weather and air quality data for each patient visit based on the patient’s residence locality. The score was derived using the KMA’s standards for special weather reports32. A special weather report refers to a forecast that calls attention to or warns against a serious disaster that is expected to occur because of a weather phenomenon. An “advisory” is issued if a disaster is expected because of a specific weather phenomenon, and a “warning” is issued if significant damage is expected. KMA issues weather reports on strong winds, wind waves, heavy rains, heavy snow, dry weather, storm tidal waves, earthquakes, cold waves, typhoons, yellow dust, and heat waves (Supplementary Table S1 and S2). Data such as the daily average particulate matter (PM10), maximum temperature, minimum temperature, relative humidity, and precipitation were used. Only PM10 was used among various atmospheric data, such as PM10, PM2.5, SO2, NO2, and O3, for calculating the W-score because there are many missing values of PM2.5 in the source data, and PM10 and PM2.5 have a multicollinear relationship.

W-scores of individual patient visits were calculated using weather conditions, such as fine dust warning, heat wave, cold wave, dryness, and heavy rain. The meteorological warning issuance criteria of the KMA were used for calculating W-scores for each element. We obtained the W-score by calculating the sum of the weather element–specific forecast values for 7 days from the discharge date so that the weather forecast data from KMA can be utilized at the time of patient discharge. Since its purpose is to predict readmission for this patient at the time of discharge, it is designed considering that weather forecast data for the next 7 days will be input and used for clinical decision-making.

Model development

The prediction model for re-admission within 30 days was developed to reflect variables such as clinical diagnosis and drug prescription prior to patient discharge date as well as to predict the occurrence of re-admission of the patient by considering the W-score for the weather forecast at the patient’s residence location after the discharge date (Fig. 4).

Figure 4
figure 4

Prediction window.

We developed tree-based machine learning models, namely, DT, random forest (RF), ADA, and gradient boosting machine (GBM)–based classifiers, based on the weather and air quality feature set using the patient-level prediction R package developed by OHDSI19. Models were trained and tested on SNUH data. All possible combinations of the hyper-parameters are included in a grid search using cross-validation on the training set. Ten-fold cross-validation is used to select the optimal hyper-parameter and internal validation. The hyper-parameters that lead to the best cross-validation performance will then be chosen for the final model. For our problem, we choose to build tree-based classifiers with several hyper-parameter values, as described in Table 5. Moreover, the models were externally validated using the SNUBH dataset. Each model performance was evaluated using the area under the receiver operating characteristic curve.

Approval and consent waiver statement

This study was performed in accordance with the relevant guidelines and regulations of SNUH and SNUBH Institutional Review Board. As the data source was de-identified, this study was approved based on waivers of informed consent or exemptions by SNUH and SNUBH Institutional Review Board (SNUH IRB No: B-1504-296-302, SNUBH IRB No: X-1908-559-901).