1 Introduction

With the advancement of technologies, a substantial amount of health data has been accumulated in digital format. The health information contained in the data consists of both individual patient and population-level health information that is kept in electronic health records (EHRs). The goal of EHRs is to increase the effectiveness of a given healthcare system by managing patient medical records and minimizing pharmaceutical errors with the underlying intention of providing better care (Kruse et al. 2016). Various data formats are captured in EHRs, including pictures, free text, integers, and symbols. These data can be divided into structured and unstructured categories. It is simple to examine structured data using conventional machine learning techniques, such as patient demographics, blood pressure, and medication, as they are monitored using established metrics. Contrarily, narrative data, which are more complex since they are not in a structured format, are referred to as unstructured data and include medical photographs, pathology reports, and surgical notes (Sun et al. 2018). Unstructured data contains a lot more valuable information than structured data. However, as unstructured data in medical reports sometimes contains ambiguous information, it is challenging to extract meaningful information from it. Abbreviations, spelling mistakes, and grammar mistakes are frequently present in clinical text data, making it challenging to analyze the data.

In recent years, the number of hospitals have been increasingly using EHRs to store information related to patients’ medical histories. This has indirectly contributed to physicians’ use of computers for data records that contain a wide range of clinical data (Hornberger 2009). Clinical decision making can be supported by the vast amount of latent information that is present in EHRs. As such, there is an inherent need to create a comprehensive model for analyzing this type of data. Various artificial intelligence and machine learning models have been successfully used with EHR records over the past ten years to recognize, anticipate, evaluate, and categorize medical data (Maurya et al. 2021). The heterogeneous nature of the data types—which includes numerical data, date time objects, free text, etc.—means that despite the fact that numerous models have been created to accommodate EHR data, there are still issues that prevent the healthcare information from being fully leveraged (Goldstein et al. 2017). Furthermore, the majority of machine learning models have exhibited difficulties in identifying temporal patterns in data that contain numerous repeating sets of variables. Some conventional models rely on taking a single number out of the time series, like the mean, median, or other agglomerated statistics (Xie et al. 2020). The failure to fully exploit the data’s temporal dynamics can result in valuable sequential information loss (Zhao et al. 2017).

Modern deep learning-based approaches have been proposed for the representation of temporal EHR data due to the limits of machine learning algorithms in addressing these issues. The temporal aspect of the EHR can be handled by these sequential deep learning models. Deep learning algorithms have proven to be more effective at modeling temporal EHR data in many applications due to their adaptability and generalizability (Yang et al. 2017; Hung et al. 2017; Reddy and Delen 2018; Park et al. 2020). Deep learning frameworks are designed to offer a complete system that learns from unprocessed data and autonomously completes predetermined tasks. Deep learning can be more advantageous than conventional machine learning because it can learn from the original data and has several hidden layers. It can analyze enormous amounts of data with excellent accuracy and performance. It can also learn abstract information based on input. As a result, it has been used in the medical profession by numerous academics. The use of deep learning to analyze generic EHR data has been detailed in a number of recent reviews (Solares et al. 2020; Shickel et al. 2017; Si et al. 2021; Xiao et al. 2018). However, a systematic and thorough overview of the technical difficulties and deep learning remedies for managing temporal EHR data is required.

The purpose of this research is to analyze current advancements in unique deep learning models and databases used for EHR analysis by outlining their associated features from the standpoint of key difficulties and the approaches adopted to overcome challenges. This paper is structured as follows: Sect. 2 provides a brief explanation of the typical deep learning architectures used for modeling EHR data, followed by a comparison table. Following that, Sect. 3 discusses a number of deep learning models used in EHR analysis for clinical prediction. The online databases that contain EHR data are examined in Sect. 4. Finally, the the conclusions are discussed, and the future directions of deep learning modeling for EHR are highlighted by examining a number of crucial factors that require greater attention.

2 Deep learning architectures

Over the past few years, the amount of research applying deep learning for EHR analysis has been increasing rapidly. Due to the advancement of technology, a rich variety of deep learning architectures have been introduced. This section provides a brief explanation of the conventional deep learning architecture used for the modeling and analysis of EHRs. For a more thorough explanation, the basic equations underlining each architecture are also emphasized. A comparison of deep learning architectures is presented in Table 1.

2.1 Autoencoder

An unsupervised deep learning architecture, which is referred to as an autoencoder (AE), uses high-dimensional data, such as EHR data, to learn patterns (Zamanzadeh et al. 2021). AE is constructed to perform dimensionality reduction based on a nonlinear transformation, which is known as latent representation. Previous research has demonstrated that latent representation can extract useful clinical information from raw data (Beaulieu-Jones et al. 2016). Numerous forms of autoencoders have been developed, including variational autoencoders (VAE), sparse autoencoders (SAE), and denoising autoencoders (DAE). However, in general, all autoencoders have the same structure and functions as shown in Fig. 1 (Vincent et al., 2010). As can be observed in Fig. 1, an AE is formed of three layers: an input layer, x; a single hidden layer, z; and a reconstructed layer, ˜x. Moreover, data is reconstructed from x to ˜x via encoding and decoding processes, as shown in (1) and (2). An AE constructs an encoder that can convert the input to a latent representation in the hidden layer. It also develops a decoder for remodeling the input that results from the latent representation. W and W’ are the respective encoding and decoding weights. As this process serves to minimize the reconstruction error ||x − x˜||, the encoded representation z is deemed more reliable.

$$z = \sigma \left( {Wx + b} \right)$$
(1)
$$\tilde x = \sigma \left( {W'z + b'} \right)$$
(2)

An AE transforms the input data into a format in which only the most essential derived dimensions are stored. As such, they are comparable to standard dimensionality reduction techniques, like singular value decomposition (SVD) or principal component analysis (PCA). However, they can be more beneficial for solving complex problems due to nonlinear transformations via the activation function of each hidden layer.

Fig. 1
figure 1

Autoencoder (AE) architecture

2.2 Restricted Boltzmann machine (RBM)

Another unsupervised deep learning architecture that adopts a stochastic viewpoint is the Restricted Boltzmann machine (RBM), which calculates the probability distribution of the input data. It can handle some complicated issues and lower the chance of overfitting. Additionally, since the RBM network is undirected, information might spread both ways across it (feed-forward and feedback modes). Two layers make up RBM: An input layer (visible units) that encodes the observable (such as the occurrence of diseases) and a latent layer (hidden units). The hidden units are used for interesting tasks, including disease diagnostics, predicting future danger, and more (Tran et al. 2015). As the RBM is ”restricted,” it is not possible for two nodes in the same layer to share a connection. The RBM architecture assumes the form of an energy-based framework with visible binary units, v, hidden units, h, and a weight matrix, W, that connects the weight and hidden units. The energy function presented in (3) defines the interaction of variables. Figure 2 presents an overview of the Restricted Boltzmann Machine architecture.

$$E\left( {v,h} \right) = - \left( {{a^T}v + {b^T}h + W{v^T}h} \right)$$
(3)

Stochastic optimization techniques, like Gibbs sampling, are frequently used to train RBMs. The final form of h is produced, which can be thought of as the learned representation of the original input data. To create a deep belief network (DBN) for supervised learning tasks, RBMs can be stacked hierarchically. By allocating weights to various transcript measurement categories, the DBM is frequently utilized in EHR word embedding as a feature-extracting strategy (Gupta et al. 2015). This design is further used to integrate medical artifacts, like diagnosis codes, in the low-dimensional vector space (Tran et al. 2015).

Fig. 2
figure 2

Restricted Boltzmann machine (RBM) architecture

2.3 Convolutional neural network (CNN)

A Convolutional Neural Network (CNN) is a deep neural network with a multilayer architecture and a topology that is frequently utilized for visual tasks (Yamashita et al. 2018). The architecture in a CNN consists of hidden layers, an output layer, and an input layer in this architecture. Every layer in the CNN is hidden and consists of a convolutional layer, a subsampling/pooling layer, and a fully connected layer. The input layer creates a dot product from the distinct inputs that have weight as filters. A convolutional layer consists of parameters and a collection of filters called kernels that are used to train the model. A pooling pub-sampling layer is frequently used to combine the collected information after these convolutions. Figure 3 presents a sample CNN architecture with two convolutional layers and a pooling layer after. Due to the fact that CNNs often have a small number of parameters and filters that are typically smaller than the input, these interactions are rare. Since each filter is applied to the entire input in convolution, parameter sharing is also encouraged.

CNN is a distinctive architecture that excels at picture classification and other deep learning application areas. Medical image analysis using CNN can produce positive results in the EHR setting for images from MRIs, mammograms, CT scans, etc. By analyzing the image as a set of local pixel patches, CNN can reliably extract significant characteristics.

2.4 Recurrent neural network (RNN)

A recurrent neural network that can encode time-stamped events from EHR data and handle sequentially ordered input, such as natural language is developed (Chen et al. 2019). RNNs are also capable of handling long-range temporal dependencies. RNNs typically consist of links, as depicted in Fig. 4, that feed each layer’s output back into itself. Due to hidden states and feedback loops.

Fig. 3
figure 3

Convolutional neural network (CNN) architecture

that can elegantly absorb and integrate prior knowledge about the patient, the RNN’s recurrent structure makes them suitable for processing EHR data (Ho et al. 2017). Activating the current input at a time and the preceding hidden layer causes the current hidden layer to be updated progressively. This design processes a complete sequence before passing information from all of its preceding pieces to the last concealed layer.

Fig. 4
figure 4

Recurrent neural network (RNN) architecture

RNNs have recently been applied to medical activities, like early heart failure diagnosis, predicting ICU mortality, and predicting patient decomposition (Shah et al. 2016; Choi et al. 2017a, b; Aczon et al. 2017). Additionally, a lot of research has used RNNs to build patient representations in EHRs utilizing groups or sequences of clinical codes. Many medical applications process enormous quantities of text, including clinical notes and medical queries, by looking for keywords that relate to common clinical entities, like ICD and CPT codes.

Table 1 Advantages and disadvantages of deep learning architecture applied to electronic health record in clinical prediction

3 Deep learning models

Deep learning models have been created in recent years to evaluate EHR data for managing chronic diseases, such as predicting chronic diseases and identifying adverse medication events. Multiple difficulties, including data heterogeneity, irregularity, and sparsity, are brought on by the expansion of EHR data. The most advanced deep learning-based model for EHR data analysis has been presented to address these issues. Additionally, as they have the proven capacity for learning, adaptability, and generalizability, these models have shown exceptional effectiveness in modeling EHR data. In this section, the current deep learning models are discussed. Tables 2 and 3 provide a summary of deep learning model frameworks.

Table 2 List of existing deep-learning models review
Table 3 Comparison of existing deep-learning models

3.1 Doctor AI

Doctor AI is a predictive model to forecast disease diagnosis with associated timelines for pharmacological interventions (Choi et al. 2016a, b). The RNN architecture was used to create the Doctor AI model, which was subsequently applied to timestamped longitudinal EHR data collected over an eight-year period. To increase accuracy and speed, skip-gram embedding (Mikolov et al. 2013) was added to the RNN initialization approach. As additional input for diagnostic prediction, this model incorporates encounter records like diagnosis, procedure, or drug codes. Additionally, it may evaluate the patient’s medical history to forecast multilabel for any type of medicine or condition. The records of primary care patients from Sutter Health Palo Alto Medical Foundation who had been using an Epic System Corporation EHR for more than ten years—260 K patients and 2128 physicians—were the data used in this model. On the real-world EHR datasets, doctor AI beat numerous baselines by obtaining 79.58% recall@30 and accuracy comparable to that of a doctor. It’s interesting to note that this model scored well on the open MIMIC dataset while maintaining high accuracy in various universities’ coding systems. Last, but not least, health professionals confirmed that Doctor AI could deliver valuable clinical information based on diagnostic results. However, the model can exhibit biases stemming from imbalanced datasets, overfitting, and gradient issues like vanishing/exploding gradients. These biases, including data and temporal bias, can result in skewed outputs and miss interpretation of data patterns, especially when the timing is atypical. Overfitting bias occurs when the model is trained on a limited amount of data, leading to excellent performance on training data but poor generalization. Additionally, biases may be exacerbated by embedding techniques like skip-gram, where stereotypical associations encoded in the training data influence downstream tasks. Addressing these biases necessitates careful dataset curation, vigilant model training, and the application of debiasing techniques to enhance model fairness and performance.

3.2 Deep patient

Deep Patient is a revolutionary unsupervised deep learning model that may forecast patients’ future health issues based on a general-purpose patient representation from EHR data (Miotto et al. 2016). To process EHR for captured hierarchical regularities and stable data structures, this framework was created using a stacked denoising autoencoder (SDA). By automatically combining the clinical descriptions, this technique created a representation that is more condensed, consistent, and non-redundant. Additionally, Deep Patient consistently delivered lower dimensional representation than the raw EHR data, improving the performance of clinical analytics engines. A total of 700,00 patients’ worth of data from the Mount Sinai data warehouse were used in SDA, and 76,214 test patients with 78 disorders were used for the evaluation of deep representation. Deep Patient was deemed to have demonstrated superior performance in the prediction of several disease categories on the basis of unprocessed EHR data and using various feature learning techniques. This shows that the learned features characterize patients in a broad and efficient manner that may be handled by automated systems in multiple fields. Personalized prescriptions, treatment suggestions, and the recruitment of clinical trial participants could all benefit from the deep patient representation that the EHR provides. However, the quality and completeness of the electronic health records data could be the causes of potential bias in this model. If certain patient populations are underrepresented or if there are errors in the data, this could lead to biased predictions.

Another potential bias could be the selection of features used in the model. If certain features are more heavily weighted or if important features are missing, this could also lead to biased predictions.

3.3 RETAIN

The RETAIN (Reverse Time Attention) model was created to solve RNN constraints. It uses a two-level neural network for sequential input to provide a detailed interpretation of the forecast while maintaining prediction accuracy (Choi et al. 2016a, b). RETAIN typically imitates medical practice by using existing EHR data to provide more focus to a recent clinical visit. The doctor’s response to the patient’s requirements and the investigation of the patient record, which focuses on specific information from the present to the past, served as the inspiration for this model framework. Five steps make up the RETAIN algorithm: embedding information, creating visit-level attention weights, creating.

variable-level attention weights, creating the context vector, and making predictions. This model employs RNN in steps 2 and 3 to recover the sequential information and imitate the behaviors of doctors. RETAIN was tested using a sizable EHR dataset with 14 million visits completed by 263 K patients over an eight-year period. Additionally, RETAIN predicts heart failure disease more accurately and quickly than conventional machine learning techniques. The goal of RETAIN is to enhance prediction performance by retaining the straightforward representation learning component for interpretation while utilizing a sophisticated attention creation technique. They intend to develop an interactive visualization system for RETAIN and use the RETAIN paradigm for more healthcare applications in the future.

3.4 T-LSTM

To analyze longitudinal patient records with irregular elapsed times, a new LSTM called Time-Aware LSTM (T-LSTM) was created (Baytas et al. 2017). Patient records, for instance, contained sophisticated medical records with variable sequence durations and videos with missing frames. Generally speaking, T-LSTM is a subtyping technique that employs the k-means method to develop a strong single representation of consecutive patient records in order to cluster patients based on clinical subtypes. This model represents a variation of the common Long-Short Term Memory (LSTM) framework that enhanced the memory content of the unit by taking the interval between consecutive elements of a sequence into account. The typical LTSM’s forget, input, and output gates are maintained by T-LSTM, which also learns a neural network to divide the cell memory into short- and long-term memories. The main part of this concept is subspace decomposition used on the memory from the previous time step. The quantity of data stored in the memory from earlier periods is modified to prevent loss of the patient’s overall profile. The elapsed time between succeeding items, which is how T-LSTM calculates the short-term memory content weight, is used to apply the memory discount. The time-lapse is converted into a suitable weight using the non-increasing function of the elapsed time. Supervised and unsupervised experiments were applied to T-LSTM on artificial and actual datasets. When dealing with erroneous elapsed time data, T-LSTM performs better than ordinary LSTM. But, biases in T-LSTM might occurs from how temporal data is integrated into the model. For instance, inadequate normalization of time information or ineffective modeling of temporal relationships within the architecture could introduce biases into the predictions. Furthermore, biases might result from specific design decisions within the T-LSTM model, such as selecting which time features to incorporate or determining how time interacts with the LSTM architecture.

3.5 Deep Diabetologist

A model called Deep Diabetologist predicts diabetes patients using sequential EHR data and RNN architecture (Mei et al. 2018). This algorithm uses a hierarchical recurrent neural network (HRNN) framework to capture heterogeneous sequential information in EHR data. This system can handle a variety of diagnoses and clinical measurements and is suited to multi-resolution learning. Technically, the learning process occurs after the EHR data has been cleaned and imputed. The preprocessing procedure yields 481 clinical variables, including 350 3-digit ICD-10 codes, 124 lab tests, and 7 previously used drug classes. Theano backend was used to implement LSTM and RNN in the learning process using a GPU machine configuration. The model was put to the test in two experiments—one with past treatments and one without. It was then contrasted with other models. Deep diabetologists perform marginally better than other models and outperform baseline logistic regression (LR). However, overall performance suffers, which results in inadequate medication. Additionally, this model continues to have a clinical measuring constraint in its EHR repository, which renders it ineffective.

3.6 DeepCare

Based on medical records, illness histories, and present ailment states, DeepCare is a dynamic neural network designed to predict future medical outcomes (Pham et al. 2017). It is a universal, all-encompassing predictive solution that may be applied to any healthcare practice EHR strategy. The LSTM architecture used in this model, which can handle events with erratic timing, directly simulates medical procedures. The development of patients’ sickness tracks and healthcare procedures in a time-stamped order involves the use of LSTM. The extracted data from the admission will be the input to the LSTM, and the output will be the sickness condition at the time of admission. Three main layers make up the DeepCare paradigm, and each plays a specific purpose. A modified LSTM, known as C-LSTM in the bottom layer, manages interventions and erratic timing. Through the use of multiscale weighted pooling for scale, the disease states are combined in the middle layer. Finally, a neural network in the top layer uses pooled states and data to estimate the outcome probability. However, it can be potential bias if certain features are more heavily weighted this could also lead to biased predictions. DeepCare can be used with current EMR systems. There is a need for further research to perform thorough assessments of the various cohorts, sites, and outcomes. Sharing parameters across numerous cohorts and hospitals makes room for domain modifications in this situation.

3.7 GRNN-HA

A model for predicting mortality that is appropriate for clinical decision support systems was created using gated recurrent unit RNNs with hierarchical attention (GRNN-HA) (Sha and Wang 2017). The aim of this paradigm was to address current issues with healthcare data; for example, managing, modeling, and interpreting medical data, for instance Word2vec (Mikolov et al. 2013). A low-dimensional representation of medical codes, is used by GRNN-HA to learn how to handle high-dimensional medical information. Bidirectional GRU is also used to encrypt temporal information from medical data. To make medical data easier to understand, this approach uses hierarchical organization to separate information at the visit and code levels and learns hierarchical attention weights on both levels in a dependent manner. The interpretability of the framework depends on attention weights, which are allocated to specific diagnostic codes and hospital visits based on the relative significance of those codes for making predictions.

Because of this, the GRNN-HA model can interpret the data in the visualization and has a higher prediction accuracy than baseline models. However, there are several limitations in this work that have an impact on the performance. First, the dataset’s poor quality is a result of the scant longitudinal medical data available on patients. Second, there are not enough samples because deep learning models require a large training dataset in order to produce adequate attention weight and meaningful prediction.

3.8 Timeline

Timeline is a state-of-the-art deep learning model that predicts clinical occurrences from previous visits while taking into account the interval between visits and two different variables pertaining to those visits (Bai et al. 2018). This model is capable of learning the time decay factor for each medical. This capability enables Timeline to recognize that chronic diseases affect future visits more subtly than acute ones. Timeline uses an attention method to enhance visit vector embedding (Vaswani et al. 2017). It is possible to analyze the predictions and comprehend how the risks of subsequent visits vary over time by scrutinizing Timeline’s attention weights and disease progression functions. Medical claims data from SEER-Medicare Linked Database (Feng et al. 2017), which contain a patient’s diagnosis and procedure billing codes, are utilized to evaluate the model. According to the experiment, when used with two sizable real-world data sets, this method surpasses cutting-edge deep neural networks at predicting the primary diagnosis of a future hospitalization.

3.9 DMNC

In order to overcome the asynchronous sequential challenge, the Dual Memory Neural Computer (DMNC) was developed as a novel memory-augmented neural network (Le et al. 2018). DMNC is a dual memory neural computer that combines three neural controllers with two external memories. Additionally, this architecture includes two encoders that read from and write to external memories to encode input views. The two memory accessing modalities in this model, early-fusion and late-fusion, correlate to the early and late interview exchanges. Early fusion uses a shared memory address space that the encoder can access and alter. As opposed to early fusion, which results in data exchange during encoding, late fusion memory space is independent. In both situations, the decoder will amalgamate the knowledge from the memories to predict both modes. The performance of the DMNC model was compared to that of numerous earlier works in two different tasks. The DMNC consistently outperformed other deep learning models in the task of prescribing drugs. Additionally, the prediction accuracy of this model rose in correlation with the number of predictions, and outperformed that of the deepCare model in tasks involving disease progression.

3.10 KAME

A strong and reliable model called the Knowledge-based Attention Model (KAME) was created to forecast patients’ future health information (Ma et al. 2018). To increase prediction accuracy, KAME makes use of general knowledge throughout the entire prediction process. Additionally, this approach learns how to represent medical codes from a specific medical ontology, such as the International Classification of Diseases (ICD) or Clinical Classification Software (CSS) (ICD). Each input visit is represented by a medical code encoded into a low-dimensional level vector. This vector is subsequently fed into an RNN to provide a hidden state representation. Utilizing altered ancestral embeds from the knowledge graph, the hidden state representation is employed to calculate knowledge attention weights. The ancestral embedding in this case comprises broad knowledge about the medical code, such as an advanced understanding of the medical graph. Using the pertinent high-level information and the accompanying knowledge’s attention weights, KAME creates a new knowledge vector. The effectiveness of the model is assessed using three authentic medical datasets. KAME generally outperformed GRAM (Choi et al. 2017a, b), Dipole (Ma et al. 2017), RNN+, and RNN on a variety of datasets. A further experiment found that the KAME model also outperformed baselines with both adequate and insufficient data.

3.11 COAM

Numerous deep learning models currently in use employ attention methods to analyze the data; nevertheless, the association between EHR data and the requirement for correction have been proven to cause inaccurate prediction. CrossOver Attention Model (COAM) makes use of the connection between diagnosis and treatment data in a crossover attention mechanism to enhance prediction performance (Guo et al. 2019). To efficiently process medical data, this model framework uses two RNNs: BRNNd and BRNNt. The bidirectional recurrent neural network (BRNN), which is used to train all input visit detail from two sections, is recurrent in both directions. BRNNd reads representations for diagnoses, whereas BRNNt reads representations for treatments. In general, COAM consists of five key processes: (1) Embedding diagnosis and treatment information; (2) Utilizing BRNNd and BRNNt to process a patient’s previous diagnosis and treatment; (3) combining data and determining the weight of the diagnosis and therapy; (4) creating a context vector using crossover attention mechanism; and v) Process prediction using both context vectors together. Without employing any expert knowledge, COAM was found to achieve high prediction accuracy without expert knowledge. Furthermore, based on strong interpretability, this model can analyze disease situations and recommend efficient treatment approaches. As such, it can effectively lower a patient’s risk of sickness and enable doctors to provide precise medical care. However, COAM rely on learning representations of data, which could be biased or incomplete. If the learned representations do not accurately capture the underlying structure of the data, it may lead to biased predictions or decisions.

3.12 VS-GRU

Variable sensitive GRU (VS-GRU) is a GRU-based framework that employs the different missing rates as input and learns the characteristics of each variable separately to decrease the influence of variables with high missing rates (Li and Xu 2019). Every variable in real life is monitored at a varied frequency depending on its properties. This is of significance in the healthcare industry, where the doctor chooses which variable to track. The model aims to present an analysis of multivariate time series containing a large number of additional values. Furthermore, VS-GRU also has the ability to handle time series without requiring a two-step method. Without incurring additional computing expenditures, VSGRU may examine the feature of various variables automatically. However, this model’s structure needs to be improved to better handle complicated issues like multi-task categorization issues. As a result, the developers enhanced the VS-GRU to become the VS-GRU integration (VS-GRU-i), which consists of a two-layer GRU. Specifically, VS-GRU is the first layer, and GRU, which incorporates the characteristic from the first layer, is the second layer. A penalized method is implemented for the second layer’s input to help identify variables that are either entirely absent or have a small number of observed values. Two open clinical datasets from PhysioNet and MIMIC-III were used to test the model. According to the experiment, VS-GRU and VS-GRU-i both performed well in single-label classification tasks and multi-label classification tasks. The findings show that this model is capable of capturing the pattern of time series with significant missing data and is useful outside the healthcare industry.

3.13 Patient2vec

The three types of records included in the EHR are patient medical treatments, diagnosis codes, and physical symptoms. By combining two features from physical symptoms and medical treatments, Patient2vec can learn how to represent EHR data in a bi-dimensional manner (Wang et al. 2019). Additionally, Patient2vec incorporates the RNN model to discover the sequential context-aware aspects of visits. The classifier is fed the learned representations in order to predict diagnoses. This framework uses Word2Vec to embed diagnosis codes and build a vector representation using a dynamic window. Additionally, Word2Vec employs Skip-Gram to forecast the words that will be in the vicinity of the targeted word. The effectiveness of the Patient2Vec framework was tested using public MIMIC-III datasets with 61,532 visits by 46,520 patients. Additionally, an experiment on multi-classifying diagnoses was conducted and performance was evaluated against the other three baselines in order to corroborate the model performance results. The studies revealed that the information hidden in physical symptoms and medical conditions plays a critical role in patient representation, resulting in up to a 76% increase in AUC for predicting the diagnosis of the entire illness and an 80% top-10 recall for the target disorders. Unfortunately, the quality and completeness of the EHR data used for training the model could be causes of potential bias in prediction. If the EHR data is incomplete or contains errors, it could lead to biased patient embeddings and potentially impact the performance of downstream tasks such as patient similarity or clustering.

3.14 ConvAE

ConvAE is a platform for unsupervised deep learning designed to quickly and accurately assess diverse EHR data (Landi et al. 2020). This learning model reduces the patient’s latent dimension vector by combining word embedding, convolutional neural networks, and autoencoders. Additionally, ConvAE enables efficient patient classification with little effort. This model incorporates subcategories for several complicated illnesses and performs qualitative analysis to establish their clinical relevance using encodings learned from heterogeneous, domainfree EHRs. ConvAE’s performance was evaluated using actual EHR data from 1.6 M Mount Sinai Health System patients in New York. According to the testing results, ConvAE significantly outperformed numerous baselines when it came to clustering individuals with complex illnesses. This demonstrates that the model is able to recognize many clinically significant disease subtypes, such as disease progression and comorbidities, which are the main contributors to the clinical phenotypic heterogeneity of complex illnesses as measured by the EHR.

3.15 BEHRT

To forecast the possibility of 301 events in the patient’s upcoming visits, the deep neural sequence model BERHT was introduced for EHR (Li et al. 2020). This model takes its cues from BERT, the most potent Transformer-based NLP architecture. F eed-forward neural networks are utilized in to simulate the temporal evolution of EHR data using a variety of sequential notions. By concurrently taking into account the complete sequence and learning the input in parallel rather than sequentially, BEHRT’s feed-forward structure solves the exploding and vanishing gradient concerns and captures information. BEHRT can be pre-trained on a sizable dataset and, after some modest tweaking, will deliver a noteworthy performance in a variety of downstream tasks. The model’s effectiveness was proved by training and testing it on CPRD, one of the largest linked primary care EHR systems, to identify the following diseases that will likely be present during a patient’s upcoming visits. According to the findings, BEHRT scored 8% better than the top deep EHR models described in the literature when predicting a variety of more than 300 diseases. BEHRT may customize interpretation and incorporate numerous heterogeneous concepts thanks to the model’s scalability, higher accuracy, and flexible design. However, this model also tend to has potential biases such as selection bias, measurement bias, confounding bias, and information bias. These biases can impact the accuracy and validity of the results obtained through BERHT analysis.

3.16 HiTANet

To more efficiently employ time information for risk prediction, a novel hierarchical time-aware attention network called HiTANet was developed to model how clinicians make risk prediction decisions (Luo et al. 2020). HiTANet replicates temporal information both locally and globally. For each visit, the local evaluation step provides local attention weight and embeds time information into visitlevel embedding. Assigning global weights to various time steps is done at the global synthesis stage by using a time-aware key-query attention technique. To create the patient representations for subsequent risk prediction, two attention weighting types are dynamically blended. The effectiveness of HiTANet was assessed using three actual datasets, and the outcomes were contrasted with 12 competing baselines. According to the experiment, the HiTANet model performed better than cutting-edge deep neural network models and exhibited steady progress in risk prediction tasks on three sizable real-world illness cohorts. HiTANet achieved an F1 score of above 7% across all datasets, demonstrating the model’s efficacy and how easily interpretable HiTANet’s inference process is for risk prediction.

4 Online databases for EHR data

Historically, listings of disease-specific data were manually compiled and displayed in disease repositories that housed healthcare data. In more recent years, the advancement of technology has enabled digital health record systems to be more readily available in hospitals. Novel methods for presenting, comprehending, and interpreting healthcare data have emerged that leverage the capabilities of internet databases. A database that can be viewed and accessed online for healthcare purposes is known as an online database. Practitioners may require a subscription to access the clinical information stored in these online databases because the data is typically housed in a cloud database that is hosted on a website. Online healthcare databases primarily stand out due to the variety of patient-level data that is automatically gathered from EHRs. The research for clinical interactions and decisions study in disease processes uses a large number of high-resolution characteristics derived from numerous individuals. The primary goal of creating an online healthcare database is to arrange clinical data so researchers may assess and derive useful knowledge from the data quickly and easily. Due to the possibility to crowdsource new machine learning approaches using open-source programming tools, there has been a significant interest in studying massive amounts of health data as a result of this new information (Sanchez-Pinto et al. 2018). Table 4 displays a comparison of online databases for EHR data.

Table 4 Comparison of existing online databases for EHR data

4.1 MIMIC

The Medical Information Mart for Intensive Care (MIMIC) (Johnson et al. 2016) is one of the most well-known and extensively utilized open-access clinical databases in the world. In 2013, The Massachusetts Institute of Technology (MIT) Laboratory for Computational Physiology, Philips Medical Systems, and Beth Israel Deaconess Medical Center collaborated to create MIMIC with support from the National Institute of Biomedical Imaging and Bioengineering. The main objective of MIMIC is to increase the speed, precision, and efficiency of clinical decision-making for patients in intensive care units. They were initially developed as sophisticated systems for ICU patient monitoring and decision support. For scholars interested in using the database, MIMIC also maintains data structure documentation and a public GitHub repository. By accessing the public code, new users can take advantage of others’ hard work and are incentivized to contribute their own effort, enhancing and expanding the influence of MIMIC.

4.2 eICU-CRD

A further illustration of an open-access database is the eICU Collaborative Research Database (eICU-CRD) (Pollard et al. 2018). This undertaking was inspired by a Philips® Healthcare critical care telehealth campaign. The MIMIC team created the eICU-CRD, which featured a unique patient pool from 208 ICUs across the United States, in 2014 and 2015. The eICU Collaborative Research Database is a multi-center intensive care unit (ICU) database that contains high-granularity data on more than 200,000 admissions to ICUs across the United States that are under the watchful eye of eICU Programs. The deidentified database stores measurements of vital signs, records of care plans, sickness severity, diagnoses and treatments, and more. Practitioners can access the data after completing the registration, which entails finishing a course on conducting research with human subjects and signing a data usage agreement mandating responsible treatment of the data and adherence to the collaborative research principle. The data can be beneficial in several efforts, such as the creation of machine learning algorithms and decision assistance systems, and ongoing clinical research.

4.3 PhysioBank

The enormous PhysioBank library of well-characterized digital recordings of physiological signals and associated data is available to the biomedical research community and is constantly growing (Goldberger et al. 2000). It currently has databases of multiparameter cardiopulmonary, neural, and other biomedical signals gathered from both healthy people and people with a variety of serious illnesses that have an effect on public health, such as life-threatening arrhythmias, congestive heart failure, sleep apnea, neurological disorders, and aging. Spanning over 80 databases, PhysioBank comprises approximately 90,000 recordings, or over four terabytes, of digitized physiologic signals and time series. Timeseries data from publically financed studies, like extensive multicenter clinical trials or physiological research carried out by the National Aeronautics and Space Administration, can be archived in PhysioBank as a final and permanent repository (NASA).

4.4 CPRD

Clinical Practice Research Datalink (CPRD) is a database of high-coverage anonymized medical records spanning 674 UK practices and 11.3 million patients (Herrett et al. 2015). The 4.4 million active (living and currently registered) patients who meet quality requirements account for around 6.9% of the UK’s total population. In terms of age, gender, and ethnicity, patients largely mirror the UK general population. As such, the CPRD primary care database provides a wealth of health-related data for research purposes, including demographics, symptoms, tests, diagnoses, treatments, health-related behaviors, and referrals to secondary care. More than half of patients have access to a wider range of data for study thanks to links with secondary care databases, disease-specific cohorts, and death records. Peer-reviewed journals have published details of more than 1000 research projects that have utilized the CPRD to conduct epidemiological research on a variety of health outcomes.

4.5 CERNER Health Facts*

Cerner Corporation has maintained Health Facts* (HF) since 2000. It represents the largest vendor-specific EHR database (DeShazo and Hoffman 2015) and contains 84 million patient records collected from more than 500 healthcare facilities across the United States over the past 20 years. To safeguard the privacy of both patients and organizations, HF data is de-identified and HIPAA-compliant. The database consists of longitudinal, de-identified electronic health record (EHR) patient data that has been collected and organized to facilitate studies and reporting. Additionally, the data types in HF include demographics, medicines, test results, and pharmacy. These records are often thorough and include 300 data pieces, including information about consultations and laboratory results. These clinical data are mapped to the most prevalent standards; for instance, diagnoses and procedures are linked to their ICD (International Classification of Diseases) codes, medication information includes the national drug codes (NDCs), and laboratory tests are linked to their LOINIC (Laboratory Observational Identification and Classification) codes (Rasmy et al. 2018). The Cerner Corporation publishes a publicly accessible resource in the form of the company’s Health Facts® data (Kansas City, MO).

4.6 Healthcare cost and utilization project

The Healthcare Cost and Utilization Project (HCUP) is a valuable resource that combines data from the State Inpatient Databases (SID), the Nationwide Inpatient Sample (NIS), the Kids’ Inpatient Database (KID), the outpatient databases State Ambulatory Surgery Data (SASD), and State Emergency Department Data (SEDD). The HCUP also stores multistate, inpatient, and outpatient records for both insured and uninsured patients. The aim of the program is to create a multistate healthcare system that stores data that benefits health services research and the creation of tools that aid administrators and researchers alike. Additionally, in order to increase the value of data, this database also offers a collection of software applications and other resources. The system can be accessed by individuals who have signed up to the data use agreement.

5 Discussion

Patient status information is extensively recorded within Electronic Health Records (EHRs). As a result, EHR data offers a practical way to monitor patient health details and enhance decision-making using data-driven technologies. In contrast to data found in clinical trials or other biomedical research, secondary data obtained from EHRs lack a predefined purpose to address a specific hypothesis. The emerging literature recognizes challenges in using EHR data for predictive modeling of health trajectories. Data quality is a persistent concern, with healthcare professionals citing issues in the healthcare environment, clinical documentation, and data tools affecting EHR accuracy. Researchers note the limited accuracy of diagnostic codes and the potential of free text fields to capture missed information. Moreover, several policy and business barriers can hinder the effective utilization of these datasets for such purposes. Some of these barriers include data privacy and security concerns, inconsistent data standards, lack of interoperability, and data quality and accuracy. To address these barriers, collaboration among healthcare stakeholders, policymakers, researchers, and technology vendors is crucial. Implementing clear data sharing agreements, improving data standards and interoperability, strengthening privacy safeguards, and providing incentives for data sharing are steps that can help overcome these obstacles and harness the potential of EHR datasets for predictive modeling of health trajectories.

Numerous investigations have undertaken predictive modeling using EHR data. This involves employing machine learning techniques to create a statistical model from EHR data, aimed at foreseeing a specific desired clinical outcome (Wu et al. 2010). However, the complexity of EHR data that uncurated, poor-quality, high-dimensional and multimodal poses challenges in employing EHR raw data directly within machine learning models for the purpose of predictive modeling. An essential aspect of predictive modeling using EHR data involves proficiently converting patient information from its initial EHR structure into a machine-readable format. This conversion essentially translates patient data into meaningful insights that can be comprehended by algorithms. The success of predictive models in enhancing disease diagnosis, phenotyping, and prognosis greatly relies on the excellence of this representation of features.

Recently, deep learning models have served as excellent instruments for identifying illnesses or foreseeing clinical events or consequences (such as mortality or treatment response) using time series data like EEG or biosignals from ICU, as well as imaging data. Nevertheless, despite the encouraging outcomes exhibited by deep learning methods in executing numerous analytical tasks, several unresolved difficulties persist. Such as transferring both data and labels in transfer learning due to the fact that deep models frequently fail to explicitly account for uncertainties. This deficiency diminishes the models’ resilience in adapting to shifts in the fundamental data distribution. Consequently, there exists a potential hazard in implementing models that could have their future predictions compromised by actual EHR data. This concern holds particular significance, particularly in healthcare environments. Furthermore, with respect to the model’s interpretability and clarity, current endeavors (such as attention mechanisms, visualization, and explanations through examples) frequently strive to elucidate the predictions. However, in order to effectively apply deep models developed from EHR data, users often require a grasp of the mechanisms underpinning the models’ operations. Achieving such a level of transparency in model functioning remains a formidable challenge. Lastly, for achieving direct clinical impact, the deployment and automation of deep learning models demand consideration. For instance, substantial volumes of EHR data are processed to create standardized inputs for training deep models. Addressing the challenge of obtaining extensive EHR datasets is essential for the integration of deep EHR models into practical EHR systems.

6 Conclusion

In recent years, there has been an increase in the use of deep learning architecture to evaluate EHR data, and this pattern of development continues to be strong. Numerous computational models have been developed for use in the medical context thanks to efforts to combine EHRs with novel technologies. Medical concepts, including disease prediction, disease and patient classification, clinical decision assistance, and more have recently been applied with EHR. This paper presented an overview of a number of deep learning architectures, models, and databases together with details of their applications within the field of healthcare. This review also explored the pros and cons of popular deep learning architectures. The results reveal that researchers prefer RNN architecture to model EHR patient data because it can handle the challenges associated with EHR data. RNN is effective at processing sequential data and can address the temporal structure of the EHR since it can map from the whole history of prior input data to each output (Solares et al. 2020).

Additionally, several deep learning models and EHR data database reviews were conducted. To give an overview of their range, the features and capabilities of each model and database were briefly outlined. Researchers can draw from the data presented to formulate fresh ideas to enhance the model’s ability to generate meaningful insights for healthcare practitioners. Each model has unique capabilities and qualities that can benefit various therapeutic applications. The effectiveness and precision of prediction are impacted by every aspect of the model. As a result, research efficiency that can address both old and new difficulties in EHR data depends on the appropriate design and characteristics. A greater role will be indirectly played in clinical prediction tasks by the expanding EHR data and numerous advanced models, which will continuously supply insightful data on patient representation.

The following section reviews and discusses six healthcare databases that contain EHR data. Before beginning their investigation and experiment, scientists must collect the necessary data. EHR data is confidential and cannot be made available to the public; thus, accessing and obtaining it remains difficult. The majority of the time, EHR records include formal information, like patient identity ID, address, and more. Therefore, EHR data is rarely upload to the database and difficult for researchers to access it. This review examines and discusses six databases are reviewed and discussed. All the databases are web-based and available online. Only three of them are unrestricted free access that can be used on a frequent basis. These databases contain several disease data categories and differ in how they deliver the data. The information given could be inconsistent. The benefits and drawbacks of each database are listed in Table 4. There is not a significant number of healthcare databases currently available; as such, more comprehensive databases are required to to facilitate researchers and to conduct larger scale EHR-oriented experiments. Key to this is solving confidentiality problems on an ongoing basis, as the data needs to be regularly updated so it can be access and used freely for research. Moreover, to address the data scarcity challenge in Electronic Health Record (EHR) systems, innovative strategies are essential to improve data availability and quality for diverse healthcare applications. These strategies encompass various approaches: i- Generate synthetic data through techniques like interpolation, extrapolation, and perturbation to expand the dataset without compromising patient privacy. ii- Forming partnerships among healthcare organizations to pool anonymized EHR data, employing privacy-preserving methods like federated learning for collective analysis. iii- Combining EHR data with relevant sources such as wearable devices, genetic data, and social determinants for a more comprehensive patient health understanding.

Finally, modern deep learning (DL) models, including transformers, hold immense potential, their promises often overshadow the practical challenges they face, particularly in real-world applications like clinical care where privacy and the protection of Protected Health Information (PHI) are paramount. DL models, including transformers, often require vast amounts of data to train effectively, raising concerns about privacy breaches and unauthorized access to sensitive information. In healthcare, where patient confidentiality is sacrosanct, the deployment of DL models must navigate stringent regulations and ethical considerations to safeguard PHI. Ultimately, while the promise of modern DL techniques in healthcare is undeniable, their successful integration into real-world clinical care settings hinges on our ability to navigate and overcome the indurate challenges of privacy protection and PHI safeguarding.