Abstract
Objective
To address the challenge of assessing sedation status in critically ill patients in the intensive care unit (ICU), we aimed to develop a non-contact automatic classifier of agitation using artificial intelligence and deep learning.
Methods
We collected the video recordings of ICU patients and cut them into 30-second (30-s) and 2-second (2-s) segments. All of the segments were annotated with the status of agitation as “Attention” and “Non-attention”. After transforming the video segments into movement quantification, we constructed the models of agitation classifiers with Threshold, Random Forest, and LSTM and evaluated their performances.
Results
The video recording segmentation yielded 427 30-s and 6405 2-s segments from 61 patients for model construction. The LSTM model achieved remarkable accuracy (ACC 0.92, AUC 0.91), outperforming other methods.
Conclusion
Our study proposes an advanced monitoring system combining LSTM and image processing to ensure mild patient sedation in ICU care. LSTM proves to be the optimal choice for accurate monitoring. Future efforts should prioritize expanding data collection and enhancing system integration for practical application.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
Assessing sedation in non-communicative critically ill patients is crucial. Excessive sedation can prolong mechanical ventilation and increase morbidity and mortality, while insufficient sedation may cause agitation, anxiety, and pain [1, 2]. Hence, an evaluating sedation tool is crucial for monitoring the sedation levels of critically ill patients. Sedation tools, such as the Bispectral index (BIS) [3], are recommended but not universally available. Currently, nurse-protocolized (N-P) targeted sedation protocols, employing scales like the Richmond Agitation-Sedation Scale (RASS), are commonly used [4,5,6]. Unfortunately, they cannot continuously monitor the sedation levels to titrate the sedatives.
The COVID-19 pandemic has prompted a heightened emphasis on wireless sensing technologies to reduce human interactions and prioritize non-contact healthcare, particularly for healthcare workers, to mitigate virus spread [7]. Utilizing continuous and remote non-contact monitoring systems has proven effective in detecting various health conditions such as sleep disorders, heart failure, arrhythmia, activity levels, and stress [8,9,10,11]. This approach aligns with infection control measures and enables real-time optimization of care through fine-tuned treatment strategies.
Recent advances in deep learning and AI models have gained popularity in clinical applications for disease diagnosis and prevention. Fang et al. developed a video-based non-invasive respiration monitoring system that detects infants’ respiratory frequency to alert caregivers to potential incidents and mitigate Sudden Infant Death Syndrome (SIDS) risks [12]. Another study demonstrated the effectiveness of deep learning-based pain classifiers using facial expressions for automated pain assessment in critically ill patients, achieving promising accuracy in both image- and video-based classifiers. Additionally, deep learning can be applied to screen for depression, observe behaviors, track posture, and monitor epilepsy [8, 13].
Our study aimed to design an AI-assisted automatic classifier of agitation, which could be applied in a non-contact, continuous sedation monitoring system. The system could aid nurses in assessing and monitoring the movement of intensive care unit patients and facilitate timely intervention and treatment based on the assessment outcomes. Using artificial intelligence and deep learning, we successfully extracted the features of real-time video and constructed the models to classify the agitation status automatically.
Materials & methods
Setting
This study was conducted in the intensive care units of Taichung Veterans General Hospital (TCVGH), a 1530-bed medical center in central Taiwan. The study was approved by the Institutional Review Board and Ethics Committee of TCVGH (IRB No. CG21307B). Informed consents were obtained, and digital video recordings of ICU patients were taken without disrupting standard care. Exclusion criteria were applied to patients under 20, pregnant individuals, and HIV patients. The patient’s age, sex, RASS, and restraint status were recorded too.
Research framework
The study consisted of seven major steps
(1) patient video collection in the ICU, (2) video segmentation, (3) annotation, (4) patient movement quantification (MediaPipe, Background Subtractor MOG2), (5) Data preprocessing (data replacement, normalization), (6) model construction with three methods, (7) evaluation of model performance (Fig. 1).
Patient video collection
The digital video recordings were captured with a 4 K webcam at 1080p/30fps from the patients in the ICUs of Taichung Veterans General Hospital. Each patient, on average, had 8 min of recorded video footage.
Video segmentation
A subset of patients with the sedation level RASS ≤ -3 were excluded because of deep sedation and no movement. Finally, the study included 61 patients. Video recordings were cut into 30-second (30-s) intervals for continuous observation and categorization. Subsequently, each 30-s segment was cut into 2-second (2-s) sub-segments for single-action classification. Cases with more than 10 s of interference, such as caregiver interventions or camera shake within 30 s, were excluded. In total, 427 30-s segments for continuous observation and 6405 2-s sub-segments for single-action classification were obtained. (Fig. 2).
Annotation
Based on clinical experience, movements in different body regions pose varying levels of risk. For instance, raising the hands was considered high-risk, while lifting the feet was perceived as lower risk. This differentiation aids the model in learning movement patterns more precisely, ensuring a more accurate assessment of the patient’s condition.
Three experienced ICU nurses were invited for annotation. Before annotating, they discussed the agitated features of postures and movements (head, trunk, and lower limbs) (Table 1). They reached the consensus as the following. Ten cases were randomly selected and marked by two nurses based on patient activity. A third nurse assisted in consensus for cases with different annotations from the first and second nurses. They annotated another 20 cases to validate their consensus in classifying “attention” and “no attention”. Attention was defined as the video recordings showing patients resisting the belt restriction or moving limbs or heads out of the bed with agitation and safety risks, around equal to RASS 2 to 4. The others without the above conditions were labeled as “no attention”. They labeled all the 30-s and 2-s segments.
Patient movement quantification
The MediaPipe machine learning framework developed by Google Research is highly valuable in the healthcare field. It is used to track hand movements and assess tremor in Parkinson’s disease, as well as diagnose low back pain by tracking joint positions in the body [14, 15]. Designed specifically for RGB video footage, the Pose model annotates 33 key joint positions for precise measurement.
This study determined the patient’s recumbent position (horizontal or vertical) by analyzing the distance between the y-coordinates of the left and right shoulders and between the right shoulder and right hip. For patients lying horizontally, the next step involved determining the head orientation by comparing the x-coordinates of the left shoulder node to the hip node. The head was above the coordinates of the right shoulder, the trunk was between the coordinates of the right shoulder and the right hip, and the lower limbs were below the coordinates of the right hip (Fig. 3).
The OpenCV Background SubtractorMOG2 algorithm utilizes Gaussian Mixture Models (GMM) for background separation in videos [16]. It learns the background and isolates moving foreground objects by associating each image pixel with a Gaussian distribution. The distribution weight reflects the duration of a color’s presence, helping identify the background. The algorithm effectively separates moving foreground objects. The process involves motion detection, converting the video into a black-and-white image. White areas indicate patient movement, and higher feature values represent more significant movement. These values are calculated by summing and averaging frames within each two-second interval (Fig. 4).
Data preprocessing
After converting the video into numerical data, this study replaced segments affected by external factors using the preceding adjacent numerical values to ensure that the model’s learning was not influenced. Additionally, the data is normalized to optimize the model parameters for this particular case.
Model construction
After preprocessing, the data was provided to the classification model. Thresholds and random forests were used for single-action classification (2-s).
-
(1)
Threshold:
The threshold method classified the head, trunk, and lower limbs into three movement severity levels: no movement (1 point), bed movement (2 points), and significant off-bed movement (3 points). Scores for each body part’s classification results were aggregated (3 to 9 points). Thresholds for each body part and classification outcomes were determined using box plots, ensuring clinical requirements were met through confusion matrix indicators.
-
(2)
Random forests:
The Random Forest (RF) algorithm, a highly effective classification method, excels in accuracy for big data scenarios [17]. Utilizing ensemble learning, RF constructs multiple decision trees during training, deriving predictions from identified patterns. This study applied RF to machine learning with quantized movement data, aiming to classify patients every two seconds. Key parameters were set as follows: n_estimators = 100, max_features = auto, criterion = Gini.
-
(3)
Long Short-Term Memor (LSTM):
LSTM simulated continuous observations and classified patients every 30 s. The LSTM model used in this study, featuring 20 hidden units, 2 stacked layers, an input size of 4, and a time step of 15. In this study, model hyperparameters were fine-tuned, including data split ratios, activation function methods, and the consideration of data normalization. The training process was visualized, calculating losses and accuracies on the validation set after each epoch and recording metrics for both training and validation sets. Ultimately, a model with optimal stability and performance was chosen. The data was split into 80% training and 20% testing. We set the parameters for the validation, categorical cross-entropy loss function, adam optimizer, and softmax activation function. The validation data was from the 10% of training set.
Evaluation of model performance
In this study, confusion matrices and ROC curves are utilized as evaluation metrics, including accuracy (ACC), precision (P), recall (R), F1_Score, and cross-validation (kfold = 10) was applied to ensure model stability. The relationship between sensitivity and specificity is also depicted in the ROC curve, and the area under the ROC curve (AUC) value is calculated.
In addition to evaluating model performance, this study used line charts to analyze patient movement over 30 s. The goal was to confirm if the model’s assessments and image quantification align with real-world scenarios. Two representative cases have been selected. Cases of attention involved significant cross-zone movements, posing potential risks, while cases of no attention related to bed movements. Through motion analysis, the study clearly illustrated these distinctions and provided quantified results.
Results
System configuration
All experiments conducted in this paper were completed using the system configuration outlined in Table 2.
Patients and video recording
We collected the video recordings of 150 patients. Only 61 patients with RASS scores ≥ -2 were enrolled for analysis. The average age was approximately 60 years old. Male patients predominated (M/F 46/15), particularly in videos featuring patients with a RASS score 0 (Table 3). The video recordings of the 61 patients were cut into 427 30-s and 6405 2-s segments (Fig. 2).
Model construction
Thresholds definition
Figure 5 presents the threshold definitions by using a box plot. The specified cut-off values for different body parts were set at 0.8 and 5 for the head, 0.8 and 14 for the trunk, and 0.8 and 11 for the lower limbs (Figure 5A). Additionally, the aggregate scores for all body parts were subjected to a cut-off value of 5 (Figure 5B).
Model construction
In this study, the validation was conducted through confusion matrices and ROC curves to compare three classification methods. A cross-validation average accuracy(k-fold = 10) of RF and LSTM was 0.90. The LSTM model achieved the highest accuracy (ACC = 0.92). LSTM, using the time series data for classification, yielded the highest sensitivity (recall) for patients requiring attention and significantly improved various performance evaluation metrics (Table 4).
Additionally, by examining the ROC curve, it was found that the AUC performance of the LSTM model surpassed other methods (AUC = 0.91) (Fig. 6. This result emphasizes the outstanding performance of the LSTM model in simulating time series data of patient clinical observations.
Patient movement analysis
Patient movement analysis of all case
We classified them into “Attention” and “Non-attention.“. We further stratified them into “Non-attention Without Restraint Belt,” “Non-attention With Restraint Belt,” “Attention Without Restraint Belt,” and “Attention With Restraint Belt.”
Referring to Fig. 7, it becomes apparent that the outcomes of image analysis align with clinical observations. There exists a notable contrast in movement between patients classified as “Attention” and those as “Non-attention.” Patients in the “Attention” category exhibit significantly more extensive movements, including those spanning different body regions. Within the “Attention” category, a noteworthy distinction surfaces between patients with and without restraint belts, with patients under restraint belts displaying reduced movement in the trunk area.
Patient movement analysis of the representative cases
Patient 49 was categorized as “Non-attention,” with the image module detecting minimal head and limb movement. (Fig. 8a) Patient 58 was classified as “Attention,” with motion quantification revealing significant head and limb movements, including inter-regional motion (Fig. 8b). The analysis results from the image module align with the observed patient movements, demonstrating its accurate detection of displacement in each region. Due to privacy considerations, the patient’s head is not shown in the video. These movements are correlated with the analytical data, and corresponding videos will be included in the supplementary material.
Discussion
Ensuring mild patient sedation in ICU care is crucial, but current clinical assessment methods encounter challenges like low frequency, subjectivity, and evolving professional standards, emphasizing the need for advanced, continuous monitoring methods [18, 19]. This study proposes a monitoring system that combines LSTM and image processing to address challenges such as ICU lighting variations and effective activity detection even when the patient is covered. The integrated AI technology enhances system accuracy, compensating for current monitoring limitations. Results align with expert observations.
Previous studies used cameras for agitation and sedation monitoring in the ICU. Chase et al. captured limb movements, quantifying sedation, and agitation levels using fuzzy logic methods [20]. Becouze et al. used cameras to record facial expressions, measuring agitation levels in critically ill patients [21]. Martinez et al. employed multiple cameras to observe patient behavior in the ICU for sedation control and accident prevention [22]. However, those researchers faced detection issues and a lack of detailed evaluation metrics.
This study compared three methods, and the results indicated that LSTM is the optimal choice. LSTM is renowned for its feature module’s selective retention of information and discarding unnecessary details, thereby having the potential to enhance performance [23]. Litton et al.‘s research demonstrated that expert-level diagnostic differentiation of various diseases can be achieved using electronic health records (EHR) and recurrent neural networks (RNN) [24]. LSTM technology aids healthcare professionals in diagnosis, prediction, and treatment, potentially enhancing efficiency and accuracy in the medical field and ultimately improving patient experiences and outcomes.
Despite significant progress, there is awareness of certain limitations. This study restricts the length and number of video segments related to patient safety and privacy concerns. These limitations include a relatively small number of cases and fewer cases with agitation (excluding 89 cases), along with marking only attention and no attention. However, practical judgment by clinical personnel confirms that this method holds clinical value in enhancing patient safety through continuous monitoring. Currently, manual interventions by healthcare professionals rely on manual pruning. Addressing these challenges requires improvements in smart device integration and workflows. Standardized methods, image transmission connections, and enhanced system security are crucial for monitoring system implementation and ensuring the legality, privacy, and reliability of the results. Future efforts can focus on expanding data collection, increasing the automation of medical interventions, and improving system integration and security to enhance practicality.
This study still holds significant value in clinical applications and provides solutions for future challenges. Despite existing challenges and risks, the potential benefits in patient care and reducing complications make these advancements promising for future clinical applications.
Conclusion
Our study proposes an advanced monitoring system combining LSTM and image processing to address challenges in ICU care. It offers continuous and accurate monitoring, crucial for ensuring mild patient sedation amidst evolving standards and subjective assessments. LSTM emerges as the optimal choice, leveraging its information retention capabilities for enhanced performance, as seen in other medical applications.
While limitations exist due to patient safety and privacy concerns, our system holds clinical value in enhancing patient safety through continuous monitoring. Addressing these challenges requires improvements in device integration, workflows, and system security. Future efforts should focus on expanding data collection and enhancing system integration and security for practicality.
Data availability
All data supporting the findings of this study are available within the paper and its Supplementary Information.
References
Page V, McKenzie C. Sedation in the Intensive Care Unit. Curr Anesthesiology Rep. 2021;11(2):92–100.
Jackson DL, et al. The incidence of sub-optimal sedation in the ICU: a systematic review. Crit Care. 2009;13(6):R204.
Devlin JW, et al. Clinical practice guidelines for the Prevention and Management of Pain, Agitation/Sedation, Delirium, Immobility, and sleep disruption in adult patients in the ICU. Crit Care Med. 2018;46(9):e825–73.
Sessler CN, et al. The Richmond agitation–sedation scale. Am J Respir Crit Care Med. 2002;166(10):1338–44.
Ely EW, et al. Monitoring sedation Status Over Time in ICU patients. JAMA. 2003;289(22):2983.
Riker RR, Picard JT, Fraser GL. Prospective evaluation of the sedation-agitation scale for adult critically ill patients. Crit Care Med. 1999;27(7):1325–9.
Saeed U, et al. Machine learning empowered COVID-19 patient monitoring using non-contact sensing: an extensive review. J Pharm Anal. 2022;12(2):193–204.
Jakkaew P, Onoye T. Non-contact respiration monitoring and body movements detection for Sleep using Thermal Imaging. Sens (Basel). 2020;20(21):6307. https://doi.org/10.3390/s20216307. PMID: 33167556; PMCID: PMC7663997.
Hu M, et al. Combination of near-infrared and thermal imaging techniques for the remote and simultaneous measurements of breathing and heart rates under sleep situation. PLoS ONE. 2018;13(1):e0190466.
Block VAJ, et al. Remote physical activity monitoring in neurological disease: a systematic review. PLoS ONE. 2016;11(4):e0154335.
Wei J, et al. Transdermal Optical Imaging reveal basal stress via heart rate variability analysis: a novel methodology comparable to Electrocardiography. Front Psychol. 2018;9:98.
Fang CY, Hsieh HH, Chen SW. A Vision-Based Infant Respiratory Frequency Detection System. in 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA). 2015.
Ahmed I, et al. Internet of health things driven deep learning-based system for non-invasive patient discomfort detection using time frame rules and pairwise keypoints distance feature. Sustainable Cities Soc. 2022;79:103672.
Güney G, Jansen TS, Dill S, Schulz JB, Dafotakis M, Hoog Antink C, Braczynski AK. Video-Based Hand Movement Analysis of Parkinson Patients before and after medication using high-frame-rate videos and MediaPipe. Sensors. 2022;22:7992. https://doi.org/10.3390/s22207992.
Hustinawaty T, Rumambi, Hermita M, Motion Detection Application to Measure Straight Leg Raise ROM Using MediaPipe Pose,. 2022 4th International Conference on Cybernetics and Intelligent System (ICORIS), Prapat, Indonesia, 2022, pp. 1–5, https://doi.org/10.1109/ICORIS56080.2022.10031299.
OpenCV. Background Subtraction. https://docs.opencv.org/3.4/de/df4/tutorial_js_bg_subtraction.html(2023). Accessed 21 Feb 2023.
Shrivastava D, et al. Bone cancer detection using machine learning techniques. Smart Healthcare for Disease diagnosis and Prevention. Academic; 2020. pp. 175–83.
Sessler CN, Gosnell MS, Grap MJ, Brophy GM, O’Neal PV, Keane KA, Tesoro EP, Elswick RK. The Richmond Agitation-Sedation Scale: validity and reliability in adult intensive care unit patients. Am J Respir Crit Care Med. 2002;166(10):1338-44. https://doi.org/10.1164/rccm.2107138. PMID: 12421743.
Grap MJ, Hamilton VA, Ann McNallen JM, Ketchum AM, Best, Nyimas Y, Isti Arief PA. Wetzel, Actigraphy: Analyzing patient movement, Heart &Lung, Volume40, Issue3, 2011, Pagese52e59, ISSN 01479563 https://doi.org/10.1016/j.hrtlng.2009.12.013.
Chase J, Geoffrey et al. Quantifying agitation in sedated ICU patients using digital imaging. Computer methods and programs in biomedicine 76.2 (2004): 131–41.
Becouze P, Pierrick, et al. Measuring facial grimacing for quantifying patient agitation in critical care. Comput Methods Programs Biomed. 2007;87(2):138–47.
Martinez M, Stiefelhagen R. Automated multi-camera system for long term behavioral monitor ing in intensive care units, in MVA, pp. 97–100, 2013.
Shung D, Huang J, Castro E, et al. Neural network predicts need for red blood cell transfusion for patients with acute gastrointestinal bleeding admitted to the intensive care unit. Sci Rep. 2021;11:8827. https://doi.org/10.1038/s41598-021-88226-3.
Lipton Z, Chase DC, Kale. Charles Peter Elkan and Randall C. Wetzel. Learning to Diagnose with LSTM Recurrent Neural Networks. CoRR abs/1511.03677 (2015): n. pag.
Acknowledgements
This study was supported by Taichung Veterans General Hospitals (TCVGH-1114404 C), National Science and Technology Council (NSTC 111-2634-F-A49-014-1) and National Science, and Technology Council (NSTC 112-2321-B-075 A-001-1-1). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Funding
None.
Author information
Authors and Affiliations
Contributions
P.D. was involved in research conception and design, data analysis and interpretation, article writing, and manuscript preparation. Y.W. was involved in data analysis and interpretation, article writing, and manuscript preparation. R.S. and C.W. oversaw the overall implementation of the project and were involved in research conception and designs, P.L., W.C., G.L., C.W., and L.C. helped to write and review this work. All authors gave final approval of the version to be published.
Corresponding author
Ethics declarations
Ethics approval
The study was approved by the Institutional Review Board (IRB) of Taichung Veterans General Hospital (IRB No. CG21307B). The video recordings of consented individuals for human trials, include patients admitted to the adult intensive care unit at Taichung Veterans General Hospital from December 1, 2022, to December 31, 2023. All procedures were conducted in accordance with the Declaration of Helsinki of 1964 and its further modifications. Participants were informed of the study’s purpose, the data’s confidentiality, and the voluntary nature of their participation. All participants gave informed consent forms.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Dai, PY., Wu, YC., Sheu, RK. et al. An automated ICU agitation monitoring system for video streaming using deep learning classification. BMC Med Inform Decis Mak 24, 77 (2024). https://doi.org/10.1186/s12911-024-02479-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12911-024-02479-2