Abstract
The way people lead their lives is considered an important factor in health. In this chapter, we describe a system to provide risk assessment based on behavior for the health insurance sector. The system processes real-world data (RWD) of individuals from their daily life that enumerate different aspects of behavior collection. The data have been captured using the Healthentia platform and a simulator that augments the actual dataset with synthetic data. Classifiers are built to predict variations of peoples’ well-being short-term outlook. Risk assessment services are provided to health insurance professionals by processing the classifier predictions in the long term while explaining the classifiers themselves provide insights on the coaching of the users of the service.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
Keywords
1 Introduction
Personalization has always been a key factor in health insurance product provision. Traditionally, it involves a screening based on static information from the customer: their medical record and questionnaires they answer. But their medical history, enumerated by clinical data, can be scarce, and it certainly is just one determinant of health. The way people live their lives, enumerated by behavioral data, is the second determinant, and for risk assessment of chronic, age-related conditions of now seemingly healthy individuals, it is the most important, as indicated by several studies. A study on diabetes prevention [1] gives evidence to the importance of lifestyle for the outcomes in youths and adults. Another study [2] correlates health responsibility, physical activity, and stress management to obesity, a major risk factor for cardiovascular diseases, type 2 diabetes, and some forms of cancer. The 2017 Global Burden of Disease Study [3] considers behavioral, environmental, occupational, and metabolic risk factors.
Risk assessment has always been an integral part of the insurance industry [4]. Unlike risk assessment in medicine that is based on continuous estimation of risk factors, its insurance counterpart is usually static, done at the beginning of a contract with a client. Dynamic personalized products are only recently appearing as data-based digital risk assessment platforms. Such platforms start transforming insurance by disrupting the ways premiums are calculated [5] and are already being utilized in car insurance. In the scope of car insurance, continuous vehicle-based risk assessment [6] is already considered important for optimizing insurance premiums and providing personalized services to drivers. Specifically, driver behavior is analyzed [7] for usage-based insurance. Moreover, telematic driving profile classification [8] has facilitated pricing innovations in German car insurance.
Similarly, personalization of health insurance products needs to be based on continuous risk assessment of the individual, since lifestyle and behavior cannot be assessed at one instance in time; they involve people’s habits and their continuous change. Health insurance products employing continuous assessment of customers’ lifestyle and behavior are dynamically personalized.
Behavioral assessments, much like their clinical counterparts, rely on data. For behavior, the data collection needs to be continuous, facilitated by software tools for the collection of information capturing the important aspects of lifestyle and behavior. In the INFINITECH project [9], insurance experts define the data to be collected, and the Healthentia eClinical system [10] facilitates the collection. Specifically, Healthentia provider interfaces for data collection from medical and consumer devices, including IoT (Internet of Things) devices. Moreover, continuous risk assessment services are provided to health insurance professionals by training machine learning (ML) prediction models for the required health parameters. ML has been used in the insurance industry to analyze insurance claim data [11, 12]. Vehicle insurance coverage affects driving behavior and hence insurance claims [13]. These previous works employed ML to analyze data at the end of the insurance pathway, after the event. Instead, in this chapter, we follow the approach in [14]. We expand on the results presented therein, focusing on the continuous analysis of data at the customer side to personalize the health insurance product by modifying the insurance pathway.
Personalized dynamic product offerings benefit both the insurance companies and their customers, but the continuous assessment imposes a burden on the customers. Insurance companies gain competitive advantages with lower prices for low-risk customers. The customers have a direct financial benefit in the form of reduced premiums due to the lower risk of their healthy behavior. They also have an indirect benefit stemming from coaching about aspects of their lifestyle, both those that drive the risk assessment models toward positive decisions and those driving them toward negative decisions. The identification of these aspects is made possible by explainable AI techniques applied on the individual model decisions. The insurance companies need to balance the increased burden of the monitoring with the added financial and health benefits of using such a system.
The system for personalized health insurance products devised in the INFINITECH project is presented in Sect. 2 of this chapter. Then, its main components are detailed, covering the data collection (Sect. 3), the model training test bed (Sect. 4), and the provided ML services of risk assessment and lifestyle coaching (Sect. 5). Finally, the conclusions are drawn in Sect. 6.
2 INFINITECH Healthcare Insurance System
The healthcare insurance pilot of INFINITECH focuses on health insurance and risk analysis by developing two AI-powered services:
-
1.
The risk assessment service allows the insurance company to adapt prices by classifying individuals according to their lifestyle.
-
2.
The coach service advises individuals in their lifestyle choices, aiming at improving their health but also in persuading them to use the system correctly.
These two services rely on a model of health outlook trained on the collected data and used in the provision of the services.
An overview of pilot system for healthcare insurance is given in Fig. 16.1. It comprises two systems, the pilot test bed, built within the INFINITECH project, and the Healthentia eClinical platform, provided by Innovation Sprint. The data are collected by Healthentia, as detailed in Sect. 3. Toward this, the complete Healthentia eClinical platform is also presented in the same section. The pilot test bed facilitates secure and privacy-preserving model training as discussed in Sect. 4. The trained model is utilized for the risk assessment and the lifestyle coach ML services detailed in Sect. 5, and the results are finally visualized by the dashboards of the Healthentia portal web app.
3 Data Collection
Two types of data are collected to train the health outlook model: measurements and user reports. The measurements are values collected by sensors, which are automatically reported by these sensors to the data collection system, without the intervention of the user. They are objective data, since their quality only depends on the devices’ measurement accuracy. They have to do with physical activity, the heart, and sleep. The physical activity measurements involve steps, distance, elevation, energy consumption, and time spent in three different zones of activity intensity (light, moderate, and intense). The heart measurements include the resting heart rate and the time spent in different zones of heart activity (fat burn, cardio, and peak). The sleep measurements include the time to bed and waking up time, so indirectly the sleep duration and the time spent in the different sleep stages (light, REM, and deep sleep).
The reports are self-assessments of the individuals; hence, they are subjective data. They cover common symptoms, nutrition, mood, and quality of life. The symptoms are systolic and diastolic blood pressure and body temperature (entered as numbers measured by the users), as well as cough, diarrhea, fatigue, headache, and pain (where the user provides a five-level self-assessment of severity from not at all up to very much). Regarding nutrition, the user enters the number of meals and whether they contain meat, as well as the consumption of liquids: water, coffee, tea, beverages, and spirits. Mood is a five-level self-assessment of the user’s psychological condition, from very positive to neutral and down to very negative. Finally, quality of life [15] is reported on a weekly basis using the Positive Health questionnaire [16] and on a monthly basis using the EuroQol EQ-5D-5L questionnaire [17], which asks the user to assess their status in five degrees using five levels. The degrees are mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, complemented with the overall numeric health self-assessment.
The data collection is facilitated by Healthentia [10]. The platform provides secure, persistent data storage and role-based, GDPR-compliant access. It collects the data from the mobile applications of all users, facilitating smart services, such as risk assessment, and providing both original and processed information to the mobile and portal applications for visualization. The high-level architecture of the platform is shown in Fig. 16.2. The service layer implements the necessary functionalities of the web portal and the mobile application. The study layer facilitates study management, organizing therein healthcare professionals, participants, and their data. They can be formal clinical studies or informal ones managed by pharmaceutical companies, hospitals, research centers, or, in this case, insurance companies.
The Healthentia core layer comprises of high-order functionalities on top of the data, like role-based control, participant management, participants’ report management, and ML functionalities. The low-level operations on the data are hosted in the data management layer. Finally, the API layer provides the means to expose all the functionalities of the layers above to the outside world. Data exporting toward the pilot test bed and model importing from the test bed are facilitated by it.
The Healthentia mobile application (Fig. 16.3) enables data collection. Measurements are obtained from IoT devices, third-party mobile services, or a proprietary sensing service. User reports are obtained via answering questionnaires that either are regularly pushed to the users’ phones or are accessed on demand by the users themselves. Both the measured and reported data are displayed to the users, together with any insights offered by the smart services of the platform.
The Healthentia portal application (Fig. 16.4) targets the health insurance professionals. It provides an overview of the users of each insurance organization and details for each user. Both overview and details include analytics based on the collected data and the risk assessment insights. It also facilitates managing the organization, providing, for example, a questionnaire management system to determine the types of self-assessments and reports provided by the users.
4 INFINITECH Healthcare Insurance Pilot Test Bed
The INFINITECH healthcare insurance test bed facilitates model training by providing the necessary regulatory compliance tools and the hardware to run the model training scripts, whenever new models are to be trained. Its high-level architecture is shown in the upper part of Fig. 16.1. The test bed ingests data from Healthentia, processes it for model training, and offers the means to perform the model training. It then provides the models back to Healthentia for online usage in risk assessment.
The regulatory compliance tools provide the data in the form compliant for model training. The tools comprise the Data Protection Orchestrator (DPO) [18, 19], which among others regulates data ingestion, and the anonymizer, which are presented in detail in Chap. 20.
The INFINITECH data collection module [20] is responsible for ingesting the data from Healthentia utilizing the Healthentia API when so instructed by the DPO. It then provides data cleanup services, before handling the data to the INFINITECH anonymizer [21, 22]. The ingested data are already pseudo-anonymized, as only non-identifiable identifiers are used to designate the individuals providing the data, but the tool performs anonymization of the data itself. The anonymized data are stored in LeanXcale [23], the test bed’s database. Different anonymized versions are to be stored, varying the effect of anonymization, aiming at determining its effect on the trained model quality. The model training is an offline process. Hence, the ML engineers responsible for model training will be instructing the DPO to orchestrate data ingestion at different anonymization levels. Models based on logistic regression [24], random forest [25], and (deep) neural networks [26] are trained to predict the self-reported health outlook variation on a weekly basis. Binary and tristate models have been trained using data collected in the data collection phase of healthcare insurance pilot, involving 29 individuals over periods of time spanning 3–15 months. The classification accuracy of the random forest models is best in this context, since the dataset is limited for neural networks of some depth. They are shown in Fig. 16.5.
Shapley additive explanations (SHAP) analysis [27] is employed to establish the impact of the different feature vector elements in the classifier decisions (either positive or negative). Average overall decisions, this gives the importance of the different attributes for the task at hand. Attributes of negligible impact on decisions can be removed, and the models can be retrained on a feature space of lower dimensions. Most importantly though, SHAP analysis is used in the virtual coaching service as discussed in Sect. 5.2.
These models are not the final ones. During the pilot validation phase that will start in September 2021, 150 individuals will be using the system for 12 months. Approximately two-thirds of these participants will be used to keep on retraining the models.
5 ML Services
Two services are built using the models trained in Sect. 4. The model decisions accumulated across time per individual are used in risk assessment (see Sect. 5.1), while the SHAP analysis of the individual decisions is used in virtual coaching (see Sect. 5.2).
Since the actual data collected thus far are barely enough for training the models, a synthetic dataset is generated to evaluate the performance of both services. Five behavioral phenotypes are defined in the simulator of [14]. Two are extreme ones: at one end, the athletic phenotype that likes indoors and outdoors exercising and enjoys a good night’s sleep and, at the other end, the gamer who is all about entertainment, mainly indoors, enjoys work, and is not too keen on sleeping on time. In between lies the balanced phenotype, with all behavioral traits being more or less of equal importance as they are allowed a small variance around the average. Two random variants of the balanced phenotype are also created, with the behavioral traits allowed quite some variance from the balanced state. The one random phenotype is associated with excellent health status, while the other one is associated with a typical health status. 200 individuals of each phenotype are simulated for a duration of 2 years.
5.1 Personalized Risk Assessment
Personalized risk assessment is based on the decisions of the models for each individual. The assessments are long term in the sense that they take into account all the model decisions over time intervals that are very long. In this study, we calculate the long-term averages of the different daily decisions with a memory length corresponding roughly to half a year. There are two such averaged outputs for models with binary decisions and three for those with tristate ones. In every case, the averages are run for the whole length of the synthetic dataset (two years), and for each day of decision, they sum to unity. At any day, the outlook is assessed as the sum of all the averaged positive outcomes from the beginning of the dataset up to the date of the assessment, minus the sum of the negative ones in the same time interval. In the tristate case, the difference is normalized by the sum of the constant ones. The resulting grade is multiplied by 100 and thus can be in the range of [−100,100]. Obviously, outlook grades larger than zero correspond to people whose well-being outlook has been mostly positive in the observation period, and outlook grades smaller than zero correspond to people whose well-being outlook has been mostly negative in the observation period.
The daily evolutions of the accumulated model decisions for the 2 years of observation for the first athletic, balanced, and gamer simulated person are shown in Fig 16.6. Clearly, the athletic person is doing great, and the balanced one is doing quite good. The gamer is not worsening but looks rather stagnant. The histograms of the outlook grades after 2 years are shown in Fig. 16.7 for each of the five behavioral phenotypes. It should be no surprise that the two extreme phenotypes are at opposite sides of the outlook spectrum, clearly separated by a range of values occupied by the balanced and random phenotypes. It is the actual activities done that determine the outlook grade, and in the balanced and random phenotypes, the selection of activities is quite different within each phenotype, so they exhibit quite a lot of spread in the outlook, as expected in real life.
5.2 Personalized Coaching
The SHAP analysis results for the individual decisions are shown in Fig. 16.8. Each row corresponds to a lifestyle attribute, and each dot in a specific row corresponds to the value of that element in one of the input daily vectors. The color of the dot indicates the element’s value (from small values in blue to large values in red). The placement of the dot on the horizontal axis corresponds to the SHAP value. Values close to zero correspond to lifestyle attributes with negligible effect on the decision, while large positive or negative values correspond to lifestyle attributes with large effects. The vertical displacement indicates how many feature vectors fall into the particular range of SHAP values. Thus, thick dot cloud areas correspond to many input daily vectors in that range of SHAP values. Dots on the left correspond to attribute values that direct one toward a prediction that health is improving, while dots on the right suggest a worsening of health. For example, red dots of large values of the body mass index trend (increasing weight) are on the right indicating negative health outlook. Purple dots of moderate body mass index trend are around zero, indicating negligible effect on the decisions. Finally, blue dots indicating a trend to lose weight are on the left, indicating improved health outlook.
The individual SHAP coefficients per model decision are employed to establish per person importance of lifestyle attributes in positive or negative well-being outlook. The most influential lifestyle attributes for a positive outlook are collected over any short time interval, as are those with the largest positive SHAP coefficients. Similarly, the most influential negative attributes are obtained for the same interval. Then, the individual is coached about these positive and negative attributes. The personalized coach offers advice toward behaviors of the positive attributes and away from those of the negative attributes.
It is worth mentioning that the explainable AI technique for personalized coaching discussed here is only about the content of the advice. An actual virtual coach should also involve decisions on the timing, the modality, and the tone of the messages carrying the advice to the individuals.
6 Conclusions
The INFINITECH way of delivering personalized services for health insurance is discussed in this chapter. To that extent, the healthcare insurance pilot of the project is integrating Healthentia, an eClinical platform for collecting real-world data from individuals into a test bed for training classification models. The resulting models are used in providing risk assessment and personalized coaching services. The predictive capabilities of the models are acceptable, and their use in the services to analyze the simulated behavior of individuals is promising. The actual validation of the provided services is to be carried out in a pilot study with 150 individuals, which started in September 2021 and will last in 1 year.
Our future work on this usage-based healthcare insurance pilot will not be confined to validating the technical implementation of the pilot system, including its data collection and machine learning-based analytics parts. We will also explore how such usage-based systems can enable new business models and healthcare insurance service offerings. For instance, we will study possible pricing models and related incentives that could make usage-based insurance more attractive than conventional insurance products to the majority of consumers.
References
Grey, M. (2017). Lifestyle Determinants of Health: Isn’t it all about genes and environment? Nursing Outlook, 65, 501–515. https://doi.org/10.1016/j.outlook.2017.04.011
Joseph-Shehu, E. M., Busisiwe, P. N., & Omolola, O. I. (2019). Health-promoting lifestyle behaviour: A determinant for noncommunicable diseases risk factors among employees in a Nigerian University. Global Journal of Health Science, 11, 1–15.
Stanaway, J. D., Afshin, A., Gakidou, E., Lim, E. S., Abate, D., Abate, K. H., Abbafati, C., Abbasi, N., Abbastabar, H., Abd-Allah, F., et al. (2018). Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Global Health Metrics, 392, 1923–1994.
Blackmore, P. (2016). Easier approach to risk profiling. Available online: https://www.insurancethoughtleadership.com/easier-approach-to-risk-profiling/. Accessed on 1/3/2021.
Blackmore, P. (2016). Digital risk profiling transforms insurance. Available online: https://www.insurancethoughtleadership.com/digital-risk-profiling-transforms-insurance/. Accessed on 1/3/2021.
Gage, T., Bishop, R., & Morris, J. (2015). The increasing importance of vehicle-based risk assessment for the vehicle insurance industry. Minnesota Journal of Law, Science & Technology, 16, 771.
Arumugam, S., & Bhargavi, R. (2019). A survey on driving behavior analysis in usage based insurance using big data. Journal of Big Data, 6, 1–21.
Weidner, W., Transchel, F. W. G., & Weidner, R. (2017). Telematic driving profile classification in car insurance pricing. Annals of Actuarial Science, 11, 213–236.
Infinitech H2020. (2021). Infinitech—The flagship project for digital finance in Europe. Available online: https://www.infinitech-h2020.eu/. Accessed on 7/6/2021.
Innovation Sprint. (2021). Healthentia: Driving real world evidence in research & patient care. Available online: https://innovationsprint.eu/healthentia. Accessed on 7/6/2021.
Bermúdez, L., Karlis, D., & Morillo, I. (2020). Modelling unobserved heterogeneity in claim counts using finite mixture models. Risks, 8, 10.
Burri, R. D., Burri, R., Bojja, R. R., & Buruga, S. (2019). Insurance claim analysis using machine learning algorithms. International Journal of Innovative Technology and Exploring Engineering, 8, 147–155.
Qazvini, M. (2019). On the validation of claims with excess zeros in liability insurance: A comparative study. Risks, 7, 71.
Pnevmatikakis, A., Kanavos, S., Matikas, G., Kostopoulou, K., Cesario, A., & Kyriazakos, S. (2021). Risk assessment for personalized health insurance based on real-world data. Risks, 9(3), 46. https://doi.org/10.3390/risks9030046
Revicki, D. A., Osoba, D., Fairclough, D., Barofsky, I., Berzon, R., Leidy, N. K., & Rothman, M. (2000). Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Quality of Life Research, 9, 887–900.
Huber, M., van Vliet, M., Giezenberg, M., Winkens, B., Heerkens, Y., Dagnelie, P. C., & Knottnerus, J. A. (2016). Towards a ‘patient-centred’ operationalisation of the new dynamic concept of health: a mixed methods study. BMJ Open, 6, e010091. https://doi.org/10.1136/bmjopen-2015-010091
Stolk, E., Ludwig, K., Rand, K., van Hout, B., & Ramos-Goñi, J. M. (2019). Overview, update, and lessons learned from the international EQ-5D-5L valuation work: Version 2 of the EQ-5D-5L valuation protocol. Value in Health, 22, 23–30.
Notario, N., Cicer, E., Crespo, A., Real, E. G., Catallo, I., & Vicini, S. (2017). Orchestrating privacy enhancing technologies and services with BPM Tools. The WITDOM data protection orchestrator. ARES’17, Reggio Calabria, Italy.
INFINITECH H2020 consortium. (2021). D3.16 – Regulatory compliance tools – II
INFINITECH H2020 consortium. (2020). D5.13 – Datasets for algorithms training & evaluation – I
Adkinson, O. L., Dago, C. P., Sestelo, M., & Pintos, C. B. (2021). A new approach for dynamic and risk-based data anonymization. In Á. Herrero, C. Cambra, D. Urda, J. Sedano, H. Quintián, & E. Corchado (Eds.), 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020) (CISIS 2019. Advances in intelligent systems and computing) (Vol. 1267). Springer. https://doi.org/10.1007/978-3-030-57805-3_31
INFINITECH H2020 consortium. (2021). D3.13 – Data governance framework and tools – II
LeanXcale. (2021). LeanXcale: The database for fast-growing companies. Available online: http://leanxcale.com. Accessed on 4/6/2021.
Tolles, J., & Meurer, W. J. (2016). Logistic regression relating patient characteristics to outcomes. JAMA., 316(5), 533–534.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2, 56–67.
Acknowledgments
This work has been carried out in the H2020 INFINITECH project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 856632.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Pnevmatikakis, A., Kanavos, S., Perikleous, A., Kyriazakos, S. (2022). Risk Assessment for Personalized Health Insurance Products. In: Soldatos, J., Kyriazis, D. (eds) Big Data and Artificial Intelligence in Digital Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-94590-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-94590-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94589-3
Online ISBN: 978-3-030-94590-9
eBook Packages: EngineeringEngineering (R0)