Keywords

8.1 Introduction

There is a lot of excitement about the potential benefits that the use of artificial intelligenceFootnote 1 (AI) can bring to healthcare. Health systems are struggling with rising costs, staff shortages and burnout, an increasingly elderly population with more complex health needs, and health outcomes that often fall short of expectations. AI is seen as the next step in addressing these challenges, with the hype so high to prompt leading US digital medicine researcher Eric Topol to compile an amusing list of “outlandish expectations” of AI, such as: the ability to diagnose the undiagnosable; treat the untreatable; predict the unpredictable; classify the unclassifiable; eliminate workflow inefficiencies; and cure cancer [29].

There is certainly a strong appetite among governments around the world to promote the use of AI in healthcare. In the UK, a dedicated body (NHSXFootnote 2) has been set up with a remit to accelerate the digital transformation of the National Health Service (NHS) and to support the development and integration of AI applications into the NHS.

Examples of the potential benefit that AI can bring to healthcare can be found readily in news reports and in the scientific literature. Over 200 AI-based medical devices have already received regulatory approval in Europe and the USA [16], and there are many more AI applications that do not require such approvals (i.e., which fall outside the narrow definition of medical devices). The area of diagnostics is particularly strong with examples including AI applications to support identification of diabetic retinopathy [1], skin cancer [7], and, recently, to distinguish COVID-19 from other types of chest infections [12]. Other developments include, for example, ambulance service triage, sepsis diagnosis and prognosis, patient scheduling, and drug and vaccine development.

While these studies provide encouraging results, the evidence base remains weak for several reasons: the focus of the evaluation is usually on a narrowly defined task; the evaluation is typically undertaken retrospectively by the technology developer, and independent evaluation remains the exception; the number of human participants tends to be small; and prospective trials are still infrequent. Taken together, claims that AI outperforms humans are likely to be overstated given the limitations in the study design, reporting and transparency, and the high risk of study bias [17].

The real challenges for the adoption of AI in healthcare will arise when algorithms are integrated into health systems to deliver a service in collaboration with healthcare professionals as well as other technology. It is at this level of the wider system, where teams of consisting of healthcare professionals and AI applications cooperate and collaborate to provide a service, that safety challenges will need to be addressed [26].

The aim of this paper is to review and highlight some of the safety challenges at the system level relevant to the use of AI in healthcare settings by looking back at what we already know from the extensive research on automation, as well as what novel challenges might need further attention. Two examples are described (1. Design of an autonomous infusion pump and 2. Implementation of AI in an ambulance service call centre to detect out-of-hospital cardiac arrest) to illustrate the types of design and implementation issues that should be considered.

8.2 Challenges Old and New

Many of the challenges with using AI in healthcare are actually very familiar. Back in the 1970s and 1980s, industrial systems saw the widespread introduction of automation to improve efficiency and to reduce failures attributed to human error. This was soon accompanied by research studying failures involving highly automated systems, which highlighted the potential for “automation surprises” [22] and the “ironies of automation” [2]. Problems with automation can arise because people are not actually eliminated from the system, but instead the automation changes the nature of the work that people do [18], often resulting in a set of tasks, which were left over by the developers of the automation. This can make the human role and the interaction between people and automation challenging, e.g., due to lengthy periods of monitoring, the need to respond to abnormal situations under time pressure, and the difficulty of building an understanding of different situations and strategies for their management.

However, modern AI systems (especially those that are increasingly autonomous) also present completely new challenges that were not as relevant in the design of traditional automated systems. AI systems can augment what people do in ways that were not possible when machines simply replaced physical work. The interaction with interconnected AI-based systems could potentially develop more into a relationship between people and the AI, especially where the AI has means of expressing something akin to a personality via its interfaces [11]. Social aspects will become much more relevant, as well as mutual understanding of expected behaviours and norms. We might think of, for example, the seemingly ubiquitous voice-enabled virtual assistants (e.g., Amazon’s Alexa or Apple’s Siri) that aim to deliver a realistic and natural social interaction experience. Examples from healthcare might include mental health chatbots and assistive robots. These relationships between people and technology, along with social, cultural and ethical aspects have much greater importance for future AI-based systems than for traditional automation. Healthcare professionals, patients and AI will increasingly collaborate as part of the wider clinical system.

The use of healthcare AI also raises ethical challenges on a much wider and more fundamental scale compared with traditional automation. For example, concerns about privacy and data protection have come to the fore, such as the controversy and subsequent litigation around the transfer of 1.6 m identifiable confidential electronic patient records from the Royal Free London NHS Foundation Trust to Google subsidiary Deep Mind in 2015. This data transfer was within the scope of a collaboration to develop a tool to support the identification of patients at risk of developing acute kidney injury (which was subsequently abandoned), but the data sharing agreement did not impose any explicit bounds on the use of these patient records. In addition, wider issues around fairness and impact on different stakeholder groups need to be considered, such as racial bias and disparities in accuracy across different population groups [30]. Many data sets are representative of more affluent health systems, and are, therefore, at risk of disadvantaging ethnic minority and vulnerable groups. Fairness at the health system level can also go beyond issues of bias in training data. For example, the use of AI-based patient-facing symptom checkers paired with remote consultations (such as the UK “GP at hand” service offered by Babylon Health) can potentially disadvantage elderly patients and those with significant healthcare needs by shifting and depleting the budget allocated to primary care: these services are typically attractive to younger, healthier populations, leaving traditional primary care services to care for more complex cases with a significantly reduced budget. It is important to note that addressing such concerns requires a broader range of expertise and a social and political dialogue to advance health equity in the age of AI [23].

Designers of AI and healthcare organisations deploying AI should be aware of these critical considerations at the systems level. Examples from the extensive literature have been summarised in a recent White Paper published by the UK Chartered Institute of Ergonomics and Human Factors and include (not an exhaustive list) [27]:

  • Situation awareness: design options need to consider how AI can support, rather than erode, people’s situation awareness. The Distributed Situation Awareness (DSA) model emphasises the systems perspective on situation awareness [24]. According to this model, situation awareness is distributed around the socio-technical system and is built through interactions between agents both human and non-human (e.g., AI). Understanding the situation awareness requirements of each agent can inform the design of the AI and its integration into the wider system.

  • Workload: the impact of AI on workload needs to be assessed because AI can both reduce as well as increase workload in certain situations. An example of unintended increase in workload is the introduction of electronic health records, which led to situations where clinicians spend around 40% of their time on data entry [10].

  • Automation bias: automation bias (or automation-induced complacency) describes the phenomenon that people tend to trust and then start to rely on automated systems uncritically [18]. Studies on automation bias suggest that the accuracy figures of AI applications in isolation do not allow prediction of what will happen in clinical use, when the clinician is confronted with a potentially inaccurate system output [13]. Strategies need to be considered to guard against people relying uncritically on the AI, e.g., the use of explanation and training.

  • Explanation and trust: explainability and transparency of AI decision making might reduce the potential for automation bias. However, there is limited agreement on how to achieve this. Many approaches focus on providing detailed accounts of how an algorithm operates, i.e., to provide explanation of why a decision was made, for example, by reference to salient features [15]. In order for explanation to be fully useful, and to support building and maintaining trust in AI decision making, efforts need to be put into developing interfaces that enable users to interrogate recommendations and to allow dialogue between the user and the AI.

  • Human–AI teaming: models of teamwork, e.g., the Big Five model (Salas et al., 2005), can provide insights for the design of behaviours (leadership, mutual performance monitoring, back-up behaviour, adaptability, and team orientation) and supporting mechanisms (shared mental models, mutual trust, closed-loop communication) to enhance human–AI teaming. The design should consider how human team members can understand the AI’s roles and responsibilities, and—more challenging—how the AI can understand the human’s roles and responsibilities, e.g., in dynamic AI applications that take over human tasks when people are at risk of being overloaded. It is also important that appropriate mental models are shared across human and AI team members.

  • Training: people require opportunities to practise and retain their skill sets when AI is introduced, and they need to have a baseline understanding of how the AI works. Maintaining core skills is important to provide healthcare workers with the confidence to override and take over from AI applications. Healthcare workers also need to understand potential weaknesses of the AI and how the safe envelope is defined, maintained or breached.

  • Relationships between staff and patients: the impact on relationships needs to be considered, e.g., whether staff will be working away from the patient once more and more AI is introduced [28].

  • Privacy and ethical concerns: at the European level, the High-Level Expert Group on AI published “Ethics Guidelines for Trustworthy AI” [9]. The guidelines are based on a Fundamental Rights Impact Assessment and operationalise ethical principles through seven key requirements: human agency and oversight; technical robustness and safety; privacy and data governance; transparency; diversity, non-discrimination and fairness; societal and environmental well-being; and accountability. These ethical requirements necessitate a thorough understanding of stakeholders and their diverse needs and expectations.

Below, two examples are described to illustrate the type of considerations that follow from this line of systems thinking for the design and use of healthcare AI.

8.3 Example 1: AI-based infusion pumps for IV medication administration

This first illustrative example is taken from a project that studied safety assurance challenges of the use of autonomous infusion pumps (i.e., infusion pumps driven by AI) for intravenous (IV) medication administration in intensive care [25]. The purpose of the research was to make recommendations that could feed into the design and implementation of the autonomous technology in such a way that its use enhances rather than diminishes safety.

It has been estimated that as many as 237 million medication errors occur in England every year, and that these cause over 700 deaths [6]. Intravenous medication preparation and administration are particularly error prone. The introduction of highly automated and ultimately autonomous IV medication management systems might contribute to reducing these error rates by taking over functions previously carried out by clinicians, such as safety cross-checks (e.g., patient identity and prescription), calculating infusion rates and independently adjusting infusion parameters based on the patient’s physiology. A large UK-US study found that whether or not infusion technology successfully improves patient safety depended largely on the specific context of implementation within the clinical system [3].

The project was carried out in an English NHS hospital, serving a population of 600,000. The intensive care unit (ICU) within the hospital has 16 beds and cares for 1300 patients annually. Patients on ICU are, by default, very ill. Patients can be on life support machines, such as ventilators, and they typically require a significant number of drugs. Some of these drugs are given intravenously via an infusion pump. The infusion pump controls the flow of the drug. The traditional set-up is that a doctor (or clinician with prescribing privileges) prescribes a drug as part of the patient’s treatment plan, and a nurse then needs to prepare the drug syringe, load the infusion pump with the drug syringe and then program the infusion pump to run at the required infusion rate for a specific duration. This is the baseline scenario used for illustration in this paper. A more comprehensive description of the analysis is given in [8].

Interviews with patients, healthcare professionals and individuals with responsibility for procurement, IT integration and training were undertaken, as well as an analysis of existing working practices. The interviews and analysis helped to anticipate and explore potential implications for the design of the AI and the impact on the wider clinical system, such as:

  • Will clinicians’ skills related to medication administration be affected? This relates to training needs in as far as clinicians require opportunities to practice their core clinical skills after the AI has taken over this task. The ICU had already observed a decrease in manual drug dose calculation skills after so-called smart pumps were brought in, which automated this task.

  • What new skills do clinicians require, e.g., how to tell if an autonomous infusion pump is working correctly? This also relates to training, but is concerned with how to manage and supervise an AI system, e.g., how to tell the difference between “good” and “bad” AI performance, what to look out for and how to recognise the limitations of the AI.

  • How will the relationship between clinicians be affected? The autonomous system replaces the practice of double checking by a second nurse, which often serves also as an opportunity for teaching and discussion.

  • How will the relationship between patients and clinicians be affected? The use of autonomous infusion pumps could provide nurses with more time to spend with patients—or nurses might be spending more time managing and supervising autonomous systems away from the patient’s bedside. Patients on ICU form very close bonds with their nurses, and they are anxious that nurses might spend more time away from the bedside. Nurses suggest that the operation of (standard) infusion pumps also provides them with an opportunity to do other things concurrently, e.g., check up on the patient’s wellbeing and social/psychological needs.

  • How will the autonomous system interact with other systems, e.g., other autonomous infusion pumps or the electronic health record, and what will be the impact on the overall IT infrastructure (e.g., in case of failures)? Lack of interoperability of IT systems is a major problem in clinical settings.

  • What is the impact on the medication administration task, e.g., does the autonomous system reduce clinician workload by taking over parts of the task or does it increase workload, e.g., due to monitoring and administration requirements? For example, the AI system requires high-quality data, but electronic patient records are often incomplete. This can be potentially safety-critical unless clinicians spend additional time providing that high-quality data to the AI.

  • How does the autonomous system impact clinician situation awareness, if clinicians do not manage infusion settings by themselves any longer? Is the autonomous system able to exchange situation awareness with the clinician? Can clinicians easily tell what the system is doing and what kind of situation awareness it has, e.g., through the use of interfaces that explain behaviour and that allow clinicians to explore what the AI is doing?

  • What is the impact on the perception of job roles, e.g., on the nursing role? Will nurses be regarded as autonomous clinicians who manage and supervise autonomous infusion pumps potentially away from the bedside, or will nurses’ roles change towards more personal caring tasks with less responsibility and authority for managing medications?

These considerations at the level of the clinical system can support designers of AI applications in defining the operating environment and in understanding relevant interactions with people, other tools and systems, other tasks that might be relevant and the characteristics of the local work environment.

8.4 Example 2: AI to support the recognition of out-of-hospital cardiac arrest

The second example is concerned with the implementation of an AI support system in an NHS ambulance service to improve the recognition rate of and time to recognition of out-of-hospital cardiac arrest (OHCA). Currently in the UK, approximately 30,000 people sustain an out-of-hospital cardiac arrest (OHCA) annually, and survival to hospital discharge ranges from 2.2 to 12% [20]. Early defibrillation within the initial 3–5 min could deliver survival rates of 50–70%, but each minute of delay to defibrillation reduces the probability of survival by 10% [19]. Hence, speedy recognition of OHCA by the ambulance service call handler is crucial to support bystander cardiopulmonary resuscitation and to enable fast paramedic attendance at the scene. However, recognition of OHCA is difficult, because signs can be subtle, and the international evidence demonstrates that around 25% of OHCA are not picked up by call centre operators [5].

An AI system to support ambulance service call handlers in recognising OHCA has been developed by a Danish manufacturer, and initial independent retrospective evaluation with data from Copenhagen produced encouraging results demonstrating that the AI system had higher sensitivity than call handlers (84% vs. 75%), but slightly lower specificity (97% vs. 99%) [5]. However, a subsequent randomised controlled trial of the technology in use found that the AI support did not lead to improved recognition of OHCA [4].

Again, this reinforces the need for consideration of the wider system when designing and implementing AI technology. Taking a systems perspective can help understand the breadth of design decisions and their potential impact, especially when considering the implementation of the AI tool in a different context (in this case using the technology in a more rural environment as opposed to the urban environment where it was initially tested). For example, the interaction between ambulance service call handler and the AI can be designed to accommodate different levels of support (or levels of automation); see also Fig. 8.1:

Fig. 8.1
An illustration. It includes information acquisition, information analysis, decision selection, and action implementation with current, medium-term future, and long-term future. A I Chatbot asks about symptoms, prioritizes, and dispatches an ambulance.

Different levels of automation and interaction in the implementation of AI support for the recognition of out-of-hospital cardiac arrestFootnote

Figure from the white paper “Human Factors and Ergonomics in Healthcare AI”, 2021, reproduced with permission from the UK Chartered Institute of Ergonomics and Human Factors. All rights reserved.

  • No AI support (current situation).

  • AI operates autonomously (full automation), e.g., an AI chatbot interacts with the caller, asks for symptoms, prioritises the symptoms and selects call priority, and then dispatches an ambulance according to the call priority

  • Several levels of support and interaction in-between, e.g., the call handler leads the conversation with the caller, but the AI picks up symptoms and prioritises these, the call handler makes the call priority decision, and the AI dispatches the ambulance.

Interviews with ambulance service staff and a cognitive task analysis [14] of call handlers’ tasks suggested that different types of interaction design might have far-reaching consequences that need to be considered:

  • To what extent should the AI communicate to the call handler the reasoning behind its decision making, e.g., should the AI simply pop up an alert suggesting that it recognised a potential cardiac arrest, or should it provide a running commentary on what it considered for that decision?

  • How will call handlers know about whether the AI is making good or bad decisions?

  • How will the interaction with the AI affect call handler workload? Will looking at AI alerts increase or decrease workload?

  • How will the false positive rate of the AI affect call handler trust in the system? Will call handlers disregard the AI input or will they start over-relying on it?

  • How will call handlers’ skills in recognising cardiac arrested be affected?

In addition to the above questions about the interaction between the AI and the call handler, it is also not clear how the AI best augments what the call handlers do, i.e., how to support call handlers with difficult decisions. For example, the cognitive task analysis identified issues that make OHCA recognition more challenging as well as strategies that call handlers employ to overcome these difficulties:

  • there are difficulties in understanding what is being said, e.g., the caller has poor mobile phone reception, does not speak the language or has speech impairments;

  • the caller is in a panic and unable to provide a coherent description;

  • the caller might be hesitant to provide accurate description of the patient’s condition, e.g., a close relative being in shock and denial; or the caller might use ambiguous and contradictory language;

  • the caller is hesitant to provide cardiopulmonary resuscitation;

  • strategies that call handlers employ include aiming to calm down the caller, asking clarifying questions, listening to background noises (e.g., patient breathing), and using synonyms to describe symptoms (e.g., “is the patient gulping for air like a fish out of the water?” to establish whether the patient is breathing sufficiently).

8.5 Conclusion

The aspiration of using AI to improve the efficiency of health systems and to enhance patient safety requires a transition from the predominant technology-centric focus that contrasts people and AI (“humans vs. machines”) towards a systems approach that considers AI as part of the wider health system.

Several lessons can be learned from research and practical experiences with the design and operation of highly automated systems. However, advanced AI systems also present novel challenges around social and relational aspects, and human–AI teaming. Addressing these requires a multidisciplinary approach as well as a broader political and societal dialogue around fairness and values in algorithms. This should be reflected in policies of research funding bodies and regulators, because funding specifications and regulatory frameworks frequently only reflect the technology-centric perspective of AI rather than reinforcing a systems approach.

There is a need to raise awareness of these issues among healthcare stakeholders, because Human Factors and Ergonomics (HF/E) and safety science, which advocate a systems approach, are currently not sufficiently well embedded in health systems.