Introduction

“One of the most concerning developments is the weaponisation of AI technologies” (Bode, 2023) and similar statements are posed by authors, engineers and spokespersons in the public domain. A response to the apparent concerns raised by technologies such as AI and autonomous systems is the call for ethical frameworks, governance, and regulation of such systems. Within the engineering community the IEEE presented their ethics by design framework for systems design, captured in the P7000 standards series (IEEE, 2021). The IEEE P7000 is a “Model Process for Addressing Ethical Concerns During System Design”. This paper serves to tailor the IEEE ethics by design framework to the development of autonomous and AI enabled systems in the context of defence.Several definitions of autonomous systems are proposed. We start with a rather technical definition of autonomous and AI enabled systems as provided by the OECD and will provide a refined version later. The OECD makes explicit mention of both AI and autonomyFootnote 1: “An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.” (OECD, 2024). Without proving a definition of autonomous systems here, we pose that the internal workings of autonomous and AI enabled systems often rest on a combination of technical and mathematical optimization principles. What to optimize for (e.g., accuracy, speed, user experience), and what to assign greater or lesser weight to in the design process (e.g., live input values from sensors or entry fields, pre-set values), involves human design choices. Throughout the design process, there are questions about which problem to optimize for at the cost of which other problem and such optimization decisions are not technical in nature, but rather ethical, as they are about which problems are more important (Danks, 2022). For example, as those who have tried out current versions of ChatGPT (a trained model which interacts in a conversational way with its users), ChatGPT’s designers seemed to prioritize resemblance to human language and optimizing for accurate and harmonious grammatical patterns over the value of presenting reliable information or accurate facts (Cassel, 2023). Explicit or implicit ethical choices during technology ideation and development inscribe digital technologies with values. The IEEE P7000 resembles the ‘value sensitive design’ (VSD) methodology, initially developed by Friedman and Kahn in the 1990’s (Friedman & Kahn, 2003). Both VSD and IEEE P7000 pay explicit attention to stakeholders’ values in systems design. In the case of systems design for defence purposes, the systems are likely to be used by military organizations on behalf of their nations. Therefore, when it comes to systems design for military purposes in the spirit of VSD and according to IEEE P7000 standards, one could consider the populations of the nations on whose behalf technologies are utilized an important stakeholder. Military personnel that operate these technologies are drawn from society and act on behalf of society. With an increase in autonomous functions in such technologies, the systems’ ‘actions and decisions’ represent societal values that have guided the systems use and design.

There are several additional reasons for including society as a stakeholder in value sensitive systems design for defence. First, there is a methodological reason for including the civil population as a stakeholder group in autonomous systems design for defence when following the VSD approach, as VSD calls for inclusion of relevant stakeholders.

Second, paying explicit attention to values in systems design is desirable in itself, because it increases the likelihood that these systems are not merely optimized in terms of efficiency, functionality and cost, but that they are also ethically ‘better’. Third, there are pragmatic reasons to pay careful attention to ethical values expressed by society, because technologies that include a society’s values are more likely to be accepted by that society (Taebi, 2017).

Fourthly, public opinion on autonomous systems in defence is not generally well-understood, apart from sporadic and rather simplistic surveys on public opinion on autonomous weapons (Boshuijzen-van Burken et al., 2024) and therefore, academic research in this area is desirable.

A final reason is of a legal nature. In international humanitarian law the so-called Martens Clause states that “if there is no specific law on a topic, civilians are still protected by the principles of humanity and dictates of public conscience”.Footnote 2 Although the interpretation of this clause varies between countries, it could be argued that in the case of developing autonomous systems for defence, in particular systems that may play role in the application of lethal force, the Martens Clause is relevant. What our research contributes to is to grasp some resonances of the ‘dictates of public conscience’ through value mining from the general public, in this case in Australia.

Over the past decade an overwhelming amount of ethics frameworks, principles and guidelines were published around the emergence and uptake of AI in society with new publications on the topic still emerging. Legislation is in place in the form of the European Union AI Act (European Parliament, 2024). It is noteworthy that the EU AI Act excludes defence applications from its scope. Some have criticized the focus on principles and have pointed out that ethics is not ‘done’ by following guidelines or principles (Casiraghi, 2023; Koniakou, 2023), while others have argued that the idea of AI ethics is useless (Munn, 2022). Similar initiatives for ethics frameworks to guide the developments and use of autonomous systems in a military context emerged (Devitt et al., 2020; Enemark, 2023; ICRC, 2021; NATO, 2021b; UK Ministry of Defence, 2022; US DOD, 2023). It is not our aim to add another framework, but to guide readers with a specific interest in autonomous systems in defence to the relevant frameworks, while complementing what has been done by others with a unique dataset that includes values mined from the Australian population. A detailed theoretical background of the topic is discussed in preceding research on VSD for autonomous systems in defence (Boshuijzen-van Burken 2023a, 2023b).

We start with a brief summary of what value sensitive design is. The core of the paper follows the VSD structure of conceptual, empirical and technical considerations and discusses these considerations in subsequent sections. We conclude the paper with a discussion and recommendations for future research.

Value sensitive design

Value Sensitive Design (VSD) was initially developed by Batya Friedman and Peter Kahn in the early 1990s. Through a conceptual analysis of human agency and responsible computer systems design, they argued for bringing in moral deliberations into computer systems design (2003). VSD is now a proven concept for including ethical considerations into technology development, and explicitly calls for stakeholder engagement for mining of stakeholder values. It is an iterative tripartite design methodology existing of conceptual, empirical and technical investigations (Friedman, 1996, 1998; Friedman & Kahn, 2003). The VSD methodology has been used in numerous contexts, see Winkler and Spiekerman (2021) for an overview. We use VSD to inform design of military autonomous systems. Friedman and Hendry state that “the design process engages reciprocally with and, […] co-evolves technology and social structure. Social structures are viewed broadly and may include policy, law, regulations, organizational practices, social norms, and others.” (Friedman & Hendry, 2019, p. 68). Our research may add to the social structure of good engineering practice for autonomous systems design, potentially informing policy and acquisition for autonomous systems for military purposes. We enhance VSD with two novel methods for eliciting values, namely a Group Decision Room (GDR) and Participatory Value Evaluation (PVE). A PVE is used to elicit values from an unorganized stakeholder group, namely Australian citizens. This is an important stakeholder group, since autonomous systems in defence do not operate in a social void, but they are being deployed in the military and on behalf of society. In the next section we follow the first step that is suggested in the VSD approach, namely conceptual considerations.

Conceptual considerations

In this section on conceptual considerations, we start with a definition of values as provided by Friedman and Kahn. They state that a value is something that “[…], a person or group of people consider important in life” (2006, p. 349). For the purposes of this paper, we adopt this rather loose definition of a value. We acknowledge that values are different from principles, but for the purpose of our research and in light of the conflicting terminology in emerging ‘lists’ for ethical AI, we do not distinguish between the two for the remainder of the paper.

We now discuss what we mean by autonomous systems, followed by existing frameworks relevant to including values in the development of autonomous systems in defence and we end with methodological suggestions regarding stakeholder selection.

Autonomous systems in defence

We acknowledge that there exist numerous definitions of autonomous systems and we provided a technical example from OECD in the introduction. For the remainder of this paper we propose a more concise understanding, and consider autonomous systems as systems that can perform actions or ‘make decisions’ with little or no human intervention, often involving some degree of artificial intelligence (AI). Furthermore, we concur that “Autonomous decision-making capability can be understood as a spectrum. At the upper end of this spectrum, there are highly autonomous systems to which we delegate both the decisional process and the implementation of the subsequent action. At the lower end, there are advisory systems to which we delegate some or most of the decisional process, but the implementation of the recommended action is the responsibility of the human in charge.” (Burton et al., 2020).

Our research focusses on autonomous systems in a military context, including but not limited to autonomous weapons systems.Footnote 3 Such systems may be used for reconnaissance, mine clearing, cyber operations, battlefield management, etcetera.

Existing frameworks

Conceptual considerations in the VSD methodology includes non-empirical methods of value gathering. Several countries and entities have suggested ethical frameworks, guidelines and practices forautonomous systems design in a military context. Table 1. lists principles developed by or in the United States (Defense Innovation Board, 2019; US DOD, 2023), the United Kingdom (UK Ministry of Defence, 2022), Australia (TAS 2023a)Footnote 4 and NATO (NATO, 2021a). As can be seen in Table 1, there is significant overlap between principles, although several stand out as they are unique.

Table 1 Overview of military AI and autonomous systems principles

The UK’s principle ‘human-centricity’ stands out as there is no equivalent principle in the list. NATO adds ‘accountability’ to responsibility. The USA has ‘equitable’, but in their explanation they focus on bias-mitigation which is an explicit principle for NATO and UK and hence these terms may be considered equivalent in their meaning. Australia is unique in listing ‘trust’. At first glance, trust comes close to reliability (UK) and reliable (USA), but there is no use of the word ‘trust’ in any of the explanations that UK or USA give for reliability or reliable respectively.

Whatever the status of the principles listed in Table 1 is, the entities that published them agree that principles and values are relevant to autonomous systems design and that developers and users should take these seriously when making design choices. How to do this exactly is less clear (Blanchard et al., 2024).

There are many more principles listed for AI development in the civil domain (Hagendorff, 2020) that could be relevant to the design and use of autonomous systems in defence, but some may not necessarily map onto a military context. We advocate for a context specific approach to values and principles for autonomous systems, similar to Liscio et al (2021) who argue for the need for context specific values for engineering intelligent agents. Our approach resonates with the Artificial Intelligence Risk Management Framework developed by the US National Institute for Standards and Technology, where it is stated that “Approaches for measuring impacts on a population work best if they recognize that contexts matter, that harms may affect varied groups or sub-groups differently, and that communities or other sub-groups who may be harmed are not always direct users of a system.” (NIST, 2023, p. 6). Principles that are highly relevant in the societal, medical or commercial domain may be less relevant, sensible, or important in the military domain. The principle of transparency for example, may be important in some organizations or societal contexts, but it may clash with the value of privacy in society, or with commercial confidentiality, or with covertness of military operations. A good source for values relevant to military systems in the context of Australia are the Australian Defence Force (ADF) values of service, courage, respect, integrity and excellence. This is a unified set of values that apply to decision making and actions across all Groups and Services in the ADF (Australian Department of Defence, 2024). But again, which values are important to the specific military context should be decided in a participatory and deliberative process with different stakeholders.

Stakeholder inclusion

Conceptual considerations include a deliberate process for stakeholder selection. The VSD method does not give much guidance on including stakeholders and to fill this gap we resort to Critical Systems Heuristics (Ulrich, 1996; Ulrich & Reynolds, 2010), which includes a method for stakeholder selection. According to Critical Systems Heuristics, researchers need to deliberately choose a certain stakeholder to be the focal stakeholder of and in their research. This is a conceptual step prior to the question of how to ‘harvest’ values from stakeholders and how to make values explicit. We identify the following potential stakeholders in the design of autonomous systems in defence: decision makers that decide to deploy the systems, soldiers that work with or alongside autonomous systems, developers and software engineers, national citizens of the countries that develop military autonomous systems, citizens of hostile nations that might be most likely encountering autonomous systems activity, politicians, lawyers, societal and non-governmental organizations, the international community, industry and businesses. Ulrich and Reynolds distinguish between the involved and affected stakeholders (Ulrich & Reynolds, 2010). Involved stakeholders are those that have a say in choices regarding the system and that can affect its design (e.g., through their expertise, through financial or legal power, or through motivational power—as is the case with clients). Affected stakeholders do not have an active say in design or operation but bear the consequences once the system is in place. Involved stakeholders could be politicians or industry partners whereas affected stakeholders could be soldiers or citizens of hostile nations. The distinction between affected and involved stakeholders is important, as it directly influences the design process by taking into account what is most important to a certain (group of) stakeholders. How an autonomous system technology is designed, depends on which, or whose, basic point of concern is chosen. For example, if the autonomous system is designed in such a way that it maximizes the value of protection of soldiers that work with the technology, the system will behave differently than when it is designed with the highest concerns for bystanders in mind. In the first case, systems design might include valuing the occurrence of false positives over false negatives in threat detection algorithms, while when bystanders are the highest concern, it could mean that under no circumstance are false positives allowed, resulting in threshold settings that increase the chance of false negatives for threat detection algorithms.

The context of our research is Australia, and since we are looking for context specific values, we limited the empirical data gathering to Australia with Australian citizens as an important stakeholder. This means that potential affected stakeholders that we identified above, such as citizens of foreign countries, were excluded from the research. Unfortunately, we were unable to identify important stakeholder groups within the context of Australia, such as those who identify as Aboriginal and Torres Strait Islanders (who might have values particularly relevant to how systems affect land, sea or sacred places), people with physical or mental impairments (who might value alternative means to communicate with and behave around autonomous systems), and members of the Australian Defence Force (that will work alongside these systems). The researchers identified these groups as highly relevant stakeholders, however, due to tight restrictions in the human research ethics approval we were not allowed to explicitly invite or identify people from indigenous, impaired or ADF communities.Footnote 5

The conceptual considerations above suggested definitions, relevant ethics frameworks and a method for stakeholder inclusion. They all inform empirical considerations that we present in the next step. Systems definitions are important for communicating with stakeholders what the technologies that are being designed entail. Existing ethics frameworks serve for inspiration and demarcation of relevant high-level values guiding the design of autonomous systems. Stakeholder selection and justification forms an important step prior to empirical data gathering and a method was suggested in the final section of conceptual considerations.

Empirical considerations

VSD does not prescribe the use of specific tools or methods for empirical data gathering. In our research we suggest and used the following empirical methods:

  1. (1)

    Semi-structured Interviews to gather values from the initial stakeholder group representatives

  2. (2)

    Group Decision Room (GDR) to elicit values from initial stakeholders and to explore how values can inform design choices in the case of autonomous systems for defence

  3. (3)

    Participatory Value Evaluation (PVE) as a method to invite the wider society (unorganized affected stakeholders) to make value trade-offs and value prioritizations.

The results of the semi—structured interviews and GDR have been presented in detail in Boshuijzen-van Burken (2023a, 2023b). In this paper we discuss the general findings of the PVE, but we summarize the GDR step below, because it served a key role in design of the PVE.

A GDR is a helpful way to harvest stakeholders’ values so they can be incorporated into the design of autonomous systems. A GDR can best be described as an anonymous computer mediated focus group, where people may physically sit together or participate online, but each individual contributes anonymously to the discussion through an online interface. Hierarchy, personalities (introvert versus extrovert), power-relations, etcetera, are likely to have less influence in the discussion than in a traditional oral meeting. An ideal number of participants in a GDR is between 15 and 20 people. The GDR that we conducted for this research consisted of 17 participants and was fully online. Participants included hardware and software engineers from industry, legal scholars, ethicists, representatives from NGOs such as International Committee of the Red Cross (ICRC) and Older Persons Advocacy Network (OPAN). The results of the GDR do not fulfil the requirement of social science data in the strict sense, as we are not after deriving statistically significant inferences or conclusions from the data since it concerns a small group of representatives; nonetheless, the results of a GDR can be considered reliable as a group outcome (Gray, 2008; Kolfschoten et al., 2011).

PVE is a value elicitation method developed in the economical sciences. It aims at mapping values in a large and diverse group of citizens. PVE has been used for large scale value mining regarding, e.g., flood protection (Mouter et al., 2021a, 2021b), COVID-19 measures (Mouter et al., 2022; 2021a, 2021b), healthcare funding (Rotteveel et al., 2022) and climate change mitigation (Itten & Mouter, 2022). The essence of a PVE is that participants can give advice to a decision-maker. Participants are effectively placed in the seat of a decision-maker. Participants are invited into an online environment where they (a) see which options the decision-maker is considering, (b) the concrete impacts of the options, and (c) they have to make choices within given constraint(s). Subsequently, participants are asked to explain their choices. Individuals’ preferences over (the impacts of) options can be determined by feeding these choices into behaviourally informed choice models and for instance can be used to rank options in terms of their desirability. In this case we used the PVE as part of a VSD research project. The strength of the method is that participants are not presented with a likert-scale of options where they can choose everything they find important, as in a utopian world, but that they are forced to make trade-offs, which reveals what people value under non-ideal, real world, circumstances.

Participatory value evaluation preparatory work and set-up

The GDR served as preparatory step to inform the PVE. Two example autonomous systems that were suggested by participants in the GDR (see Boshuijzen-van Burken 2023a, 2023b for a detailed description) were used in the PVE set-up. The first one is the design of an autonomous underwater mine sweeper and the second one is an armed drone. The GDR took place fully online via Teams and Mural software. Participants wrote on a digital whiteboard and digital sticky notes. One step in the GDR was the ‘brain dump of values’, were we asked participants to list all values that they deemed relevant to autonomous systems in general. The next step was that participants matched up an example autonomous system (e.g., an autonomous submarine) with most important values for that example system. They were also asked to indicate clashes between values and distinguish between ‘must have’ and ‘nice to have’ values. Finally, they were asked to translate the value into a norm and a required behaviour or a design requirement for the system, after we had briefed them on van de Poel’s value hierarchy (Van de Poel, 2013). For the PVE we selected the two examples of mine clearing submarine and lethal drone, because participants had provided a relatively rich set of values and design requirements on them in the GDR session. Secondly, the two examples resemble both offensive and defensive autonomous systems, and in this way, we may be able to see whether participants make different value trade-offs when it comes to offensive and defensive situations. We summarize the findings from the GDR below and refer to (Boshuijzen-van Burken 2023a, 2023b) for more details on the methodology and results of this initial stakeholder engagement process.

For the design of the autonomous mine counter submarine GDR participants suggested:

  • Reliability

  • Protection of marine life and non-combatants

  • Transparency

  • Accuracy

  • Cost

  • Safety

GDR participants provided the following values for the design of the autonomous drone that bombs targets:

  • Trust

  • Distinction

  • Control

  • Reliable

  • Accountable

  • International humanitarian law (IHL)

  • Explainability

  • Awareness of limits

  • Traceability

These values that were suggested in the GDR were used to inform the design of the PVE, together with conceptual values from sources mentioned in “Existing frameworks” Section. The PVE consisted of two design tasks that were based on Van de Poel’s (2013) value hierarchy to implement values into technical design (Van de Poel’s value hierarchy is explained in detail in “Technical considerations and operationalization of values” Section). We note that one value can be realized via multiple design options, and vice versa, that a design option can represent more than one value. Most of the value translations below were suggested by GDR participants.

The first case was an autonomous underwater mine sweeper (see Fig. 1). PVE participants were asked to make choices between four different design goals that reflect values: (1) they could choose to increase human assisted decision-making, to promote the value of human control and accountability; (2) they could choose to add sonar sensors, to promote sensitivity and the value of distinction relevant to IHL; (3) they could choose to increase use of Australia’s supply chain, to promote Australia’s self-sustainability; and (4) they could choose to increase the number of batteries of the autonomous system, to promote reliability and durability of the system.

Fig. 1
figure 1

Screenshot of design task for autonomous mine clearing submarine

The second design task is an autonomous drone that can drop bombs (see Fig. 2). Participants could make choices between seven different design features: (1) additional battery power, to promote trust and reliability of the system; (2) Video recordings of bombings, to promote accountability and traceability; (3) human assisted decision-making, to promote control and accountability; (4) geographical restriction, to prevent misuse and promote distinction and IHL obligations; (5) a swarm mode, to promote effectiveness; (6) adapter clasp for seed spreading, to promote sustainability; and (7) an emergency communication system to promote reliability and control.

Fig. 2
figure 2

Screenshot of design task for armed autonomous drone

An important aspect of the PVE is that participants make decisions under constraints, i.e., they are forced to make value trade-offs. In this case, we used a monetary constraint in the shape of a budget. Participants could only submit their design choices if they had stayed within the budget. The costs associated with the various design options were not random choices but were set in consultation with a number of subject-matter experts, in order to provide participants with realistic monetary trade-offs. For example, the option for human control included costs for human training, a pay scale at certain education level, stand-by controller/back-up, shift loadings, etcetera. Both PVEs used a monetary restriction that would allow them to choose around 25% of total available options.

Each design task was accompanied by a hypothetical yet realistic sample scenario designed in collaboration with subject matter experts, to provide participants with a potential use case of the system.

For the submarine the following was presented:

  • Imagine you are designing an autonomous submarine for clearing mines at the bottom of the sea.

  • When can such autonomous submarines be used?

For example, hostile foreign military forces may sabotage Australia’s offshore underwater internet cables. They could leave a trail of sea mines in the access route to the cables, to slow down repair attempts. When this happens, many Australians will not be able to use internet communications for a long time.

  • What are the features of the autonomous submarine that you are designing?

  • It is an unmanned vessel that can cruise under water.

  • It uses sonar sensors to detect sea mines and then blows them up all by itself.

  • It has battery power to autonomously cruise for 3 days before the batteries need to be recharged

For the drone the following scenario was presented:

  • Imagine you are designing autonomous drones for dropping bombs on targets.

  • When can such drones be used?

An example is when a hostile foreign military force would land their troops on Australian shores and overtake an Australian port. The drones you are designing can find the exact location of targets, they can assess the situation, and carry out an attack all by themselves. Drones can destroy enemy soldiers and equipment while minimising casualties to Australian soldiers and civilians.

  • What are the features of the autonomous drone for dropping bombs on targets that you are designing?

  • It is an unmanned aircraft that does not need a remote pilot for flying and navigation.

  • The drone you are designing is trained to autonomously detect and strike targets with precision (it strikes the intended target in 98% of cases).

  • The drone that you are designing can carry a bomb that can strike a medium-sized target, like a vehicle.

Results

Participants in the experiment were sampled from an online panel (Dynata), with a view to be representative of the Australian population in terms of age, gender and education. The Human Research Ethics Committee of UNSW approved our study protocol (HC220732). The experiment ran from 13 March 2023 to 22 May 2023, and a total of 1980 participants completed the PVE. We limit ourselves here to findings on a group level as they relate to value prioritizations. We discuss diversity in value patterns and diverging perspectives amongst participants elsewhere (Boshuijzen-van Burken 2023a, 2023b, 2024).

An important finding is a discrepancy with what some theories suggest, namely that moral values are universally shared and held with similar regard (Schwartz, 1992). However, the PVE shows that normative diversity is the rule rather than the exception. In both PVE tasks, there are no design choices that are chosen by everybody, and no options that are chosen by nobody. See for instance, Table 2 that shows the most chosen combination of options in the case of the autonomous armed drone. Each row represents an often-combined set of options (noting that some options were chosen as single choice, so there is no ‘combined set’), and the percentage of participants selecting these combinations. Although there is great diversity in the data, the data shows that some options are preferred over others.

Table 2 Overview of participants’ design choices for armed autonomous drone

There are strong voices in the public debates about autonomous technology that call for ‘humans in the loop’ to promote human control and accountability (Bode, 2023; ICRC, 2021). However, this study shows that the importance of having human control and autonomy is important but not a dominant value, when people weigh human control against other design choices which they value (and which are less expensive). For example, in the case of the autonomous underwater mine counter vessel, human control is the design choice that is most often not selected. It also seems less relevant to participants, as one participant explains their choice: “Minimal risk of harming people, so no strong requirement for a human in the loop.” In the armed drone choice- task the option for ‘human assisted decision-making’ is the fourth most popular option, out of seven. That said, it is not as unpopular as options that increase sustainability (seed clasp) and effectiveness (swarm mode) for the armed drone. In the case of the armed drone, human assisted decision making is also the most expensive option. And comparing human assisted decision making to the other most expensive option (‘swarm mode’), human assisted decision-making is almost 3 times more popular (10% versus 28%). And when the average share of available budget allocated to the options is considered, it is the option on which participants spent most budget.

Furthermore, design choices that increase reliability are on average preferred over the promotion of human control options in both choice-tasks. In the mine counter submarine choice-task we observe that on average participants value increasing reliability over human control. A majority of participants (81%) spend at least some extra budget on increasing reliability of the submarine. In the armed drone choice—task, the option to build in ‘redundancy in battery power’ is more often chosen than ‘human assisted decision-making'. Participants motivated their choice by referring to the distances a drone would have to travel, as one participant stated: “Short range is generally fine, except that in an Australian context 2 h flying is nothing.”

Also, one can argue that the promotion of the value of accountability is a driver for choices of participants. In the case of the armed drone design choices that increase accountability together account for 22% of total budget spent. Both ‘human assisted decision-making’ and ‘video recordings of bombings’ are popular options. In the top ten of combination of choices, six out of ten combinations contain one of or both these options. Furthermore, in the written motivations of these choices people state that the promotion of accountability is at least part of the reason for choosing the options for ‘human assisted decision making’ and ‘video recordings of bombings’. One participant that chose ‘human assisted decision making’ stated “If people are going to die from this, I want there to be a judgement call made, not just an algorithm.” Another participant that selected the recording option motivated their choice by stating: “To ensure accountability for drone actions and to allow technology redesign.”

Implications

What do those findings suggest for the design of autonomous systems in the military? Firstly, that a ‘golden list’ with values that designers of autonomous systems for defence may use to guide the design of the system is helpful at a high level, but it may not be the best way forward for those who are faced with the question of actual systems design. There exists a variety of use contexts and concrete autonomous systems that are best served by a more granular approach to values put forward by stakeholders. The PVE showed that people choose different values between systems that operate in different conditions or for different purposes. It is unclear if the difference in value trade-offs was due to the fact that one systems’ context was underwater, with little likelihood that people are present, while the other system operates in the air and likely in the vicinity of people, or if the reasons for the difference in value trade-offs was due to the fact that one system served for defensive and another system served for offensive purposes, or if there is a different reason altogether. What is clear is that different contextual or artifactual risk perceptions give rise to different value prioritization and combinations. Finally, the highly debated value of control of humans over decisions that autonomous systems make is an important value, but that people gave less priority to this value in the case of the autonomous mine counter vessel, possibly because there is a lower risk of harming people, and that the value of system reliability is often chosen over human control.

Technical considerations and operationalization of values

In this section we provide suggestions for translating values into technical considerations, which is the final VSD step. We start with Van de Poel whose work can be applied to any technological design (we followed his method ourselves in the GDR and PVE set-up), followed by a military example and we end with an exploration of value-based methods tailored to (autonomous) systems design.

Van de Poel (2013) introduces the notion of values hierarchy, meaning a hierarchy structure of values, norms and design requirements (see Fig. 3).

Fig. 3
figure 3

Van de Poel’s Value Hierarchy (Van de Poel, 2013, p. 259)

The upper layer in Van de Poel’s pyramid consists of values, while the bottom layer represents the most concrete layer of design requirements between which there exists an intermediate layer of norms. It should be noted that ‘norms’ have no univocal meaning, and depending on the disciplinary context, there exist ethical norms, technical norms, social norms, legal norms, aesthetic norms, etcetera. In this case, a norm is at its minimum the lead-up to technical design requirements and for Van de Poel ‘norms’ are referred to for all kinds of prescriptions for, and restrictions on, action. Van de Poel pays specific attention to end-norms, which are norms referring to an end to be achieved or strived for (cf. Richardson, 1997 p. 50). In design, end-norms may refer to properties, attributes or capabilities that the designed artefact should possess. “Such end-norms may include what sometimes are called objectives (strivings like ‘maximize safety’ or ‘minimize costs’ without a specific target), goals (that specify a target such as ‘this car should have a maximum speed of 150 km/h’), and constraints (that set boundary or minimum conditions).” (van de Poel, 2013, p. 258). Van de Poel’s value hierarchy is helpful in the context of our project on including values in autonomous systems design. The options that were available to participants in our PVE can be considered as end-norms in many cases, for example the objective of a human should be ‘in control’ without specifying the degrees of control, or the goal of ‘this submarine should have 40% of its parts from Australia’, or the constraint that the armed drone cannot fly over busy places and is therefore geographically restricted to certain areas.

Van de Poel’s suggestion of end-norms is a helpful intermediate for bridging the gap between values and design. We suggest adding to Van de Poel’s notion of end-norms to include the good(s) (cf MacIntyre (1981)) that the entity such as an organization, artifact, or practice is realizing. We argue that the end-norms should be understood in the broad context, to include the good(s), what is sometimes called the ‘destination’ that the entity supports to realize, or the ‘qualifying norm’ of the practice (Verkerk et al., 2015). This goes beyond mere goals and includes its ‘reason for being’. It is about the broad vision of what the context of use is ultimately about and good (in the broadest possible sense) for. For example, designing an autonomous system for detecting diseased cells in a medical healthcare context gives that system a destination beyond the goal of ‘count diseased cells’ which includes ‘contribute to patient care’. Should that autonomous system be designed for the context of academic medical research, where detection of diseased cells should contribute to the destination of ‘contribute to theoretical knowledge’, it may affect the design requirements, for example thresholds for counting a cell as diseased (Kraemer et al., 2011) and offsets, interfaces design, data handling, etcetera. We deem the inclusion of context relevant for the case of military systems design. As Van Burken and De Vries (2012) have argued, the reason for any technology in use by the military should be to contribute to the promotion of justice. In other words, when translating values into norms and then into design requirements, it is important to keep an eye on the bigger picture. Autonomous systems in defence are ultimately designed for a particular ‘good’ or ‘destination’, namely the promotion of justice.Footnote 6 In particular where systems have a role in the use of force, these systems should always function under this guiding norm. Attention to the good(s) (cf MacIntyre) can overcome the challenges that have been pointed out by Danks (2022) who argued that engineers should have an understanding of what the application does and investigating foundational questions starting from the application context.

A military example of values implemented in technology is the case of the design of the safety pin on a rifle. The value of safety leads to a norm ‘no bullet should accidently leave the barrel’, which leads to a design requirement ‘add a pin to manually block or release the trigger’. Different weapons safety mechanism designs exist, each, as it seems, having a different use or user in mind and weighing value conflicts differently, for example, a quick release mechanism over deliberate delay (see Boshuijzen-van Burken 2023a, 2023b for an elaboration on this example).

Literature on operationalizing values in the context of autonomous systems has not been overwhelmingly convincing (Mittelstadt, 2019; Morley et al., 2020; Prem, 2023). Shahin et al (2022) examined literature on the translation of values into design requirements for the case of software engineering in general. They found that most of the proposed techniques were poorly supported by a ‘tool’, nor were the tools validated in real-life projects. According to them, “ (a) It takes years before such innovations are commercially feasible and reach the industry (this is a general trend in software engineering research); (b) It would be challenging to prove their efficacy and efficiency in real-life projects; (c) Since these tools are perceived as ‘cooked-up’ in research labs rather than co-developed with the practitioners, tools uptake requires a disruption rather than diffusion of the innovation.” (Shahin et al., 2022). They concluded that the most promising way forward is to utilize value-based approaches with a degree of co-creation. A good example of a value-based approach that includes stakeholders is the VCIO (Values, Criteria, Observables, Indicators) model (AI Ethics Impact Group, 2020). VCIO is an interdisciplinary framework to operationalize AI ethics. It offers a guide to incorporate values into algorithmic decision-making and a way to measure the fulfilment of values using criteria, observables and indicators combined with a context dependent risk assessment. The authors pay explicit attention to the “context-dependency of realising ethical values, the sociotechnical nature of AI usage, and the different requirements of different stakeholders concerning the ‘ease of use’ of ethics frameworks” (AI Ethics Impact Group, 2020, (6). The authors advocate to combine high level context independent values (such as justice) with a classification for different user contexts and stages in AI development by distinguishing between process requirements and system requirements. Another good example is the Waatu tool, which is meant to support organizations to collaboratively plan, build and manage AI and autonomous systems projects responsibly (IVAI, 2022). The tool takes a whole of organisation approach with careful attention to stakeholders, ethics and responsibility.

The best attempt for a similar tool for development of autonomous systems in the defence context is the work done by Trusted Autonomous Systems (TAS), in the form of the Responsible AI for Defence (RAID) Toolkit (TAS 2023b). It is designed for industry to facilitate communication with Australian Defence about ethical and legal risks for their technologies and to point them towards relevant stages of the acquisition process. It consists of an AI Checklist (a quick set of questions to better understand the AI capability and to prompt risk identification), Risk Register (spreadsheet to record, track and allocated ethical and legal risks) and Legal and Ethical Assurance Program Plan (LEAPP) (more detailed consideration of ethical and legal risks particularly useful in the lead up to an Article 36 weapons review).

Our research could be integrated in existing tools such as the RAID toolkit, by paying explicit attention to the values we found in the PVE when going through the section of “tracking ethical risks” in the toolkit while noting that value integrations and resolving value conflicts are always context dependent.

A final remark in this section on the operationalisation of values is on value conflicts, which will likely occur when trying to operationalise values. In other words, the moment of translating values into norms is when the conflict becomes visible. For example, the norm ‘anyone should be able to see which data is used’ (for the value of transparency) conflicts with the norm ‘no one should have their data exposed’ (for the value of privacy). Different ways to resolve value conflicts exists, for example top-down or bottom-up resolutions (AI Ethics Impact Group, 2020). A top-down approach considers some values optional, and these can be violated if needed, while there are values that are higher-level and that should never be violated. A bottom-up approach takes the situation at hand as a point of departure for deciding a conflict of value, where the urgency of the concrete matter is decisive; a violation of some values is allowed, so long as the value that is preserved saves the matter altogether. We propose a third and fourth way. The third way goes back to the qualifying norm that we mentioned before, as this is the guiding norm for the practice in which the autonomous system is envisioned or deployed. Value conflicts should be resolved by considering the qualifying norm above all other norms. For the case of military context of use, all value conflicts should be viewed in light of the question which of the values supports the qualifying norm, namely promotion of justice in case of the military, best. The fourth way is inspired by Critical Systems Heuristics, which provides methodologies that could aid in solving value clashes, by reiterating the roles and responsibilities of the different stakeholders and deciding on who’s values should be justifiably prioritized in a technological design. Finally, rather than resorting to value prioritization in value conflicts, these conflicts may be resolved via innovative means in which apparently conflicting values are both honoured through a (socio-) technical intervention or innovation (Van Den Hoven 2013).

Discussion

One of the critiques of VSD is that proposed values are often formulated by researchers and designers of technology, rather than by the end-users or stakeholders of technology. We were unfortunately unable to include key end-users (ADF personnel) due to human research ethics restrictions, however, we showed a way forward, using PVE that included the stakeholder of Australian citizens. We realize that value mining from society at large may not be available to everyone, so a suggestion for an alternative way to mine societal values is through social media content analysis. A parallel investigation of the conceptual and empirical work may be done via social media and publicly available sources, such as blogs and newspaper articles. The relatively large amount of attention given to military autonomous systems in media and society lends it a good source of value mining. Novel content analysis can be done through human efforts of coding, clustering and analysing, but it can also be done by computer-based content analysis software (Su et al., 2017). Briggs and Thomas (2015) describe a helpful manner for tapping into social media for value-sensitive design research which could be adopted for our case of VSD in autonomous systems in defence. Boshuijzen et al. et al. piloted a Python based text analyser to mine texts for ethically relevant information (Boshuijzen-van Burken et al., 2022), which may be further developed for mining values from professional literature and public outlets about autonomous systems in the military.

Our research was limited to value inclusion at the level of autonomous technology (artifact), but we concur with Hussain et al., (2022) that a whole of organization approach is needed for inclusion of ethical and societal values in military autonomous systems. Hussain et al. suggest intervention points on the level of artefacts, roles, ceremonies, practices, and culture, to ensure human values are considered as primary artefacts in software development practice.

Finally, VSD of technology for deployment into military practice comes with some specific challenges related to the realities of this practice. Recognizing these challenges is important for successful design and uptake in the military. One of the challenges are adversarial threats, meaning that in the defence context there are others explicitly looking for vulnerabilities potentially or accidentally aggravated by a VSD method. A concrete example is that autonomous systems rely on network availability. Adversarial thinking forces design solutions that take into account that vital connections may be unavailable at critical moments. A sensitivity to this and other organizational realities is key for successful translation of values into design. Another challenge is that the military consists of several domains (maritime, land, air, cyber and space), and sub-practices (logistics, reconnaissance, manoeuvre, psychological operations, counter mining, air defence, etcetera) and this means that defence systems may vary greatly, and that a sensitivity for these differences is important when designing autonomous systems. Yet another challenge is the reality of joint military operations where allied partners work alongside one another or where military and non-military parties work together, such as in Maritime Border Command or National Operations. Allied partners may prioritize values in a different manner and may have designed autonomous systems in a way that replicates their own societal values that may misalign amongst allies. Even if values are harmonized at an abstract level between allied nations, such as those presented in Table 1, there may be deviations in the translation to norms and design requirements, similar to differences in operational procedures (for example, in some countries the force escalation procedure includes firing a warning shot or shooting to incapacitate, while in other countries soldiers are not allowed to fire other than to kill an adversary). In the case of working alongside allied partners, the way values are translated into design requirement may differ greatly resulting in different system behaviours. Similarly, when systems are developed elsewhere, there may be a potential mismatch between the values of the nations that acquire and deploy these systems.

A suggestion for further research is to include members of the defence organization(s) as explicit stakeholder group, as they are best placed to decide on values, norms and design requirements that take the above listed challenges into account.

Conclusion

We enhanced VSD with a novel method, namely PVE. By doing so we contributed to the existing body of knowledge around VSD. The values listed in this paper are not limited to stakeholder groups that have an active voice in either side of the debate, such as academics, policy makers and pressure groups in autonomous systems debates, but include the voice of developers, industry, and the ‘silent majority’ of Australian citizens. Our research suggests that value prioritizations differ depending on the context of use (including but not limited to air versus underwater and offensive versus defensive use) and that no one value fits all autonomous systems. General high-level frameworks can serve as high-level guiding principles, but when it comes to actual technologies, a much more nuanced and granulated approach to the incorporation of values is needed. We suggested that VSD, with its distinctive conceptual, empirical and technical steps provides a helpful methodology for including values into systems design.

Our research contributes to a greater awareness of the importance of inclusion of values in autonomous systems design in the context of defence. It presented a set of values that are of particular relevance to the Australian context. It suggested relevant frameworks and methods for values-based autonomous systems design for defence in the context of Australia—and that may serve as a blueprint for other countries -, and finally, we provided suggestions on how to operationalise values in systems design, as this is underrepresented in current literature.