The decisions we make are typically influenced by our principles and values, or our ethics. Ethics guide us toward what we believe to be the best course of action. This decision-making process is particularly important in the context of software development, as our society relies on software for critical systems and increasingly relies on artificial intelligence (A)–driven software in these systems. Ethical decisions made by software developers of these systems can therefore have amplified impacts. Thus, it is essential to understand and support ethical practices in software development.

Rarely do software developers actively think about or openly discuss whether their actions are ethical. However, whether or not developers consider ethics when making decisions, their decisions often have quite tangible impacts on society that regularly capture media attention. By examining the software development process through the philosophical lens of ethics, we can better understand the types of decisions that software developers make and how to better support them in refining and applying their ethical principles.

We argue that diversity, equity, and inclusion are three core principles that ought to be considered under the umbrella of ethical software development. If we encourage ethical practices, we are inherently making foundational strides toward these principles. In this book chapter, we will bring these aspects to the forefront of our discussion of ethics in software development. More specifically, in this chapter we will

  • Examine a case study that demonstrates the importance of ethics in an AI-driven software development environment by highlighting the potential harms when unethical software persists.

  • Explore ethics from a philosophical perspective and discuss how existing frameworks can be applied in the context of software development.

  • Discuss existing efforts in understanding and supporting ethical decision-making in software development, specifically how software developers think about the role of ethics in their decision-making and signals they use to indicate when and how they are thinking about ethics.

  • Outline existing tools and techniques that support ethical decision-making and other ways we can work toward explicitly considering the ethics behind decision-making throughout the process of building and maintaining software.

Making a Case for Ethics

There are many examples that help emphasize the importance of making explicit ethical considerations when building technology [9, 13]. Anecdotes range from the “Dieselgate” scandal where Volkswagen vehicles were programmed to evade emission regulations [12] to racial bias in algorithms and software used in criminal justice [1, 5, 8, 21]. To illustrate the value of explicit ethical considerations, in this section we will examine one of these many case studies. Namely, we will examine the work done by Obermeyer and colleagues to identify racial bias, and ultimately improve equity, in healthcare tech [10, 14].

Despite the numerous examples of tech gone wrong, more and more we are seeing technology injected into everyday societal interactions and decision-making processes. There is no silver bullet to solving the problem of biased technology. However, there are actionable ways we can work toward reducing and eventually eliminating inequitable outcomes.

“Dissecting Racial Bias”

In 2019, Obermeyer and colleagues identified bias against Black patients in a diagnostic and treatment algorithm used widely across the United States [14]. Collaborating with an academic hospital, they collected and analyzed the algorithmic risk scores for 6,079 patients who self-identified as Black and 43,539 patients who self-identified as White. Their data covered 11,929 and 88,080 patient-years, respectively, where 1 patient-year refers to the data that was collected from one patient in one calendar year.

The data and various analyses conducted by Obermeyer and colleagues pointed to a higher need for healthcare among Black patients. In other words, they found that Black patients exhibited more “illness burden,” or a greater incidence rate of chronic illness. However, their analyses also found that the algorithm in question erroneously assigned similar risk scores to ill Black patients and healthier White patients.

The root cause of this bias was the fact that the algorithm used healthcare costs as a proxy for health risk. Black patients with higher risks for chronic illness had similar healthcare costs when compared with healthier White patients. One explanation for this was a correlation between race and income. Black patients in the study were more likely to have lower incomes and thus less likely to have access to and engage with the medical system (even when insured). This lower engagement has been studied and linked to factors such as reduced trust and differences in access to healthcare, among others.

So what action can we take to help balance the scales?

Seeking a Solution

The results of the studies conducted by Obermeyer and colleagues point to the impact decision-making can have on technological outcomes. By choosing cost as the predictor, given the fact that Black patients are generating fewer medical expenses, accurate predictions yielded inequitable outcomes.

While cost prediction as a proxy for health risk score accuracy seems reasonable, according to Obermeyer and colleagues, we could and should be doing more. One suggestion was that perhaps it makes more sense to focus on “future avoidable costs,” which would be those associated with emergency care and hospitalization. Or maybe we move away from predicting the costs of healthcare and instead use a more direct measure of health, such as the number of active chronic health conditions.

To better understand the potential for label choice to reduce bias in this case, Obermeyer and colleagues conducted a series of experiments that started with the development of three new models. These models are used to predict the following outcomes:

  • Total cost in year t: Overall, how much will patients spend on healthcare in a year?

  • Avoidable cost in year t: These are just the costs associated with emergency room visits and hospitalizations.

  • Health in year t: This is determined based on the number of chronic condition flare-ups in year t.

They trained each model on a random two-thirds subset of their data and tested their models on the remaining one-third. They also excluded race from the training data. The three models all performed reasonably similarly at predicting the three outcome variables (total cost, avoidable cost, and health in year). However, depending on the model chosen, the composition of the highest-risk group varied drastically. For the models trained to primarily predict the healthcare costs of patients, Black patients comprised 14.1% of the highest risk category. On the other hand, for models trained to primarily predict chronic conditions, Black patients comprised 26.7% of the highest risk category.

This variation in outcomes based on label choice could be seen as an insurmountable challenge and inherent feature of learning algorithm use. We see it as an opportunity, as Obermeyer and colleagues did, to be more informed and take intentional action to actively consider the ethical implications of the choice we make while developing software.

Following the publishing of these findings, the authors joined forces with the company that developed the original cost predictor algorithm to make these improvements in practice [10]. While not always an easy or straightforward process, decision-making is the control developers have over the outcomes of the technology they produce regardless of the domain.

The Philosophy of Ethics: Defining “Ethical”

Notions of ethics have been discussed by philosophers since long, long, before the Obermeyer study and the rise of modern software development and the increased use of AI. In this section we will briefly summarize a few of these pre-existing philosophical frameworks and explore how they can be applied in the context of software development. Here, the key question we’ll focus on is one of normative ethics: what constitutes “ethical” behavior? Three prevailing philosophical theories that approach this question are deontology, consequentialism, and virtue ethics.

Deontology: The theory of deontology, or duty ethics, states that behavior can be deemed ethical or unethical by strictly applying some universal set of moral rules or laws, for instance, “do no harm” or “do not steal.” Under this theory, we can disregard situational factors such as an individual’s intent as well as the consequences of their actions, so long as their behavior adheres to a particular set of moral rules. Of course, philosophers disagree over which moral rules should be used and whether they should be derived from a divine power, nature, or some other source. In the context of software development, many organizations, like the ACM, have begun to capture and describe what could be considered a set of domain-specific moral principles. For instance, the ACM Code of Ethics includes principles like “Respect privacy” or “Be honest and trustworthy.”Footnote 1 (For a more thorough discussion of codes of conduct, which have become increasingly pervasive in open source software development, refer to [Chapter 17, “Codes of Conduct in Open Source”].)

Consequentialism: The theory of consequentialism argues that the consequences of our actions are more important than any set of rules or laws. Positive actions are those that result in some benefit to the actor or to society. This theory is tied to notions of utilitarianism, or the argument that actions are good if they maximize benefit for the majority of people. A common critique of utilitarianism is that actions that consistently benefit the majority may also consistently harm minorities. In the field of software engineering, we might view grey-hat hackers (security engineers who identify vulnerabilities in systems, sometimes without prior authorization) as consequentialists. Grey-hat hackers may not subscribe to rules like “do not hack.” Instead they could argue that the consequences of their actions (revealing potentially harmful vulnerabilities before malicious actors are able to exploit) provide greater benefit to society.

Virtue ethics: The theory of virtue ethics can be traced back to the ancient Greek philosophers Plato and Aristotle. In contrast with deontology and consequentialism, virtue ethics claims that ethical behavior stems from who we are as people, rather than a set of rules or the consequences of our actions. This perspective argues that it is more important to have and to develop good character over one’s lifetime. Under this framework we might think of individuals as “good” or “bad” software developers.

Understanding Ethics in Practice

In this section, we outline previous studies that empirically examined the state of how practitioners view ethics. Here we focus both on previous works that examined AI-driven development practices and more general software development practices. In contrast to studies that aim to evaluate interventions, which will be discussed in later sections, research in this section studied the question of ethics more broadly.

Much of the limited work in this space has been conducted by Vakkuri and colleagues [17, 18, 19, 20]. In a series of studies, this research group has conducted case studies [17, 20], semi-structured interviews [18], and a survey of practitioners [19]. Their findings characterize how software developers implement (or disregard) ethics in some types of AI-driven systems. For instance, in startup-like environments, Vakkuri and colleagues report that developers take responsibility for issues related to software development, such as finding bugs, and they generally care about ethics on a personal level. However, little is done to tackle ethical concerns that arise during product development [18]. A separate study reveals a similar disconnect in the development of autonomous cyber-physical systems – developers unanimously indicate that they consider ethics useful to their organization, but also unanimously report that their practices do not account for ethics [20].

In another attempt to better understand how developers are thinking about and applying ethics in practice, specifically in the context of AI technologies, Vakkuri and colleagues surveyed over 200 developers across over 200 software companies to gain insights on the state of practice in AI ethics [19]. Their sample included companies that build AI-enabled technologies as well as those that do not. The survey asked developers to evaluate the importance of various ethical concepts, such as transparency and responsibility, and recount experiences dealing with software unpredictability. Their findings suggest that while ethics is not completely absent from the software landscape, the state of practice is a mixed bag and mostly immature or undefined.

Most recently, Lu and colleagues conducted an empirical study with 21 practitioners at an Australian research agency to better understand how AI ethics is being considered and applied in practice [11]. They found that while AI practitioners are sometimes explicitly taking ethics into consideration, they lack guidance on how to operationalize ethical principles. Based on their findings, as well as other existing efforts, they offer a template, or list of patterns, that AI practitioners can use to better integrate ethical practices into their work.

While there has been a recent increase in interest around ethics in software development due to the rapid advancement and integration of AI technologies, research on ethics in software development has been happening for decades. Much of this work involved case studies aimed at better understanding the role of ethical decision-making in software development and how to adequately support it. For example, Stapleton studied the effects of not making ethical considerations in large-scale software systems and found that ethical issues may be more complex than they seem but that lacking ethical considerations can have an impact on project outcomes [16]. Others have also conducted case studies to better understand ethics in practice in various domains and contexts [2, 7, 15] and studied the effects and implications of code of ethics in practice [6, 13].

Supporting Ethical Decision-Making

Researchers and practitioners have proposed numerous interventions to better support the ethical software development, most of which aim to support the development of AI and machine learning (ML) software systems. Some of these contributions take the form of actionable frameworks and guidelines, while others are tools that can be used for various tasks throughout the development process. In this section, we outline some of the many existing contributions to ethical software development practices. All of the works discussed in the following, along with others, can be found in our short paper on ethical software development practices [9].

Ethical Frameworks

Over the years, there have been numerous contributions to ethics in the form of frameworks, principles, or guidelines. For many existing efforts, an important goal to achieve is ethics by design. Prior efforts have proposed the concept of “ethical by design” and provided some best practices in general and specific domains, such as natural language process (NLP) systems. The frameworks that emerged to accomplish the goal of ethical by design vary. Some propose new entities that can be integrated into existing processes to explicitly review and support ethical considerations from algorithmic design all the way to system design.

Other frameworks that aim to support ethical design take a more stakeholder-centric approach that centers on effectively integrating stakeholders for identifying and addressing potential ethical concerns. This includes an ethical by design manifesto, which outlines principles for supporting various software stakeholders when attempting to integrate ethical concerns in the design process. Considerations in the space of ethical by design frameworks include offering alternatives to support shared, decision-based usage and designing through empathy for users. All of these efforts aim to increase accountability and responsibility for potential impact on users and other stakeholders when designing software systems.

While there is a concerted effort to integrate ethical considerations before development, some existing frameworks aim to support ethical considerations throughout the entire software development pipeline. These frameworks provide guidelines to be followed before, during, and after system development.

Many of these efforts centered on supporting both ethical decision-making and providing practitioners with insights into the consequences of their decisions. A specific example of this is the “ethics-aware software engineering” framework, the steps for which are depicted in Figure 9-1. This framework centers on a more exhaustive view of ethics beyond just artificial intelligence technologies and the engineers developing the software. It starts with the articulation of ethics requirements and organizing them into an ethics specification, followed by implementation of both software and processes centered on that specification and lastly verification and validation of the software against the ethics specification. As implied by Figure 9-1, this is intended to be an iterative process.

Figure 9-1
A cycle diagram denotes the sequence between articulation, specification, implementation, verification, and validation. It also indicates the following values of E. E 0, ethics knowledge. E 1, awareness. E 2, conscious valuing. E 3, Transparency.

Visualization of the methods from ethics-aware software engineering [3]

The researchers who proposed this framework describe four “enablers” that can be used to facilitate the adoption of ethics-aware software engineering:

  • Ethics knowledge is required for each phase of the framework and refers to the process of specifying what is and is not considered ethical. This speaks directly to the articulation and specification portions of the framework.

  • Awareness connects practitioners to ethical issues and their potential impacts (which as we’ve mentioned is an important step in realizing ethical decision-making).

  • Conscious valuing goes beyond awareness to placing value behind ethical issues that pertain to the software system. It’s generally after adding value that requirements start to form and specifications can be documented.

  • Transparency is important in software development to ensure that the behavior of the artifact and the development processes align with specifications and ethical requirements. This is achieved by making both the processes and the artifact’s behavior visible for validation.

Another specific example of an ethical framework that targets AI-based systems is Australia’s AI Ethics Framework [4]. Their framework is separated into two components: core principles for AI and toolkits for ethical AI. Core principles range from the common mantra “do no harm” to promoting regulatory and legal compliance of AI technologies. The framework’s focus on implementing ethical AI outlines tool support that ranges from assessing impacts and risks to supporting collaboration and consultation.

Efforts to develop frameworks and guidelines that can be applied in practice are a meaningful step toward realizing ethical software practices. However, studies have shown that the existence of guidelines may not be enough to effectively support ethical decision-making. In both research and practice, there have been efforts to bridge the gap between ethics principles and in-practice. Much of the effort in this space has been centered on connecting principles to action. For some frameworks this means grounding guidelines in strategies from relevant domains to support actionability. For others, it means mapping elements of the framework to relevant information, such as concerns and actions, that directly point to rationale and course of action. The goal of these action-focused frameworks ranges from individual developer support all the way to industry-level support.

Ethics Tools

Separate from the ethical frameworks that continue to emerge, there is also a steady increase in the tools that are being developed to support ethical software development practices. Many of these efforts have focused on increasing software fairness, and most aim to support development of AI-driven systems.

One of the first published efforts at providing fairness tooling was FairML,Footnote 2 a toolbox aimed at mitigating bias in black-box machine learning models. FairML provides support for evaluating the effects of inputs on a model’s decision-making to determine the effects on fairness.

In 2018, IBM introduced AI Fairness 360 (AIF360), a Python toolkit for measuring and mitigating bias in machine learning models.Footnote 3 The toolkit provides an exhaustive and extensible set of open source models and algorithms, along with fairness metrics for models and datasets. Similar to AIF360 is Aequitas, a Python toolkit for systematic auditing of model fairness.Footnote 4 As with most fairness tools, Aequitas was designed to be used by data scientists; however, it was also designed for use by policy makers. It also provides tooling for analyzing bias in datasets and determining optimal metrics for a given situation.

In the same year, we saw the introduction of Themis, the first tool designed to test any kind of software for discrimination.Footnote 5 Similar to other available fairness tools, Themis works based on common definitions of fairness. In contrast, Themis allows for measuring and detecting bias in software separate from any model that may (or may not) be integrated into the software. Themis also generates tests that engineers can take advantage of in their own test suites.

The contributions to this space, specifically with respect to ethics in AI and ML software systems, continued in the years to come. This includes tools like fairkit-learn,Footnote 6 FairVis,Footnote 7 Fairlearn,Footnote 8 FAT Forensics,Footnote 9 and the LinkedIn Fairness Toolkit (LiFT)Footnote 10 that support detecting and measuring bias during model training and selection. LiFT, introduced in 2020, is a Scala/Spark library that supports measuring and mitigating bias in large-scale machine learning workflows. Unlike other fairness tools, fairkit-learn, FairVis, and Fairlearn use visualizations to support the ability to explore and discover biases in machine learning models. Fairkit-learn and Fairlearn provide additional unique features, such as interactive comparison and fairness and performance tradeoffs. FAT Forensics is another unique tool that also supports the inspection of accountability and transparency aspects of machine learning software.

As we have outlined, there is no shortage of interventions available for attempting to support ethical decision-making during software development. Check out our paper on ethical practices to learn more about these efforts and others [9]. While none of these solutions provide a “silver bullet” for the problem of ethical decision-making in software development, collectively they provide hope for a more equitable technological future.

Summary

In this chapter, we made an argument for ethical decision-making as the umbrella over diversity, equity, and inclusion efforts. For those who skipped to the credits, let’s recap :

  • Ethical concerns, such as fairness and safety, are addressable when explicitly considered.

  • There is more than one way to define ethics, but all definitions are centered on the actions taken by an individual (or organization) and why.

  • While ethics has a long history in the context of computing, most of the recent efforts have focused on ethics when developing machine learning– and artificial intelligence–based software systems.

  • Research has provided some insights into ethical software development practices (though much more is needed to understand and support ethics in practice).

  • There are numerous frameworks and tools available to support ethical software development practices, many of which target components of AI-based software systems.

By encouraging and supporting ethical decision-making with concepts like “ethical by design” and frameworks like ethics-aware software engineering, we can work toward realizing ethical software development in practice.