Software has a pivotal role in human lives. It is instrumental in transforming dreams into reality, such as the role of software in self-driving cars and space explorations. Software is also fueling our worst nightmares. Channeling the spread of misinformation, inciting communal violence, and making life-threatening decisions for humankind are examples of how software harms society. In the twenty-first century, software is the most potent tool with humankind that nurtures and ruptures the social fabric of modern human civilization.

Software is a creation and a reflection of humans. From inception to implementation and maintenance, an individual or a group works and makes decisions on software. This process infuses the technical details of software development with the experiences and worldviews of the individuals and groups involved in its development. The resulting software is not just for the people and by the people but also a reflection of the people involved in its development.

Software is made with various intentions and serves a range of purposes. Even with the best intentions, software can unintendedly harm humankind, especially marginalized communities.

One such example is a face recognition system. Face recognition systems help humankind automatically process people’s information, individually and as a crowd. These systems, however, are shown to harm marginalized communities (e.g., black populationFootnote 1) by systematic misclassification, placing them in a disadvantaged position.

Explorations on diversity and inclusion focus on appropriately handling the distinguishing characteristics that systematically place an individual or group(s) in a disadvantaged position. These characteristics can be visible and perceived (e.g., gender [21], age [10], nationality [18], and race [15]) and subtle and not visible otherwise (e.g., beliefs and background [9]) [20]. In simpler terms, promoting diversity starts with understanding when to distinguish and when not to. For instance, one way to promote diversity is to not distinguish people on gender for job application. In another scenario, promoting diversity means creating special provisions for the visually challenged. Inclusion goes further into making everyone feel included.

This chapter offers the author’s perspective on promoting diversity and inclusion and discusses the role of software engineering research. The chapter introduces the unique position of the software industry in exaggerating diversity and inclusion issues and the unique opportunity to solve a problem of historical relevance. The author describes the two paths research can pursue, at the software and human levels, to mitigate diversity and inclusion issues and the scope of each exploration. Finally, the chapter characterizes the software engineering research focused on diversity and inclusion (with examples) and proposes a roadmap for improvement.

This chapter is written for various audiences. Software engineering research can pursue the two paths presented in this chapter to promote diversity and inclusion. The software industry (including software-defined enterprises) can use it to overview the current state of research and potential solution spaces. For the broad computer science audience and other fields interested in exploring diversity and inclusion, this chapter introduces the unique position of software systems in exacerbating diversity and inclusion issues and providing a unique solution space. While there are many paths to explore diversity and inclusion issues (e.g., policy-making bodies and government), the insights presented in this chapter are limited to what we know from the scientific literature in software engineering research. For the interested, it will be worthwhile exploring the directions mentioned previously.

A Unique Opportunity

Software engineering is in a unique position regarding diversity and inclusion. Software, on the one hand, is exaggerating diversity and inclusion issues at an unprecedented pace. On the other hand, it offers a unique solution space that never existed earlier.

Diversity and inclusion issues exist historically. Some early documented references to diversity and inclusion issues date to the 1960s [24], although the issues have existed for much longer. These issues have percolated in society for a long time and have many manifestations. For instance, Chapter 6, “Elicitation Revisited for More Inclusive Requirements Engineering,” describes diversity and inclusion issues in requirements engineering and offers advice for more inclusive requirements engineering. Another manifestation is during software development, where there is a systematic difference in evaluation depending on the gender [21] and geographical location [18] of code contributors. Part 3 of this book offers detailed insights into diversity and inclusion in development teams. Further, some issues affect a specific subpopulation (e.g., a region). In contrast, others can involve a wider population (e.g., multiple countries) [17].

With software, diversity and inclusion issues are widespread and bigger. As reflections of people, software imbibes people’s characteristics and cues that introduce biases and fairness issues. For example, a recent study shows that setting gender to “female” resulted in fewer high-paying job ads than setting the gender to “male” [4]. A similar exploration in software engineering research showed that the chances of women’s contributions being accepted are higher when their gender identity is unknown [21]. The software picks cues from its environment as well. With machine learning software, the software picks cues and amplifies what it learns, which worsens existing issues. Software systems can now inflict more harm than they would have had in the past.

The software also offers an unprecedented opportunity and a solution space to mitigate this problem of historical relevance. This unique position comes from the characteristics of software (as opposed to humans), the role of software as an intermediary in our daily lives, and the historical data generated during its development and use.

Software offers a way to enforce rules where humans fail and subconsciously introduce biases [13]. The second opportunity comes from the position of software in modern human civilization. As the invisible air around us, the software is everywhere, giving it the unique opportunity and capability to influence communications, beliefs, experience, and more. If software’s position is leveraged wisely, it can contribute to reintroducing lost values.

Software engineering offers another opportunity in the form of data to investigate and reflect on software and its development. During software development, traces of development activities are captured. Examples are version control systems, mailing lists, and issue-tracking systems. For decades, these logs with rich insights on activities, actors, and events have helped us improve software and its development [11, 19]. These traces have recently shown us an evidence-based path to investigate diversity and inclusion issues, including measuring the extent of the problem [17, 18, 21] and potential solution spaces [17]. For more initiatives in practice and education, refer to Parts 4 and 5 in the book.

This opportunity is unprecedented in history for various reasons. We now have data on actions and behavior contributing to diversity and inclusion issues. For the first time, we can study the decisions behind closed doors. The qualitative analysis helped us understand perceptions and experiences. Now with the data, we can substantiate the claims. Software data can help us go beyond perceptions, which might sometimes not reflect reality. Unlike experiments, which are yet another powerful tool to study behavior, to some extent, data mitigates the changes in behavior resulting from observation – also referred to as the Hawthorne effect.

Today, data-driven explorations promise to offer objective insights as long as the data reasonably reflect actual events. Analyzing activity log data in its current form offers a space where the phenomenon seen is closer to the actual event. However, it remains a valid risk that once these data sources are used for analysis, it changes the actions and behavior of the people. Another challenge is the nonavailability or limited availability of data. Given the topic’s sensitive nature, there is limited to no data to study diversity and inclusion.

Two Solution Spaces

Software and humans are in a vicious cycle in which they constantly learn and influence each other. To break this cycle, changes are required at the level of software and people, the two solution spaces to promote diversity and inclusion. Changes at either of the two levels will influence one another and create a cycle of changes promoting diversity and inclusion. For example, a recommender system that offers equal job opportunities to people from different demographics will foster diversity. In the long run, this is likely to change marginalized communities’ social status and contribute to changing the prevalent notions responsible for biases.

Making software systems fairer can start at any stage of software development. For software systems yet to be developed, discussions on fairness should start at the inception of an idea, leading its way into its design, implementation, testing, and use. For software systems in use, it is still necessary to gauge how the outcomes of a software system influence people.

During ideation, fairness refers to the awareness of audiences and their various needs. The better we understand the needs, the more prepared we are to propose a solution that likely works for all. However, there remains an unforeseeable future that is hard to quantify. For example, creating software for a visually challenged audience requires understanding their needs. One such need is comprehending an image. See Chapter 6, “Elicitation Revisited for More Inclusive Requirements Engineering,” for insights on how requirements can be made more inclusive.

Fairness in design choices is about balancing the stakes by making trade-offs among competing and often conflicting priorities. In the preceding example, to design a software system for a visually abled and disabled audience, there is a need to consider presenting the same information in different formats. For example, offering an alternative text for an image makes the content of an image comprehensible to a visually disabled person.

During development, the individual and group experiences and decisions are codified in source code, potentially introducing fairness issues. These issues can come through three channels: (a) algorithms making decisions, (b) machine learning components as abstract algorithms, and (c) the data from or to which it is learned or applied.

Algorithms as decision-makers, unless designed to meet the requirements of any, are likely to miss the needs of many. These differences emerge as bias against subpopulations. Machine learning components are a more complex alternative to algorithms in making decisions, and often it is unclear what rules they follow. Sometimes issues creep in from the data it is trained on and, otherwise, the data it is applied to. Even when an algorithm is unbiased, it can introduce unfairness if the data it is applied to has inherent biases.

Testing for fairness comes into play when a system is ready for use or in use. Testing for fairness looks into attributes (e.g., gender and ethnicity – also referred to as sensitive or protected attributes) to assess how software systems behave for one or more protected attributes [5]. Finally, who has access to software or its feature is another source contributing to fairness issues during deployment.

Another way to break the vicious cycle starts with the people involved in its development. As the driving force, humans have a significant role in shaping software. Therefore, with diversity considerations, it is likely to create software that caters to the needs of many. This starts with creating and promoting a fair development environment (including its process) and a diverse and inclusive workforce.

People are shaping software in various capacities. Some of these roles might be more visible, for example, developer and tester. Other roles might be more subtle, such as user and business requirements analysts shaping the features implemented in software. As a user, people consciously or subconsciously drive the need for software or its features. An architect envisions what the software may look like, and developers bring the vision to reality. There are more roles than the ones mentioned here. Collectively, these individual and group dynamics, if meaningfully controlled, can improve the software system and spearhead solving the fairness issues at its source.

The two solution spaces have their utility and purpose. Fixing software systems for fairness is a reasonable short-term goal, although it has its share of issues. First, fixing fairness issues is challenging in the current landscape of somewhat homogenous teams. It is unreasonable to expect teams with a limited understanding of the problem to devise a solution that works for all. The meaningfulness of such a solution needs to be investigated.

The other alternative is making participating stakeholders in software development diverse and devising initiatives and processes that foster inclusion. This solution space is even more challenging since we are discussing issues that have percolated for centuries and are now deep-rooted. Any attempts at solving the problem from either end will go a long way in mitigating diversity and inclusion issues.

Studies in Software Engineering

Explorations on diversity and inclusion in software engineering can be best described based on (a) the objective of a study, (b) the context, (c) the characteristics of stakeholders, and (d) the choice of data and method for exploration.

Objective: The objective of a study can be (a) to identify a problem, (b) to characterize the state of practice, and (c) to propose or adopt solutions. Today, some studies identify and report a problem relating to diversity and inclusion. For instance, a study by Terrell et al. shows evidence of gender bias by eliminating other potential explanations for an observation [21]. More similar studies offer insights pointing to diversity issues (e.g., geographic disparity in code evaluation [18]).

Studies characterize the state of practice to describe how things work in practice. This includes investigations into interactions among diversity and inclusion goals and software development activity, process, and outcomes. These explorations form the basis for understanding problem space and exploring appropriate solutions. For instance, a study on gender diversity showed that gender diversity correlates to improved productivity [23].

The solution space often takes inspiration from other fields for ideas likely to work in software engineering, for instance, anonymized peer reviews as a potential solution for mitigating biases in software engineering [14]. Another example is the GenderMag tool designed to identify diversity issues in software [3].

Context: The context for explorations on diversity and inclusion can be characterized as open source, industry-sponsored open source software systems, commercial software systems, and software-defined enterprises. Other contextual factors are the phase of software development and specific development activity. Here, the phase of software development refers to requirements elicitation, design, development, testing, and maintenance [20]. In each of these software development phases, activities such as code review [18], debugging [8], and pair programming [7] are studied to investigate diversity and inclusion issues.

Stakeholders: Each exploration addresses the problems or needs of one or more stakeholders. A stakeholder is characterized by features (visible or invisible) that uniquely define an individual or a group. These features can be identifiable, cognitive, changing over time, and role-based. Identifiable features generally refer to gender [21], ethnicity [15], and culture [1]. A particular class of identifiable characteristics includes specially abled individuals, for example, visually challenged [12]. Cognitive features include aspects such as ethics [2] and personality [6]. Features that change with time are age [10], experience [10], and status as a newcomer [16]. Finally, people in various roles include users, developers, and managers.

Data and methods: There are quantitative explorations based on traces of development activities inferred from archival data and its associated metadata (e.g., [19]). There are qualitative explorations (in the form of surveys and interviews) soliciting the perceptions and experiences of participating stakeholders (e.g., [22]). Other explorations follow a mixed-methods approach, use ethnography, or conduct experiments [20]. Ultimately, these solution spaces generate awareness and offer recommendations and tools that help identify and understand a problem and propose solutions.

The Road Ahead

Data: To understand diversity and inclusion issues, we need data for analysis. While the software industry may have information on some attributes relevant to diversity studies, it is only the case for some. For instance, asking about an employee’s ethnicity at work might not be legal. Care must still be taken when we have access to the data that can offer insights since observation will likely change the behavior.

Some studies derive information such as gender [21], geographic location [18], and ethnicity from signals [15], but this comes with its limits. These automated solutions miss nuances and sometimes reinforce stereotypes running in society. Therefore, diversity and inclusion must be investigated with great care. For more on ethics, refer to Chapter 9, “The Role of Ethics in Engineering Fair AI (and Beyond).”

Intersectionality: Most explorations in software engineering have focused on understanding problem space in one dimension. It needs to be made clear how improving diversity in one aspect influences another. For instance, if we are trying to improve gender diversity, are we also improving diversity in other forms?

A common notion is that improving diversity in any form promotes diversity in other forms, but preliminary evidence suggests the opposite. For instance, a recent study showed that teams diverse in gender are not necessarily diverse in geographical location and vice versa [17].

This raises an even more important question: whose problem do we solve? This question goes beyond the discussion on gender and location into deeper spaces where solving the problem for one person creates a problem for another. It only gets complicated from here, with software now having an uncodified part (courtesy of machine learning) that continues to evolve. This ever-changing element further complicates our understanding of fairness, diversity, and inclusion.

One way ahead is investigating how solving a problem for one subgroup influences another. This way, we facilitate making informed choices and trade-offs otherwise. This information is relevant since it generates awareness of what we can improve and acceptance of what we cannot.

While we previously discussed limitations relating to data, these issues increase multifold for investigating trade-offs. From the given data sources, it is often difficult to gauge the influence of all the characteristics that interact. Other alternatives, such as conducting experiments on humans, are challenging. The closest we have is conducting experiments on the student population. Still, studying a problem on a student population in many situations is not feasible. For instance, understanding how global software engineering teams collaborate might be challenging to replicate based on student population, even if globally distributed. Chapter 12, Exploring Intersectional Perspectives in Software Engineering Through Narratives,” presents explorations on intersectionality using narratives.

Ethics: Most problems arise because of the topic’s sensitive nature. With studies on diversity and inclusion, considering the ethics of doing a study becomes as important as the objective of bringing positive change in society. Since these goals can often be conflicting, studying and meaningfully exploring the choices is hard.

Diversity and inclusion: Improving diversity is essential; doing it meaningfully is more. Most explorations have been trying to understand the problem space and how to quantify it. Future exploration should now focus on finding solutions but also on looking into inclusion. This is as important since there is no meaning to improving diversity without doing it meaningfully or without making attempts for inclusion, which is a much more complex problem. Chapter 10, “Beyond Diversity: Computing for Inclusive Software,” provides more details on the subject.

Sustainable solutions: Another aspect to consider is the sustainability of a solution. Most current solutions are explored in isolation. In practice, however, for a solution to be sustainable, it must balance business needs and the needs of society. Only then are we minimizing the potential for harm while maximizing the interests of software, society, and industry.

Inspiring other fields: Amid all these challenges and limitations in software engineering, we still have something that can inspire research in other fields: software and data. We hope that what software engineering can do can inspire other fields to gather a closer understanding of their subject. Future research can explore novel ways to enable us to study relevant problems to solve diversity and inclusion issues. This could come either from software solutions or guidelines for building software solutions.

Takeaways

  • Software is a creation and reflection of humans.

  • For a better society, have better software. For better software, have a better software development team.

  • Unless designed for the needs of any, a software system will miss the needs of many.

  • Development activity traces are our unique opportunity to understand diversity and inclusion issues in the wild.

  • Improving diversity is important; doing it meaningfully is more.

  • Sustainable solutions to improve diversity and inclusion are pragmatic and account for business needs.