The research team’s exploration of data ethics management included an inquiry into the risks—to individuals, groups, and society as a whole—that such management efforts sought to address. Nearly everyone we surveyed and interviewed acknowledged that business use of advanced analytics poses such risks, and that it is important for companies to address them. The survey and interview components of the research project each utilized small samples shaped by selection bias, so they can tell us only about a specific set of companies. Still, their consistency on the question of risks is striking.Footnote 1 At the time that we collected our data, most companies that talked publicly about their use of advanced analytics focused on the valuable insights that it produces. The survey and interviews suggested that this rosy view was only part of the picture and that some of the more sophisticated companies, at least, also recognized the very real threats that these activities create.

While survey respondents acknowledged a broad range of risks, they reported that their companies pay far more attention to some risks than others, as illustrated in Fig. 3.1.

Fig. 3.1
A grouped bar graph plots percentages versus a range of risks. The highest data are as follows. Privacy receives a great deal of attention. Manipulation receives some attention. The displacement of workers is not relevant or has not received attention.

Corporate attention to risks from advanced analytics

Eighty percent of respondents said that companies pay a great deal of attention to the privacy risks that advanced analytics creates. Nearly half said that businesses pay a great deal of attention to risks of discrimination or bias, unfairness (e.g., predictions that might bring undue harm to consumers or employees), lack of transparency or accountability (e.g., through opaque algorithms), and, to a slightly lesser extent, to errors in decision making. On the other hand, the respondents reported far less attention to risks of manipulation (e.g., of consumers or users of a service), and an especially low level of attention to risks of worker displacement through automation, which most respondents rated as essentially irrelevant. Interestingly, while more than 60% of respondents saw risks of manipulation as relevant to business use of advanced analytics and AI, they most commonly rated this as having received just some attention rather than a great deal of attention. This is surprising given that the highest-profile breach of data ethics—the Facebook-Cambridge Analytica incident—involved the use of advanced analytics to manipulate voters.

Our interviews allow for a closer look at how businesses perceive these risks, as well as an even broader array of concerns about how business use of advanced analytics and AI could be harmful to individuals and societies.

3.1 Privacy Violations

Companies can use advanced analytics to take seemingly innocuous surface data about people and infer highly sensitive information from it with high levels of accuracy. For example, researchers at Cambridge University were able to take a person’s Facebook likes and infer their gender, sexuality, age, race, and political affiliation “with remarkable accuracy” based solely on these surface data (Rosen 2013). Predictive analytics thus poses a profound threat to personal privacy (Rubinstein 2013). The interviewees expressed keen awareness of this threat to privacy. As one remarked: “You’re learning my weaknesses or learning my pregnancy status, you’re learning whether I’m gay, you’re learning intimate information about what I do in my home. Is it ethical for you to be doing that even though your policy said you do research and we collect that information for product improvement?” (Interviewee #19).

The interviewees distinguished between different types or levels of privacy invasion. Some predictive insights feel “creepy,” such as when Facebook inferred which of its users were Jewish and sent Rosh Hashanah (Jewish New Year) greetings to them (Interviewee #23). Other insights are more invasive and can cause severe embarrassment, distress or even danger. For example, some gay teens have been outed to their parents as a result of their receiving gay-themed advertising from companies that inferred their sexual orientation (Interviewee #23). Finally, companies may deny people opportunities for jobs, loans or other important life opportunities based on predictive insights about their physical or mental health status, sexual orientation or other highly personal attributes. One interviewee gave the example of a producer of smart toothbrushes that faced economic pressure to sell household tooth brushing data to insurance companies who could infer risk of heart disease from it. “This might go to future insurability of the kids or payment for pre-existing condition of the adults. My point is, the world is changing and measurement or observation of us, which is happening, this is the way it all works, is very, very important. We've got to decide what the rules are now. Right?” (Interviewee #6).

3.2 Manipulation

Data scientists can employ advanced analytics to infer people’s vulnerabilities. This can allow bad actors to manipulate, or even exploit, these individuals. For example, a business might predict that an individual is likely to experience early-stage dementia and target the person with predatory loans intended to take advantage of her diminished, but undiagnosed, mental state. Or, as actually happened, a company such as Cambridge Analytica might take people’s Facebook “likes,” use them to infer their personality types, and then target them with political advertisements that appeal to their unconscious in ways that they find hard to resist (Rosenberg 2018).Footnote 2 One interviewee saw such manipulation as a growing issue:

[P]eople are becoming more sensitive to some of the risks that I might put into the category of being unfairly manipulative, or kind of unfair in some way. That might be using predictive analytics to sell people things they don't need or can't really afford. Or, targeting people based on vulnerabilities, whether it's age, or cognitive abilities, or some other disability . . . . When you see them happening . . . people recoil as slimy and nobody wants to be that . . . . When do those lines get crossed? So that's not always obvious. There's certainly a sensitivity around that. (Interviewee #12).

The difficult questions lie in identifying the point at which marketing becomes unacceptable manipulation, or even exploitation. An interviewee from the retail industry talked about how they approach this issue:

How much imputation can you do before you're actually manipulating and defining the behavior and causing the behavior, rather than responding to it? . . . I've said this a lot to our marketing teams. I was like, "so long as you are persuading." In your gut, [if you] know that you are persuading and providing an offer, and something of value -- you're good. The moment you feel that you are manipulating, you've gone too far, and we need to have a conversation. (Interviewee #17).

This gut-level, know-it-when-you-see-it approach to drawing the line between marketing and unacceptable manipulation leaves a great deal of room for interpretation.

3.3 Bias Against Protected Classes

The law distinguishes between disparate treatment and disparate impact discrimination. Disparate treatment occurs when one intentionally and deliberately disadvantages another based on a protected characteristic (e.g., race, or gender). Disparate impact occurs when a policy or practice that is neutral on its face disproportionately and negatively affects a group of people defined by a protected characteristic (race, sex, religion, etc.) where there is no legitimate business necessity for the practice, or where there is a legitimate business purpose but there is a less-discriminatory way of achieving it.

Advanced analytics and AI can produce disparate treatment. For example, a company could infer someone’s protected characteristic (e.g., pregnancy), and intentionally discriminate against the person on this basis.Footnote 3 The more likely scenario is for the use of these technologies to produce disparate impact discrimination. For example, reliance on training data that has itself been shaped by past bias can produce a model that replicates and perpetuates that bias. Amazon ran into this when it tried to develop an AI tool that could separate viable from non-viable resumes (Dastin 2018). It trained the tool on the resumes of existing Amazon employees most of whom–likely due to pre-existing bias in the technology field–were male. The tool accordingly learned to reject applicants whose resumes identified them as female (e.g., by listing an all-women’s college). Amazon discovered this problem early on and, unable to fix it, ultimately abandoned the project. But bias in the training data can be subtle and many companies may miss it.

Harmful bias seemed to be one of the top, if not the top, concern of the interviewees:

Algorithmic discrimination is a top tier issue for me and my group, and I've made it a priority. What I mean by that is to work, and help, and focus, our engineering teams on evaluating outcomes as we build out especially our machine learning portfolio. You're never going to be able to be 100 percent positive, in a testing environment, that your algorithm isn't creating some disparate impact. That's very difficult to do . . . How do you get data that doesn't have a lot of bias in it? That's also tricky, but there's some data sets that we all know to have tremendous bias in it, so maybe steering away from those insofar as you're training the models might be helpful, right? (Interviewee #18).

Increasingly, companies seek to address the problem of algorithmic bias by seeking to identify, and either not use or modify, biased data sets. This is an important strategy. The ethical question that the interviewees posed was how far to go with this. Specifically, do companies have an obligation to “fix” long-standing social inequalities that are accurately reflected in the training data? For example, should a facial recognition tool that learned it could identify gender in part by whether the person was standing in a kitchen (women were more likely to be in the kitchen) deliberately ignore this finding? (Interviewee #19). If women, through their online behavior, express less interest in certain high-paid jobs than men, should a company nonetheless advertise the jobs equally to both women and men? (Interviewee #19). Should the company ignore or alter the training data in these cases, or modify the conclusions that emerged from it? The interviewees talked with their data scientists about this question. As one explained, “[t]he concern is now you’ve taught this thing, this code, to be biased. On the other hand, do they have some obligation to have the algorithm be less accurate... do you want me to pull [those data that are the product of bias] out? So these sorts of questions are being asked of us by the AI folks: ‘We’ll figure out how to do or not do... but tell us when and where prediction is discriminatory in a way that is to be deterred.’” (Interviewee #19).

Another grey area was when, if ever, it is acceptable to use a protected characteristic in algorithmic decision-making. For example, when data shows that different racial or ethnic groups have different preferences, is it appropriate to take this into account in marketing to the members of these groups? An interviewee from the retail industry provided an example:

So we know that there's different body sizes, or different body types perhaps, for different ethnicities. You might need wider-thighed jeans. We're conscious of that. And again, this is just matching the customer with what they need. So in that case, if we have a special on jeans and we want to make sure they're the right jeans, that ethnic code might actually be important. (Interviewee #17)

These interviews raise deep and interesting questions about what a society that values equality and justice should look like, and about what companies should do to try to achieve that vision. They suggest that at least some corporate privacy professionals and data scientists are discussing these issues but are doing so without the benefit of well-developed tools, resources or ethical frameworks that could help them navigate the grey areas.

3.4 Increased Power Imbalances

Businesses that employ advanced analytics to achieve highly accurate insights into their customers can use this to build an advantage over them. For example, a company could infer the highest price that each customer would be willing to pay for a given good or service, and then charge the individual that price. This would allow the company to capture all the gains from trade. Additionally, corporate use of advanced analytics to determine eligibility for loans, jobs or other important opportunities can entrench existing inequalities. If more privileged applicants are more likely to possess the attributes (proxies) that predict job success or loan repayment, the algorithm will more likely select them for these opportunities. This can reproduce existing hierarchies and further lock the poor into poverty. Advanced analytics and AI can further enable companies to segment groups into much finer categories than was previously possible. This can have social and distributional effects. For example, it can undermine the pooling of risk that has long been one of the social functions of insurance. In each of these ways, the increased use of advanced analytics and AI can produce, and reproduce, inequality.

3.5 Error

Inaccurate data or faulty algorithms can produce erroneous predictions. In the marketing area, such errors can result in annoyed or dissatisfied customers (Interviewee #12). In the government context, the stakes can be much higher. As one interviewee recounted: “Our number one risk is if someone is killed because of our analytics. We’re working with the military, we’re working with intelligence and law enforcement, and I’ve impressed this on the engineers a number of times, you’re pointing a loaded gun at someone basically. Are we 100% confident in the analysis that we’re supporting here, and if we’re not, then the consequences are that level of seriousness.” (Interviewee #10).

3.6 Opacity and Procedural Unfairness

While it is true that algorithmic decision-makers can make errors, the same can be said for human decision-makers. The key distinction between algorithmic and human decision-making is not the former’s capacity for error, but rather its opacity and imperviousness to challenge. For example, where a company determines through advanced analytics that an employee would not succeed in a higher position and denies the person a promotion, the employee would have no way to know what data or algorithm had resulted in this determination, and no way to challenge them (Rubinstein 2013). Such algorithmic determinations are a “black box” as far as the individual is concerned. (Pasquale 2016). In some advanced machine learning, even the company or other decision-maker may not understand how the technology arrived at its determination. The risk to the individual, then, is that machine-driven decisions deny people the core procedural rights—transparency and the right to be heard—to which they are entitled when others are making important decisions about their lives. One interviewee articulated this risk:

In this case, if a harm occurs, there is no mechanism to even understand why suddenly am I on the No Fly List. . . . How did I get on the No Fly List? There is no mechanism to ask. You will be told, “[it’s] none of your business, you simply can't fly anymore.” . . . What if [the list placement] was because in the third generation of processing, where they were not using data about me but data inferred about me, something got in there that was a horrible inaccuracy or trigger and now it is perpetuated because suddenly, it's no longer about the data about me, it's about data that has been inferred about me. Some risk score. And there is no mechanism to actually understand why [it happened] or to have [the data] corrected. (Interviewee #21).

3.7 Displacement of Labor

Advanced analytics facilitates increased automation which, in turn, can displace the existing, human labor force. As one interviewee explained:

The thing that worries me enormously in this way is driverless cars. . . . You’re going to put people out of work: trucking, cab drivers, low skill workers, people who aren’t going to be able to get other jobs and I don’t think the industry thinks it has to care about that. The speed at which it’s developing these things, if it builds a driverless car that works really well and starts replacing everybody before society is able to figure out what are we going to do with all these people that it’s displaced . . . that’s hugely irresponsible. That’s the kind of thing that topples governments, leads to the French Revolution, you know? This is significant, and I don’t think industry really takes responsibility for that . . . And what’s the legal solution? Ban driverless cars? Maybe, but that’s a hard call. What’s the rationale for that? I think these are the huge challenges that engineers have to own; but I’m not sure they know they should. (Interviewee #10).

3.8 Pressure to Conform

One interviewee expressed a deep fear that constant data collection about people, combined with analysis of that data to allocate goods and opportunities, would create a profound pressure on individuals to conform to behaviors that they think will please the algorithmic decision-maker.

[M]y biggest fear, which is almost Orwellian, is that . . . [a]t some point, we as individuals will begin to realize that we are being observed. And everything about our behaviors and our patterns of behaviors are being understood, compiled, inferences are being created. And there will be a point in time in the near future . . . where we're going to internalize that. And you know what's going to happen . . . we are going to be the person we think people want us to be all the time. And what impact is that going to have on creativity? What impact is that going to have in ultimately funneling us all down into behavior that we believe or, worse case, know that we must conform to? . . . What impact is that going to have on society? On culture? On us as individuals? It scares the hell out of me. And it's happening right now. (Interviewee #21).

3.9 Intentional, Harmful Use of Analytics

Some companies worried that customers or others would use their analytic tools for morally problematic ends. For example, one company had an internal debate about whether to sell its technology to customers that may have ties to the Chinese government which might use the technology to create facial recognition tools capable of distinguishing members of the Uighur minority (Interviewee #2).Footnote 4 In a well-publicized 2018 incident, thousands of Google employees signed a letter protesting the company’s work on a Pentagon pilot program, Project Maven, which used machine learning to interpret drone imagery and, potentially, to better target drone strikes against suspected terrorists or other individuals (Wakabayashi and Shane 2018). The letter expressed the employees’ view that “Google should not be in the business of war.” A few months later, Google announced that it would cease its involvement with the controversial Pentagon program (Harwell 2018).

The difficult question is where to draw the line. One interviewee described an employee complaint about the company’s analytic work for a cosmetics manufacturer. “[S]omebody sent an email to me and they said, ‘What good does it do the world to perpetuate working with companies whose primary mission is to make women feel bad about how they look?’ I thought about the question, it’s not really a civil liberties or privacy question, but we didn’t feel like we should ignore it, so we started having a conversation... but this was interesting: how do we evaluate?” (Interviewee #10). The lines are not clear. Even the question of whether to do advanced analytics work for the Pentagon has no obvious answer. Several months after Google’s announcement on Project Maven, Microsoft and Amazon separately affirmed their willingness to contribute to the Department of Defense’s AI efforts. (Gregg 2018).

While it is commonplace today to think about the harm that advanced analytics and AI can create, that was not as common a view at the time of the interviews and survey on which this book is based. As the above discussion makes clear, the companies that we studied were aware of these risks. That, in turn, raised the question of what a company should do to address them. Was compliance with existing legal requirements sufficient? Or should a company go beyond this? It is to that question that we now turn.