As we addressed in preceding chapters,1 data , or “experience” as Holmes referred to the inputs in law, influences decision-making in a number of ways. It might influence decision-making in such a way that the decision made is illegal, immoral, unethical, or undesirable on some other grounds. Both legal decision-making and machine learning have struggled about what to do when presented with data that might influence decision-making in such a way. Courts have attempted to exclude particular pieces of data, what we will call “bad evidence,” from the decision-making process altogether.2 Exclusion before entry into the process removes the problem: data that doesn’t enter doesn’t affect the process. However, exclusion also may introduce a problem. In both legal settings and in machine learning, particular pieces of evidence or data might have undesired effects, but the same inputs might assist the process or even be necessary to it. It also might be that no mechanism exists that will reliably exclude only what we aim to exclude. So exclusion , or its collateral effects, may erode the efficacy or integrity of the process. Lawyers refer to the “probative value” of a piece of evidence, an expression they use to indicate its utility to the decision process—even when a risk exists that that evidence might have undesired effects.3 The vocabulary of machine learning does not have such a received term here, but data scientists, as we described in Chapter 3, know well that the data that trains the machine is essential to its operation.

Exclusion is not the only strategy courts have used to address bad evidence and its kindred problem, bias . Another is to restrain the inferences that the decision-maker draws from certain evidence that might otherwise have undesirable effects on decision-making. This strategy entails an adjustment to the inner workings of the process of decision itself.

Finally, restraint may be imposed at a later stage. For example, courts review outputs (verdicts and judgments) and, if they’re not in accord with certain rules, strike them down, which, in turn, means that the instruments of public power will not act on them. In that strategy, a decision-making mechanism—for example, a jury, and as much might be said of a machine learning system—was not under inferential restraint (or it was but it ignored the restraint); the output the mechanism gives, on review, is unacceptable in some way; and, so, the output is not used. The restraint did not operate within the mental or computational machinery that generated an output but, instead, upon those persons or instrumentalities who otherwise would have applied the output in the world at large. Defect in the output discerned, they don’t apply it.

We turn now to consider more closely the problem of bad evidence; the limits of evidentiary exclusion as a strategy for dealing with bad evidence in machine learning; and the possibility that restraining the inferences drawn from data and restraining how we use the outputs that a machine reaches from data—strategies of restraint that have antecedents in jurisprudence—might be more promising approaches to the problem of bias in the machine learning age.

8.1 The Problem of Bad Evidence

As an Associate Justice of the Supreme Court, Holmes had occasion in Silverthorne Lumber Co. v. United States4 to consider a case of bad evidence. Law enforcement officers had raided a lumber company’s premises “without a shadow of authority” to do so.5 It was uncontested that, in carrying out the raid and taking books, papers, and documents from the premises, they had breached the Fourth Amendment, the provision of the United States Constitution that protects against unreasonable searches and seizures. The Government then sought a subpoena which would authorize its officers to seize the documents which they had earlier seized illegally. Holmes, writing for the Supreme Court, said that “the knowledge gained by the Government’s own wrong cannot be used by it in the way proposed.”6 In result, the Government would not be allowed to use the documents7; it would not be allowed to “avail itself of the knowledge obtained by that means.”8 Obviously, no judge could efface the knowledge actually gained and thus lodged in the minds of the government officers concerned. The solution was instead to place a limit on what those officers were permitted to do with the knowledge: they were forbidden from using it to evade the original exclusion .

The “fruit of the poisonous tree ,” as the principle of evidence applied in Silverthorne Lumber Co. came to be known, is invoked in connection with a range of evidentiary problems. Its distinctiveness is in its application to “secondary” or “derivative” evidence9—i.e., evidence such as that obtained by the Government in Silverthorne Lumber on the basis of evidence that had earlier been excluded. Silverthorne and, later, Nardone v. United States, where Holmes’s friend Felix Frankfurter gave the principle its well-known name, concerned a difficult question of causation. This is the question, a recurring one in criminal law, whether a concededly illegal search and seizure was really the basis of the knowledge that led to the acquisition of new evidence that the defendant now seeks to exclude. In the second Nardone case, Justice Frankfurter writing for the Court reasoned that the connection between the earlier illegal act and the new evidence “may have become so attenuated as to dissipate the taint.”10 But if the connection is close enough, if “a substantial portion of the case against him was a fruit of the poisonous tree ,”11 then the defendant, as of right, is not to be made to answer in court for that evidence.12 That evidence, if linked closely enough to the original bad evidence, is bad itself.

We have noted three strategies for dealing with bad evidence: one of these is to cut out the bad evidence and so prevent it from entering the decision process in the first place. This strategy, which we will call data pruning,13 in a judicial setting is to rule certain evidence inadmissible. It is a complete answer, when you have an illegal search and seizure, to the question of what to do with the evidence the police gained from that search. You don’t let it in. A different strategy is called for, however, if bad evidence already has entered some phase of a decision process. Judges are usually concerned here with the jury’s process of fact-finding. On close reading, one sees that Holmes in Silverthorne Lumber was concerned with the law enforcement officers’ process of investigation. In regard to either process, and various others, a strategy is called for that restrains the inferences one draws from bad evidence. We will call that strategy inferential restraint. Finally, and further down the chain, where a decision or other output might be turned into practical action in the world at large, a further sort of restraint comes into play: restraint upon action. We will call this variant of restraint executional restraint. Data pruning and the two variants of restraint, all familiar since Holmes’s day in American court rooms, have surfaced as possible strategies to address the problems that arise with data in machine learning. We will suggest, given the way machine learning works, that data pruning and strategies of restraint are not equally suited to address those problems.

8.2 Data Pruning

Excluding bad evidence from a decision process has at least two aims. For one, it has the aim of deterring impermissible practices by those who gather evidence, in particular officials with police powers. Courts exclude evidence “to compel respect for the constitutional guaranty [i.e., against warrantless search and seizure] in the only effectively available way—by removing the incentive to disregard it.”14 For another, it has the aim of preventing evidence from influencing a decision, if the evidence tends to produce unfair prejudice against the party subject to the decision. In machine learning, the first of these aims—deterring impermissible data-gathering practices—is not absent. It is present in regulations on data protection.15 Our main focus here is with the second aim: preventing certain data from influencing the decision.16 Data pruning is the main approach to achieving that aim in judicial settings.

Data pruning avoids thorny questions of logic, in particular the problem of attenuated causation. Just what inferences did the jury draw from the improper statements or evidence? Just what inferences did the police draw from the evidence gained from the unlawful search? And how did any such inferences affect future conduct (meaning future decision)? It is better not to have to ask those questions. This is a salient advantage of data pruning. It obviates asking, as Justice Frankfurter had to, whether the link between the bad evidence and the challenged evidence has “become so attenuated as to dissipate the taint.”17

Data pruning has the related advantage that, if the bad data is cut away before the decision-maker learns of it, the decision-maker does not have to try not thinking about something that she already knows. Data pruning avoids the problem that knowledge gained cannot be unlearnt. As courts have observed, one “cannot unring a bell.”18 The cognitive problem involved here is also sometimes signaled with the command, “Try not to think of an elephant.” By deftly handling evidentiary motions, or where needed by disciplining trial counsel,19 the judge cuts out the elephant before anybody has a chance to ask the jury not to think about it.

Machine learning has a fundamental difficulty with data pruning. To make a meaningful difference on the learnt parameters, and thus on the eventual outputs when it comes time to execute, you need to strike out huge amounts of data. And, if you do that, you no longer have what you need to train the machine. Machines are bad at learning from small amounts of data; nobody has figured out how to get a machine to learn as a human infant can from a single experience. Nor has anybody, at least yet, found a way to take a scalpel to datasets; there’s no way, in the state of the art, to excise “bad” data reliably for purposes of training a machine.20 Accordingly, data pruning is anathema to computer scientists.21

As for legal proceedings, data pruning is, as we said, a complete answer to the problem it addresses—in situations in which the data was pruned before a decision-maker sees it. As we noted, however, not all improper evidence stays out of the court room. Nor does all knowledge gained from improper evidence—fruit of poisonous trees—stay out. Once it enters, which is to say once a decision-maker, such as a juror, has learned it, its potential for mischief is there. You cannot undo facts. They exist. Experience is a fact. Things that have been experienced, knowledge that has been gained, do not disappear by fiat.

A formalist would posit that the only facts that affect the trial process are those that the filters of evidentiary exclusion are designed to let in. As we have discussed, however, Holmes understood the law, including the results of trials, to derive from considerably more diverse material. Juries , lawyers, and judges all come with their experiences and their prejudices. To Holmes, these were a given, which is why he thought trying to compel decision-makers “to testify to the operations of their minds in doing the work entrusted to them” was an “anomalous course” and fruitless.22 You cannot simply excise the unwanted experience from someone’s mind—any more than present-day computer scientists have succeeded in cutting the “bad” data from the training dataset.

8.3 Inferential Restraint

What you can do—however imperfect a strategy it may be—is place limits on what you allow yourself, the jury, the machine, or the judge to infer from the data or the experience. Inferential restraint is familiar in both law and machine learning. In efforts to address the problem of bad evidence (bad data) in machine learning, most of the energy indeed has been directed toward this approach: instead of pruning the data, computer scientists are developing methods to restrict the type of inferential outputs that the machine is able to generate.23

In the legal setting, placing restrictions upon inferences has been an important strategy for a long time. Judges ’ instructions to juries serve that purpose; appeals courts recognize that judges’ instructions, properly given, have curative effect.24 Judges , in giving curative instructions, understand that, even when bad evidence of the kind addressed in Silverthorne and Nardone (evidence seized in violation of a constitutional right) has been stopped before it gets to the jury , there still might be knowledge in the jurors’ minds that could exercise impermissible effects on their decision. The jurors might have gained such knowledge from a flip word in a lawyer’s closing argument.25 They might have brought it with them in off the street in the form of their life experiences; Holmes understood juries to have a predilection for doing just that.26 Knowledge exists which is to be kept from affecting verdicts, if those verdicts are to be accepted as sound. But some knowledge comes to light too late to prune. There, instead, a cure is to be applied. In the courtroom, the cure takes the form of an instruction from the judge. The instruction tells the jurors to restrain the inferences they draw from certain evidence they have heard. The restraint is intended to operate in the mental machinery of each juror.

A further situation that calls for inferential restraint is that in which some piece of evidence has probative value and may be used for a permissible purpose, but a risk exists that a decision-maker might use the evidence for an impermissible purpose. Pruning the evidence would have a cost: it would entail losing the probative value. Thus, as judges tell jurors to ignore certain experiences that they bring to the court room and certain bad evidence or statements that, despite best efforts, have entered the court room, so do judges guide jurors in the use of knowledge that the court deliberately keeps.27 Here, too, analogous approaches are being explored in machine learning.28

8.4 Executional Restraint

From Holmes’s judgment in Silverthorne, one discerns that a strategy of restraint operates not just on the mental processes of the people involved at a given time but also on their future conduct and decisions. Silverthorne was a statement to the government about how it was to use knowledge. True, the immediate concern was to cut out the bad evidence root and branch, to keep it from undermining judicial procedure and breaching a party’s constitutional rights. Data pruning is what generations of readers of Silverthorne understand it to have done; the principle of the fruit of the poisonous tree more widely indeed is read as a call for getting rid of problematic inputs .29

There is more to the principle of the fruit of the poisonous tree , however, than data pruning. Consider closely what Holmes said in Silverthorne: “the knowledge gained by the Government’s own wrong cannot be used by it in the way proposed” (emphasis added). So the “Government’s own wrong” already had led it to gain certain knowledge. Holmes was not proposing the impossible operation of cutting that knowledge from the government’s mind. The time for pruning had come and gone. Holmes was proposing, instead, to restrain the Government from executing future actions that the Government on the basis of that knowledge might otherwise have executed: knowledge gained by the Government’s wrong was not to be “used by it.” The poisonous tree (to use Frankfurter’s expression) addresses a state of the world after the bad evidence has already generated knowledge. The effect of that knowledge on future conduct is what is to be limited. That is to say, executional restraint, the strategy of restricting what action it is permissible to execute, inheres in the principle.30

8.5 Poisonous Pasts and Future Growth

Seen in this, its full sense, the principle of the fruit of the poisonous tree has high salience for machine learning, in particular as people seek to use machine learning to achieve outcomes society desires. A training dataset necessarily reflects a past state of affairs.31 The future will be different. Indeed, in many ways, we desire the future to be different, and we work toward making it so in particular, desirable ways. But change, as such, doesn’t require our intervention. Even if we separate ourselves from our desires for the future, from values that we wish to see reflected in the society of tomorrow, it is a matter of empirical observation, a fact, that the future will be different. Thus, either way, whether or not our values enter into it, we err if we rely blindly on a mechanism whose outputs are a faithful reflection of the inputs from the past that shaped it. We must therefore restrain the conclusions that we draw from those outputs, and the actions we take, or else we will be getting the future wrong.

In machine learning, there is widespread concern about undesirable correlations. An example could be supplied by a machine that hands out prison sentences. The machine is based on data. The data is a given. Americans of African ancestry have received a disproportionate number of prison sentences. Trained on that data, a machine will give reliable results: it will give results that reliably install the past state of affairs onto its future outputs . African-Americans will keep getting a disproportionate number of prison sentences. Reliability here has no moral valence in itself; it connotes no right or wrong. It is simply a property of the machine. The reason society objects to reliability of this kind, when considering an example as obvious as the prison sentencing machine, is that this reliability owes to data collected under conditions that society hopes will not pertain in the future. We want to live under new conditions. We do not want a machine that perpetuates the correlations found in that data and thus perpetuates (if we obey the machine) the old conditions. Some computer scientists think there may be ways to address this concern about undesirable correlations by pruning the training dataset.32 We mentioned the technical challenges this presents for machine learning. We speculate that the other strategies will be as important in machine learning as they have been in law: restrain the inferences and actions that derogate the values we wish to protect. That’s how we increase the chances that we’ll get the future right.

Notes

  1. 1.

    See in particular Chapters 6 and 7, pp. 67 ff, 81 ff.

  2. 2.

    Silverthorne Lumber Co. et al. v. United States, 251 U.S. 385, 392 (1920, Holmes, J.); Nardone et al. v. United States, 308 U.S. 338, 342 (1939, Frankfurter, J.).

  3. 3.

    For an exposition of the concept of probative value by reference to principles of probability, see Friedman, A Close Look at Probative Value, 66 B.U. L. Rev. 733 (1986).

  4. 4.

    Op. cit.

  5. 5.

    251 U.S. at 390.

  6. 6.

    Id. at 392.

  7. 7.

    Id.

  8. 8.

    Id.

  9. 9.

    See Pitler, “The Fruit of the Poisonous Tree ” Revisited and Shepardized, 56 Cal. L. Rev. 579, 581 (1968). Justice Frankfurter called it evidence “used derivatively”: Nardone et al. v. United States, 308 U.S. 338, 341 (1939, Frankfurter, J.). Cf. noting that “[t]he exclusionary prohibition extends as well to the indirect as the direct products of such invasions [of a premises in breach of constitutional right]”: Wong Sun v. United States, 371 U.S. 471, 484 (1963) (Brennan, J.). See further Brown (Gen. Ed.), McCormick on Evidence (2006) § 176 pp. 292–94.

  10. 10.

    308 U.S. at 342.

  11. 11.

    Id. at 341. As to the sufficiency of connection, see Kerr, Good Faith, New Law, and the Scope of the Exclusionary Rule, 99 Geo. L. J. 1077, 1099–1100 (2011). Cf. Devon W. Carbado, From Stopping Black People to Killing Black People: The Fourth Amendment Pathways to Police Violence, 105 Cal. L. Rev. 125, 133–35 (2016).

  12. 12.

    Undesirable outcomes from a machine learning process are shot through with questions of causation—e.g., is it appropriate to hold accountable the computer scientist who engineered a machine learning system, when an undesirable outcome is traceable back to her conduct if at all then only by the most attenuated lines? Regarding the implications for tort law, see, e.g., Gifford, Technological Triggers to Tort Revolutions: Steam Locomotives, Autonomous Vehicles, and Accident Compensation, 11 J. Tort Law 71, 143 (2018); Haertlein, An Alternative Liability System for Autonomous Aircraft, 31 Air & Space L. 1, 21 (2018); Scherer, Regulating Artificial Intelligence Systems: Risks, Challenges, Competences, and Strategies, 29 Harv. J. L. Tech. 353, 363–366 (2016); Calo, Open Robotics, 70 Md. L. Rev. 571, 602 (2011). Writers have addressed causation problems as well in connection with international legal responsibility and autonomous weapons: see, e.g., Burri, International Law and Artificial Intelligence, 60 GYIL 91, 101–103 (2017); Sassóli, Autonomous Weapons and International Humanitarian Law: Advantages, Open Technical Questions and legal Issues to Be Clarified, 90 Int’l L. Stud. 308, 329–330 (2014).

  13. 13.

    In the computer science literature, the expression “data pruning” has been associated with cleaning noisy datasets in order to improve performance. See, e.g., Anelia Angelova, Yaser S. Abu-Mostafa & Pietro Perona, Pruning Training Sets for Learning of Object Categories: CVPR Conference (2005), San Diego, June 20–25, 2005: vol. 1 IEEE 494–501.

  14. 14.

    Mapp v. Ohio, 367 U.S. 643, 656 (1961).

  15. 15.

    See for example Meriani, Digital Platforms and the Spectrum of Data Protection in Competition Law Analysis, 38(2) Eur. Compet. L. Rev. 89, 94–95 (2017); Quelle, Enhancing Compliance Under the General Data Protection Regulation: The Risky Upshot of Accountability- and Risked-Based Approach, 9 Eur. J. Risk Regul. 502, 524–525 (2018).

  16. 16.

    “Bad evidence” is thus of broadly two types. (i) Evidence may be bad because the manner of its collection is undesirable. That type of bad evidence might have raised no problem, if its collection had not been tainted. (ii) The other type is bad, irrespective of how the evidence collector behaved. It is bad, because it poses the risk of an invidious influence on the decision process itself.

  17. 17.

    Doctrinal writers on evidence have struggled to articulate how to determine whether the link between bad evidence and challenged evidence is attenuated enough to “dissipate the taint.” Clear enough is the existence of an exception to the fruit of the poisonous tree . Unclear is when the exception applies. Here the main treatise on American rules of evidence has a go at an answer:

    This exception… does not rest on the lack of an actual causal link between the original illegality and the obtaining of the challenged evidence. Rather, the exception is triggered by a demonstration that the nature of that causal link is such that the impact of the original illegality upon the obtaining of the evidence is sufficiently minimal that exclusion is not required despite the causal link. Brown (Gen. Ed.), McCormick on Evidence (2006) § 179 p. 297.

    Note the circularity: the exclusion “exception is triggered” (i.e., the exclusion is not required) when the “exclusion is not required.” The hard question is what precisely are the characteristics that give a causal link such a “nature” that it is “sufficiently minimal.”

  18. 18.

    Dunn v. United States, 307 F.2d 883, 886 (Gewin, J., 5th Cir., 1962). Courts outside the U.S. have used the phrase too: Kung v. Peak Potentials Training Inc., 2009 BCHRT 154, 2009 CarswellBC 1147 para 11 (British Columbia Human Rights Tribunal, Apr. 23, 2009).

  19. 19.

    See for example Fuery et al. v. City of Chicago, 900 F. 3d 450, 457 (Rovner, J., 7th Cir., 2018).

  20. 20.

    Broadly speaking, there are two ways to prune a dataset: removing items from the dataset (rows) for example to remedy problems of unbalanced representation, or removing a sensitive attribute from the dataset (a column). It has been widely observed that removing a sensitive attribute is no use, if that attribute may be more or less reliably predicted from the remaining attributes. Removing items is also tricky: for example, the curators of the ImageNet dataset, originally published in 2009 (see Prologue, p. xiii, n. 23) were as of 2020 still playing whack-a-mole to remedy issues of fairness and representation. See Yang et al. (2019).

  21. 21.

    Of course we don’t mean that computer scientists find the goal that motivates data pruning efforts to be antithetical morally or ethically. Instead, data pruning, a blunt instrument perhaps acceptable as a stop-gap, is at odds with how machine learning works.

  22. 22.

    Coulter et al. v. Louisville & Nashville Railroad Company, 25 S.Ct. at 345, 196 U.S. at 610 (1905).

  23. 23.

    For a cutting-edge illustration, see Madras et al. (2018). What is particularly interesting about their approach is that, in order to guarantee that the machine learning system’s inferences are unbiased against individuals with some protected attribute x, that attribute must be available to the machine. This illuminates why computer scientists are uneasy about data pruning.

  24. 24.

    See Leonard (ed.), New Wigmore (2010) § 1.11.5 p. 95 and see id. 95–96 n. 57 for judicial comment. The standard for establishing that limiting instructions have failed is exacting. See, e.g., Encana Oil & Gas (USA) Inc. v. Zaremba Family Farms, Inc. et al., 736 Fed. Appx. 557, 568 (Thapar, J., 6th Cir., 2018).

  25. 25.

    A problem addressed repeatedly by U.S. courts. See, e.g., Dunn v. United States, 307 F.2d 883, 885–86 (Gewin, J., 5th Cir. 1962); McWhorter v. Birmingham, 906 F.2d 674, 677 (Per Curiam, 11th Cir. 1990). A substantial literature addresses jury instructions, including from empirical angles. See, e.g., Mehta Sood, Applying Empirical Psychology to Inform Courtroom AdjudicationPotential Contributions and Challenges, 130 Harv. L. Rev. F. 301 (2017).

  26. 26.

    See Chapter 7, p. 81 ff. See also Liska, Experts in the Jury Room: When Personal Experience is Extraneous Information, 69 Stan. L. Rev. 911 (2017).

  27. 27.

    Appeals courts consider such instructions frequently. For a recent example, see United States v. Valois, slip. Op. pp. 13–14 (Hull, J., 2019, 11th Cir.). Cf. Namet v. U.S., 373 U.S. 179, 190, 83 S.Ct 1151, 1156 n. 10 (Stewart, J., 1963).

  28. 28.

    See Madras et al., op. cit.

  29. 29.

    That reading is seen in judgments, including (perhaps particularly) of foreign courts when they observe that “fruit of the poisonous tree ” is not part of their law. See, e.g., Z. (Z.) v. Shafro, 2016 ONSC 6412, 2016 CarswellOnt 16284, para 35 (Kristjanson, J., Ontario Superior Court of Justice, Oct. 14, 2016). Some foreign courts do treat the doctrine as part of their law and apply a similar reading. See, e.g., Dela Cruz v. People of the Philippines (2016) PHSC 182 (Leonnen, J., Philippines Supreme Court, 2016), with precedents cited at Section III, n. 105. See the comparative law treatment of the principle by Thaman, “Fruits of the Poisonous Tree” in Comparative Law, 16 Sw. J. Int’l L. 333 (2010).

  30. 30.

    Executional restraint and inferential restraint, as we stipulate the concepts, in some instances overlap, because an execution that is to be restrained might be a mental or computational process of inference. Overlap is detectable in Silverthorne Lumber. The Government, Holmes said, was to be restrained from how it used the knowledge that it had gained through an illegal search and seizure. The use from which Holmes called the Government to be restrained was equally the Government’s reasoning about where to go in search of evidence; and the physical action it executes in the field. Restraint has both aspects as well where one is concerned, instead of with preventing people from using knowledge to generate more knowledge, with preventing a machine learning system from using an input to generate more outputs . The overlap arises in machine learning between the two variants of restraint, because machine learning systems (at least in the current state of the art) don’t carrying on computing with new inputs unless some action is taken to get them to execute. The executional restraint would be to refrain from switching on the machine (or, if its default position is “on,” then to switch the machine off).

    The overlap is also significant where human institutions function under procedures that control who gets what information and for what purposes. Let us assume that there is an institution that generates decisions with a corporate identity—i.e., decisions that are attributable to the institution , rather than to any one human being belonging to it. Corporations and governments are like that. Let us also assume that, in order to generate a decision that bears the corporate identity, two or more human beings must handle certain information; and one of them, or some third person, has the power to withhold that information. The person having the withholding power may place a restraint upon the institution: she may withhold the information and, thus, the institution cannot carry out the decision process. The restraint in this setting has overlapping characteristics. It is inferential, in that it restrains the decision process; it is executional, in that it restrains the actions of the individual constituents of the institution .

  31. 31.

    See Chapter 3, p. 37.

  32. 32.

    Chouldechova & Roth, op. cit., Section 3.4 p. 7. Cf. Paul Teich, Artificial Intelligence Can Reinforce Bias, Forbes (Sept. 24, 2018) (referring to experts who “say AI fairness is a dataset issue”).