Experiences Implementing and Deploying Anonymous Code Review

Dicker, Jill; Murphy-Hill, Emerson; Sarma, Anita

doi:10.1007/978-1-4842-9651-6_19

Jill Dicker⁶,
Emerson Murphy-Hill⁶ &
Anita Sarma⁷

536 Accesses
1 Altmetric

Abstract

Jill Dicker, Emerson Murphy-Hill, and Anita Sarma

You have full access to this open access chapter, Download chapter PDF

Developers from historically marginalized groups often face more rejection and pushback than their peers. To address this problem, at Google we’ve been experimenting with anonymous author code review (AACR) as a way to combat the biases that all people have. In this chapter, we describe our experience designing, implementing, deploying, and using anonymous author code review over the last few years. We describe the who, how, when, and where to anonymize in a modern code review system. Our experience suggests that the design space for anonymous code review systems is wide and that implementations are not trivial.

Introduction

Prior work [4, 5, 7] has shown that code authors from historically marginalized groups face more rejection and more pushback on their changelists (CLs), also referred to as a pull request, than their majority peers, both in open source software and in industry:

On GitHub, Terrell and colleagues showed that code review acceptance rates differ between men and women, depending on whether reviewers’ can infer their gender from their GitHub profiles, that is, based on their username or profile photo. When the gender of a non-owner of a project is apparent, women have lower pull request acceptance rates than men [7]. When the gender is harder to infer, that trend is reversed – women actually have higher acceptance rates.
Based on race and ethnicity inferred from GitHub users’ names, Nadri and colleagues found that perceptibly White developers tended to have higher pull request acceptance rates than non-White developers [5].
In an industrial setting, we showed that men and White developers have lower odds of receiving pushback from their code reviewers [4]. In that work we also showed that older developers faced more pushback than younger ones.

Such research builds on the observation that it’s common for software developers to look at identity signals in a code author’s profile [1], such as their name or profile photo. In doing so, human biases likely come into play, resulting in the disparities in rejection and pushback observed in the research literature.

One way that prior work [4, 5] has suggested that such disparities can be mitigated is through anonymous code review, where information about a code change’s author is hidden from a reviewer, so that the reviewer may focus on the content of the changelist without being biased by the identity of the code author.

We have built two such anonymous author code review (AACR) features for code review inside of Google. In particular, the first author of this paper, a software engineer at Google, built a Chrome browser extension that provided AACR for those that installed it. The second author of this paper, a research scientist at Google, then evaluated the engineering impacts of this extension, described in a published experiment [4]. Since then, the first author implemented a new version of AACR directly in a code review tool (called Critique), which has been in production for more than two years. The third author of this paper has been studying usage of the Critique AACR implementation as a visiting scientist at Google.

While the idea of AACR seems straightforward – just remove the author’s name during a code review (Figure 19-1) – in this chapter we discuss some practical considerations building and deploying this tool designed to increase equity in code review outcomes. We hope that our experiences will be helpful to toolsmiths and developers in other organizations that wish to implement anonymous code review. We structure the remainder of the chapter as follows: first, deciding who to anonymize; second, deciding how to deploy anonymization; third, deciding when to anonymize; and fourth, deciding where to anonymize.

A flowchart represents 9 steps for the code review process. It starts from an issue created in the task-tracking tool, C L creation, the author assigns reviewers to C L, the reviewer looks at the code and approves C L, and ends with the C L summation of the author. — **Figure 19-1**

Who to Anonymize

Anonymous code reviews can take different forms. One option is to anonymize the author identity to mitigate any biases that may emerge based on social cues deciphered from the author profile or identity. The other option is double anonymous. We believe the first option is a better option, since code reviewers act as gatekeepers and already occupy the power position in this relationship. Further, there is a possibility that reviewer anonymity can lead to more destructive criticism, which is already a concern in open source [2].

How to Deploy

There are different levels at which AACR can be deployed, ranging from the individual to the entire organization. Here we discuss several approaches.

Authors: An AACR option can be authors choose particular changelists or have all their changes be anonymized. We did not choose this option because reviewers may interpret this as authors not trusting them to evaluate their code objectively. Additionally, if people from marginalized communities are the ones who face pushback and turn this option on, this by itself would signal their identity, thereby perpetuating biases.

Reviewers: Another AACR option is to allow reviewers to opt in to reviewing all incoming changelists with the author identity hidden. This has the benefit of avoiding confusion when the author’s identity is hidden in a changelist, since the reviewer is aware they opted into this feature. We allowed reviewers to break the glass to reveal the author’s identity, in cases where more context is needed. This ensures that AACR does not block a review from moving forward. One drawback with this approach is that reviewers who are aware of unconscious biases in code reviews – and try to combat them – may be more likely to opt in, yet those who do not opt in would arguably benefit more.

Leadership: AACR could also be deployed in a top-down manner, at the behest of leadership. However, our prior experiment suggests that developers tend to prefer to retain control of whether or not they use anonymous code review [3]. Furthermore, our experience has been that there are two “chicken and egg” problems with a top-down approach. First, missing desirable features – such as indications of what time zone an anonymized author is in – tend to be more of a blocker for organizations than for self-selecting individuals. Yet significant feature investments in AACR are less justifiable without an existing, significant user base. Second, organizations tend to want evidence that AACR will solve inclusion problems, but a large user base is necessary to detect statistically significant effects.

Review types: This feature can be enabled for specific types of reviews, rather than being dependent on who the author or reviewer of the changelist is. One type of review that seems appropriate for anonymizing this way are large-scale changelists where an author is making a change across the entire codebase and sends out small reviews to each team whose code was impacted. Examples of large-scale changelists include changing an API method name or upgrading a specific build dependency. Reviewing this type of changelist is much less complex than changelists that are modifying behavior or features, so the identity of the author should not be relevant. Google readability reviews are another example of a type of review that is appropriate for always anonymizing the author. These reviews are performed after a changelist has already been approved by a teammate, as shown in step 8 of Figure 19-1, and the review is limited to ensuring the code conforms to language-specific best practices.

When to Anonymize

There are several ways one can implement AACR so that changelists are anonymized and deanonymized at various times.

During review: A common concern raised with anonymizing the author of changelists is that developers will be less familiar with what work their colleagues are doing. To mitigate this issue, we implemented AACR such that as soon as the code is merged into the codebase, at step 9 of Figure 19-1, the identity of the author is no longer anonymized. We had considered going a step further and revealing the identity of the author as soon as the reviewer gave their mark of approval for the change to be merged (step 7 of Figure 19-1), but we found it is a relatively common practice to give approval before all the review comments are resolved. This is especially common when reviews are done across time zones and reviewers don’t want to slow down the author.

A screenshot of the anonymous author code review in the critique code review tool. The username displayed is an anonymous panda with the status unresolved. The page lists details of reviewers, C C, bugs, workspace, created, last snapshot, and released. — **Figure 19-2**

Human authors: After releasing this feature, we realized that we had anonymized changelists sent by robot authors. Large-scale refactorings are sometimes done by sending out automatically generated small changes, with the author set as a non-human account. It turned out that looking at the author was one of the primary ways reviewers determined that they were performing a review of a large-scale change. Before we implemented a fix to never anonymize non-human accounts, reviewers who had anonymization on would sometimes even try to communicate with the robot author in the code review tool, unaware it wasn’t a real person.

Where to Anonymize

The basic idea behind anonymous author code review is to hide the author’s identity from reviewers in the code review tool. Figure 19-2 shows a picture of how we replaced the author’s username at the top of a code review with an anonymous animal.

The author of a change request is not only available via the code review tool but also in a suite of tools related to it. Here are some places where the author of the change request is discoverable outside of the code review tool:

In a task-tracking tool, which is often linked from the changelist
- In a tool that displays the results of running the predefined set of continuous integration tests for each proposed code change, which is always linked from the changelist
- In an extension that notifies developers of any new tasks, code reviews, etc. that are assigned to them
- In emails sent by the code review tool when there are updates to the review

Additionally, within the code review tool, there were places the author’s name appeared that we weren’t expecting:

In other changelists in the same chain of changelists
In the workspace (or branch) name
As part of the directory structure of the code being modified
In the code itself

Some of the items mentioned here have a clear best solution. For example, the name of the workspace (or branch) is not relevant to reviewing the content of the code, so can either be omitted entirely or hidden if it contains the author’s name. However, many of these items have legitimate reasons to keep the author’s name visible. In the following we outline the most contentious of the issues.

Chains of CLs

A relatively common practice within Google is to split code changes into small reviewable pieces and send them for review as a chain, or stack, of changelists. Each changelist in the stack links to the others in case the reviewer wants to see more context about the full change, as seen in Figure 19-3. If a reviewer of a changelist in such a stack has the author identity hidden, then the author’s identity should also be hidden when viewing any other changelists in the stack, to avoid deanonymizing the author when the reviewer is simply trying to get more context for their review. However, if any changelists in the stack are already merged into the codebase, the author’s identity will be revealed because anonymization is only applied during review as discussed previously.

A screenshot of a list of 3 C L in the code review tool. The column headers are change list, author, status, last action, reviewers, size, and description. — **Figure 19-3**

Email Notifications

Anonymizing the content: Anytime there is an update to a changelist (e.g., the code is updated by the author or new comments are made in the code review tool), an email summarizing the review activity is sent to the author of the change and all reviewers. We modified the content of the emails to omit the identity of the author where it wasn’t necessary and used the generic term “CL author” when it was necessary to refer to the author of the CL. See Figure 19-4 for an example of such an email.

Anonymizing the recipient list: One unintended consequence of allowing individual reviewers to opt in was that a single code review could have a mix of reviewers, some using the anonymization feature and some not. When any summary email is sent out, it is sent as a single email to all reviewers and the author. Since the author is one of the recipients, this would effectively deanonymize the author for any reviewers of the CL. One proposed solution to this problem was to send a separate email to each reviewer and a separate email to the author. However, having a single email thread that includes all reviewers and the author is a feature that many reviewers rely on for communication about the changelist. For example, some users check their email on a mobile device while not at the office and reply to the email as a way to alert everyone else that their review will be delayed. In the two years since this feature was released, there is still no way for users who are reviewing anonymously to view summary emails without risking deanonymizing the author. Our recommendation for users who turn this feature on is that they filter out the email notifications from their inbox, which is not an ideal experience for users, especially those who rely on the email notifications in their workflow.

Identifying author-anonymous reviewers: When a user is first assigned to review a changelist, an email is sent out to all existing reviewers (if any) and the author notifying them that a reviewer has been added. We made the decision to explicitly alert the author when a reviewer is reviewing their code anonymously for a couple of reasons. First, so the author would be aware that talking to their reviewer outside the code review tool (in person or via chat) would deanonymize the review. Second, so the author wouldn’t be confused if the reviewer made a comment that would be considered strange if the reviewer had known the identity of the author. For example, a reviewer who doesn’t know the identity of the author might say something like “Will you be able to release this to production next week?”, and if the author is a colleague who is going on leave the next day, it might be taken differently than intended. For these reasons, anytime someone is added who will be reviewing a changelist with author identity hidden, a sentence is added to the bottom of the email saying “[user] is reviewing this CL with author identity hidden via Anonymous Code Review[link to documentation].” One unexpected result of adding this text was that it served as an advertisement for AACR. This allowed for feature adoption to grow organically.

A screenshot of the summary email in the code review tool. It depicts the sender and receiver email at the top and a summary of the C L author in a block at the bottom. — **Figure 19-4**

Within the Code

Occasionally the identity of the author appears in the code itself. One common reason this happens is the author leaves a “TODO” in the code that includes their username and a description of what is left to do in a future change. Some less common ways the author’s identity is visible in the code are if the author is adding code to their own personal directory that includes their username or when authors use their own name when writing unit tests. In weighing the benefits of preserving anonymity at the cost of no longer showing the true code that will be submitted into the codebase, we decided the risk and confusion of changing the content of the code was not worth it.

Linked Bug Reports

In many code review systems, it’s common to link a change being reviewed with a bug or bugs that are related (Figure 19-1, Step 1). This is useful for a code reviewer to understand more about the code change, especially understanding the rationale for why the change was necessary in the first place. Typically, the author of the change is the same person who is assigned to the linked bug report. This presents a challenge for AACR, since reviewers who look at the linked bug may see the identity of the assignee and reasonably infer that this person is the change’s author.

The solution is to anonymize the assignee of the linked bug report. In our implementations, we chose not to do this, mostly due to technical complexity; the code review and the bug systems are independent codebases, so looking up which code reviews are anonymized from the bug tracker is nontrivial. A robust implementation, however, would likely be (a) only anonymize assignees when the bug viewer is also reviewing a linked code review with AACR and (b) perform consistent anonymizations across the code review tool and bug tracker, that is, use the same anonymous animal for each.

Discussion

Designing, implementing, and using anonymous code review has helped us think more deeply about bias in software development tools. A broader question that AACR raises is, what information is relevant and what information is irrelevant when performing software development tasks, and when should irrelevant information be hidden?

From a principled design perspective, by excluding irrelevant information – in this case, a code author’s identity – we aim to reduce the influence of bias. From that basic principle, we can examine where else irrelevant information should be removed. One idea here is issue trackers, where participants – issue reporters, triagers, and implementors – work collaboratively to decide if, when, and how a bug should be fixed or a feature should be implemented. Analogous to code review, it’s plausible that people from marginalized identities may face more negative outcomes; for instance, we might predict that issues filed by men would be more likely to be fixed than issues filed by others. For such tools and tasks, anonymization may also be a promising path to reducing the impact of bias and increasing diversity and inclusion.

At the same time, removing identity from software engineering systems is a blunt tool for a more specific problem. That is, identity per se isn’t the thing that activates people’s biases, but rather it’s a proxy for demographic identities, such as historically marginalized genders, races, ages, and so on. Because it’s difficult to mask only a person’s demographic identities, in anonymous code review, we instead mask the entirety of their identity. But this comes with a hidden disadvantage, which is that identity carries useful signals that the anonymization masks. For instance, reviewers who use AACR say they do need to know authors’ time zones, so that they can decide when to do a review [3]. But even revealing time zone is potentially fraught. For instance, because Asian authors typically get more pushback than White authors [4], knowing that an author is in Indian Standard Time suggests the author is more likely to be Indian. What we’ve concluded is that deciding what information to reveal about a developer is more complicated than it appears at first glance.

Another facet of identity that might be useful to reveal during anonymous code review is to indicate whether the author is on the same team as the reviewer. Preference data reveals that most anonymous code reviewers think that this facet would be useful or essential to know, but a nontrivial proportion think that knowing it could be harmful [3]. To us, the lesson here is that user preferences should be gathered and considered, but that other information needs to be taken into account when deciding how to implement anonymization. Indeed, we might ask whether anonymous code reviewers are even in a good position to know whether revealing some facet of identity will be harmful or not. Rather, directly measuring harm through empirical research may be a more fruitful path.

However, our experience has been that even direct measurement isn’t a panacea for decision making. For example, while our previous data on code review shows that folks from marginalized groups tend to face disproportionate pushback, it also shows that more junior code authors tend to face more pushback than more senior ones. Based on this result, should we conclude that junior authors are discriminated against? On one hand, it seems plausible that people see a junior developer and assume they’re not as competent as a more senior one, independent of the developer’s actual performance. On the other hand, senior developers usually do become more competent as they gain experience, so arguably the data does not suggest the presence of undue bias. So whether a code review system should display an author’s seniority remains an open question, even in the presence of data. The lesson that we have drawn from this is that some design decisions should be data-driven and others need not be. A good researcher, designer, or leader should know which is which.

Conclusion

Code review is a process where biases can taint reviewers’ judgements, and anonymous author code review is one technique designed to reduce the impact of such biases. Our experience implementing and deploying anonymous author code review has demonstrated that implementing it in a modern code review ecosystem is not as trivial as one might imagine. We hope that the lessons we’ve shared in this chapter are helpful for others who are considering implementing it in their tools and organizations.

Acknowledgments

Thanks to Ben Holtz, Katie Stolee, Lanting He, the Critique team, Google’s Core Developer team, and anonymous reviewers for their feedback and support.

Bibliography

Denae Ford, Mahnaz Behroozi, Alexander Serebrenik, and Chris Parnin. Beyond the code itself: how programmers really look at pull requests. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS), 51–60, IEEE, 2019.
Google Scholar
Sanuri Dananja Gunawardena, Peter Devine, Isabelle Beaumont, Lola Garden, Emerson Murphy-Hill, and Kelly Blincoe. Destructive criticism in software code review impacts inclusion. In Computer Supported Cooperative Work, 2022. To appear.
Google Scholar
Emerson Murphy-Hill, Jillian Dicker, Margaret Morrow Hodges, Carolyn D. Egelman, Ciera Jaspan, Lan Cheng, Elizabeth Kammer, Ben Holtz, Matthew A. Jorde, Andrea Knight Dolan, et al. Engineering impacts of anonymous author code review: A field experiment. IEEE Transactions on Software Engineering, 48(7):2495–2509, 2021.
Google Scholar
Emerson Murphy-Hill, Ciera Jaspan, Carolyn Egelman, and Lan Cheng. The pushback effects of race, ethnicity, gender, and age in code review. Communications of the ACM, 65(3):52–57, 2022.
Google Scholar
Reza Nadri, Gema Rodíguez-Pérez, and Meiyappan Nagappan. On the relationship between the developers perceptible race and ethnicity and the evaluation of contributions in OSS. IEEE Transactions on Software Engineering, 48(8):2955–2968, 2021.
Google Scholar
Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. Modern code review: a case study at google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, 181–190, 2018.
Google Scholar
Josh Terrell, Andrew Kofink, Justin Middleton, Clarissa Rainear, Emerson Murphy-Hill, Chris Parnin, and Jon Stallings. Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Computer Science, 3:e111, 2017.
Google Scholar

Download references

Author information

Authors and Affiliations

Google, California, USA
Jill Dicker & Emerson Murphy-Hill
Oregon State University, OR 97331, USA
Anita Sarma

Authors

Jill Dicker
View author publications
You can also search for this author in PubMed Google Scholar
Emerson Murphy-Hill
View author publications
You can also search for this author in PubMed Google Scholar
Anita Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Victoria, Victoria, BC, Canada
Daniela Damian
University of Auckland, Auckland, New Zealand
Kelly Blincoe
Microsoft, Redmond, WA, USA
Denae Ford
Eindhoven University of Technology, Eindhoven, The Netherlands
Alexander Serebrenik
Riyadh, Saudi Arabia
Zainab Masood

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dicker, J., Murphy-Hill, E., Sarma, A. (2024). Experiences Implementing and Deploying Anonymous Code Review. In: Damian, D., Blincoe, K., Ford, D., Serebrenik, A., Masood, Z. (eds) Equity, Diversity, and Inclusion in Software Engineering. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9651-6_19

Download citation

DOI: https://doi.org/10.1007/978-1-4842-9651-6_19
Published: 21 September 2024
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-9650-9
Online ISBN: 978-1-4842-9651-6
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics