Abstract
This introductory chapter outlines the technical and conceptual basics of genomics, its history and current impact. We use the metaphor of the history of genomics as an hourglass to challenge the widespread association of genomics with the idea of a single, international and successful ‘Human Genome Project’. We query this hourglass depiction by examining genomics research in both human and non-human organisms: yeast and pig, as well as Homo sapiens. We also introduce a concept that helps differentiate genomics across our three chosen species: communities of genomicists. The objectives, research necessities and visions of these communities are profoundly entangled with the genomes of the organisms they work on. This materialises in different models of organising and conducting genomics and affects the affordances and limitations of the resulting reference genomes.
You have full access to this open access chapter, Download chapter PDF
In four decades, genomics has transformed the biological sciences and has penetrated well beyond them. The marriage of DNA sequencing techniques and computational infrastructures built to handle, store and analyse ever-increasing quantities of data has contributed to significant developments in:
-
Our understanding of human history through our relationship to Neanderthals, Denisovans and other hominids (Pääbo, 2014);
-
Our appreciation of the extent and diversity of life previously undetected by biological methods (Riesenfeld et al., 2004; Venter et al., 2004);
-
Forensic science, food tracing and nature conservation (Arenas et al., 2017);
-
Our picture of the Tree of Life and the evolutionary relationships within it (O’Malley et al., 2010);
-
The reclassification of diseases resulting in improved diagnosis, prognosis and treatment options (Keating et al., 2016);
-
Enhancements in the efficacy of selective breeding in agriculture (Lowe & Bruce, 2019);
-
The reshaping of the fundamental models and metaphors with which we think about how living things develop and function (Keller, 2000).
DNA sequencing has gone from being a highly specialised practice, requiring considerable labour and skill, to being routinely applied in ordinary laboratory work while also being conducted at great scale, speed and accuracy in factory-style genome centres. In the late-1970s, manually sequencing the tiny genome of a bacteriophage (a virus that infects bacteria) was a monumental task, one that earned Frederick Sanger, who led the group undertaking it, a Nobel Prize (Brownlee, 2014; Hutchison, 2007). The determination of the whole human DNA sequence (commonly referred to as the Human Genome Project) took more than a decade, at a cost initially estimated at $3 billion. It started in the 1990s and concluded in 2003, expanding in speed and scale throughout.
Progress since then has been so dramatic that, more recently, well over fourteen million coronaviruses have been sequenced and shared via the Global Initiative on Sharing Avian Influenza Data.Footnote 1 Another example that illustrates how far genomics has come, is that the cost of sequencing a whole human genome was estimated to be about £7000 in 2020, multiple orders of magnitude below the original budget of the Human Genome Project (Schwarze et al., 2020).Footnote 2
In 1999, four years before the Human Genome Project was officially concluded, the National Center for Biotechnology Information of the USA created a new database called RefSeq. The purpose of this database was to serve as a centralised repository that would gather the ongoing reference sequence of the human genome and those of other species completed or in progress. Those reference sequences were and still are curated and freely released to the research community. They serve as canonical representations of their designated species and are graded according to their level of comprehensiveness, representativeness and quality (Ostell, 2013, pp. 72–74; Tatusova et al., 2014, p. 135).Footnote 3
The number of entries in RefSeq has grown exponentially, from complete sequences representing just over two thousand different species in 2003, to 125,116 in November 2022.Footnote 4 On top of this, RefSeq also curates and stores a higher number of partial sequences, as well as variants and other versions of complete reference genomes. Life scientists from every discipline all around the world can access the sequences and curatorial metadata. In processing each existing and upcoming entry, RefSeq curators attempt to achieve a balance between respecting the differences across the stored sequences while avoiding a Tower of Babel of different communities producing separate datasets that would require considerable efforts to integrate, use and compare outside their contexts of creation. Yet in fostering this universal—or at least commensurate—language, some of the distinctions between the individual reference genomes are flattened, and indeed lost.
In what follows, we make some of these distinctions visible again by looking at the history of the production of three reference genomes: those of the baker’s and brewer’s yeast Saccharomyces cerevisiae released in 1996 and published in 1997; Homo sapiens, published in 2001 as a working draft and in more definitive form in 2004; and the pig Sus scrofa, initially released in 2009 and published in 2012. Taken together, these three genomes embody overlapping trajectories of change and differentiation in the practices, goals, organisation and status of genomics research. While yeast is both a model organism in basic biomedical science and a tool for the brewing and biotechnology industries, pigs were mainly sequenced for agricultural purposes, but also to serve objectives of human medicine—for instance, helping organ transplantation. Sequencing H. sapiens became the most prominent area of genomics, one believed to have potentially invaluable clinical payoffs.
By examining the substantially different ways in which these endeavours were conducted across the three organisms, this book argues that producing a whole-genome reference sequence was not always the main—nor the universally accepted—objective of genomics, as the growing entries in RefSeq may suggest. What these now centrally curated reference sequences represented, and the uses to which they were put, also varied substantially across the communities that produced them, in spite of the commensuration work of RefSeq and cognate institutions and repositories.
The rest of this introductory chapter summarises the main features of genomics and how it historically emerged from the practices that have subsequently accompanied it and conferred its identity: mapping and sequencing DNA, and processing the resulting data with information technologies, including databases.Footnote 5 We then present the key concepts and analytical tools that we use throughout the book and outline how we develop them in the remaining seven chapters. We argue that popular and scholarly accounts have tended to excessively emphasise the Human Genome Project in the history of genomics, due to the perceived impact and high profile of this initiative. We refer to this Human Genome Project-centred history as the canonical, master narrative of genomics, and relate its structure to the hourglass model that prior historiography has applied to the study of heredity throughout the nineteenth and twentieth centuries. As in the case of the study of heredity (Barahona et al., 2010), the hourglass model aids the comprehension of the institutional and infrastructural landscape of genomics, while falling short in capturing its broader history. We escape the boundaries of the hourglass model by looking at non-human genomic endeavours and documenting the deep entanglement between the creation of reference genomes and the communities that were involved in their production. We propose the term genomicist to capture the crucial role of communities in the construction of genomic data and materials, and highlight both inclusive and exclusive mechanisms in the formation and operation of those communities.Footnote 6
1.1 Genomics, DNA Mapping and Sequencing
The sequencing of DNA is the determination of the order of the four ‘bases’ along each of the two complementary strands of nucleotides that wind around each other to produce the molecule’s double-helical structure: adenine, thymine, cytosine and guanine, known by their initials—A, T, C and G. Sequencing is central to genomics. However, genomics involves far more than just this, and sequencing can be conducted outside of genomics research and for other biological molecules, such as RNA and proteins. Indeed, while the history of sequencing—of proteins, RNA and then DNA—can be traced back to the 1950s, 1960s and 1970s respectively, genomics proper is recognised to have arisen only in the 1980s (García-Sancho, 2010). Its antecedents were not only sequencing practices, but also the mapping of chromosomes (bodies containing DNA in the cell), and the development of information technologies to process the resulting map and sequence data.
Chromosome mapping dates back to the early twentieth century and is conducted in order to find certain landmarks in them, such as genes (de Chadarevian, 2020; Hogan, 2016; Rheinberger & Gaudillière, 2004).Footnote 7 It was known since the early days of mapping that genes constitute only a small portion of chromosomes; after the discovery of the structure of DNA in 1953, genes were increasingly identified with partial, specific segments of the nucleotide sequence within the chromosome. The third central practice of genomics, the processing of the resulting map and sequence information using databases and computational methods, started to be applied to DNA in the 1970s. Similar practices involving other biological and medical data, such as the elucidation of protein sequences or the three-dimensional structures of proteins, can be traced back to the decades following World War II (Strasser, 2019, Ch. 3; de Chadarevian, 2002, Ch. 4).
What makes genomics distinct from sequencing and these other practices, when they are considered separately? While it is important to avoid the error of being too inclusive, there is also the risk that a strict and exclusive definition of genomics can project the way that genomics developed—or at least a particular trajectory of it—back on to the past. To put it bluntly, there is a danger of a winner’s narrative: that those who succeeded in making their vision of genomics a reality—or who are currently in charge of the institutional manifestation of it—dictate the boundaries of the field and project them retrospectively (Suárez-Díaz, 2010).
Areas of scientific endeavour, particularly ones with disciplinary names and associated journals, databases, brick-and-mortar facilities and well-funded institutions, are social and sociological phenomena. This means that the demarcation and boundary work performed by influential social groups and networks shapes the reality of the field. But scientific fields, disciplines and other phenomena are not only social creations and objects in this top-down political sense. They are also comprised of configurations of methods, techniques, technologies, theories, models, research programmes and commitments, norms and the careers, interests and activities of less-prominent scientists. These are no less infused with the social, cultural and political, but they are elements that deny the exclusivity of elite political, cultural and social mechanisms to define what scientific endeavours like genomics are.
It is not our job to provide an exhaustive and authoritative definition of genomics that takes account of these considerations. We can note, however, and show throughout this book, that the historical configuration of genomics involved a multi-directional, often dialectic, interaction between elite actors, less influential bench biologists and computer experts, all of whom mobilised differing visions, methods and forms of organisation. Genomics necessarily involves some form of sequencing and/or mapping of the genome, wherein the products—in the form of data—are stored and analysed using computational (informatics) infrastructures. To constitute genomics, this must be associated with a more general effort to construct a systematic representation of the genome, either in whole or in part.
The term ‘genome’ long antedates the idea of ‘genomics’, being coined by the German botanist Hans Winkler in 1920 to denote “the haploid chromosome set” (as translated in Lederberg, 2001). The haploid set constitutes one of each pair of chromosomes; so for humans that have a total of 46 chromosomes made up of 23 pairs,Footnote 8 the haploid set constitutes 23 chromosomes. Scholars have noted that the term genome, and genomics itself, aims to capture something comprehensive, a totality (Rheinberger & Müller-Wille, 2017; Stevens, 2013). Does this mean that something can only be genomic if it aims at the complete mapping or sequencing of a genome? Not necessarily. On the basis of achieving total completeness or comprehensiveness, barely anything could constitute genomics. Additionally, what constitutes completeness or comprehensiveness is not fixed; as we see later in the book, but particularly in Chap. 7, the goal posts are always moving. One may say that, as long as there is a concerted effort being made towards that end, it is genomics. However, the indeterminacy of what constitutes the end-point means that there is no strict criterion for ruling any given endeavour either in or out. The idea of a process or journey towards a goal means that the line between ‘true’ genomics and mere sequencing and mapping is somewhat blurry. How close does one need to be to the ever-receding end-point to be doing genomics?
Instead, we prefer to recognise genomics through its systematicity and its treatment of the genome as the substrate of its efforts. By systematicity, we mean that there is some concerted—and often collective—effort to identify and establish relations between multiple objects in and across the genome. By substrate, we mean that the genome is the field of operations for this activity: that which is to be mapped and the map itself. This does not mean that the whole genome needs to be mapped—or sequenced—for an effort to be deemed genomic. We distinguish systematicity from comprehensiveness and argue that in the history of genomics—especially during the early days—there were a substantial number of systematic but not comprehensive efforts, in the form of concerted operations that only addressed certain regions of target genomes.
Our criteria do not imply that all research that tries to identify genes in the genome can be classed as genomics. If a molecular geneticist was able to identify a gene that they had good reason to believe was implicated in some process in the cell, sequence that gene and then study the way it is expressed—how it results in the production of a specific protein—this falls well short of being genomics in both aspects of our guideline. It only considers a single object in the genome. Even in cases where two or more genes were involved in the process of interest, if the research does not consider the relations between them in terms of them being objects in the genome it would still not fulfil our second, ‘genome-as-a-substrate’ criterion. If, instead, the researcher was using known products of genes relating to a biological process of interest in order to identify and map multiple DNA sequences across the genome—ideally in collaboration with other laboratories—they would have shifted towards a more genomic way of working. This is because the focus is now on the genome as a territory to be mapped, rather than just on individual genes. Indeed, as we show in the next chapters, this kind of activity and the communities that converged around it became key drivers of genomics research from the 1990s onwards.
The invention of DNA sequencing methods in the 1970s was crucial to the forging of genomics. One of the main pioneers was Frederick Sanger, who had previously worked to discern the sequence of amino acids—the fundamental building blocks of proteins—in insulin, for which he won the Nobel Prize in 1958. He then moved on to RNA, the intermediary molecules in the process by which stretches of DNA form the basis for the synthesis of proteins with specific amino acid compositions. While other researchers in the mid-1970s such as Allan Maxam and Walter Gilbert also developed DNA sequencing methods, the technique that Sanger and his team devised at the Medical Research Council’s Laboratory of Molecular Biology in Cambridge (UK) became the dominant approach before the creation of newer methods in the twenty-first century (García-Sancho, 2012, Chs. 1–2).
Sanger’s technique required extremely time-consuming and labour-intensive bench work, as well as considerable technical and interpretive skills. The refinement of manual methods alongside increasing automation of parts of the process—including the invention and ongoing improvement of automated sequencing machines from the mid-1980s—enabled more and more to be sequenced in less time (García-Sancho, 2012, Chs. 5–6).Footnote 9 As the 1980s proceeded, therefore, the quantities of DNA sequence data were rapidly expanding year-upon-year.
Alongside this were developments in mapping genes and other markers on the chromosomes. Genetic mapping had been pioneered by Thomas Hunt Morgan and his colleagues in the 1910s, working with the fruit fly Drosophila melanogaster. As in most animals, Drosophila’s chromosomes are paired in two sets within its cell nucleus. Morgan’s team observed, tracked and recorded different variant traits—such as the eye colour or wing shape—in many thousands of these flies, which were systematically bred and assessed (Kohler, 1994). The traits were presumed to result from different mutant versions of genes occurring across the chromosomes.
Morgan and his team exploited two facets of genetics: linkage and recombination. Linkage means that certain genes are commonly inherited together, which in the fly experiments meant that the associated traits were linked across generations. Recombination, discovered by the Morgan laboratory in their explorations of genetic linkage, happens during the creation of the sex cells (gametes), a process called meiosis in which the pairs of chromosomes separate. In it, parts of one of a pair of chromosomes can swap places with the corresponding parts of the other member of the pair. This means that the linkage between genes can be broken.
Morgan’s laboratory realised that they could use this to find out the relative positions of genes on the fly’s chromosomes: the further apart genes were, the more likely it is that a recombination event would occur between them, breaking their linkage. The frequencies of co-occurrence of versions of particular genes could be used to ascertain their relative proximity and order on the chromosomes. An array of relatively simple traits inherited from parent to offspring fly—such as the aforementioned eye colour and wing shape—enabled the group to map the Drosophila chromosomes and to further discern chromosomal dynamics in doing so. These maps of estimated chromosomal positions started to be called genetic linkage maps (see the upper part of Fig. 1.1).Footnote 10
It took several decades for this approach to be applied to humans. When it did, inter-generational studies of families experiencing disproportionate numbers of cases of particular medical conditions could be used to identify the kind of genetic basis underlying them and to perform some analyses to assess the linkage relationships (Comfort, 2012; Lindee, 2005). This practice received a considerable boost when in the 1960s, molecular biologists began detecting polymorphic (many-variant) genetic markers that could be positioned on the chromosomal structures. These markers provided a greater number of landmarks for identification and analysis of variation beyond the small number of individuals suffering medical conditions or showing morphological traits that could be observed with the naked eye, and therefore mapped using the principles of genetic linkage. As we show in subsequent chapters, from 1973, human and medical geneticists periodically gathered in chromosome mapping workshops, with the first one held at Yale University. These workshops enabled attendees to systematically pool their mapping results—some of them obtained through molecular methods—and achieve an increasingly higher resolution in the location of genes and other markers of mainly medical interest.Footnote 11
The first genetic linkage map encompassing the whole human genome obtained through molecular markers—Restriction Fragment Length Polymorphisms or RFLPs—was published in 1980. It deployed a type of protein (restriction enzymes) that cleaved the DNA molecule at specific sequence sites. When applied to DNA samples from multiple individuals, if their sequences diverged, the cleavage would produce different patterns of fragments. These different fragment patterns could be detected and used to map the sequence-specific genome regions where the restriction enzymes acted (Botstein et al., 1980). The same enzymes had been used from the mid-to-late 1970s as part of the recombinant DNA technologies, a suite of methods that enabled researchers to cleave and isolate specific fragments of the genome of one organism and transfer them into another. For instance, as a result, human genes synthesising insulin—a protein used for the treatment of diabetes—could be expressed in a controlled way in bacteria (Rasmussen, 2014; Yi, 2015).
These molecular methods propelled the creation of a different type of map in the 1980s. Rather than representing the approximate location of genes and markers on the chromosomes—as the genetic linkage maps did—this new map visualised a set of ‘physical’ DNA fragments ordered as overlapping lines across the genome (see the lower part of Fig. 1.1). In organisms with larger genomes, the construction of these physical maps required the prior generation of libraries to store and manage the thousands of fragments into which the DNA contained in the different chromosomes would be broken.
Producing a ‘DNA library’ or ‘genome library’ involves using restriction enzymes and other recombinant techniques to insert DNA from the organism to be mapped into the genome of another organism (Hutchison, 2007; Loenen et al., 2014). As well as functioning as warehouses of the DNA inserts, the host organisms can also be used to amplify the fragments to be mapped, multiplying their number. This is achieved through the reproductive cycle of the host organism, which results in the production of cloned copies of the original inserted DNA. The libraries can be screened as well, for instance by hybridisation: using the property of chemical complementarity by which, in a double-stranded DNA molecule, adenines always bond with thymines and cytosines with guanines. Building on this, a probe containing a specific sequence can be designed to detect and locate particular fragments to which it will hybridise: chemically bond, due to the complementarity of its bases.Footnote 12
In the early days of sequencing, viruses or circular chromosomes called plasmids—present in bacteria such as Escherichia coli—were used as host organisms for libraries, but these were limited in storage capacity. In 1987, though, Yeast Artificial Chromosomes (YACs) were developed, offering considerably larger storage capacity. Later, in 1992, Bacterial Artificial Chromosome (BAC) libraries were created, with several quality-related advantages over YACs to compensate for their smaller capacity.
Ordering the inserted DNA fragments of these libraries in physical maps enabled researchers to isolate and access those fragments, which could be used for sequencing purposes or any other sort of genetic experiment. The overlaps detected between the fragments also allowed their assembly into a reference sequence, as was done with the human and other genomes (see lower part of Fig. 1.1). A central argument of this book is that the way in which libraries were constructed, and mapping was combined with sequencing, crucially distinguished the production of the yeast, human and pig reference genomes, thus embodying different forms of organising genomics, and affecting the potentialities and limitations of the resulting sequence data.
Above, a genetic linkage map of the six chromosomes of the nematode worm Caenorhabditis elegans, elaborated by molecular biologists Sydney Brenner, Robert Horvitz and Jonathan Hodgkin in the 1970s, the decade that the chromosome workshops started. Below, a diagrammatic representation of how a physicalmap is produced from a BAC library and assembled into a sequence—in this case, the reference sequence of the human genome. The physical map is the third illustration starting from the top (“Organized mapped large clone contigs”) and the sequence is the bottom illustration (“Assembly”). Above image: Reproduced from Hodgkin, J, Horvitz, R, Brenner, S, Nondisjunction mutants of the nematode Caenorhabditis elegans. Genetics, 1979, 91(1), 67–94: Fig. 1 on p. 70, by permission of Oxford University Press. Below image: Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Nature (https://www.nature.com/), Initial sequencing and analysis of the human genome, International Human Genome Sequencing Consortium, 2001: Fig. 2 on p. 863
The growing ability to map and sequence DNA presented a problem: what to do with the resulting data. In 1980, the first global database to gather DNA sequences was launched. This was the Nucleotide Sequence Data Library, sponsored by the European Molecular Biology Laboratory as a shared repository to which the life sciences community could both submit their sequencing results and access the data contributed by others (García-Sancho, 2012). In 1982, the US National Institutes of Health (NIH) created an equivalent repository—GenBank, on which RefSeq would later be built—and, two years later, the DNA Data Bank of Japan started its operation. During their early years, these repositories struggled to keep up with processing the increasing quantities of sequence data being produced, while simultaneously having to confront the problem that much of what was being produced was kept by the laboratories that performed the work and not shared with the wider community. In 1987, the three databases reached an agreement by which their entries would be mirrored and users would be able to access the same information regardless of the repository they queried. Their curators also started persuading journal editors to make submission to one of the databases compulsory ahead of the publication of new DNA sequences, something that became increasingly customary in the 1990s (Strasser, 2019, Chs. 5–6; Stevens, 2018).
That same year of 1987, the journal Genomics was founded. It was co-edited by prominent medical geneticists Victor McKusick and Frank Ruddle, who in the previous decade had played a leading role in organising the first chromosome mapping workshop at Yale University. The first editorial of Genomics, entitled “A new discipline, a new name, a new journal” stated that mapping and sequencing DNA should go “hand in hand” since both practices had the “same objective”. McKusick and Ruddle regarded mapping and sequencing genes as “the way to go” and the resulting sequence data as the “ultimate map” or the “Rosetta Stone” from which “the complexities of gene expression in development” could be discerned and the “genetic mechanisms of disease interpreted”. For the “newly developing discipline” of mapping and sequencing DNA, the co-editors “adopted the term GENOMICS” (McKusick & Ruddle, 1987, p. 1, capitals in the original; see also Kuska, 1998). In the late-1980s and especially the 1990s, Genomics established itself as a platform for the dissemination of mapping and sequencing results, along with other journals that reported on the progress of ongoing genomic research.
At this time, scientists and administrators began to consider the full mapping and sequencing of the genomes of different species. Already in the late-1970s, the tiny genomes of viruses had been sequenced, but the scale-up to even bacteria was daunting given the skills and time that the existing techniques required. From the mid-1980s onwards, however, serious proposals to map and sequence the human genome were presented and a number of national programmes began. As we show later in the book (Chap. 3), the most ambitious of these was the Human Genome Project (HGP), which started as a joint endeavour of the NIH and laboratories of the Department of Energy of the USA.
By 1990, an array of human and non-human genome projects were underway. Some, like that for the nematode worm Caenorhabditis elegans and the American side of yeast genome sequencing, were conceived as pilots for human genome sequencing, allowing methods and approaches to be tried and evaluated, then adapted and improved for the bigger task of tackling a larger genome. Others, like the European side of yeast genome sequencing (Chap. 2), and the mapping of the pig genome (Chap. 5), were driven by the research aims of particular communities of scientists working on the biology of those organisms. As we argue, it was in the specificities of the interactions between these communities and their target genomes where differences between the genome projects arose and distinct ways of practising and organising genomics were configured (for a timeline illustrating milestones in the history of genomics across these species and some select others, see Fig. 1.2).
Timeline representing historical milestones in DNA mapping and sequencing, as well as genomic research. White arrows refer to human genomics, light grey to yeast genomics and dark grey to pig genomics. Black arrows refer to technical or infrastructural developments. Elaborated by Jarmo de Vries from information compiled by James Lowe. For a larger version of this figure that can be zoomed in and out, see https://www.pure.ed.ac.uk/ws/portalfiles/portal/290406301/Fig_1_2_zoomable_final.pdf (last accessed 29th November 2022)
Genomics came into the public spotlight with the ambitious plans to sequence the entire DNA of humans. These plans—and particularly their materialisation in the HGP—have, quite naturally, attracted considerable attention both in scholarly and non-scholarly literature. In the late 1990s, the US programme coalesced with other initiatives into a transnational effort to determine a reference sequence of the whole human genome. The label HGP was kept, but the meaning of this, in both the popular imagination and for the scientists and administrators involved, shifted from the national US project to designate a broader, multi-national endeavour (Fortun, 1999). The reference sequence was published between 2001 and 2004 by an International Human Genome Sequencing Consortium (IHGSC) formed by institutions from different countries, mainly the USA, UK, France, Germany, Japan and China (Chap. 4).Footnote 13 This was heralded as the entry of biology into the world of big science (Collins et al., 2003; Glasner, 2002; Hilgartner, 2013), a term characterising large-scale, coordinated scientific projects usually in the physical or engineering sciences, such as the World War II Manhattan Project, the Apollo space programme, or the creation and operation of CERN, the European centre for nuclear research (Barnes & Dupré, 2008, p. 43; Lenoir & Hayes, 2000).Footnote 14
A central thesis of this book is that the excessive emphasis on the determination of the human reference sequence has led the history of genomics to be presented in a somewhat narrow fashion. By focusing on genomic work concerning non-human species—namely yeast and pig—and outside the HGP framework, we aim to capture a more richly-textured trajectory in which genomics forked, diversified and permeated in different ways across many areas of the life sciences and the world beyond them. We do this, in part, by unpacking the history of certain aspects of genomics that have come to be conceived of in a teleological manner: that they were created or happened in a certain way because that is how genomics would inevitably develop. These include the multiple possible ways in which genomes can be sequenced—with the HGP representing one strategy among many—and the diverse nature and utility of the reference sequences that are available today in the RefSeq database.
Based on the idea that the human reference sequence is often conceived of in a totemic manner, we now draw analogies between an HGP-centred history of genomics and the hourglass metaphor that some scholars have used to model and interrogate the history of heredity (Barahona et al., 2010). In this hourglass representation, there are two periods featuring heterogeneous activities conducted by a wide array of actors, one before and one after a bottleneck which is narrower in both content and participation. In the case of genomics, the neck of that hourglass corresponds to the later stages of the HGP (1996–2003), an initiative that has shaped the institutional landscape and infrastructures for mapping and sequencing endeavours well beyond itself. In what follows, we look beyond that narrow neck, and past an hourglass-based view of genomics more generally. We do this by paying attention to the needs and objectives of some often overlooked communities of researchers and the interactions they have with their target genomes, of both human and non-human species.
1.2 Moving Away from a Human Genome Project-centred History of Genomics
Since its inception, genomics has been an area with a significant concentration of humanities and social science scholarship. In 1988, a programme to examine the ‘Ethical, Legal and Social Implications’ (ELSI) of genomics was announced by James Watson, co-discoverer of the double helical structure of DNA and then head of the NIH Office for Human Genome Research. ELSI was formally launched in 1990 and awarded no less than 5% of the budget that the NIH would devote to human genomics. Other programmes encompassing ‘Ethical, Legal and Social Aspects’ were also launched in the early years of genomics. The one sponsored by the European Commission began as a small element of the second Framework Programme for Research and Innovation, running from 1987 to 1991. Projects and collaborations aiming to analyse the socio-ethical dimensions of genomics were particularly strong in the USA, UK, Netherlands, Germany and Canada.Footnote 15
Sociological and ethical studies of human genomics have been particularly prominent, reflecting the societal concerns about the implications of the new technologies and the use of sequence data (e.g. see Gannett, 2019). These investigations have taken advantage of the possibility to pursue ethnographic approaches, examining the decision-making, organisation and re-configuration of this new science as it happened (Hilgartner, 2017; Stevens, 2013). Histories have also been published, initially by people close to those involved, for example, Robert Cook-Deegan’s The Gene Wars (1994; see also Gaudillière & Rheinberger, 2004). Philosophical accounts have explored the re-interpretations of the role of genes and genetics in the development of organisms in the light of the findings of genome projects (Keller, 2000; Moss, 2003). This includes aspects such as the smaller than expected number of human genes, the definition and identification of ‘functional elements’ (for example in the ENCODE—Encyclopedia of DNA Elements—project) and the so-called ‘missing heritability’ problem (e.g. Griffiths & Stotz, 2013; Guttinger & Dupré, 2016).
The existing historiography of genomics has been dominated by a particular phase of the HGP: that between the internationalisation, and radical scaling and speeding up of the project in the mid-to-late 1990s and the ‘completion’ of the reference sequence in the early 2000s. This was indeed the phase in which the vast majority of the data was produced. It was made especially salient by the story of a ‘race’ between the IHGSC, funded by an array of public bodies and charities, and the competing corporate effort led by Celera Genomics and its charismatic and controversial head, Craig Venter (Davies, 2001).Footnote 16
This phase was one in which an extraordinary concentration of sequencing capacities was effected in a small number of institutions, with large and increasing numbers of sequencing machines, and ever-developing pipelines to produce, assemble and assess sequence data. Pipelines are series of successive software tools and algorithms configured to refine and validate inputs from sequencing to enable the resulting data to undergo further processing and be integrated into data infrastructures. In those pipelines, the sequences are assembled, with the parts growing smaller in number and larger in size, and more connected to each other (Fig. 1.1, bottom illustration). Many smaller laboratories and centres that had been involved in the earlier stages of human genome mapping were progressively sidelined from the effort. The advent of the reference genome heralded an era that became commonly known as ‘post-genomic’, reinforcing the equation of genomics with the HGP. ‘Post-genomics’ constituted an emergence from the narrow tunnel of the human reference sequence.
The canonical history of genomics—with its emphasis on the HGP—can be portrayed as an hourglass. In its upper part, there were a number of collective efforts to map the human genome and sequence those of other ‘pilot’ organisms such as yeast and the worm C. elegans. These efforts involved heterogeneous collections of institutions, some specialising in genomics, and others concerned with particular aspects of biology, such as anthropology, evolution, cell biochemistry or medical genetics. The later stages of the HGP from 1996 to 2003 constitute the narrow neck, tapered in because of the smaller number of institutions involved, the singularity of the aims of the programmes, and the radical abstraction of the potential genomic variation that was being captured in a single, consensus reference sequence. Then, in the lower part of the hourglass, there is an opening out to the world of post-genomics (Fig. 1.3, left).
An illustration of the two hourglass models we describe. The hourglass on the left represents the canonical history of genomics, as centred on the Human Genome Project. The hourglass on the right depicts the history of the scientific treatment of heredity over the nineteenth and twentieth centuries. In both cases, the hourglass models portray a change over time from a variety of practices, approaches and organisational forms (the upper part of the hourglass) to a narrower development (the neck of the hourglass) and then a return to a more diverse configuration (the lower part of the hourglass). Figure elaborated by both authors. For a larger version that can be zoomed in and out, see https://www.pure.ed.ac.uk/ws/portalfiles/portal/290406890/Fig_1_3_increased_final.pdf (last accessed 29th November 2022)
This hourglass model refers to both the scope of genomics and the historical trajectory that the HGP-centred narrative conveys. According to this narrative, the pre- and post-genomic stages were wider in their range of activities and institutional variety, with the HGP resembling the hourglass neck through its focus on the production of a reference sequence at specialist genome centres. This narrative projects a winner’s history in which the HGP is an obligatory passage point through which the sand in the hourglass flows: it is both the triumphant culmination of the pre-genomics stage and the opening to the post-genomic world.
The metaphor of an hourglass has also been used to productive effect when considering the history of the scientific study of heredity. In the second half of the nineteenth century, this research deployed a broad conception of heredity. In this, the roles of environment and inter-generational processes operating at different levels were explored and used to explain observed hereditary phenomena across a range of contexts. The advent of genetics as a discipline narrowed this sense of heredity, and also restricted the range of potential causal factors investigated and appealed to from the early 1900s onwards. This funnel effect, which was strengthened with the establishment of DNA as the genetic material, is what historians identify with the neck in the hourglass representing the study of heredity (Fig. 1.3, right). Then, later in the twentieth-century and into the twenty-first, the concept of heredity has once again been opened up and linked with examinations of organismal development, epigenetics, evolution and interactions with the environment, to produce new configurations such as evolutionary developmental biology. These remove the partitions between a version of heredity understood in terms of the inter-generational transmission of genetic material and other objects of biological research. We are now very much in the wider, lower part of the hourglass (Barahona et al., 2010).
While recognising the general utility of this metaphor, in making it explicit, its proponents have specifically interrogated the potential value and limitations of the hourglass model in the historiography of heredity. Could the hourglass be a “historiographical artifact” resulting from “historical research centered on a few actors and fields, most of them located in the American and British scenarios” (Barahona et al., 2010, p. 7)? Indeed, heredity was implicated in a wide range of endeavours beyond the mainstream genetics research that has traditionally been the focus of historical (and social scientific and philosophical) inquiry: medicine, agriculture, anthropology, genealogy, natural history and taxonomy, physiology, embryology and evolution. However, a cautious and critical use of the hourglass model has enabled its proponent historians to advance knowledge on these endeavours without neglecting the role and influence of the narrow neck representing genetics research.Footnote 17
It is in this heuristic way that we intend to approach the hourglass model in the history of genomics. As we show later in the book, the effects of the HGP in the history of genomics are visible and self-evident. Key current institutions and infrastructures, such as RefSeq, were the products of its momentous impetus. The infrastructures, processes and materials produced through the HGP also shaped contemporary and subsequent genome initiatives, such as the sequencing of the yeast and pig genomes, respectively. In the USA, the NIH made the yeast initiative part of its national human genome programme: it was a pilot project through which technologies were developed and tested during the early-to-mid 1990s, thus preceding the intensive sequencing phase of H. sapiens (Chap. 2). Later on, in 2003, the Swine Genome Sequencing Consortium was formed. It made use of the infrastructures and processes developed at the Sanger Institute, a leading member of the IHGSC (Chap. 5). It was leading members of the IHGSC that advocated for the subsequent transition to a ‘post-genomic’ era. When depicting this transition, its advocates often implicitly deployed an hourglass metaphor, with the HGP featuring in the narrow neck (Fig. 1.4).
Yet, however influential, the organisational model of the HGP, with its emphasis on concentration and maximised rates of production, was just one among other forms of genomics that historically emerged throughout the 1980s and 1990s: we argue that it was an unusual and rather exceptional one (Chap. 3). The other configurations demonstrate that the history of genomics is more complex and richly textured than the master narrative of the HGP and its representation in hourglass form may suggest. In order to appreciate this multifaceted history and its multiple genealogies, we need to look beyond the HGP and examine genome projects in human and non-human species that occurred before, during and after it. Another crucial way of moving beyond the restrictions of the hourglass model is placing the communities that produced the genomes—rather than the sequence end products—at the centre of our history.
A depiction of events preceding and succeeding the Human Genome Project (HGP) that illustrated an article co-authored by Eric Green and Marc Guyer in 2011. Green and Guyer were key scientific and administrative figures during the development of the HGP. After its conclusion, they were appointed director and deputy director, respectively, of the National Human Genome Research Institute of the NIH and tasked with planning what was by then called the ‘post-genomic’ era. In the illustration, the HGP is portrayed as a bulb powered by prior scientific achievements and illuminating subsequent milestones. The structure of events resulting from this past ‘powering’ the future places the HGP in a position that is analogous to the pinch-point of an hourglass. Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Nature (https://www.nature.com/), Charting a course for genomic medicine from base pairs to bedside, Green and Guyer, 2011: Figure 1 on p. 205. A high-resolution version of this image is available in the open-access version of the article, which can be found online at: https://www.nature.com/articles/nature09764 (last accessed 29th November 2022). We thank Catherine Heeney for drawing our attention to the image
1.3 ‘Thick Sequencing’, Communities and ‘Genomicists’
This book and the long-term historical narrative it encompasses enables us to probe, expand and develop a number of conceptual tools. While we use some of them for the first time here, we had originally proposed others elsewhere. Among the latter, we extend our distinction between thin and thick sequencing from its original context in making sense of pig genomics (Lowe, 2018), out to the history of genomics more generally. Thin sequencing is the compilation of the string of DNA nucleotides in order, while thick sequencing comprises all the processes, materials and organisational configurations that make the products of genomics—including the ‘thin’ sequence, but not limited to it—usable by a variety of potential actors. Thin sequencing is a feature of the narrowest point of the neck of the hourglass: it is the determination of the order of bases, whether manually or in a more automated way. This is not necessarily a simple task, as it requires the interpretation of recorded signals that are not always unequivocal. To understand the nature of genomics, however, and how its resulting outcomes can be taken up by different users in distinct ways, examining this part of the process alone is insufficient.
Capturing the thickness of sequencing means examining the obtaining and selection of DNA, its storage in DNA libraries, its mapping, the choice to sequence DNA fragments (clones) in YAC, BAC or other types of library, the extent of the coverage of the genome, and the selection of particular areas for more or less rigorous sequencing.Footnote 18 The sequences so generated then need to be assembled and annotated. All of these steps require decisions about what is to be abstracted from the variation that the different individual genomes exhibit in nature and what variation is to be represented in the final result. There are more stable aspects of this process, such as common pieces of software, sequencing and informatics pipelines, quality and validation standards, but the products also depend on the decisions and choices made in the whole thick sequencing process (Lowe, 2018). It is the thickening of our historical approach to sequencing—by focusing on practices such as library construction, mapping and annotation—that enables us to probe the hourglass representation and examine processes, trajectories and lineages beyond the narrow (thin) neck.
Through a thick sequencing framework, the differences between sequencing endeavours across species and how this affects the outcomes of genomics research—including reference genomes—become more manifest. One of the ways in which we capture these differences is by exploring the participation—or lack thereof—of particular communities of scientists in the production of reference genomes. These communities can be identified by coalescence around a particular object, such as a species, and/or a biological unit of it such as a cell. Additionally, or alternatively, they can be oriented around one or several biological processes such as heredity in the case of genetics, evolution or particular molecular mechanisms. These alliances are usually cemented and reinforced by common disciplinary membership and training, and participation in modes of scholarly communication and interaction such as a particular set of journals and conferences. These communities typically share “epistemic cultures” (Knorr-Cetina, 1999), and the extent of collaborative relations will be denser within members of a given community than between members of different communities.
There is no hard-and-fast rule for drawing the boundaries of particular communities, and weaker supra-communities or more specific sub-communities can also be identified. The notion of a community has long interested historians of science and scholars working in Science and Technology Studies (e.g. Shapin & Thackray, 1974). From the early days of both fields, a considerable amount of literature has explored the factors that lead scientists to group into communities and the dynamics of those groupings, from growth to stability, amalgamation, fragmentation or disappearance. Various mechanisms that glue communities together have been highlighted, among them common styles of thought or ways of knowing (Harwood, 1993; Pickstone, 2000), shared moral economies or working worlds (Agar, 2020; Kohler, 1994; Strasser, 2011) and particularly intense collaborative relationships (Vermeulen et al., 2013).
When we deploy the notion of community in this book, we refer to particular sets of individuals, laboratories and associated research practices converging around the description of a genome. Many of these consciously self-identify with communities, acting in concert to launch programmes and initiatives, and featuring specific conferences and venues of publication in common. Yet these communities are not homogeneous, and they may not exhibit the same characteristics or level of resolution. For instance, the community of yeast researchers we discuss (Chap. 2) is more heterogeneous than the medical geneticists we also survey (Chaps. 3 and 4). The pig genome community that we introduce (Chap. 5) is and was much smaller than both of these, but is in many respects broader, featuring different kinds of disciplinary backgrounds and researchers who have worked on other species, in addition to the pig. But, as we show, it was no less coherent a community for all that and acted as a community in shaping the genomics of their chosen species in a decisive and consequential manner. Genomics, and the object of a genome, can only be understood in relation to particular communities that it shapes as well as being shaped by, and wider social and technical configurations that it also impacts.Footnote 19
Our notion of communities builds on scholarship that considers the genome a rhetorical and practical space, as much as a material object (Szymanski et al., 2019). In this space, pre-existing scientific groupings can converge or fragment. Those, like the yeast biologists, who are more successful in defining and shaping the genome in their own terms, are in turn further unified by their orientation around the object of the genome. Human and medical geneticists, by contrast, formed a genome community that differed from the one assembled by the participants in the HGP.Footnote 20 This rhetorical and pragmatic definition enabled us elsewhere to highlight different characteristics of genomics research depending on the communities involved with a given genome: a strict separation of producers and users in the case of the human reference genome (García-Sancho, Leng, et al., 2022), different degrees of proximity and distance between yeast sequencing and particular research goals (García-Sancho, Lowe et al., 2022) and processes of bricolage or reuse of tools and resources that were deployed in the generation of the pig reference genome (Lowe, Leng, et al., 2022).
One conclusion arising from this community framework is that genomics can be regarded as a set of tools that enable groups of scientists to do different things and achieve different objectives with their target genomes (Lowe, García-Sancho, et al., 2022). Throughout the remaining seven chapters of this book, we propose the notion of yeast, human and pig genomicists as (often collective) subjects that make the history of genomics. In this process of construction, the genomicists mould their target genomes according to their necessities. They thus shape what these genomes represent and what they can do with them, sometimes quite consciously and deliberately.
This focus on communities of genomicists allows us to discern greater diversity and complexity in the history of genomics. In what follows, we show that yeast, human and pig genomicists have exhibited different mechanisms of inclusion and exclusion of particular sets of scientists and institutions. These have shaped each community differently and changed their compositions—and sometimes their roles—over time. The genomicists working on S. cerevisiae were relatively stable before, during and after the production of their reference genome, while in H. sapiens the leading genomicists of the early days were replaced by a different community based at specialist genome centres. For S. scrofa, the range of genomicists expanded, due to the convergence of a longstanding community of pig geneticists with practitioners from one of these specialist genome centres. These different trajectories further show that the history of genomics cannot be reduced to a single framework or periodisation.
Previous historiography has narrowly focused on a few, homogeneous genomicists: the participants in the HGP, recipients of the grants to determine the human reference genome and heads of the new institutions of genomics research: the genome sequencing centres. By looking at other less visible genomicists—those working on non-human organisms and beyond the HGP framework—we emphasise their agency as historical subjects and their capacity to pursue their own goals rather than following a teleological, pre-defined pathway. It is in the specificity of those goals and their agency in pursuing them where the interactions between the genomes and their communities occur and we identify trajectories and lineages that diverge from the canonical history of genomics. In other words, when a heterogeneous and inclusive array of genomicists is considered, genomics becomes something other than a static, retrospectively constructed field: it becomes a science (and history) in the making.
1.4 Outline of Chapters and Structure of Our Argument
The book is divided into three parts, comprising two chapters each. Taken collectively, these three parts de-centre the historiography of genomics: from a focus on H. sapiens; from an emphasis on the HGP; and, finally, from excessive attention to the determination of DNA sequences themselves (what we defined above as ‘thin sequencing’).Footnote 21 We achieve this by exploring genomic endeavours around yeast, human and pig—including their reference genome projects—that started in the mid-1980s and concluded towards the late-2010s.Footnote 22 The sources that have enabled us to reconstruct these endeavours are oral histories, published literature—including scientific, administrative and policy reports—and archival materials. For the oral histories, we approached individuals ranging from Nobel Prize-winning scientists to administrators, lower-profile researchers and those devising and running the infrastructures of genomics. Our archival sources include catalogued and uncatalogued collections, as well as grey literature (see Appendix A and Appendix B at the end of the book for a complete list). We have also found extant and archived web pages to be useful in reconstructing parts of the history of genomics that had a lower public profile and lack an extensive secondary literature concerning them.
Part I of the book addresses what we call the distributed model of genomics. It starts with an account of the determination of the reference sequence of yeast: a non-human genome project that ended in 1996, just before the scaling-up of the HGP. The yeast effort enables us to show a greater variety of institutions and ways of organising mapping and sequencing practices than the ones behind the production of the human reference genome. Chapter 2 documents how institutional and organisational diversity was especially manifest in the European Commission-funded Yeast Genome Sequencing Project, which was not intended to serve as a pilot for the HGP, as the NIH S. cerevisiae genome programme was.
Similarly, a focus on the collective and systematic mapping work that preceded the large-scale sequencing characteristic of the latter stages of the HGP reveals a variety of heterogeneous human genome programmes. As we argue in Chap. 3, the HGP was but one among those many programmes: its focus on the rapid, industrial production of a reference sequence of the whole human genome was a particular—and rather singular—characteristic that distinguished the HGP from the others. The other, non-HGP programmes were more collective and inclusive of existing communities of medical geneticists. In order to accelerate the production of the reference sequence, the IHGSC that conducted the later stages of the HGP sidelined a large proportion of human and medical genetics institutions from its operation, starting in 1996. Yet these human and medical genetics communities continued their genome efforts, thus forming trajectories that the canonical winner’s history of genomics overlooks.
Part II compares the production of the human reference genome with those of other species, especially the pig S. scrofa. Chapter 4 presents a main participant in the production of the human reference sequence: the Sanger Institute. Chapter 5 shows how this institution also played a major role in the subsequent sequencing of the pig genome that started in 2006, three years after the HGP was deemed concluded. At a first glance, the pig genome thus seems to be strongly modelled on the HGP. Yet, the broader history of pig genomics allows us to qualify that impression. If we take into account the early pig genome mapping work, started at the same time that the HGP was in the 1990s, we see that the scientific communities working on the agricultural genetics and immunogenetics of S. scrofa were intensely involved then and, unlike human and medical geneticists, continued to be. Indeed, institutions working on the genetics of pig immune response and traits relevant for selective breeding processes were important drivers and participants in the Swine Genome Sequencing Consortium that organised, managed and coordinated the reference genome work.
Taken together, Chaps. 4 and 5 continue the de-centring exercise that we started in Part I. In this case, the de-centring is not only due to our consideration of non-human species (pigs, as well as yeast) but also to our addressing of longer-term trajectories: considering genome mapping, as well as sequencing. We look at the sources of the DNA libraries from which the reference sequences were obtained and show that in both cases they were derived from a narrow pool of a few humans and pigs. Yet in the case of S. scrofa, the engagement of the early mapping communities in the sequencing operation eased the connection of the resulting reference genome with more general immunogenetic goals and the development of data and tools to aid the improvement of agriculturally-relevant breeds. These were the problems that motivated the mapping activity of pig genomicists before their involvement in whole-genome sequencing.
Part III comprises Chaps. 6 and 7. In it, we address a number of features that have been commonly attributed to post-genomics, such as connection of genomic data to other forms of biological data, and an attention to variation and diversity. We examine the annotation of reference genomes and other functional and systematic studies of sequence data. By the former, we mean the elucidation of the effects of particular genes and other genetic elements in the organism. By the latter, we mean the determination of patterns of variation within a given species or between species to inform, among other endeavours, evolutionary biology. We argue that our ‘thick sequencing’ approach—addressing the long-term processes by which DNA data become reference genomes—enables us to show that these practices have been deeply entangled throughout the whole history of genomics rather than necessarily following the completion of the HGP or any other reference sequence project.
Furthermore, in the case of the pig, the close involvement of the communities of immunogeneticists and agriculturally-oriented geneticists from the early days of genome mapping transformed annotation practices at the Sanger Institute into more collective and distributed endeavours. This paved the way to collaboration between two different communities of genomicists, one centred around the Sanger Institute and the other derived from the wider pig genetics community involved in mapping practices.
In our concluding Chap. 8, we explore the implications of our study beyond the realms of the history, philosophy and sociology of science. One of the preoccupations of science policymakers and funders in the wake of the HGP has been the notion of a ‘translational gap’ between the availability of masses of genome data and the exploitation of them, for example in effective new treatments or diagnostic tests in the clinic: ‘from bench to bedside’, as the slogan goes. We argue that this translational gap is an artifact of the particular configuration and history of the HGP: its model of concentrated production and the rigid division it implied between the producers of the reference sequence and the communities that would later use it in biomedical and clinical research. Other genomic endeavours that deployed more inclusive strategies show more immediacy and connection between the compilation of the data and its mobilisation towards particular goals. Our historical investigation thus illuminates ways of reducing the temporal, cognitive and conceptual distance between genomic data and user communities.
Dissatisfaction with reference genomes has given rise to new initiatives to represent genomic variation and to connect genomes to other forms of biological data and processes. As we show throughout, these qualms are based on trying to attribute particular functions to reference genomes and to make them carry weight that they were not designed or conceived for. Our book highlights that many of these problems stem from the contingent and historically-driven processes of reference genome construction. Without a historical reconstruction, these processes and their consequences on the resulting reference genomes are flattened and rendered invisible.
Notes
- 1.
https://www.gisaid.org/ (last accessed 29th November 2022). The COVID-19 Genomics UK (COG-UK) Consortium alone has sequenced over two million SARS-CoV-2 viruses: https://www.cogconsortium.uk/ (last accessed 29th November 2022).
- 2.
Elsewhere, lower figures have been indicated (https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost, last accessed 29th November 2022), though these may not include the full range of costs involved in all aspects of the sequencing process, including processing, storage and curation of the resulting data.
- 3.
RefSeq distinguishes “reference genomes”, “representative genomes” and “variant genomes”. Throughout, when we refer to reference genomes, we are referring to objects that are designated by RefSeq as “reference genomes” and “representative genomes”. When the distinction between these becomes relevant in our narrative, we will specify which RefSeq category we are referring to. For RefSeq, “reference genomes” are “manually selected ‘gold standard’” high-quality complete genomes. “Representative genomes” are designated standard genomes for a given species of organism, while “variant genomes” constitute “genome variations within the species” (Tatusova et al., 2014, p. 135).
- 4.
See https://www.ncbi.nlm.nih.gov/refseq/statistics/ (last accessed 29th November 2022).
- 5.
- 6.
Our central idea of entanglement between genomes and communities of genomicists expands arguments that we formulated elsewhere, such as the distinction between ‘thin’ and ‘thick’ sequencing (Lowe, 2018) and the existence of different ways of sequencing that affect the ontological status and affordances of the resulting sequence data (Leng et al., 2022).
- 7.
- 8.
22 pairs of non-sex chromosomes, and typically one pair of sex chromosomes, XX or XY, though numerous exceptions to these figures exist in humans, and the numbers and sets of distinct chromosomes differ in other organisms. The full complement of 46 chromosomes in humans is the diploid set. The meaning of ‘genome’ has, inevitably, shifted over time (Keller, 2011).
- 9.
For an explanation of how the manual and automated sequencing techniques work, see https://genomicsincontext.wordpress.com/dna-sequencing-and-its-history/dna-sequencing-from-manual-biochemistry-to-industrial-genomics/ (last accessed 29th November 2022).
- 10.
They are also often referred to as just genetic maps, or linkage maps. Yet, for clarity, we use the term genetic linkage map throughout the book.
- 11.
- 12.
Another key object in the use of genome libraries and genomics research more generally is the primer. Primers are DNA fragments designed to specifically attach to a sequence and trigger the amplification of a target genome region using the enzyme DNA polymerase. This enables researchers to obtain multiple copies of a particular stretch of DNA they seek to sequence, detect or otherwise investigate.
- 13.
As noted above, the name Human Genome Project and the acronym HGP are commonly used to refer to both the specific US programme and the later international initiative. In the remainder of this introductory chapter, our usage of HGP aligns to the latter sense: a coordinated effort that led to the production of the human reference sequence. Later in the book, and particularly in Chap. 3, we distinguish between the US human genome programme and later developments, designating the former as ‘US-HGP’ and differentiating it from the effort led by the International Human Genome Sequencing Consortium (the IHGSC endeavour).
- 14.
Some scholars of the life sciences query the novelty of the big science designation, drawing upon historical examples of large-scale coordinated endeavours that very much predated the HGP—and indeed the Manhattan Project—such as eighteenth-century voyages of discovery, surveys of the natural world, concerted ecological research programmes and networks of collection and information exchange—for example, associated with great museums, botanic gardens or around figures such as Charles Darwin (Aronova et al., 2010; Capshew & Rader, 1992; Strasser, 2019; Vermeulen, 2013). Others, while recognising that genomics does indeed constitute something new, highlight key differences between the way ‘Bigness’ manifests in the life sciences, in comparison to the physical or engineering sciences. The reasons for collaborating, and for forming networks and/or centralised facilities or resources, differ across the sciences, and even within the life sciences (Vermeulen, 2016; Vermeulen et al., 2013).
- 15.
- 16.
- 17.
Additionally, the hourglass model enables its proponents to unveil and scrutinise the tension between the desire to draw long-term lineages on the one hand and historicise and contextualise work in particular eras and domains on the other. One may want to trace the ways in which aspects of the upper part of the hourglass still survived and manifested in the neck, and were related to new developments in the lower part. But this should not come at the cost of equating twenty-first century interest in concepts of epigenetics with analogous examples of the ways that scientists connected organismal development and evolution in the late-nineteenth century (Barahona et al., 2010). This problem is not exclusive to historians: some scientists seek to draw historical parallels between their own interests and the ideas and practices of their predecessors (Scott Gilbert and Brian K. Hall are excellent examples of modern biologists interested in nineteenth century organismal development, see 1991 and 2009 respectively).
- 18.
Coverage relates to the depth of sequencing: how many times on average that any given nucleotide in the sequence has been determined. 2X coverage means that, on average, a nucleotide will have two data points, 5X, five, and so on. Higher numbers would be more likely to iron out any random errors, resulting in a higher quality sequence.
- 19.
The notion of “working worlds”, introduced by historian of science Jon Agar, helps us understand this entanglement between genomes and communities. Working worlds are spheres of activity that pose and frame particular problems, which scientists tackle by constructing and working with abstract representations (Agar, 2012, 2020). In this book, we consider the working worlds of medical geneticists that engage with real patients and the clinic, livestock geneticists that orient towards the needs of selective breeding for agriculture, and the development and use of one of a handful of model organisms in the biological sciences: yeast. What the reference genomes arising from these working worlds represent—and the problems that they are meant to address—varies significantly.
- 20.
In this way, the genome is analogous to the “epistemic space” that was opened up for heredity and its scientific investigation in the mid-nineteenth century, giving rise to the historical trajectory that has been analysed through the hourglass model (see above). While narrower than the space of heredity, the genome shares heredity’s “depend[ence] on a vast configuration of distributed technologies and institutions connected by a system of exchange” (Müller-Wille & Rheinberger, 2007, p. 25).
- 21.
On other de-centring exercises in the historiography of science, see Andrew Cunningham and Perry Williams’s work on the early-modern period (1993). They argue that what is now considered to be modern science did not emerge out of a single, sudden and epic event such as the so-called Scientific Revolution. Instead, there were a series of more gradual transformations that, over the sixteenth and seventeenth centuries, led to forms of knowledge-production more in line with our current understanding of science. A similar argument can be made with the HGP: however revolutionary and epic this event is presented, it does not in itself fully capture the emergence of genomics research.
- 22.
Our choice of these three species is necessarily selective, but as noted above encompasses different kinds of organisms used in distinct domains. Likewise, we have had to be selective in the choice of genomic projects and geographical scope concerning these species. Our focus on international initiatives—particularly those supported by the European Commission—has allowed us to provide an overview of the history of genomics that involves many different countries. In spite of this, further research on other species and geographical settings—most pressingly, Asia—would be valuable to complement and develop the arguments and perspectives that we raise in this book.
References
Agar, J. (2012). Science in the twentieth century and beyond. Polity Press.
Agar, J. (2020). What is science for? The Lighthill report on artificial intelligence reinterpreted. The British Journal for the History of Science, 53(3), 289–310.
Arenas, M., Pereira, F., Oliveira, M., Pinto, N., Lopes, A. M., Gomes, V., et al. (2017). Forensic genetics and genomics: Much more than just a human affair. PLoS Genetics, 13(9), e1006960.
Aronova, E., Baker, K. S., & Oreskes, N. (2010). Big science and big data in biology: From the international geophysical year through the international biological program to the Long Term Ecological Research (LTER) Network, 1957–Present. Historical Studies in the Natural Sciences, 40(2), 183–224.
Atkinson, P., Glasner, P., & Greenslade, H. (2007). New genetics, new identities. Routledge.
Barahona, A., Suárez-Díaz, E., & Rheinberger, H.-J. (2010). The hereditary hourglass. Genetics and epigenetics, 1868–2000. Max Planck Institute for the History of Science Preprint 392. Retrieved December 4, 2022, from https://www.mpiwg-berlin.mpg.de/sites/default/files/Preprints/P392.pdf
Barnes, B., & Dupré, J. (2008). Genomes and what to make of them. The University of Chicago Press.
Bartlett, A. (2008). Accomplishing sequencing the human genome. PhD dissertation, Cardiff University.
Botstein, D., White, R. L., Skolnick, M., & Davis, R. W. (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. American Journal of Human Genetics, 32(3), 314–331.
Brownlee, G. G. (2014). Fred Sanger – Double Nobel Laureate: A biography. Cambridge University Press.
Capshew, J. H., & Rader, K. A. (1992). Big science: Price to the present. Osiris, 7, 2–25.
Collins, F. S., Morgan, M., & Patrinos, A. (2003). The Human Genome Project: Lessons from large-scale biology. Science, 300, 286–290.
Comfort, N. (2012). The science of human perfection: How genes became the heart of American medicine. Yale University Press.
Cook-Deegan, R. (1994). The gene wars: Science, politics, and the human genome. W. W. Norton and Company.
Cunningham, A., & Williams, P. (1993). De-centring the ‘big picture’: The Origins of Modern Science and the modern origins of science. The British Journal for the History of Science, 26, 407–432.
Davies, K. (2001). The sequence: Inside the race for the human genome. Weidenfeld & Nicolson.
de Chadarevian, S. (2002). Designs for life: Molecular biology after World War II. Cambridge University Press.
de Chadarevian, S. (2020). Heredity under the microscope: Chromosomes and the study of the human genome. The University of Chicago Press.
Dreger, A. D. (2000). Metaphors of morality in the Human Genome Project. In P. Sloan (Ed.), Controlling our destinies: Historical, philosophical, ethical, and theological perspectives on the Human Genome Project (pp. 155–184). University of Notre Dame Press.
Fortun, M. (1999). Projecting speed genomics. In M. Fortun & E. Mendelsohn (Eds.), The practices of human genetics (pp. 25–48). Kluwer Academic.
Fortun, M. (2006). Celera Genomics: The race for the human genome sequence. In A. Clarke & F. Ticehurst (Eds.), Living with the genome: Ethical and social aspects of human genetics (pp. 27–32). Palgrave Macmillan.
Gannett, L. (2019). The Human Genome Project. In Zalta, E. N. (Ed.), The Stanford Encyclopedia of Philosophy (Winter 2019 Edition). Retrieved December 4, 2022, from https://plato.stanford.edu/archives/win2019/entries/human-genome
García-Sancho, M. (2010). A new insight into Sanger’s development of sequencing: From proteins to DNA, 1943–1977. Journal of the History of Biology, 43(2), 265–323.
García-Sancho, M. (2012). Biology, computing and the history of molecular sequencing: From proteins to DNA, 1945–2000. Palgrave Macmillan.
García-Sancho, M., Leng, R., Viry, G., Wong, M., Vermeulen, N., & Lowe, J. W. E. (2022). The Human Genome Project as a singular episode in the history of genomics. Historical Studies in the Natural Sciences, 52(3), 320–360.
García-Sancho, M., Lowe, J. W. E., Viry, G., Leng, R., Wong, M., & Vermeulen, N. (2022). Yeast sequencing: ‘Network’ genomics and institutional bridges. Historical Studies in the Natural Sciences, 52(3), 361–400.
Gaudillière, J.-P., & Rheinberger, H.-J. (2004). From molecular genetics to genomics: The mapping cultures of twentieth-century genetics. Routledge.
Gilbert, S. F. (1991). A conceptual history of modern embryology. Springer.
Glasner, P. (2002). Beyond the genome: Reconstituting the new genetics. New Genetics and Society, 21, 267–277.
Glasner, P., & Rothman, H. (Eds.). (1998). Genetic imaginations: Ethical, legal and social issues in human genome research. Routledge.
Green, E., Guyer, M., & National Human Genome Research Institute. (2011). Charting a course for genomic medicine from base pairs to bedside. Nature, 470, 204–213.
Griffiths, P., & Stotz, K. (2013). Genetics and philosophy: An introduction. Cambridge University Press.
Guttinger, S., & Dupré, J. (2016). Genomics and postgenomics. In Zalta, E. N. (Ed.), The Stanford Encyclopedia of Philosophy (Winter 2016 Edition). Retrieved December 4, 2022, from https://plato.stanford.edu/archives/win2016/entries/genomics
Hall, B. K. (2009). Tapping many sources: The adventitious roots of evo-devo in the nineteenth century. In M. D. Laubichler & J. Maienschein (Eds.), From embryology to evo-devo: A history of developmental evolution (pp. 467–498). The MIT Press.
Harwood, J. (1993). Styles of scientific thought: The German genetics community, 1900–1933. The University of Chicago Press.
Hilgartner, S. (2013). Constituting large-scale biology: Building a regime of governance in the early years of the Human Genome Project. BioSocieties, 8, 397–416.
Hilgartner, S. (2017). Reordering life: Knowledge and control in the genomics revolution. The MIT Press.
Hogan, A. J. (2016). Life histories of genetic disease: Patterns and prevention in postwar medical genetics. Johns Hopkins University Press.
Hutchison, C. A., III. (2007). DNA sequencing: Bench to bedside and beyond. Nucleic Acids Research, 35(18), 6227–6237.
International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature, 409, 860–921.
Keating, P., Cambrosio, A., & Nelson, N. C. (2016). “Triple negative breast cancer”: Translational research and the (re)assembling of diseases in post-genomic medicine. Studies in History and Philosophy of Biological and Biomedical Sciences, 59, 20–34.
Keller, E. F. (2000). The century of the gene. Harvard University Press.
Keller, E. F. (2011). Genes, genomes, and genomics. Biological Theory, 6, 132–140.
Kevles, D., & Hood, L. (Eds.). (1992). The code of codes: Scientific and social issues in the Human Genome Project. Harvard University Press.
Knorr-Cetina, K. (1999). Epistemic cultures: How the sciences make knowledge. Harvard University Press.
Kohler, R. (1994). Lords of the fly: Drosophila genetics and the experimental life. The University of Chicago Press.
Kuska, B. (1998). Beer, Bethesda, and biology: How “Genomics” came into being. JNCI: Journal of the National Cancer Institute, 90(2), 93.
Lederberg, J. (2001). ‘Ome Sweet’ Omics – A genealogical treasury of words. The Scientist, (April 2001).
Leng, R., Viry, G., García-Sancho, M., Lowe, J., Wong, M., & Vermeulen, N. (2022). The sequences and the sequencers: What can a mixed-methods approach reveal about the history of genomics? Historical Studies in the Natural Sciences, 52(3), 277–319.
Lenoir, T., & Hayes, M. (2000). The Manhattan Project for biomedicine. In P. R. Sloan (Ed.), Controlling our destinies: Historical, philosophical, ethical, and theological perspectives on the Human Genome Project (pp. 29–62). University of Notre Dame Press.
Lindee, M. S. (2005). Moments of truth in genetic medicine. Johns Hopkins University Press.
Loenen, W. A., Dryden, D. T., Raleigh, E. A., Wilson, G. G., & Murray, N. E. (2014). Highlights of the DNA cutters: A short history of the restriction enzymes. Nucleic Acids Research, 42(1), 3–19.
Lowe, J. W. E. (2018). Sequencing through thick and thin: Historiographical and philosophical implications. Studies in History and Philosophy of Biological and Biomedical Sciences, 72, 10–27.
Lowe, J. W. E., & Bruce, A. (2019). Genetics without genes? The centrality of genetic markers in livestock genetics and genomics. History and Philosophy of the Life Sciences, 41, 50.
Lowe, J. W. E., García-Sancho, M., Leng, R., Wong, M., Vermeulen, N., & Viry, G. (2022). Across and within networks: Thickening the history of genomics. Historical Studies in the Natural Sciences, 52(3), 443–475.
Lowe, J. W. E., Leng, R., Viry, G., Wong, M., Vermeulen, N., & García-Sancho, M. (2022). The bricolage of pig genomics. Historical Studies in the Natural Sciences, 52(3), 401–442.
Maxson Jones, K., Ankeny, R. A., & Cook-Deegan, R. (2018). The Bermuda triangle: The pragmatics, policies, and principles for data sharing in the history of the Human Genome Project. Journal of the History of Biology, 51(4), 693–805.
McKusick, V. A., & Ruddle, F. H. (1987). A new discipline, a new name, a new journal. Genomics, 1, 1–2.
Morange, M. (2020). The black box of biology: A history of the molecular revolution. Harvard University Press.
Moss, L. (2003). What genes can’t do. The MIT Press.
Müller-Wille, S., & Rheinberger, H.-J. (2007). Heredity–The formation of an epistemic space. In S. Müller-Wille & H.-J. Rheinberger (Eds.), Heredity produced: At the crossroads of biology, politics, and culture, 1500–1870 (pp. 3–34). The MIT Press.
Müller-Wille, S., & Rheinberger, H.-J. (2012). A cultural history of heredity. The University of Chicago Press.
O’Malley, M. A., Martin, W., & Dupré, J. (2010). The tree of life: Introduction to an evolutionary debate. Biology & Philosophy, 25, 441–453.
Ostell, J. (2013). What’s in a Genome at NCBI? In The NCBI handbook 2nd edition. National Center for Biotechnology Information (US).
Pääbo, S. (2014). Neanderthal man: In search of lost genomes. Basic Books.
Pickstone, J. V. (2000). Ways of knowing: A new history of science, technology and medicine. Manchester University Press.
Rasmussen, N. (2014). Gene jockeys: Life science and the rise of biotech enterprise. Johns Hopkins University Press.
Rheinberger, H.-J., & Gaudillière, J.-P. (2004). Classical genetic research and its legacy: The mapping cultures of twentieth-century genetics. Routledge.
Rheinberger, H.-J., & Müller-Wille, S. (Trans. A. Bostanci). (2017). The gene: From genetics to postgenomics. The University of Chicago Press.
Riesenfeld, C. S., Schloss, P. D., & Handelsman, J. (2004). Metagenomics: Genomic analysis of microbial communities. Annual Review of Genetics, 38, 525–552.
Schwarze, K., Buchanan, J., Fermont, J. M., Dreau, H., Tilley, M. W., Taylor, J. M., et al. (2020). The complete costs of genome sequencing: A microcosting study in cancer and rare diseases from a single center in the United Kingdom. Genetics in Medicine, 22, 85–94.
Shapin, S., & Thackray, A. (1974). Prosopography as a research tool in history of science: The British scientific community 1700–1900. History of Science, 12(1), 1–28.
Sloan, P. R. (2000). Controlling our destinies: Historical, philosophical, ethical, and theological perspectives on the Human Genome Project. University of Notre Dame Press.
Stevens, H. (2013). Life out of sequence: A data-driven history of bioinformatics. The University of Chicago Press.
Stevens, H. (2018). Globalizing genomics: The origins of the International Nucleotide Sequence Database Collaboration. Journal of the History of Biology, 51, 657–691.
Strasser, B. J. (2011). The experimenter’s museum: GenBank, natural history, and the moral economies of biomedicine. Isis, 102(1), 60–96.
Strasser, B. J. (2019). Collecting experiments: Making big data biology. The University of Chicago Press.
Suárez-Díaz, E. (2010). Making room for new faces: Evolution, genomics and the growth of bioinformatics. History and Philosophy of the Life Sciences, 32(1), 65–90.
Szymanski, E., Vermeulen, N., & Wong, M. (2019). Yeast: One cell, one reference sequence, many genomes? New Genetics and Society, 38, 430–450.
Tatusova, T., Ciufo, S., Fedorov, B., O’Neill, K., Tolstoy, I., & Zaslavsky, L (2014). About prokaryotic genome processing and tools. In The NCBI handbook 2nd edition. National Center for Biotechnology Information (US).
Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A., et al. (2004). Environmental genome shotgun sequencing of the Sargasso Sea. Science, 304(5667), 66–74.
Vermeulen, N. (2013). From Darwin to the census of marine life: Marine biology as big science. PLoS ONE, 8(1), e54284.
Vermeulen, N. (2016). Big Biology. NTM Zeitschrift für Geschichte der Wissenschaften, Technik und Medizin, 24, 195–223.
Vermeulen, N., Parker, J. N., & Penders, B. (2013). Understanding life together: A brief history of collaboration in biology. Endeavour, 37(3), 162–171.
Winther, R. G. (2020). When maps become the world. The University of Chicago Press.
Yi, D. (2015). The recombinant university: Genetic engineering and the emergence of Stanford biotechnology. The University of Chicago Press.
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
García-Sancho, M., Lowe, J. (2023). Introduction. In: A History of Genomics across Species, Communities and Projects. Medicine and Biomedical Sciences in Modern History. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-06130-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-06130-1_1
Published:
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-031-06129-5
Online ISBN: 978-3-031-06130-1
eBook Packages: HistoryHistory (R0)