In 2006, two years after the human reference genome was deemed ‘completed’, one of its key contributors, the Sanger Institute, became involved in another initiative to produce a full reference sequence: the Swine Genome Sequencing Project (SGSP). This project drew upon a variety of different funders and contributing laboratories, and was led by the Swine Genome Sequencing Consortium (SGSC), which involved many institutions that had conducted pig genome mapping in the 1990s. The SGSC designated the Sanger Institute as the large-scale centre that would conduct most of the sequencing effort: determining the 2.7 billion nucleotides of the reference genome of the pig Sus scrofa, slightly smaller than that of Homo sapiens. By the time of the start of the SGSP, the Sanger Institute had moved from its original, provisional accommodation to a purpose-built facility in what had then become the Wellcome Genome Campus in Hinxton, Cambridgeshire. At this new location, it had dramatically increased its sequencing capacity and become one of the most productive genome centres worldwide. By the mid-2000s, the continuous decline of sequencing costs due to improved instrumentation, more experienced personnel, and ever-refined pipelines and modes of organisation, enabled a single genome centre—in this case the Sanger Institute—to complete a large reference sequence without needing to ally with other genome centres.

The SGSP evolved from prior genome mapping programmes on S. scrofa. Different institutions seeking to locate genes and markers on pig chromosomes for a variety of purposes—from agricultural breeding to immunology and transplantation biology—converged in coordinated swine mapping efforts between the late-1980s and early-1990s. Some of them were conducted within a single country, including ones supported by the US Department of Agriculture (USDA): for instance, the in-house (intramural) mapping operation at the USDA Agricultural Research Service Meat Animal Research Center (USDA MARC). Based in Clay Center (Nebraska), USDA MARC operated with a factory-based model of mapping, analogous to the large-scale genomics facilities that the National Institutes of Health (NIH) and the US Department of Energy (DoE) were instituting (Chap. 4).

Another major effort sponsored by the USDA was the Pig Genome Coordination Program (PGCP), launched in 1993 under the leadership of Iowa State University animal geneticist Max Rothschild as part of the National Animal Genome Research Program. The PGCP was—and is—an extramural programme, and as such was conducted under the auspices of the USDA’s Cooperative State Research, Education and Extension Service (CSREES) from 1994 to 2009.Footnote 1 The PGCP has performed a coordinating and community-building function, funding and distributing shared resources such as mapping tools, contributing to the development of other community resources such as mapping databases, and helping to forge collaborations both within the USA and beyond. As in the contemporary UK Human Genome Mapping Project (Chap. 3), the USDA also disbursed grants to individual researchers and laboratories seeking to map areas of the pig genome.

Other swine genome programmes were funded by transnational institutions, such as the European Commission (EC). This was the case for the Pig Gene Mapping Project (PiGMaP), which by the close of its second iteration had established a network of 29 laboratories coordinated by the Roslin Institute in Scotland.Footnote 2 Between 1991 and 1996, these laboratories pooled and exchanged data and materials—such as DNA samples from carefully-bred reference families of pigs—to generate genetic markers, assign them to specific chromosomes and map their positions (Archibald et al., 1995; Yerle et al., 1995). The purpose of mapping such markers was to provide signposts to researchers so that they could narrow down the location of genes or other functionally-relevant regions in chromosomes.

As with the Human Genome Analysis Programme and unlike the Yeast Genome Sequencing Project—both of them also sponsored by the EC’s early Framework Programmes (Chap. 2)—PiGMaP was mainly focused on mapping the chromosomes of S. scrofa and did not seek to determine its full genome sequence. Indeed, the participants were quite adamant that at that point this was neither a feasible nor necessary task. The community was well aware that the number of mapped markers and genes at the outset of the 1990s was tiny, and that further populating those maps with additional markers was the immediate priority. This would enable more refined mapping based on the landmarks provided by these initial maps.

Throughout the 1990s, the community of pig genomicists that had formed around the mapping efforts continued to produce ever-more refined maps, including those using new kinds of markers and mapping methods. Developing the means by which these maps and mapping data could be exploited for a range of possible applications was a major focus. Completely integrated genetic linkage or physical maps were never produced in this period, in part because the primary interest of the community was in generating useful data rather than complete maps. But some integrated maps were developed. Significantly, one brought together the USDA MARC efforts with the growing alliance of PiGMaP and the network of institutions in its orbit, including some American institutions (e.g. for chromosome 6, Paszek et al., 1995).

At the turn of the millennium, these communities had not pursued significant sequencing of large stretches of the pig genome, with most sequencing efforts instead directed towards the focused characterisation of particular genes and their neighbourhoods. The funding that the pig mappers had access to was not sufficient for a whole-genome sequencing effort like the one that was being undertaken by the International Human Genome Sequencing Consortium (IHGSC). The immediate research needs of the pig genomicists did not require a reference sequence. This, as we show below, changed in the space of a few years. So did the wider situation in genomics, which made the prospects of sequencing the genome of a livestock species like S. scrofa more realistic and worthwhile for the community.

Individual researchers and laboratories, as well as the community as a whole, pursued a variety of different avenues of potential support and funding. This drew upon strategies of diversification and enabled different pots of funding to be accessed for particular tasks that could contribute towards the wider effort of sequencing the genome (Lowe, 2018). Like the IHGSC, the SGSC was supported by national public funding agencies—among them the USDA, the UK’s Biotechnology and Biological Sciences Research Council (BBSRC) and the Danish Government—but also sub-national administrations, such as funders from specific US states, as well as industry bodies. The Sanger Institute operated as a contractor for the community, drawing largely on funds acquired from the USDA. The relationship between the Sanger Institute and the community was far more integrated than such an arrangement might suggest, though, with both parties working together on defining the sequencing effort and shaping its products. The SGSC made the S. scrofa sequence data available in the global, open-access databases in 2009 and described the sequence in Nature in 2012 (Groenen et al., 2012).

The prominent role of the Sanger Institute in the SGSP and the sequencing of the human genome suggests that the production of the swine reference sequence was configured in a similar manner to the IHGSC-led project.Footnote 3 At first glance, both initiatives seem to have emerged from the formation of selective groupings and the channelling of several lines of funding into the concentrated and comprehensive production of a whole-genome sequence. The SGSP would appear to be even more concentrated and narrow than the human genome project. The development of technologies and fall of associated costs made the funnelling effect of the large-scale sequencing model more pronounced: one genome centre undertook the sequencing of the whole pig genome, while for the human one the Sanger Institute needed to pool its efforts with nineteen other institutions. The pig genome endeavour was also deeply informed by the experiences of the human genome sequencing that had preceded it. Yet a more detailed examination of the historicity of both reference genomes shows that the communities involved in the prior swine mapping programmes were much more represented in the SGSC than human and medical geneticists were in the IHGSC, and more heavily involved in shaping the reference genome that was produced.Footnote 4 In other words, when the trajectories of the communities involved in pig mapping are taken into account and the emphasis is not placed exclusively on large-scale sequencing, the funnelling effect caused by the advent of reference genome sequencing is less pronounced in pig genomics than in human. Indeed, some of the diversity of actors, practices and modes of organisation of the mapping phase survived during the production of the reference sequence of S. scrofa.

This chapter explores the means by which the pig mappers remained involved in the production of the reference sequence. In line with earlier parts of the book and prior scholarship (Szymanski et al., 2019), we show this by portraying the genome as a rhetorical and practical space in which pre-existing communities involved in DNA mapping and sequencing could converge or fragment. The next section of this chapter documents how the S. scrofa genome, as an object to be mapped, fostered an alliance dominated by animal geneticists oriented towards the problems of the animal breeding industry with whom they had regular contact. This community also included immunogeneticists pursuing research on the potential use of the pig as a source of organs for human transplantation. Many of the immunogenetics researchers were themselves institutionally associated with the agriculturally oriented animal geneticists. A substantial fraction of these animal geneticists were also interested in what we describe as systematic research, meaning an appreciation of diversity and evolutionary relationships. This line of research ran parallel to mapping and sequencing from the mid-1990s onwards, and as we address in Chap. 7, led to new collaborators participating in the pig genome community following the release of the reference genome.

The alliance of animal geneticists and immunogeneticists drove the production of successive genomic resources, methods and tools from the 1990s onwards. A key example of this was the creation of comprehensive libraries of DNA fragments, which would be used in the concerted physical mapping of the pig genome and its subsequent sequencing. As the IHGSC had done a decade earlier, the SGSC commissioned a specialist laboratory to construct some libraries that were used for the reference genome effort. These were produced by the same team led by Pieter de Jong that had assisted the human sequencers (Chap. 4). Yet in the case of S. scrofa, other libraries created by the pig mappers acted as additional DNA sources for the reference genome and were thus repurposed from their original agricultural and immunogenetic goals.Footnote 5

We conclude by observing that the previous trajectories of the pig genomicists, and their redeployment of tools and resources, made them acutely aware of the affordances and limitations of their reference sequence. Similar to the case of yeast (Chap. 7), they were cognisant of what variation was included in—and excluded from—their reference sequence. This allowed them to appropriately interpret what the reference sequence represented, and consequently to generate new genomic resources linked to it to compensate for the variation known—or reasonably suspected—to be absent. The pig reference sequence, however, differed from the Saccharomyces cerevisiae one, in being a conglomeration of DNA from different breeds and populations as opposed to being sourced from a single yeast strain (Chap. 2). Consequently, it was conceived more as a provisional resource than something definitive, reflecting satisficing disposition of the pig community and the kinds of research purposes that they conceived that their data could contribute towards (on ‘satisficing’, see Wimsat, 2007).

1 Mapping Markers and the Uses of Pig Genomics

In the 1990s, mapping the pig genome and finding ways to use the data they produced became a key task of the community of institutions and researchers that investigated the genetics of the pig. A substantial part of this community was oriented towards the problems of livestock breeding. As this had, prior to the 1990s, been dominated by quantitative genetics, pig genome mapping represented an intersection between the newer molecular genetics and the long-established quantitative genetic tradition. The latter involved formulating statistical approaches to enable breeders to make use of the plethora of data on a multitude of traits of interest to farmers—such as litter size or lean meat content—to inform selective breeding decisions for populations of farm animals. These breeding programmes were and are conducted by private sector breeding companies (such as the Pig Improvement Company, or PIC, which we encounter many times in the rest of the book), farmers’ cooperatives or state organisations. From the 1980s onwards, there has been a shift away from publicly-funded research institutions conducting many aspects of the breeding process, and towards these bodies concentrating on providing the scientific basis and data to inform private sector breeders (Agar, 2019, Ch. 3; Myelnikov, 2017).

The advent of genomics in the late-1980s provided an opportunity for agriculturally-oriented research institutions to recalibrate their work in this way. These institutions could now produce genomic data and statistical and computational tools for their potential application in breeding programmes, with the breeders themselves taking on the further development and incorporation of these data and tools in their own operations.

In the case of S. scrofa, as with other farm animal species, the 1990s represented a period in which maps became ever more populated with increasing numbers—and new kinds—of genetic markers: Restriction Fragment Length Polymorphisms, Amplified Fragment Length Polymorphisms, minisatellites and microsatellites, to name a few of the most significant (Table 5.1).

Table 5.1 Descriptions of four main types of genetic markers used in pig genome mapping. Adapted by both authors from Lowe and Bruce (2019)

New mapping assignments were made, databases for storing mapping and related data were developed, and statistical and computational tools were constructed for the detection of chromosomal loci associated with variation in traits of interest: Quantitative Trait Loci, or QTL. These loci were normally markers laying nearby genes. They could also be genes themselves, or parts thereof. Initial mapping relied on the extraction of DNA from cross-bred reference families of pigs, with DNA samples distributed across many laboratories in collaborative projects. PiGMaP, and the other national and international mapping collaborations, enabled a coordination—and in some respects, a division—of labour that made use of the capabilities and resources of particular laboratories to contribute to common resources such as maps and mapping databases. This was vital in a community where, USDA MARC apart, no one institution possessed the capacity to take on the tasks of genome mapping alone. Alan Archibald at the Roslin Institute was an instrumental figure in brokering these collaborative projects on the European side and in linking up the European efforts with US groups.Footnote 6

At this stage, there was no conception of producing a reference sequence—or even of mapping the whole pig genome—on the part of most pig geneticists. One reason for this was the increasingly difficult funding environment that this community had endured since the 1980s. The decreasing economic and social importance of agriculture had led most Western governments to expect that industry would become the main funder of food-related research. State support was channelled towards projects and tools that held promise for achieving more efficiency—rather than more quantity—in food production, such as genetic engineering (García-Sancho & Myelnikov, 2019). As a result, the pig genomics community developed a suite of approaches to use and adapt data, knowledge, methods and tools produced for the genomes of species such as human and mouse, which had a longer and more-established history of mapping (Hogan, 2016; Lyon, 2002; Paigen, 2003a; Paigen, 2003b; Rader, 2004) and more resources than were available to pig geneticists. The development of infrastructures and data for the genomes of other species, such as the human, therefore became a key resource for pig genomics, and a comparative inferential apparatus was articulated to make full use of it (Lowe, 2022).

Yet achieving an equivalent level of resolution to the human or mice maps was not an objective of pig geneticists per se nor an inevitable outcome of their activities. This was due to their predominantly agricultural orientation as opposed to the biomedical goals of most human and mouse geneticists. For the majority of researchers mapping S. scrofa, as well as their associates in the breeding industry, the identification of particular genes with known biological mechanisms and phenotypic effects was desirable, but not essential. In the early-to-mid 1990s, it was presumed that knowledge of the presence or absence of particular markers known to be linked to a locus associated with variation in traits would be sufficient for improving the effectiveness of selective breeding. The goals of the research did not, therefore, require that the molecular genetic basis of observed phenotypic variation be discerned, contrary to the needs of medical genetics research where this is imperative (Lowe & Bruce, 2019). Because of this, comprehensive sequence data was not seen as a necessity for informing breeding decisions, in the same way that it was felt to be a key resource that would radically advance the understanding of the genetic basis of disease—to cite the justification of the likes of James Watson for completely sequencing the human genome—or the identification and characterisation of genes responsible for key cellular processes: André Goffeau’s motivation for sequencing the yeast genome (Chaps. 2 and 3).

Several developments around the turn of the millennium changed this perspective. First of all, the maps were becoming extremely well-stocked with markers of different kinds, and were arrayed across the chromosomes at increasingly higher resolutions. The payoffs from incremental improvements to these maps were therefore diminishing. It also became increasingly apparent that using a panel of even dozens of markers linked to variation in traits that breeders wanted to select for was not yielding results that matched the high expectations some had for this approach.Footnote 7 Soon, statistical models were articulated by quantitative geneticists that required the use of many magnitudes more markers across the genome, an approach known as genomic selection (Haley & Visscher, 1998; Meuwissen et al., 2001). A particular kind of marker, abundant and available across the genome, the Single Nucleotide Polymorphism or SNP, was particularly valuable for this approach. Whole-genome sequencing efforts were a good source of the data that was needed for the identification of these.

Another significant area where it was becoming increasingly apparent that a fully sequenced genome would be valuable was in research on the immunogenetics of the pig. This was tied to the decades-long history of using the pig as a model for transplantation research and surgery, and more recently as a potential source of organs for humans—xenotransplantation. Researchers working in this field had been mapping the Major Histocompatibility Complex (MHC), a region in chromosome 7 of pigs (and chromosome 6 in humans) that is densely populated with genes involved in immune response. Incompatibilities between the products of different genes—and versions of genes—in this genome region are the cause of adverse reactions leading to the immune rejection of a transplanted organ. Identifying these genes and their different variants is therefore a crucial task for effecting transplantations, both within species and across them.

The mapping of the swine and human MHCs—since the 1970s and 1960s, respectively—was an extremely tricky task given how densely packed and highly variable the genes are in this region (on the history of human MHC research, see: Thorsby, 2009). In the 1990s, the task of using pig organs for transplantation was complicated by the discovery of retroviruses embedded in the pig’s DNA—Porcine Endogenous RetroViruses, or PERVs. It was feared that viruses could become activated if pig organs were transplanted into immuno-compromised humans who had not co-evolved with the viruses like the pigs had. For these reasons, it became imperative to sequence the pig genome: to further characterise the swine MHC and its differences to the human MHC and to assess the presence or absence of PERVs (Rohrer et al., 2002).Footnote 8

Immunogenetics was thus one area of research that motivated the creation of a pig genomic library, a set of S. scrofa sequence fragments stored in the DNA of microorganisms such as viruses, yeast and bacteria. The natural proliferation processes of these vectors were used to clone and multiply the pig DNA fragments. First a Yeast, and then a Bacterial Artificial Chromosome library (a YAC and a BAC) of S. scrofa were constructed by a team at Laboratoire Mixte CEA-INRA de Radiobiologie Appliquée (hereafter, CEA-INRA).Footnote 9 This institution was based on the campus of the Institut National de la Recherche Agronomique (INRA) in Jouy-en-Josas, south-west of Paris.

CEA-INRA was originally set up in 1964 with funding from two state agencies: INRA, the multi-branch French agricultural research body, and the French atomic energy agency (Commissariat à l'Énergie Atomique; CEA).Footnote 10 It was led by Marcel Vaiman from its inception and pursued research on the genetics of immune response in the pig, in order to improve the efficacy of transplantations of organs. Another early member was Christine Renard, who joined at the outset of the 1970s and developed serological methods for immunological analysis (see below). Patrick Chardon joined the team in the 1970s and Claudine Geffrotin in the 1980s. They both implemented new molecular biology-based approaches in the group. A key addition in the 1990s was Claire Rogel-Gaillard, who was vital in developing and deploying the new genome libraries (Fig. 5.1).

Fig. 5.1
A photograph of four people. Two women stand in between two men.

Picture of four key members of the CEA-INRA team over the course of its history from 1964. From left to right: Marcel Vaiman, Claire Rogel-Gaillard, Christine Renard and Patrick Chardon. Photograph taken by James Lowe, Paris, November 2017

The team’s early research led to the successful development of the pig as a surgical model for transplantation procedures. Researchers at CEA-INRA co-discovered the pig’s MHC (the Swine Leucocyte Antigen complex, SLA) in 1970, and then went on to pioneer the mapping—and later sequencing—of this region. Initially, this was achieved by serological methods, a core immunology technique that uses immune reactions between antibodies in blood serum and white blood cells from a different individual as a mapping indicator.Footnote 11 CEA-INRA was a participant in PiGMaP from the start of the project in the early-1990s. In it, they performed flow cytometry, a technique that sorts chromosomes and therefore aided the mapping of markers to specific pig chromosomes. They also serologically analysed pigs from reference families across Europe that were used in the mapping, as well as developing tools for the further characterisation of chromosome 7.

Through this, CEA-INRA used the funding and networking opportunities of PiGMaP to advance their ongoing survey of the SLA complex by employing physical mapping techniques.Footnote 12 This mapping endeavour involved the creation of DNA libraries and the use of probes to identify coding sequences. This work was conducted in the first year of the second round of PiGMaP, which ran from December 1994 to November 1996. The CEA-INRA team created a library using Yeast Artificial Chromosomes (YACs) as vectors. Here we focus on the source of DNA for this, the ways in which the creators of the libraries evaluated them, and the uses to which they were put. We then describe how and why they created DNA libraries in Bacterial Artificial Chromosomes (BACs), showing how they became community resources that aided the mapping of increasingly larger areas of the pig genome, as well as other forms of genome analysis.

For their library construction, the CEA-INRA workers drew on techniques used by a group led by Daniel Cohen in Paris, who constructed YAC libraries to contain clones of the MHC in H. sapiens: the Human Leucocyte Antigen complex (HLA). They had already collaborated with Cohen, who was a former student of Jean Dausset. Dausset had discovered the HLA, for which he won the Nobel Prize, and his team had been working with the CEA-INRA group since 1968. With Dausset, Cohen was a co-founder of the Centre for the Study of Human Polymorphism (Centre d'Etude du Polymorphisme Humain; CEPH), the institution from which Généthon had arisen in 1990 with funding from the French Muscular Dystrophy Association (Association Française contre les Myopathies). Généthon was founded to systematise their attempts at mapping the loci of different genetic diseases. This new institutional base had enabled Cohen to scale up from the HLA to the whole human genome: as we discussed in previous chapters, Généthon was a leading institution during the early stages of human genomics and produced the first comprehensive linkage and physical maps using high-throughput automated approaches (Chaps. 3 and 4; Kaufmann, 2004). In assisting in the construction of pig libraries to aid in the mapping of the SLA region, Cohen also contributed towards the scaling up to the eventual tackling of the whole pig genome.

To produce their pig library, the CEA-INRA group extracted DNA from peripheral blood lymphocytes, a kind of white blood immune cell, from two boars (males) of the Large White breed. The laboratory had long used Large White pigs in their immunogenetic research, dating back to the 1960s. A hardy and adaptable pink-skinned pig that is amenable to crossbreeding in livestock breeding programmes, it was also an internationally prevalent breed for commercial food production. The very thing that had made it useful for agriculture therefore also made it useful for conducting and applying pig genetics research. For instance, it was used in the crossing experiments of PiGMaP as well as in the production of the CEA-INRA YAC library.

The boars used for this library each had a distinct homozygous SLA haplotype, meaning that the genes making up the haplotype (see note 11) were the same on both strands of DNA. The construction of this library rested on decades of prior mapping of the SLA complex to determine these haplotypes: sets of specific combinations of genetic variants. This mapping first used serological methods combined with cytogenetic techniques, and then from the 1980s involved genomic approaches. An early example of the latter was an experiment, published in 1985, in which the CEA-INRA team applied restriction enzymes to pig DNA samples and hybridised the resulting fragments to human cDNA probes acquired from CEPH. They showed that this technique had greater specificity than serological analysis, revealing different haplotypes within ones that serological methods had identified as the same. The new SLA variants were considered to be sub-types of those detected with the preceding mapping techniques. The nesting of the newly determined haplotypes in the older serologically identified ones adduced credibility by conforming to previous classifications while partitioning them still further. As a result, they concluded that these genomic methods offered the prospect of “increasing knowledge concerning SLA genetic organization and complexity” (Chardon et al., 1985, p. 170).

Once CEA-INRA had constructed the YAC library, they needed to test—or validate—the new resource. They performed tests to discern the average size (and range of sizes) of the clones, how many YACs were chimeric (pig DNA contaminated with yeast DNA) and the presence or absence of particular sequences. For the latter, they used primers—which trigger the amplification of specific stretches of DNA—of particular known genes. These primers were either produced locally or acquired from ten other laboratories in the wider PiGMaP network. As well as inspecting the accuracy of their library, the CEA-INRA team examined whether there were enough overlapping sequences present in the clones to build them into larger sets of ordered fragments or contigs, and therefore be able to encompass broader areas of the pig genome. In these ways, they were assessing the utility of the YACs themselves (through evaluation of size and proportion of chimeras), whether the library provided sufficient coverage (through examining the presence of known genes) and the extent to which it could be applied to larger-scale physical mapping (Rogel-Gaillard et al., 1997). By the time of this evaluation, in 1997, the team at Jouy-en-Josas managed a library of some 18,000 clones that had been tested using underlying sequence information and DNA fragments from other pig breeds.Footnote 13

During the evaluation, they screened the library to identify clones containing parts of the SLA, using primers for four genes and finding three of these represented. They also screened the library for repeat sequences, as a starting point for being able to characterise the organisation of centromeres—regions that link the two halves of chromosomes and feature abundant repetitive sequence patterns. This task was crucial to their studies of the SLA, which spans the centromere of chromosome 7. Some of the YAC clones flagged in this screening were then sequenced and compared to previously identified centromeric repeat sequences.

While 85% of the sequences screened for were found, this percentage was lower than expected given prior knowledge of the prevalence of these repetitive sequences in the yeast genome (Chap. 2). These divergences, which could point to biases in the coverage of the library, were explained with reference to “the influence of the cloning system on the selection of specific regions”, the screening procedures they adopted and the quality and range of the primers used.Footnote 14 Additionally, they swapped samples from their own library with clones contained in two other pig genome libraries created in Göttingen and Berlin. By using different libraries and cloning systems in conjunction with their own, as well as refining methods and tools, they hoped to advance the coverage and utility of their YAC library (Rogel-Gaillard et al., 1997).

YAC libraries were favoured at this stage because of the large insert size they allowed, of clones up to 1Mb, a million bases or nucleotides. Once libraries were needed for fine-grained physical mapping, however, the disadvantages of YACs—such as the risk of chimerism due to contamination by yeast DNA—outweighed the storage capacity advantage. As with human genome mapping (Chap. 4), the CEA-INRA team therefore decided to produce a library stored in BACs, as their lack of chimerism made up for their smaller storage capacity of up to 300Kb: 300,000 bases, or nucleotides. A BAC library of S. scrofa was created at Jouy-en-Josas in 1999 with DNA from one of the Large White boars that had been employed before. This time, the DNA was extracted from skin fibroblasts, connective tissue cells that synthesise collagen and other fibres.

Once again, the library was primarily constructed to address the immunogenetic interests of the group, in particular assessing the presence of PERVs in the DNA of pigs. As with the prior YAC library, it was also validated by assessment of its coverage, levels of chimerism and insert sizes. It was screened using known markers to test the extent to which it replicated known genomic features, and contigs were built using overlapping sequences that were identified. So validated, the library could now be screened using primers for known PERV sequences, and the clones thus identified were isolated and analysed. This enabled the researchers at Jouy-en-Josas to both satisfy their more immediate goals—probing the clones with known PERV sequences and identifying their chromosomal position—and to build larger contigs using overlaps between the library’s DNA fragments. In other words, the BAC library, as its YAC predecessor, overflowed its original SLA focus and could be used to map increasingly larger areas of the pig genome (Rogel-Gaillard et al., 1999).

The team at CEA-INRA screened the library on request from researchers across the world, distributing clones for free. They saw this as a key service to their fellow pig genomicists and other researchers. It also helped them to forge new connections in a network of laboratories that they perceived as becoming ever denser and more international.Footnote 15 Screening the library was a laborious process, involving manual rather than automated picking and analysis of clones. In the long-term, it would have been far too strenuous and costly for it to continue to be conducted by the same researchers mapping the SLA complex. Consequently, the BAC-YAC Resource Center was formed with technicians and engineers placed in charge of managing and screening the library. The mapping team, therefore, transferred their libraries to the Resource Center, a technical laboratory also belonging to INRA that distributed clones on request to the wider research community.Footnote 16 Its rationale and operation resembled the Resource Centre that the UK Medical Research Council had created in the early 1990s within its Human Genome Mapping Project (HGMP, Chap. 2).

Other DNA libraries were established for concrete research purposes: a YAC library at USDA MARC was constructed and characterised in collaboration with a researcher at the University of Otagu in New Zealand (Alexander et al., 1997); a PAC library was created by a German collaboration using an artificial chromosome derived from P1 bacteriophage,Footnote 17 and a BAC library (PigEBAC) was developed at the Roslin Institute.

PigEBAC was created over 1997 and 1998 with funding from the EC and the UK’s BBSRC. It was then further processed and housed at the HGMP Resource Centre, which by the mid-to-late 1990s had been relocated to the same campus near Cambridge where the Sanger Institute was based (Chap. 4). The clones were distributed to the wider community from the Resource Centre. The DNA used in PigEBAC, as with the French YAC library, was acquired from the peripheral blood lymphocytes of a boar. Yet in this case, the boar was the offspring of a cross between a Large White female and a Meishan breed male. This hybrid origin was considered to be appropriate to the stated motivation of producing the library: it was explicitly intended to aid specific genetic research as well as more general genome mapping efforts (Anderson et al., 2000).

Indeed, many of the reference populations used in PiGMaP had been constructed by crossing Large White and Meishan pigs. These two breeds of pig—though the Meishan is typically classified as a sub-breed of the Taihu pig—were geographically distinct in their origins: Yorkshire in the case of the Large White, and the Chinese province of Jiangsu for the Meishan. The two pigs were also quite dissimilar: the Meishan is darker, with wrinkled rather than smooth skin, and is fatter and more reproductively prolific (Fig. 5.2). The latter characteristic made it of interest to Western breeders and allied researchers, who aimed to boost this quality in their local pig populations by crossbreeding with Meishans. For this reason, efforts were made to import these pigs, which resulted in transplantations of small populations to France in 1979, the UK in 1987 and the USA in 1989.

Fig. 5.2
Two photographs. A meishan sow is on the top and a white pig is at the bottom.

Top: A Meishan sow at the Roslin Institute. Bottom: A Large White pig, also at the Roslin Institute. Photographs taken by the Roslin Institute photographer Norrie Russell, and provided to us courtesy of Alan Archibald

The presumed genetic distance of the two breeds was deemed an additional advantage to their use in mapping. Polymorphisms or variability at particular loci in the cross-bred offspring could be used to calculate genetic linkage between pairs of genetic markers: the frequency at which they are jointly inherited. A BAC library based on the same kind of genetic material as used in the prior mapping of markers could help refine and evaluate the existing assignments of loci still further. This diversity of uses shows that although the Roslin library was designed in part for genome mapping, it reflected the trajectories, networks, and evolving goals and interests of the communities that had coalesced around the pig genome (Fig. 5.3). Apart from its use in aiding mapping, it was also intended to be used, more immediately, as a resource for the identification of QTL: chiefly those of value to the pig breeding industry that much of this community was oriented towards.

Fig. 5.3
A group photograph of the delegates in the P i G M a P meeting.

Photograph of an early PiGMaP meeting in Toulouse, December 1991. It features the communities of agriculturally-oriented geneticists and immunogeneticists that coalesced around the mapping of the pig genome. Note Alan Archibald of the Roslin Institute (ninth from right, wearing a Scottish kilt), Max Rothschild of Iowa State University (eleventh from left, with a beard) and Lawrence Schook of the University of Illinois (tall and at the back in the centre of the group). Many other key figures who continued to play a significant role in pig genomics were also in attendance, such as Marcel Vaiman (ninth from left, with light-grey hair and a tie on) and Patrick Chardon (brown hair, just to the left of centre at the back, behind the woman with the brown bag) of CEA-INRA, and Louis Ollivier (thirteenth from left, holding a coat) of the INRA station at Jouy-en-Josas, who we meet in Chap. 7. Photograph courtesy of Lawrence Schook

DNA libraries such as the ones produced at CEA-INRA and Roslin are shared reference resources. They constitute validated and progressively characterised collections with known and described provenance that can be consulted by the wider community, and for which the potential uses are not narrowly prescribed or channelled by the sources and means of their construction. In this way, they are similar to cell lines (Landecker, 2010), mouse strains (Rader, 2004) and seeds held in banks (Curry, 2017; Curry, 2019; Peres 2016). In the case of the S. scrofa libraries, their production, circulation and validation from the late-1990s onwards helped to intensify the connections that had begun to be forged in projects to improve maps of the pig genome. This convergence reflected, and was further fostered by, the ongoing mapping and by successive projects that aimed to produce other resources and tools of use for the community and the breeding industry.Footnote 18 The community dimension of pig mapping and its concomitant concern with variation persisted when, at the turn of the millennium, the opportunity arose to characterise the full S. scrofa genome.

2 The Genealogies of the Map and the Sequence

One of the ironies of pig genome sequencing was that, although physical mapping preceded and informed the whole-genome sequencing operation in a manner that faithfully replicated the original strategy of the IHGSC (Chap. 4), the creation of this physical map was a separate project designed to contribute to the community’s more proximate research goals. This physical mapping constituted a continuation, albeit in a more comprehensive way, of the preceding mapping activities of the pig community. As a result, it naturally used the DNA libraries that two of the more prominent mapping institutions—CEA-INRA and the Roslin Institute—had produced and distributed to their peers. Yet in the USA, the landscape of concerted mapping programmes was different. One of them, centred around USDA MARC, adopted some of the organisational and logistic aspects of the large-scale production model that was becoming increasingly pervasive in human genomics.

USDA MARC approached the same specialist team from which the IHGSC had commissioned the production of libraries to map and determine the reference sequence of the human genome: Pieter de Jong’s group at the Roswell Park Cancer Institute. The resulting pig BAC library was named after de Jong’s institutional acronym (RCPI) and given serial number 44, since the team had been involved in the construction of preceding libraries for other organisms. RPCI-44’s creation was paid for by USDA MARC, with several researchers from there involved in its characterisation and analysis. The library was derived “from four crossbred male pigs (breed composition: 37.5% Yorkshire [Large White], 37.5% Landrace, and 25% Meishan)” (Fahrenkrug et al., 2001, p. 472).Footnote 19 It therefore overlapped with the two breeds that provided the DNA for PigEBAC and, like the Roslin library, was intended to be used in both mapping and identifying further genetic markers of agricultural interest. Yet RPCI-44 was at this point the only library whose creators made explicit mention of its possible use for the sequencing of large genomic regions or even the whole genome of S. scrofa (Fahrenkrug et al., 2001).Footnote 20

RPCI-44 became publicly available in 1999, the same year as PigEBAC. By that time, the IHGSC effort was approaching its zenith and its participants were looking to the post-human reference sequence world. A potential new horizon for some of them, especially those more narrowly specialised in conducting large-scale mapping and sequencing, was undertaking the genomes of other organisms. Debates and planning took place at the NIH on opening funding streams to sequence non-human genomes that could provide both comparative insights for data validation, as well as knowledge of interest for medicine, agriculture and industry (Chaps. 6 and 7). During the early-2000s, the communities that had converged around pig mapping attempted to take advantage of these opportunities to position S. scrofa as a candidate to be sequenced next.

Here, the role of a handful of international conferences in allowing the small and increasingly tight-knit pig genomics community to come together and develop new ideas and strategies was key. Three of these were especially significant: the Plant and Animal Genome conference held annually in January in San Diego, California; the biennial International Society of Animal Genetics (ISAG) conference held in a different location every even-year summer; and the quadrennial World Congress of Genetics as Applied to Livestock Production, that also moves around venues worldwide. These meetings concerned (and still concern) multiple species, in contrast to the typical conferences of human researchers, and are used as occasions for convening working groups and consortia, for holding meetings to discuss the progress of existing projects, and debating the prospects for forming new ones.

Many of the researchers in the pig community work on other organisms as well, usually other farm animals. Contact with colleagues pursuing genomics research on other species raised their horizons, as well as informed their own strategising. For example, pig genomicists learned from developments in cattle genomics, as we shall see in this chapter and Chap. 7. Chicken genome researchers, who like cattle genomicists were pursuing a full reference sequence before the pig, also intersected with the pig genome community, though in a more direct way through the involvement of people like Wageningen University’s Martien Groenen in both efforts. Industry representatives, particularly from the breeding sector, were (and are) regular participants in these conferences, and the informal sharing of new developments in the academic and private settings has been key to shaping research and industry agendas.

The NIH held a workshop in July 2001 on ‘Developing Guidelines for Choosing New Genomic Sequencing Targets’, involving key figures in the IHGSC. Richard Frahm, the National Program Leader of Animal Genetics at the USDA’s CSREES, advocated for sequencing an agriculturally important species, emphasising its potential economic impact in addition to its value for comparative purposes. However, already at this stage, the phylogenetic position (where a species is located in the tree of life) of the candidate species was being emphasised by other participants at the workshop as a key criterion, and this meant that advocates for sequencing some non-agricultural organisms could stake more convincing claims. Parallel to this, the USDA was exploring its own options for genomics. The new Under Secretary for Agriculture responsible for research, Joseph Jen, requested that the US Government’s Office of Science and Technology Policy create an Interagency Working Group on Domestic Animal Genomics.Footnote 21

In 2001, a different group—the Alliance for Animal Genome Research—was established in the USA by agricultural industry bodies and research institutions to advocate for the development of genomics research concerning animals used in agriculture. From the beginning, they were led by Kellye Eversole, and the Alliance used her firm Eversole Associates to lobby public officials and politicians. This would prove useful in acquiring funds in the US Government’s budgeting process.Footnote 22

An early success of the Alliance for Animal Genome Research was in getting the US National Academy of Sciences to convene a meeting on domestic (i.e. farm) animal genomics. This was funded to the tune of $100,000 by the two main research arms of the USDA (the Agricultural Research Service—ARS—and CSREES) and took place on 19th February 2002 in Washington DC. In addition to contributions on comparative genomics, many of the discussions centred on which species should be sequenced (Pool & Waddell, 2002).

Spurred by the impending competition for resources for sequencing, the pig genome community marshalled its own efforts. Arising out of the ISAG meeting in August 2002, a permanent animal genome sequencing committee was created and a working group was tasked with writing a ‘White Paper’ to submit to the NIH to interest them in sequencing the pig genome. In October, the White Paper was submitted, and a ‘Scientific Stakeholders meeting’ of the Interagency Working Group (coordinated by Jen) was held, with Rothschild, Gary Rohrer from USDA MARC and Fuller Bazer (a Texas A&M University reproductive biologist) advocating for the pig.Footnote 23

The White Paper was co-authored by Rohrer, Rothschild, Lawrence Schook and Jon Beever of the University of Illinois, together with Richard Gibbs and George Weinstock of the Human Genome Sequencing Center at Baylor College of Medicine. Its arguments for sequencing the pig genome heavily emphasised its potential value for human health through developing the pig as a biomedical model, and in terms of what it could contribute to human genomics.Footnote 24 The former contention drew on long-established work in shaping the pig as an animal model of disease. Schook in particular had worked in this vein, and much early research at CEA-INRA had addressed the potential of S. scrofa for advancing human medicine. This line of work emphasised the biomedical fruits of pig genetics research to uncover genes relevant to human disease and health, some of which had been conducted by the more agriculturally-inclined scientists and institutions.

The ability to genetically modify and clone pigs, together with the existence of mapping and DNA library resources, were adduced in support of their contention that pig genomics was sufficiently mature and ready for whole-genome sequencing. This case was further supported by the ongoing construction of a BAC fingerprint map by a consortium of INRA, the University of Illinois, USDA MARC, the Roslin Institute, the BBSRC and the Sanger Institute. The White Paper also stressed the comparative genomics expertise and knowledge built up by pig genomicists, which could provide a conduit for the translation of pig mapping and sequencing data to human genomics (Rohrer et al., 2002; García-Sancho et al., 2017, pp. 13-14).

These efforts culminated in the formation of the SGSC in 2003, with its inaugural meeting held at INRA Jouy-en-Josas in September, hosted by Schook and Patrick Chardon. In addition to researchers from many of the same mapping institutions that had come together in the 1990s on both sides of the Atlantic, representatives from China, South Korea and Japan were also present and played a significant role in the sequencing effort to come. Reflecting their growing importance in this area, agents of the USDA and the Alliance for Animal Genome Research were also in attendance. The basic principles for the operation of the Consortium, estimates for resource requirements and commitments for contributions towards the eventual project, were laid out at this meeting (Schook et al., 2005).

Although at first glance its structure and operation seemed to replicate the IHGSC, the SGSC differed in a number of important respects. While the leading institutions of the IHGSC were large-scale sequencing centres that had been either created de novo or considerably enhanced for the determination of the human reference genome, the SGSC’s membership included many participants in the prior swine genome mapping programmes that existed long before concerted sequencing appeared on the horizon (Table 5.2).

Table 5.2 List of members of the Swine Genome Sequencing Consortium elaborated by James Lowe with data from: https://www.igb.illinois.edu/labs/schook/sgsc/index.php (last accessed 9th December 2022). Key participants in PiGMaP and USDA mapping initiatives are indicated in bold. The selection criterion for mapping participants included in this table is authorship on at least one of the following papers: Archibald et al. (1995), Yerle et al. (1995) and Rohrer et al. (1996). It should be noted that the selection criterion excludes many scientists who were involved in some way in mapping and/or sequencing. For instance, Timothy Smith was key to resequencing the pig genome (Chap. 7) but was not a member of the Consortium, and Patrick Chardon and Tosso Leeb (to pick only two examples of many possible ones) were involved in mapping and genome library creation in the 1990s, but were not authors on the three papers used for identifying mapping participants to include in this table. Compare the continuity exhibited in this table to the discontinuity in human genomics, as illustrated by the differences between the institutions listed in Table 3.1 (Chap. 3) and those listed in Table 4.1 (Chap. 4)

In terms of funding, the organisations that came to support the SGSC were more agriculturally-inclined and less biomedically-oriented than the ones that underwrote the human genome coalition. The contributors to the SGSC included a lower proportion of charities, but there was a stronger presence of public and private funders connected to local economic interests, such as livestock production and breeding. Finally, the SGSC was a unified body dedicated to garnering the funds needed to sequence the whole genome of the pig, to map out the strategy and means to do so, and also to guide and involve itself in that sequencing. It was a concrete entity from the very beginning, something that had been missing from the human genome effort, with the IHGSC largely being a retrospectively established name for a coalition that had emerged during the second half of the 1990s and into the 2000s (Chap. 4).

At its launch, the SGSC believed that a sum in the range of 50 million dollars would be required.Footnote 25 Fortunately for the pig genomicists, this proved to not be the case, as the body that could provide funds on such a scale—the NIH National Human Genome Research Institute—did not prioritise S. scrofa as a sequencing target, focusing instead on cattle as its chosen agriculturally-important species, in part because of backing from the cattle industry and the rapid progress that was being made as a result (Chaps. 6 and 7).Footnote 26

As they were unsuccessful in attracting NIH funds for whole-genome sequencing, they turned their attention to generating other key resources—such as the BAC fingerprint map and Expressed Sequence Tags. They also focused on further exploiting what they already had, using smaller pots of money and collaborating in a similar way to how they had been doing previously. Alongside this, efforts to secure funds from the USDA to sequence the whole pig genome continued. The existing genomic efforts and the support of companies and pork industrial boards helped, as did the ongoing connections with USDA officials, the assistance of the local congressman for the University of Illinois at Urbana-Champaign, Representative Timothy Johnson, and the lobbying of the Alliance for Animal Genome Research.

This bore fruit in 2006, when Jen, just before leaving his post as Under Secretary at the USDA, approved $10 million for sequencing the pig genome (formally awarded to the University of Illinois), which was signed off by the then Secretary of Agriculture, Mike Johanns. Increasing automation and refinement of sequencing processes had reduced costs and therefore lowered the barriers for the full genome sequencing of less well-funded species, but the required investment was still substantial. The funds from the USDA were complemented by additional resources provided by Iowa State University and North Carolina State University, as well as industry bodies: the National Pork Board, the Iowa Pork Producers Association and the North Carolina Pork Council. The other institutions involved in the SGSC brought their own resources to bear on the overall programme, once again drawing on grants to perform particular pieces of research and create new resources (Lowe, 2018).Footnote 27 Key to the USDA’s support was the demonstration that the community of pig genomicists was united behind one project and that the initiative had international buy-in. The existing international basis of the community helped, as did the agreement of a separate Sino-Danish collaboration to contribute data from what had threatened to be a rival project.Footnote 28

Schook was appointed co-director of the SGSC alongside Mike Stratton, an expert in cancer genetics at the Sanger Institute. Schook approached pig genomics from the direction of establishing S. scrofa as an animal model of disease, but like many of the other members of the community, his genetic and genomic research led him to work towards multiple domains of application. Like the USDA MARC researchers, he relied on de Jong for the construction of the library that was mainly used in the concerted projects to physically map and then sequence the whole pig genome. In 2000, de Jong’s team had moved from RCPI to the Children's Hospital Oakland Research Institute (CHORI), located on the other coast of the USA within the San Francisco Bay Area. There they led the BACPAC Resources Center, a unit specialised in the mass production and distribution of DNA libraries.Footnote 29

De Jong had approached Schook at the January 2002 Plant and Animal Genome conference, with news that he had received funding to construct a pig DNA library. De Jong suggested that an inbred female pig be used, as this would have two copies of the X chromosome, no Y chromosome (as these are notoriously difficult to deal with), and reduced heterozygosity—the variation between each chromosome in a pair. Schook had access to a reference family that had been constructed at the University of Illinois at Urbana–Champaign for the purposes of mapping QTL.Footnote 30 One of the pigs in that family was more inbred than any other and would therefore be more homozygous and amenable to library production: a Duroc (North American domestic breed) sow born in 2001. Once Schook and his colleague Jon Beever had decided to use her, they sent de Jong 250 millilitres of her blood as requested, and from this the DNA was extracted from white blood cells to produce the CHORI-242 BAC library, as well as another fosmid library also used in sequencing.Footnote 31

The sow chosen by Schook and de Jong had a name: TJ Tabasco (Fig. 5.4, top). Oddly, she was named after her offspring, who were clones of her. TJ Tabasco was an acronym of the first letters of the names given to nine of them, deriving from animated characters: Tinker Bell, Jasmine, Tiana, Aurora, Belle, Ariel, Snow White, Cinderella and Olivia (Fig. 5.4, bottom).

Fig. 5.4
Two photographs. A pig is on the top and a litter of piglets is at the bottom.

Top: TJ Tabasco, as preserved on the wall of Lawrence Schook’s office at the University of Illinois at Urbana–Champaign. Bottom: Cloned offspring of TJ Tabasco. Cultures of foetal fibroblast cells and tissues at different developmental stages were derived from these pig clones and used to construct whole-genome shotgun libraries at the Sanger Institute and cDNA libraries. The cDNA libraries were used for the annotation of the reference sequence that was derived chiefly from DNA obtained from their mother. Photographs courtesy of Lawrence Schook

These DNA libraries and other reference resources were used in physical mapping at The Keck Center for Comparative and Functional Genomics of the University of Illinois at Urbana–Champaign, the French national sequencing centre Genoscope, and the Sanger Institute. Both Genoscope and the Sanger Institute were prominent IHGSC members. The Sanger Institute had been the second largest contributor to the draft human reference sequence published in 2001, and Genoscope was the seventh largest contributor (and second most productive European centre). The Sanger Institute was also contracted by the SGSC to determine the genome sequence and undertake some of the work that transformed this string of nucleotides into an assembled and fully annotated reference genome, as we see in Chap. 6.Footnote 32 The physical map and the BAC libraries were used as the basis for the main part of the sequencing of the reference genome at the Sanger Institute. This reference sequence and the physical map were largely derived from the CHORI-242 library that de Jong had produced for Schook, alongside the fosmid library from TJ Tabasco and the three other BAC libraries mentioned, produced by CEA-INRA, the Roslin Institute, and de Jong’s group at USDA MARC’s request (Schook et al., 2005; Humphray et al., 2007; Lowe, 2018). Annotation of the reference sequence made use of the cDNA libraries derived from the cultures of TJ Tabasco’s clones (Fig. 5.4).

In its overall strategy, the SGSP operated in a way that reflected the original plan for the determination of the human reference genome: a distinct physical mapping stage that preceded and informed subsequent hierarchical shotgun sequencing. The sequencing of the DNA from the libraries—again, chiefly CHORI-242—was almost exclusively undertaken by the Sanger Institute, using the factory-style methods refined in human genome sequencing. Yet beyond the mere determination of sequence data, there was substantial input from the rest of the SGSC members. First, the participating laboratories played a crucial role in augmenting the initial stitching together—assembly—of the sequenced clones into larger and more contiguous stretches of sequence across the whole genome. Secondly, a number of Chinese and Danish institutions provided key additional data for the assembled genome using next-generation sequencing (Wernersson et al. 2005).Footnote 33 Thirdly, this and other supplemental sequencing, along with the prior mapping and knowledge of genetic diversity across breeds possessed by the community, informed the analyses comparing the genomes of different breeds of pig that accompanied the 2012 Nature paper describing the reference genome.Footnote 34

There was a distinct role for the Sanger Institute in this project, compared with its roles in S. cerevisiae sequencing and the IHGSC. Contrary to the case of the yeast genome project—where the Sanger Institute was a participant—and the IHGSC effort that the Sanger Institute had helped to coalesce, the Sanger Institute here opened up to and worked with a separate community. The pig genomicists were able to take advantage of the repertoires established at the Sanger Institute to process DNA libraries, sequence DNA, assemble sequences and validate the results. Rachel Ankeny and Sabina Leonelli (2016, p. 19) deploy the term repertoire to mean “the material, social, and epistemic conditions under which individuals are able to join together to perform projects and achieve common goals, in ways that are relatively robust over time despite environmental and other types of changes, and [that] can be transferred to and learnt by other groups interested in similar goals”. In this case, however, other groups interested in different goals beyond the production of a reference genome participated in and helped to direct the repertoires established at the Sanger Institute. The availability of a comprehensive physical map and previous genetics research allowed members of the community to request that certain areas of the genome be given special attention, for example, with targeted sequencing of those areas at higher levels of coverage conducted by the Sanger Institute.

Alongside this, the fragments of determined sequence were joined together—assembled—using the previously elucidated physical map, which indicated the order and relative positions of clones derived from the DNA libraries. As had been the case in the IHGSC effort, the software package PHRAP was used to assemble the pig sequence data generated at the Sanger Institute into—in this case—279 sets of overlapping contiguous sequences known as ‘contigs’. Using methods and pipelines developed for human genome sequencing, workers at the Sanger Institute then applied automated pre-finishing procedures and closed remaining gaps with selective sequencing of BAC clones that were known to span them. The Sanger Institute made checks of coverage, extent and contiguity, and the pig genomics community themselves contributed to checking and correcting the provisional assembly so produced by the Sanger Institute. One way that they did this was to check the orientation and order of scaffolds containing contiguous sequence using a previous physical map.Footnote 35 Greater conformity with this map in a newer assembly (or build) was adduced as evidence that it constituted an improvement (Groenen et al., 2012, Supplementary Information).

Members of the pig genomics community were granted access to the Genome Evaluation Browser (gEVAL) that had been created and was maintained by the Genome Reference Informatics Team (GRIT) at the Sanger Institute. This enabled them to view the assembly, assess its accuracy and communicate their findings to GRIT. This process drew upon the community’s more general knowledge of the structure and nature of the pig genome, detailed knowledge of particular regions, and their facility in exploiting human genome data to inform their assessment of the S. scrofa sequence. In some cases, assembly errors identified by members of the community were used to amend algorithms that were deployed in the process of constructing genome assemblies (Lowe, 2018). We explore these relationships and interactions between the members of the pig genome community and the specialist genomic labour and pipelines at the Sanger Institute in more detail in Chap. 6.

The immediate output of this was an imperfect frozen abstraction, the representative reference genome of the pig, which could be continually annotated further, and eventually replaced by other frozen abstractions: newer versions of the representative reference genome. In terms of variation, the reference genome looked to be just as limited in scope as the human and yeast genomes. Yet, due to the nature of its construction—in particular, the involvement of the existing community of pig genomicists—a significant constituency of users were acutely aware of the variation included in—and excluded from—the reference sequence. Like the human genome, it was substantially based on DNA from a single individual, but for the pig genomics community, it constituted a resource to be built on and linked to others, just as with their previous efforts and outputs. It was not supposed to represent the species in all its variation and diversity. They were aware of this and compensated for it.

3 Reference Genomes and Their Affordances

The sequencing of the pig genome appears to represent a continuation of the tendency towards the concentration of reference sequence producers that we have outlined in the preceding chapters of this book. While in the early-to-mid 1990s, initiatives to complete the yeast genome combined distributed and concentrated approaches to the determination of the reference sequence—represented by the EC and US programmes, respectively (Chap. 2)—towards the end of the twentieth century the IHGSC consolidated an intensive, large-scale production system that was embodied in the genome centres (Chap. 4). Ten years later, in the mid-to-late 2000s, the role of the Sanger Institute in the SGSC implies an even more concentrated production model in which only one genome centre was sufficient to determine the full reference sequence of S. scrofa, as opposed to twenty in the human genome effort.

Yet, the involvement of pig geneticists in the production and continuous adaptation of their reference genome—something that we explore further in Chap. 6—compensated for the delegation of part of the production process to the Sanger Institute. As a result, this community held a different perception of the reference genome that they helped to create than the yeast and human genomicists did for their reference genomes (Table 5.3).

Table 5.3 A comparison of the yeast, human and pig reference genomes (elaborated by both authors)

The yeast genome was produced as a community resource to be shared by geneticists, biochemists and cell biologists interested in the study of this single-celled organism. Key to its production was agreement about the suitability of using the S288C strain of S. cerevisiae as a model to investigate the workings of genes and cellular processes in eukaryotic organisms such as yeast. S288C had a prior history of use in genetic experimentation and shared genetic linkage and physical maps of it existed when the genome sequencing efforts started in 1989. This eased the convergence of the different communities of yeast geneticists and biologists around the objective of genome sequencing (Szymanski et al., 2019; Vermeulen & Bain, 2014).

The S288C strain thus became the glue that aligned the heterogeneous institutions and differential sequencing practices of the distributed approach promoted by the EC’s programme. The presumed—or intended—invariance of this strain also allowed the meshing of the data produced by the European consortium with that generated by the American and Asian institutions also involved in yeast genome sequencing. S288C was used as a fundamental common object for investigating the biology and genetics of yeast by its communities of researchers. Not surprisingly, once its reference sequence was produced in 1996, the further functional explorations of the S. cerevisiae genome—among them the EUROFAN project—relied on the same strain and were undertaken by largely the same participants as the genome sequencing programmes (Chap. 7).

The case of the human genome is quite different. Here, the strategy propounded by the leaders of the genome centres that were being established during the 1990s, departed from existing chromosome mapping practices of human and medical geneticists. These genome centre leaders belonged to a new generation of researchers that were supported by both rising biomedical funders—the Wellcome Trust—and influential scientific celebrities such as Nobel laureate molecular biologist James Watson. The younger breed of researchers and their supporters formulated a vision of producing a map and reference sequence of the entire human genome. These reference resources would represent the species as a whole and, as with the map and sequence of the yeast genome, would enable researchers to address the molecular basis of fundamental life processes, including pathologies and development from embryo to adult. Unlike the yeast efforts, though, the human genome did not correspond to a specific ‘strain’ of H. sapiens—it was a vaguer abstraction than that. Furthermore, the human reference sequence was determined by a less heterogeneous set of institutions and techniques: the IHGSC coalition of genome centres deploying industrial modes of data production and processing, supported by administrative agencies and bioinformatic infrastructures (Chap. 4).

The vision of the genome centre coalition clashed with the approach of human and medical geneticists, for whom only the genome regions that varied between healthy individuals and those suffering from genetic diseases deserved attention.Footnote 36 Prominent and long-serving geneticists based in hospitals and medical schools, such as Victor McKusick and Walter Bodmer, had dominated early discussions of genomics as a nascent discipline during the mid-to-late 1980s. They had also served as coordinators and advisors in the first concerted initiatives to map and sequence the human genome, which largely adopted the distributed, networked approach of the EC’s yeast genome programme—with the notable exception of the national human genome effort in the USA (Chap. 3). Throughout the 1990s, however, the influence that these reputed medical geneticists had in the new sequencing centres declined, which was reflected in their peripheral involvement in the IHGSC. The differences between the two groups inhibited the involvement of a large part of the human and medical genetics communities in the production of the reference sequence and hindered the subsequent clinical exploitation of this resource. For a substantial fraction of medically-oriented geneticists, the lineages of the IHGSC map and sequence—in terms of the human populations they represented, or their associations with previous maps of healthy and diseased individuals—were blurry and difficult to reconstruct. This led to them preferring to keep using their more chromosome or region-specific, locally-produced and clinically-targeted resources: maps and sequences extracted from hospital patients that were compared to control data from persons not affected by the conditions being studied.

Pig genomics represents a third, distinct case. The involvement in the SGSC of the communities that had coalesced during projects to systematically map the pig genome meant that the resulting reference sequence would only ever be seen as an arbitrary abstraction from the known or supposed genetic diversity of pigs.

Crucially, unlike in the yeast and human reference sequence efforts, S. scrofa mapping and sequencing was never presumed to be comprehensive or complete. Satisficing according to proximate concrete translational goals was the aim of pig genomicists, as it had been for medical geneticists involved in the HGMP or other early human genome programmes. Due to this, the swine reference sequence was perceived as a dynamic resource that would qualitatively change when updated. This resource would also form the basis for the creation of new reference resources incorporating different variation, as the objectives of its user communities evolved. This continuous iterative adjustment was reflected in practices such as annotation of the reference sequence with data concerning immune response genes or other traits of interest for pig geneticists and the livestock breeding industry and the creation of new datasets cataloguing inter-breed variability. These practices were regarded as part of the ongoing production and use of the reference sequence, rather than as an appendix or postscript to be added once the genome was ‘finished’. As we see in the next chapter, this led the collaboration between the Sanger Institute and the other SGSC members to be extended in order to develop the annotation of the pig sequence so it could be aligned with numerous and changing research priorities (Chap. 6).

Key aspects of the production and nature of the pig genome remain invisible if its story is just told from the perspective of the sequencing work conducted at the Sanger Institute using a small set of libraries largely drawn from one highly-inbred pig. The other consortium members contributed an array of practices—from mapping to assembly and annotation—that were crucial in augmenting and transforming this partial sequence into a usable genome.Footnote 37 They conducted these in conversation with the Sanger Institute, which opened itself up to the input of this community and changed the way key parts of its specialised data production and processing pipelines worked. This contribution by pig geneticists provided them with a perspective on their reference genome that human geneticists lacked. For human and medical geneticists, their relative absence from the IHGSC effort complicated their ability to link the reference sequence to data they routinely produced about clinically-relevant variation.

These differences suggest that the master narrative of genomics—centred on the production of the human reference sequence but with that presumed to stand in for genomics as a whole—constitutes an artifactual historical representation. When the historical lens goes beyond the mere compilation of reference sequences, genomics research emerges as a broader enterprise that diverges from this canonical trajectory. The history of the production of less well-known—and especially non-human—genomes enables us to better discern connections between the creation of reference sequences and pre-existing practices and communities engaged with genome mapping. In the case of S. scrofa, these practices and communities continued shaping the string of nucleotides produced at the Sanger Institute and its representation in data infrastructures. When it came to the determination of the pig reference sequence, the range of actors and activities did not narrow as radically as they did in the IHGSC effort. The participation of an established community of researchers in the whole-genome sequencing effort with the Sanger Institute—which itself had further evolved as a specialist genome centre—accounts for the more direct and contextualised use of the sequence data by the pig geneticists, especially compared to the more peripheral human and medical geneticists. The next part of the book explores how this community involvement—and its differing extents and nature in yeast, human and pig genomics—shapes annotation and other post-reference genomic practices (Chaps. 6 and 7).

The extreme filtering out of much of the variety of pre-reference genomic research in the IHGSC is shown to be exceptional when lineages and connections between earlier and later genomic research are considered, for instance between the early pig mapping programmes and what has commonly been called “post-genomics” (Richardson & Stevens, 2015). We show that, outside the success narrative of the IHGSC, the advent of a reference sequence does not by itself create the post-genomic world. Furthermore, when other species such as S. scrofa are considered, continuities—from before the reference sequence was determined to after it—can be discerned for communities, practices, resources, knowledge and objectives. Taking these points on board transforms the history of genomics into a dynamic and recursive field rather than a dichotomous, linear and teleological space punctuated by the completion of reference sequences.