Keywords

11.1 Introduction

Sweetpotato (Ipomoea batatas) is a widely consumed vegetatively propagated crop with particular importance as a subsistence crop in Africa. However, due to its hexaploidy (2n = 90), low flowering, and outcrossing phenotypes, it is a challenging crop for breeding (Campos and Caligari 2017). During the last decade, large breeding projects were implemented, including the SASHA, GT4SP, and SWEETGAINS projects (Girard et al. 2017; Wu et al. 2018), intending to modernize sweetpotato breeding, address issues such as pathogen susceptibility, increase nutritional value, and to work toward the application of new methods, such as genomic selection.

A characteristic of these newer, genome-based breeding methods is their data-intensive nature. This characteristic prompted the projects to focus on enhancing the available infrastructure to handle the large-scale phenotyping and genotyping datasets that are required. Such infrastructure includes databases such as Sweetpotatobase (https://sweetpotatobase.org/), which was established based on the Breedbase software (https://breedbase.org/) (Morales et al. 2022). It implements a digital ecosystem that can facilitate the work of a breeding program aligned with these projects’ goals. In addition, these projects addressed several other big obstacles, including the availability of the full genome sequence of sweetpotato (Chap. 2), the missing tools for analyzing polyploid genomes (Chaps. 4, 5) (Campos and Caligari 2017; Mollinari et al. 2020), and the creation and refinement of an appropriate ontology to describe the traits of sweetpotato (see https://cropontology.org/term/CO_331:ROOT).

In this chapter, we describe the available database infrastructure for sweetpotato. We also discuss how breeding programs can benefit from the available system and gain the maximum benefits by following established best practices for data management and workflows in the complex reality of a breeding program. Further resources are also available to get more information. For example, a good overview of the sweetpotato breeding community, best practices, new traits and methods, links to other resources, and many other aspects, is available from the Sweetpotato Knowledge Portal at https://www.sweetpotatoknowledge.org/.

Conventional breeding is a complex process, and the complexity is scaled upwards as genome-related data is integrated into breeding decisions. Importantly, some prerequisites must be satisfied before one can even start thinking about such an endeavor. The most important prerequisites are: (1) A high-quality genome reference sequence that will facilitate the genotyping process; (2) algorithms that can be used to predict traits from genome data; (3) ontologies that describe the traits in the crop at hand, with well-defined data formats such as scales or categories. In a hexaploid system such as sweetpotato, (1) and (2) are much harder to achieve than for diploids because of the difficulty of assigning sequence reads to individual chromosomes. Using special techniques, the sequence of the diploid progenitors (Wu et al. 2018) and the hexaploid sweetpotato has only recently been completed (see Chap. 2). A complete database and website about the sequenced sweetpotato genomes is available at http://sweetpotato.uga.edu/, featuring interactive genome browsers and utilities such as BLAST searches. Polyploid genomes are much harder to genotype; while there are only three possible states in a diploid marker, the number of states increases rapidly with increasing ploidy levels. New methods had to be developed to identify the genotypes of hexaploid sweetpotato reliably, which is described in Chap. 4 (Campos and Caligari 2017; Mollinari et al. 2020). For the trait descriptors, work in collaboration with the crop ontology project (https://cropontology.org/) (Shrestha et al. 2012) yielded a standardized ontology (https://cropontology.org/term/CO_331:ROOT), which over the years has been improved and extended to adapt to the changing needs of the breeders, as new methods, such as near-infrared spectroscopy (NIRS), were introduced for many traits. New areas of interest, such as cooking quality, have been added to the phenotyping repertoire.

The fourth prerequisite, which is the focus of this chapter, is a strong commitment to strict data management principles and the establishment of the necessary data management infrastructure. It comprises a breeding database and data collection tools that seamlessly integrate into a digital breeding ecosystem, where data is collected and processed digitally, and breeding decisions can be made right in the database based on the latest data.

11.2 Digital Breeding Ecosystem

Breeding decisions are only as good at the data they are based upon. As selection strategies grow in sophistication, it is all the more important to ensure that the data underlying their complex models is inter-related, accessible, and free of preventable error. The key to accomplishing these goals is to produce, transfer, and store breeding data entirely within a digital ecosystem (Fig. 11.1).

Fig. 11.1
figure 1

Roundtripping within a digital breeding ecosystem

At its surface, a digital ecosystem is composed of tools for data collection in the field or lab and for data analysis in the office. Underlying the tools is a programming interface that enables them to communicate automatically with the core of the ecosystem, a central database that stores, combines, and disperses the data. Passing data along from tool to database to tool for different steps of the breeding process without ever leaving the digital ecosystem is known as roundtripping. Roundtripping involves an initial setup cost to populate the database with breeding material to be tested and with traits to be measured. Then the activities of each breeding step can be tracked digitally. First, by creating database objects corresponding to the physical material, then by identifying the physical material with barcoded labels, and finally by collecting measurements on the physical material using the standardized traits. Once up and running the roundtripping process pays dividends as data is collected and related to each other with less effort, greater speed, and higher fidelity.

11.2.1 Data Collection

The data collection steps in the roundtripping process involve intricate workflows that need to be well-defined based on SOPs and can be very different for different crops. Data needs to be collected on the physical attributes of breeding lines (phenotypes), but also genetic data (genotypes) and relationship data (pedigrees). The following digital tools are used in sweetpotato programs for collecting each of these data types.

11.2.1.1 Phenotypic Data

There are many different approaches to collecting data in the field. Some examples include paper notebooks, digital spreadsheets, and custom breeding software on hand-held devices. Each has advantages and disadvantages, but we have found that Field Book app provides the ideal combination of features for most situations.

Field Book is an open-source Android app that can be used to collect data on plants in breeding and research applications (Rife and Poland 2014). Its data entry is efficient, eliminates the need for data transcription, and reduces the risk of errors. It runs on a wide range of inexpensive hardware, allowing consumer-grade technology to be used in environments where cost and inflexibility have been limiting factors.

An important consideration with Field Book is how one identifies the plot or plant that is being phenotyped in the field. Field Book provides a search interface to find the desired entry by plot or plant name, and it is possible to move automatically to the next entry in the field using on-screen buttons. However, we have found that barcoding the field is currently the best solution for routine identification of plots or plants. Field Book fully supports barcode scanning using the tablet camera, including QR codes, and Breedbase can generate PDFs for printing the labels. In addition, Field Book supports taking images in the field, which are automatically associated with the corresponding plot (Fig. 11.2).

Fig. 11.2
figure 2

Field book app’s collect screen with collected sweetpotato data

For entering trait values, input screens adapt to the format of a trait; for example, for a categorical trait with categories 1, 2, 3, 4, and 5, five corresponding buttons will be displayed; it is impossible to enter an illegal value. A number pad is shown for numerical values, and for dates, a date selector, and so forth.

Roundtripping in the context of Field Book means that the trial layouts and traits have to be created or be available in the database and have to be exported to Field Book. Field Book essentially attaches data to the pre-existing data object identifiers such that the collected data can easily be uploaded back to the database.

Until recently, this data transfers from the database to Field Book and back involved file transfers, with the complexity of connecting the tablet to a computer, creating the necessary files, and finding and transferring the files. Now, a BrAPI-based API (Selby et al. 2019) can be used, allowing Field Book to automatically import, export, and sync data from a BrAPI-enabled database over any internet connection with just the click of a button. This greatly reduces the amount of work required and makes the process of collecting data in parallel with multiple devices much easier.

Unfortunately Field Book is not a one-size fits all solution for phenotypic data types. Some data must be collected by incompatible hardware or in unsuitable workflows or conditions. In these cases, the roundtripping process can still be maintained by ensuring the necessary identifiers are propagated through the process using barcoded labels, or by making use of the standardized BrAPI calls.

A common example of one of these alternative workflows is the collection of near-infrared spectroscopy (NIRS) spectra. NIRS data collection requires the use of specialized benchtop or hand-held hardware, and outputs large quantities of spectral data. Regardless of the technology used, roundtripping can be maintained by propagating the unique identifiers of samples through the whole process so they are included in the output data. This ensures that the resulting spectra can be easily loaded and linked to the proper objects within Sweetpotatobase.

11.2.1.2 Genotypic Data

Genotypic data is a complex data type, and tissue sample collection, processing, and analysis can be challenging. Collection of samples in the field is often an error-prone manual process; a breeder must go to the field, collect and label individual samples, and lay those samples out on a plate that can be submitted for sequencing, taking particular care to prevent mix ups so that when data is returned from a genotyping facility, it can easily be connected back to the original samples (Fig. 11.3).

Fig. 11.3
figure 3

Coordinate data collection screen

In the digital ecosystem, the Coordinate app provides support for these activities. Coordinate is a flexible, open-source Android app that is used to collect and organize samples. Coordinate functions by defining templates and then collecting data in grids created from those templates. The plot or plant barcode can be scanned with the app to identify the samples collected and a unique identifier is generated for each sample. The samples are arrayed in 96 well plates, and the corresponding data is uploaded to the database, which can, for some providers, submit the data automatically to genotyping facilities.

In turn, Sweetpotatobase can serve as a repository for various genetic markers that are output by the genotyping process. Supported types include Single Nucleotide Polymorphism (SNP) markers, Kompetitive Allele-Specific PCR (KASP) markers, and Simple Sequence Repeats (SSR) (Morales et al. 2020). By associating these genotypes back to the source tissue samples generated by Breedbase and tracked in Coordinate, they are automatically integrated with the broader set of phenotypic and relationship data. This contextualized genotypic data is instrumental in unraveling the genetic diversity of sweetpotato. It enables sweetpotato breeding programs to deploy a powerful set of tools, from purely quantitative genomic selection (GS) models to genome-wide association studies (GWAS) that identify specific inheritance patterns and marker-assisted selection (MAS) strategies that exploit them.

11.2.1.3 Crossing Data

Crossing is frequently the least digitzied part of the breeding process due to the highly complex and variable nature of different crop’s biology. Handwritten paper tags are common due to their flexibility but limit the speed and accuracy with which pollination data can be linked back to the rest of the digital ecosystem. Where digital solutions exist, they are often tightly customized to the crop (btract, banana) or a specific breeding program (pollination-toolbox, NCSU sweetpotato). An exception to this is the Intercross app, a beta implementation of a general solution to cross data collection (Rife et al. 2022). Intercross digitizes the process through a streamlined interface to manage potential parents, make crosses, and track additional cross data. Parents are identified by scanning barcoded labels while newly created pollinations are tracked using labels produced on-demand by a Bluetooth-connected mobile printer. These barcoded labels ensure fast and reliable tracking of identifies and are important to maintain data connections in downstream processes such as seed inventory and seedling trials.

Intercross does not yet implement all of its imports and exports via BrAPI, but the necessary file formats are interoperable with Sweetpotatobase. The Sweetpotatobase crossing experiment page can generate both a parent file to import the necessary male and female ids, as well as a wishlist file used to set pollination targets for specific parental combinations.

Exported crossing data can be updated in a standard spreadsheet format accepted by Sweetpotatobase. When uploaded, this data automatically populates details including unique pollination event ids, timestamps, operator name, and optional fields like flower number. As the seed generated is collected, planted, and selected, any downstream data collected can be linked back to the original cross using the unique id encoded in the barcoded label. Digital tracking of this data such as seed numbers and progeny names allows Sweetpotatobase to automatically calculate pollination success rates, selection percentages, and to populate the pedigrees of newly selected material (Fig. 11.4).

Fig. 11.4
figure 4

Details of a sweetpotato pollination recorded in the Intercross app

11.2.2 Data Transfer

As with other technologies, breeding data transfer has come a long way, from transcription, to manual transfer of digital files, to Breeding API (BrAPI) calls (Selby et al. 2019). While relatively new and still under active development, all the digital tools we recommend here have adopted the BRAPI standard as a way to automate data transfer between freestanding software tools and into flexible analysis environments like RStudio. These automated transfers are the glue that keep the ecosystem together and can be the difference between a nearly frictionless roundtripping experience, or a tedious file-transfer process that limits the process or breaks it entirely. While the digital ecosystem is flexible in that the available tools are optional and may change over time, it is crucial that any additions to the system speak the shared language.

11.2.3 Data Management

11.2.3.1 The Search Wizard

In Sweetpotatobase, as in other Breedbase databases, the Search Wizard is a major query tool, empowering users with an intuitive and efficient approach to data exploration and retrieval. In the Search Wizard, the data in the database is viewed as a multi-dimensional cube, in which the dimensions represent attributes of the trial data, such as location, year, breeding program, accessions, traits, and so forth. The Search Wizards allows the specification of data items along these dimensions to create intersects in the data cube, efficiently generating highly precise datasets that can be stored in the database and used for downstream analyses (Fig. 11.5).

Fig. 11.5
figure 5

Slice of sweetpotato data in the Search Wizard

Analysis tools that support data input by wizard datasets include heritability assessments, stability evaluations, genomic selection methodologies, and mixed models.

A significant advantage lies in the tool's capacity to combine these different dimensions and store each parameter individually, both within lists and datasets. Lists prepared through the Search Wizard are automatically validated, whereas manually created lists require additional validation.

Lists and datasets play an indispensable role in many functionalities within the Breedbase platform and expedite activities from trial creation to seamlessly integrating accessions and facilitating the use of various other tools.

11.2.3.2 The Pedigree Viewer

Pedigrees assist breeders in identifying crossbreeding combinations that result in desirable traits, such as high yields and resistance to biotic and abiotic stress. Breeders can observe allele inheritance through pedigrees and understand their influence on trait expression and breeding strategies. Breeders can generate large pedigree structures using this genetic and phenotypic data. This helps breeders and researchers to make informed decisions about which plant lines to use in subsequent crossings.

Sweetpotatobase contains over 150,000 Ipomoea batatas accessions, encompassing a wealth of genetic diversity sourced across multiple breeding programs. Accessions can be linked to pedigrees, which are generated when accessions are crossed using the crossing tool or can be uploaded into the system via a specific table format (Fig. 11.6).

Fig. 11.6
figure 6

Sweetpotato accession’s pedigree, visualized using the Pedigree Viewer

Within Sweetpotatobase, pedigrees can be displayed and interacted with using the Pedigree Viewer. By default, the viewer shows an accession male and female parents, identified using color-coded lines. Purple arrows indicate nodes that can be expanded to display more relationships, either additional progeny, siblings, or parents, depending on the location and direction of the arrow. Access to pedigrees in this way adds depth and lineage context to the stored germplasm and provides valuable insights into genetic relationships and traits.

11.2.3.3 The Trait Ontology

The trait ontology is an important aspect of any breeding program, as it defines which traits can be measured and stored in the database, and it also standardizes how a trait is measured to ensure comparability of the results, potentially across breeding programs worldwide (Shrestha et al. 2012). The sweetpotato ontology in the database consists of 327 variables, which refer to traits with methods and scales (Fig. 11.7).

Fig. 11.7
figure 7

Sweetpotato ontology, as seen at cropontology.org

The Breedbase system allows post-composing of trait variables using other orthogonal ontologies, which can specify sampling conditions, temporal components, or sample treatments. These can be formed on the fly, while traits in the ontology itself can only be changed through a request to the ontology team to ensure standardization and avoid duplication of terms. Post-composing increases the flexibility of the annotations without sacrificing the standardization of the terms. In Sweetpotatobase, 105 post-composed traits have been created by users, showing the platform's adaptability and usability in storing diverse data and enhancing its accessibility for researchers and breeders alike. The sweetpotato ontology has been continually developed in collaboration with the crop ontology project (https://cropontology.org/).

11.2.3.4 Data Analysis

Analytical tools within Sweetpotatobase offer a comprehensive suite for sweetpotato breeders, empowering users with nuanced insights into their crop's performance. Available phenotypic analyses include analysis of variance (ANOVA) to assess the significance of comparisons of sweetpotato traits across different varieties or treatments. Heritability and stability analysis help quantify the extent to which observed traits are genetically inherited and how consistent they remain across environments. This insight aids breeders in selecting superior sweetpotato cultivars with desirable and stable characteristics. The GWAS tool (Morales et al. 2020) runs GWAS analyses right in the database, using datasets selected by the Wizard tool. Population structure analysis elucidates genetic diversity and relatedness among sweetpotato accessions, informing breeding strategies for maximizing heterosis and minimizing inbreeding depression. Mixed models complement this by accounting for genetic relatedness in statistical analyses, offering more accurate trait predictions and breeding value estimations. A built-in genomic selection workflow called solGS (Tecle et al. 2014) can be used to run the entire genomic selection pipeline in the Breedbase system, including the generation of models and prediction of GEBVs from genotypic information.

The built-in support for BrAPI in Breedbase means that BrAPI enables analysis tools that can be used with Sweetpotatobase. One such tool is the MrBean analysis tools offer a user-friendly platform tailored for molecular marker data analysis, particularly suited for plant breeding and genetic studies. Its intuitive interface streamlines tasks like marker-trait association studies, population structure analysis, and genomic prediction (Aparicio et al. 2024).

Using MrBean and other BrAPI-enabled tools, researchers can complement the built-in function of Breedbase to conduct comprehensive phenotypic and genetic analyses. Its functionality extends to diversity analysis, facilitating the exploration of genetic variation within and among populations.

11.3 Conclusion

Sweetpotatobase provides a digital ecosystem to assist breeders worldwide with managing sweetpotato breeding data and making more informed, data-based breeding decisions. Sweetpotatobase integrates many different data types, such as phenotypic and genotypic information, and associated analysis tools that can aid in improving selections. Contributing data into one system by many breeding programs enables better collaboration and builds larger models with more predictive power, increasing genetic gain. The database seamlessly interfaces with digital data acquisition tools from the PhenoApps project, including the Field Book, Coordinate, and Intercross apps. It supports the BrAPI standard for interfacing with tools and analyses. A comprehensive trait vocabulary (ontology) has been developed, constantly updated, and is used routinely in more than a dozen registered breeding programs. Databases such as Sweetpotatobase are part of the necessary foundation for any modern breeding program.