Keywords

2.1 Introduction

Sweetpotato, Ipomoea batatas (L.) Lam. (2n = 6x = 90), is a globally important crop with an annual world production of more than 90 million tons over the past ten years (https://www.fao.org/faostat). It originated in South America (Roullier et al. 2013a; Muñoz-Rodríguez et al. 2018) and has been domesticated for more than 5000 years (Austin 1988). Sweetpotato is a rich source of carbohydrates and many other essential nutrients. Due to its hardiness and good yield, sweetpotato plays a vital role in food security to alleviate famine, especially in Africa and Southeast Asia (Loebenstein 2009). In addition, biofortification with provitamin A-rich orange-fleshed sweetpotato in sub-Saharan Africa has greatly reduced diseases caused by vitamin A deficiency in children under five years old. The World Food Prize in 2016 was awarded to scientists who pioneered this effort, highlighting the significance of sweetpotato in shifting human health outcomes.

Despite the importance of sweetpotato, its improvement has been hindered due to its genetically complex polyploid nature and the lack of robust reference genome sequences only until recently (Wu et al. 2018). Reference genome sequences of major crops and model species in early 2000s has revolutionized plant biological research and breeding (The Arabidopsis Genome Initiative 2000; International Rice Genome Sequencing Project, Matsumoto et al. 2005; Schnable et al. 2009). A surge in plant genome sequencing has enabled genomic studies using a single reference genome combined with the sequences of a large panel of individuals, which has allowed for deeper understanding of genetic diversity present in the crop species mainly at the single nucleotide polymorphism (SNP) level (Wang et al. 2018; Zhao et al. 2019). Pan-genomes that represent the genetic diversity of an entire species have greatly advanced plant breeding and evolutionary studies (Bayer et al. 2020; Della Coletta et al. 2021). More recently, having access to multiple reference-grade genome assemblies enabled by rapid advances in sequencing technologies with a goal to understand structural variants (SVs) that contribute substantially to genomic and phenotypic diversity (Alonge et al. 2020), graph-based pan-genomes capturing the entire genome content including SVs of a species have been employed to facilitate the discovery of casual genetic variants of traits and the understanding of the evolution and domestication of crops (Zhou et al. 2022; Tang et al. 2022; Li et al. 2023). High-quality reference genomes and a pan-genome of hexaploid sweetpotato would provide valuable resources for genomics-enabled breeding of this important crop.

2.2 Complexity of the Hexaploid Sweetpotato Genome

Several factors have made de novo assembly of a sweetpotato genome challenging. The cultivated sweetpotato is a hexaploid with an estimated genome size of approximately 3.0 Gb (3.05 pg/2C nucleus) and with 90 chromosomes (six sets of 15 chromosomes). Polysomic inheritance observed in sweetpotato (Mollinari et al. 2020) negates an allopolyploid origin involving three divergent donors as seen in the allohexaploid (AABBDD) bread wheat, Triticum aestivum (International Wheat Genome Sequencing Consortium 2018). To date, the origin of the hexaploid sweetpotato remains controversial with several hypotheses proposed. An initial scenario suggested the involvement of two diploid relatives I. trifida (2n = 2x = 30) and I. triloba (2n = 2x = 30) (Austin 1988). Another hypothesis invokes autopolyploidization within I. trifida (Roullier et al. 2013b; Muñoz-Rodríguez et al. 2018). Morphologic (Austin 1988), cytogenetic (Srisuwan et al. 2006) and molecular evidence (Roullier et al. 2013b) have shown the close relationship between I. trifida and I. batatas. A third hypothesis suggested that I. batatas originated from allopolyploidization between I. trfida and a recently identified closely related wild autotetraploid species, I. aequatoriensis (2n = 4x = 60) (Muñoz-Rodríguez et al. 2022). Most recently, a fourth hypothesis suggested the contribution of both I. aequatoriensis and wild tetraploid I. batatas (2n = 4x = 60) to the hexaploid sweetpotato genome (Yan et al. 2024). These hypotheses have yet to be tested using a phased haplotype-resolved chromosome-scale hexaploid sweetpotato genome.

In addition to the high ploidy level, self-incompatibility and clonal propagation have led to high heterozygosity in the hexaploid sweetpotato genome. High heterozygosity in a genome usually leads to fragmentated genome assemblies even for diploid species, especially for short read-generated assemblies (Pryszcz and Gabaldón 2016). A previous effort attempted to utilize this highly heterozygous nature of hexaploid sweetpotato genome to generate a haplotype-resolved sweetpotato genome using short reads (Yang et al. 2017). However, due to the limitation of short reads, the resulting 836-Mb (larger than the estimated 500-Mb monoploid genome size due to redundancy) consensus genome assembly was incomplete and contained many misassemblies (Wu et al. 2018), limiting its use as the reference genome for sweetpotato.

2.3 Efforts in Sequencing the Genomes of Diploid Relatives, I. trifida and I. triloba

Smaller genome size and simpler chromosome composition of diploid relatives of polyploid crops offer substantial advantages for genomic research and breeding. Self-compatible diploid species, such as woodland strawberry (Fragaria vesca) (Shulaev et al. 2011) and diploid cotton Gossypium raimondii (Wang et al. 2012), as well as Triticum urartu (Ling et al. 2018) and Aegilops tauschii (Luo et al. 2017) (diploid progenitors of the A and D subgenomes, respectively, of the hexaploid wheat) were first sequenced to serve as reference genomes of the polyploid crops. In more complicated cases, for self-incompatible diploid progenitors, homozygous doubled monoploid lines were developed for constructing high-quality reference genomes of polyploid crops such as potato (The Potato Genome Sequencing Consortium 2011) and modern rose (Saint-Oyant et al. 2018).

While genome sequences of I. trifida lines were first released in 2015, they were fragmented and incomplete (Hirakawa et al. 2015). The first reference-grade genome sequences of diploid wild relatives of cultivated hexaploid sweetpotato were reported in Wu et al. (2018), and are available at Sweetpotato Genomics Resource (http://sweetpotato.uga.edu). I. trifida NCNSP0306 and I. triloba NCNSP0323 were selected for reference genome sequencing. I. triloba NCNSP0323 is a highly homozygous inbred line derived from PI 618966 originally collected in Michoacan, Mexico. I. trifida NCNSP0306 is a self-compatible inbred line with a relatively low level of heterozygosity (0.24%) derived from PI 540724 originally collected in Magdalena, Colombia. The genomes were sequenced mainly using the Illumina short-read technology. Illumina short reads from paired-end genomic libraries and mate-pair libraries with different insert sizes ranging from 2 to 40 kb were generated and used for genome assembling, resulting in assembled scaffolds with N50 lengths of 1.2 and 6.9 Mb for I. trifida and I. triloba, respectively. PacBio long reads were also generated and used for gap-filling, and de novo-assembled BioNano maps were used to refine the assemblies. The final assemblies were 462.0 Mb and 457.8 Mb for I. trifida and I. triloba, respectively, and each was anchored and oriented onto the 15 chromosomes using a high-density genetic map. A total of 32,301 and 31,423 protein-coding genes were predicted in I. trifida and I. triloba genomes, respectively. More than 88% of the genes were assigned with putative functions by comparing their protein sequences to various public protein and domain databases.

The high-quality I. trifida and I. triloba reference genomes enabled comparative genomic analyses to uncover Ipomoea lineage-specific expanded gene families that function in storage root development and defense (Wu et al., 2018). Syntenic blocks within the I. trifida or I. triloba genome revealed a whole-genome triplication (WGT) event specific to the Ipomoea lineage that occurred around 46 million years ago (Wu et al. 2018). Functional enrichment analysis of genes induced by stress treatments suggested that the WGT event played a critical role in adaptation. Key genes associated with storage root development were also found to be contributed by this WGT event. The I. trifida reference was used to study the expressions of hexaploid sweetpotato orthologs that are involved in abiotic stress tolerance (Lau et al. 2018; Kitavi et al. 2023) and disease resistance (Bednarek et al. 2021), providing candidates for breeding and research. Furthermore, these diploid references facilitated the identification of genes and alleles associated with high β-carotene content in cultivated hexaploid sweetpotato (Wu et al. 2018) and the negative association between β-carotene and starch contents due to physical linkage (Gemenet et al. 2020). These findings demonstrated the robustness of the diploid I. trifida and I. triloba reference genomes in supporting studies that investigate genetic and molecular bases of sweetpotato agronomic traits.

More than 92% of genomic reads generated from the hexaploid sweetpotato could be aligned to the diploid references (Wu et al. 2018), permitting the construction of an ultra-dense phased genetic map to characterize the inheritance system in hexaploid sweetpotato (Mollinari et al. 2020). Sweetpotato accessions from the Mwanga diversity panel (MDP) have been extensively used as parents in African sweetpotato breeding programs (David et al. 2018). The MDP contains a total of 16 accessions, including breeding lines, cultivars, and landraces sourced from different areas across Uganda, as well as a few selected introduction lines. The I. trifida and I. triloba genomes were used as references to call SNPs from genome resequencing data of the 16 MDP sweetpotato lines. The resulting high-density robust polymorphic marker set confirmed the highly heterozygous nature of hexaploid sweetpotato, revealed the population structure of these key breeding lines, and improved the delineation of their phylogenetic relationships. Genomic read mapping depth of hexaploid sweetpotato to the diploid genomes also revealed chromosomal aberrations in hexaploid lines. NASPOT 5, a member of the MDP, was identified to be a double monosomic line with 88 instead of 90 chromosomes, which was confirmed using cytogenetics (Wu et al. 2018). These results showcase the usefulness of diploid reference genomes in characterizing the genetic and genomic features of hexaploid sweetpotatoes.

2.4 Development of Chromosome-Scale Haplotype-Resolved Hexaploid Sweetpotato Reference Genomes

The diploid reference genomes can serve as a fundamental tool for modern breeding of polyploid crops. However, the homozygous diploid genomes cannot fully represent the genes and allele diversity in polyploid genomes, and polyploid references are still required for detecting and studying polyploid-specific loci and genes that control agronomically important traits. Continual improvement of genome assembly algorithms and sequencing technologies that produce longer and more accurate reads combined with phase information from genetic maps have allowed for the de novo assembly of phased heterozygous diploid genomes such as those of apple and potato (Sun et al. 2020; Zhou et al. 2020). However, haplotype phasing and construction of chromosome-level assemblies of highly heterozygous autopolyploid genomes remain challenging due to the presence of more than two haplotypes with highly similar sequences. Chromosome conformation capture (Hi-C) sequencing data have been applied to resolve haplotypes and achieve chromosome-scale autopolyploid assemblies (Zhang et al. 2018, 2019; Healey et al. 2024). Yet, there is no straightforward standard method for assembling complex autopolyploid genomes. For example, in addition to PacBio high-accuracy long reads (HiFi) and Hi-C data, single-cell sequencing of diploid gametes was used to separate reads derived from different haplotypes to resolve collapsed contigs in a tetraploid potato genome assembly and reconstruct the sequences of all four haplotypes (Sun et al. 2022).

To facilitate sweetpotato breeding and research, hexaploid cultivated sweetpotato accessions, including Beauregard, Tanzania and New Kawogo, were selected for genome sequencing. De novo assembly of these hexaploid genomes were performed using the haplotype-resolved assembler hifiasm (Cheng et al. 2021) with PacBio HiFi reads. Phased genetic maps constructed using a full-sib population derived from a cross between Tanzania and Beauregard (Mollinari et al. 2020) and another cross between Beauregard and New Kawogo (unpublished) and Hi-C data were then utilized for haplotype phasing and pseudochromosome construction. The resulting assembly of Beauregard has a total size of 2.70 Gb, with 2.54 Gb sequences (94.1% of the total assembly) anchored into haplotype-resolved 90 chromosomes. For Tanzania and New Kawogo, 2.78 Gb and 2.77 Gb were assembled with 2.53 Gb and 2.49 Gb anchored to the 90 chromosomes, respectively. Genetic map, Hi-C contact signals along the pseudochromosomes, and HiFi read alignments were further used to manually curate misassemblies. Assessment of the Beauregard, Tanzania and New Kawogo genome assemblies, including overall benchmarking universal single-copy orthologs (BUSCO) analysis (Simão et al. 2015), k-mer based evaluation of completeness, and inference of collapsed sequences in the assemblies based on read coverage, indicate that these phased assemblies are highly complete.

Genome annotation has been performed for the Beauregard, Tanzania and New Kawogo assemblies using a custom annotation pipeline for sweetpotato, resulting in the prediction of 225,111, 234,617 and 230,838 high-confidence gene models in these three assemblies, respectively, with numbers of genes in a haplome (one set of chromosomes) ranging from 34,682 to 38,505. More than 99% of the conserved plant genes were found complete in the predicted genes, indicating high completeness and quality of the genome annotation. The three genome assemblies and predicted genes have been made available to public and private research communities as a resource to facilitate sweetpotato biological discovery and breeding through Sweetpotato Genomics Resource (http://sweetpotato.uga.edu/). These hexaploid sweetpotato reference genomes provide more precise sequences for genome editing than diploid references. The phased genome assembly representing all six haplotypes in cultivated sweetpotato is crucial for studying the role of dosage and allele-specific gene expression in conferring traits of interest. In addition, through comparative genomic analyses, the chromosome-scale haplotype-resolved assemblies will be central to revealing the origin of hexaploid sweetpotato.

2.5 Towards the Development of a Sweetpotato Pan-Genome

Extensive structural variations have been found within and among tetraploid potato genomes (Hoopes et al. 2022). The same is expected for hexaploid sweetpotato genomes. Indeed, sweetpotato exhibits both intra- and inter genome structural variations. For example, our comparative analysis has detected large inversions among different haplotypes within the same sweetpotato genomes. Furthermore, aneuploidy has been discovered in cultivated sweetpotato, demonstrating an extreme form of presence/absence variation (PAV) between accessions (Wu et al. 2018).

To assist genetic analyses of complex biological traits in sweetpotato and capture causal genetic variants, a pan-genome comprising variations from different hexaploid sweetpotato accessions exhibiting contrasting traits and covering various breeding interests will be developed. A recently reported tetraploid potato pan-genome focused on the genic portion, which described the variation in gene content between haplomes as well as between accessions that result in a highly complex transcriptome in tetraploid potato (Hoopes et al. 2022). We also hypothesize that similar to potato (Hoopes et al. 2022), the clonally propagated hexaploid sweetpotato will be littered with dysfunctional and deleterious alleles, due to the inability to purge non-functional alleles via meiosis. In addition to gene PAVs, genomic variations outside genes have been found to explain a substantial proportion of phenotypic variations, and several agronomically important traits have been found to be controlled by gene regulation (Rodgers-Melnick et al. 2016; Alonge et al. 2020). Access to multiple reference-grade phased hexaploid sweetpotato genomes will provide the opportunity to construct a graph-based pan-genome to integrate intra- and inter-genomic variations from multiple and diverse accessions under a single genome coordinate system. This sweetpotato pan-genome graph will capture small and large genomic variants in both genic and intergenic regions and permit determination of their contribution to phenotypic variation. It will also serve as the foundation for biological discovery and improvement of sweetpotato breeding.

2.6 Summary

In summary, the genome sequences of diploid wild relatives I. trifida and I. triloba that we developed are useful in improving our knowledge of the mode of inheritance in hexaploid sweetpotato and the genetic basis of important agronomic traits. Recent advances in sequencing technologies and computational algorithms have allowed us to assemble the complex genome of hexaploid sweetpotatoes. The chromosome-scale haplotype-resolved hexaploid genomes present an ample opportunity for facilitating sweetpotato breeding and a deeper understanding of genetic and molecular mechanisms underlying complex traits. In the future, a pan-genome that integrates all genomic variations into a single graph will be developed. Bioinformatic tools that can effectively utilize the pan-genome to associate the variants to phenotypes will be helpful to realize the potential of these genomic resources in sweetpotato breeding.