Keywords

3.1 Introduction

Sweetpotato (Ipomoea batatas (L.) Lam) is widely cultivated and consumed worldwide, with a global production of 86.4 million tons in 2022 (FAO STAT). China is the leading producer, contributing 54% to the world’s total production. Sweetpotato is also a popular crop in neighboring countries such as Japan and South Korea, and research on the breeding and cultivation of sweetpotato has been actively conducted in the region. To promote exchange among scientists studying sweetpotato in East Asia, the Trilateral Research Association of Sweetpotato (TRAS) was established in 2004 by sweetpotato scientists from China, South Korea, and Japan. The inaugural symposium took place in Mokpo, South Korea, and subsequent symposiums have been held approximately every two years, rotating among the three countries. Nine international symposiums have been held to date, with the most recent one taking place in September 2022 in Xuzhou, China.

At the 5th International Sweetpotato Symposium held on Jeju Island, South Korea in 2012, agreement was reached among the three countries to undertake the construction of a reference genome for sweetpotato. After subcommittee meetings in Tokyo and Jeju in 2013, the TRAS genome sequencing consortium was formally launched in Beijing, 2014. The consortium consists of six organizations: the Jiangsu Xuzhou Sweetpotato Research Center, CAAS (China), China Agricultural University (China), Rural Development Administration (Korea), Korea Research Institute of Bioscience and Biotechnology (Korea), Institute of Sweetpotato Research, National Agriculture and Food Research Organization (Japan), and Kazusa DNA Research Institute (Japan). The composition and roles of the consortium are shown in Table 3.1.

Table 3.1 Organizational overview and role of the TRAS genome sequencing consortium

Sweetpotato is a hexaploid species with 90 chromosomes (2n = 6X = 90) and a large genome size of 4.8–5.3 pg/2C nucleus (Ozias-Akins and Jarret 1994). When de novo assembly is performed in polyploid species, it is common to advance the analysis by referencing the genome of closely related diploid species (Kyriakidou et al. 2018). Sweetpotato is the only species in the genus Ipomoea that is cultivated as a crop; among the genus’s wild species, thirteen are thought to be closely related to sweetpotato (Austin 1988). Although no definitive conclusions have been reached as to the evolutionary origin and genome structure of sweetpotato, I. trifida (H.B.K.) Don. has been considered a likely diploid progenitor of sweetpotato (Nishiyama 1971).

In 2012, when genome sequence analysis was first proposed as an appropriate project for TRAS, the genome sequences of diploid species of Ipomoea had not yet been published. Therefore, the consortium decided to conduct genome analysis, not only for the hexaploid sweetpotato but also for related diploid species. For genome assembly and transcriptome analysis in I. batatas, we used the Chinese variety ‘Xushu 18’, which is a leading variety in China, bred at Xuzhou Institute of Agricultural Sciences in Jiangsu Xuhuai District and released in 1977.

3.2 Genome Assembly of I. trifida ‘Mx23Hm’

Whole-genome sequencing and assembly was first performed for two I. trifida lines, a selfed line, ‘Mx23Hm’, and a heterozygous line, ‘0431–1’ (Hirakawa et al. 2015). The whole-genome de novo assembly was conducted using Illumina paired-end (PE) and mate-pair (MP) libraries. The assembled genome was initially employed for genetic analysis, such as SNP detection, serving as the first reference genome for I. trifida. However, due to the assembly being based solely on short reads, the scaffolds exhibited fragmentation, and connectivity at the chromosomal scale was lacking.

In order to obtain chromosome-scale scaffold sequences, the RDA group obtained a total length of 64.26 Gb PacBio subreads from ‘Mx23Hm’ and conducted whole genome de novo assembly. De novo assembly was conducted with subreads using the SMRTMAKE assembly pipeline (Chin et al. 2013), and a total of 2881 contigs were generated with a total length of 495.7 Mb. The 2881 contigs were polished with Illumina reads, and chromosome-scale scaffolding was then performed by HiRise (Putnam et al. 2016) with 471 M Hi-C reads. The 15 chromosome-scale scaffolds and the chr0 sequences were designated as Itr_r2.2 (Table 3.2). The total length of Itr_2.2 was 502.2 Mb, including total lengths of 460.77 Mb for 15 pseudomolecules and 41.47 Mb for the chr0 scaffold. Itr_r2.2 covered 97.4% of the ‘Mx23Hm’ genome, when the genome size was considered to be 515.8 Mb (Hirakawa et al. 2015), while the cover ratio of the 15 chromosome-scale scaffolds was 89.3%. The ratio of complete BUSCOs was 98.5%, including 93.4% of single-copy genes and 5.1% of duplicated genes (Simão et al. 2015). The ratios of fragmented and missing BUSCOs were 0.8% and 0.7%, respectively. A total of 34,386 gene sequences were predicted on the Itr_r2.2 genome based on ab initio and evidence-based gene models.

Table 3.2 Statistics on the assembled I. trifida ‘Mx23Hm’ (Itr_r2.2)

3.3 Genome Assembly of I. batatas ‘Xushu18’

When the TRAS genome sequencing consortium started the whole-genome de novo assembly of I. batatas ‘Xushu 18’ in 2012, long-read sequencing was expensive, and its utilization in genome assembly was not realistic. Consequently, our approach involved the use of Illumina short reads for sequencing, and the PE and MP sequences shown in Table 3.3 were obtained.

Table 3.3 Sequenced illumina short reads of ‘Xushu 18’

The genome size of ‘Xushu 18’ was estimated as 2.6 Gb on the basis of the distribution of distinct k-mers (K = 17) identified by jellyfish (Marçais and Kingsford 2011) with a total length of 215.7 PE read. The results of genome size estimation have varied across studies. For example, Ozias-Akins and Jarret (1994) reported that the 2C content of the sweetpotato nucleus was 4.8–5.3 pg/2C, while Srisuwan et al. (2019) reported it as 3.1–3.3 pg/2C. Given that the haploid genome size of the diploid I. trifida haploid is around 500 Mb, it is reasonable to assume that the genome size of sweetpotato is around 3 Gb/2C. Therefore, it was considered that the use of jellyfish (2.6 Gb) led to an underestimation due to the influences of homologous sequences across homoeologous chromosomes.

De novo whole-genome assembly was performed with Illumina short reads using three assembly tools. However, the N50 length ranged from 347 to 1598 bp, indicating significant fragmentation (Table 3.4).

Table 3.4 Results of de novo assembly of ‘Xushu 18’ with illumina short reads

Two approaches were then used for haploid-resolved genome sequence assembly: that is, DenovoMAGIC (NRGene, Israel) for Illumina and 10X Genomics reads and Falcon-unzip (PacBio) for PacBio reads (Yoon et al. 2022). The total length of primary contigs and haplotigs was 1.8 Gb (N50 = 325.5 Kb) and 336 Mb (N50 = 44.9 Kb), respectively (Table 3.5), while total and N50 lengths assembled by DenovoMAGIC were 2.4 Gb and 2150 Kb, respectively. The shorter total lengths in PacBio and DenovoMAGIC assembly are considered to be due to the integration of sequences across homoeologous chromosomes. Consequently, hybrid assembly with the Illumina scaffolds and PacBio reads were then performed by NRGene, and a total of 110,708 sequences were generated with 2.91 Gb length. The total length was close to the estimated genome size of sweetpotato, and the result suggested that hybrid assembly using Illumina DenovoMAGIC scaffolds and PacBio reads is effective for haploid-resolved assembly in autopolyploidy species.

Table 3.5 Status of whole genome assembly in I. batatas ‘Xushu 18’

To create chromosome-scale scaffolds, an S1 linkage map was constructed using the variants identified on the I. trifida genome. The dd-RAD-Seq sequences of 437 S1 individuals were mapped onto 520 scaffolds comprising the ‘Mx23Hm’ Hi-C scaffolds. A total of 534 scaffolds were aligned on the linkage map as 90 chromosome-level scaffolds. With 109,896 unplaced scaffolds, the 90 chromosome-level scaffolds were designated as IBA_r1.0. The total length of IBA_r1.0 was 2907.4 Mb, consisting of 2168.4 Mb at the chromosome level and 738.9 Mb unplaced scaffolds (Table 3.6). The ratio of complete BUSCOs assembly on IBA_r1.0 was 99.5%, including 1.7% of single-copy genes and 97.8% of duplicated genes. A total of 175,633 gene sequences were predicted for the Itr_r2.2 genome based on ab initio and evidence-based gene models.

Table 3.6 Statistics on the assembled I. batatas ‘Xushu 18’ (IBA_r1.0) genome sequences and genes

The genome sequences of the 90 chromosome-level scaffolds were then compared with I. trifida genome sequences (Itr_r2.2). There was clear macro-synteny between I. batatas and the diploid species (Fig. 3.1).

Fig. 3.1
figure 1

Comparison between I. batatas ‘Xushu 18’ genome (IBA_r1.0) and I. trifida ‘Mx23Hm’ genome (Itr_r2.2) sequences

3.4 Application of Assembled Genome Sequences for Crop Improvement and Future Prospects

The Itr_r2.2 and IBA_r1.0 genome sequences are available on Plant GARDEN (Itr_r2.2: https://plantgarden.jp/ja/list/t35884/genome/t35884.G002, IBA_r1.0: https://plantgarden.jp/ja/list/t4120/genome/t4120.G001) and have already been used for genomic and genetic analysis. For example, Suematsu et al. (2022) reported identification of a major QTL for root thickness in I. trifida using a QTL-Seq approach. A BC1F1 population derived from crosses between ‘Mx23Hm’ and ‘0431–1’ was used for the analysis, and a major QTL for root thickness (qRT1) was identified on chr06 of the Itr_r2.2 genome. Haque et al. (2023) reported genetic analysis of starch contents (SC) using 204 F1 progenies derived from a bi-parental cross between I. batatas cultivars, ‘Konaishin’ and ‘Akemurasaki’. Base variants were identified on the Itr_r2.2 genome, and significant QTL for SC were identified on Chr15. One of the candidate genes located on the QTL regions, IbGBSSI, was considered to be involved in starch accumulation in sweetpotato root, by the results of qRT-PCR analysis.

For the expression analysis of starch, anthocyanin, and carotenoid genes in I. batatas tissues, RNA-Seq analysis was performed on RNAs extracted from the leaves at 42 days after transplantation (DAT), stems at 42 DAT, and roots at 90 DAT (Yoon et al. 2022). The fragments per kilobase of transcript per million mapped reads (FPKM) values were calculated on the genes predicted on the I. batatas genome, IBA_r1.0. Significantly high expressions were observed in roots for starch pathway genes. Conversely, in the leaves, the robust expression of genes associated with anthocyanin genes was observed.

Sweetpotatoes are utilized for a diverse range of purposes, including food and processed products such as starch, distilled spirits and natural colorants. Given the various applications, breeding goals for sweetpotato are diverse, necessitating genetic analyses across a multitude of traits. According to the comprehensive review by Yan et al. (2022), previous genetic analyses have predominantly focused on yield, root development, quality, and biotic resistance. Until recently, genetic analyses were predominantly conducted using the genome sequences of diploid species like I. trifida. However, the recent completion of the hexaploid genome sequence now paves the way for more advanced analyses. While the sweetpotato genome structure has been suggested to be either complete auto-hexaploid or auto-allo-hexaploid, elucidating the extent of genome sequence variation among homoeologous chromosomes and the conservation of gene sequences on these chromosomes is a task for the future. This advancement is anticipated to enhance our understanding of how genes governing target traits are regulated across homologous chromosomes, enabling more precise breeding strategies.

In the era of climate change, when food production faces escalating challenges, sweetpotato, with its relatively stable yields even in marginal lands, is expected to attract greater attention as a source of nutrition. The sweetpotato genomes, including those created by the TRAS consortium, are poised to serve as a crucial information resource for accelerating global sweetpotato breeding efforts. As we anticipate difficulties with food production amid changing climates, leveraging the genomic information of sweetpotato will become crucial for developing resilient crops and ensuring global food security.