Keywords

4.1 Introduction

The use of molecular markers has had a significant impact on our understanding of the genetic basis for phenotypic expression and consequently enables applications in trait discovery (i.e., identification and functional validation of marker–trait associations) and crop improvement. Landmarks in the application of molecular markers have been achieved in the past half-century and have continued to evolve (Schlotterer 2004). Applications in sweetpotato that have benefitted from the use of molecular markers include genetic diversity, DNA fingerprint, genomic prediction, genetic and physical mapping, QTL mapping, and association mapping. Like in most crops, the earliest molecular marker technologies were deployed with some limited success and resolution. Nevertheless, some of the inferences drawn from those initial efforts still hold up following the use of better technologies. In this chapter, we highlight historical perspectives on the incremental improvements of molecular markers and their applications in sweetpotato. Besides the limitations inherent in specific molecular marker platforms, we also review how the biology and genomic landscape of hexaploid sweetpotato impacts the accuracy and utility of each molecular marker system.

Sweetpotato has benefited from molecular marker technologies since the advent of the first molecular markers, i.e., protein-based Isozymes and DNA-based Restriction Fragment Length Polymorphism (RFLP) markers. Like other species where these first-generation molecular markers were used, the use of polymorphisms in DNA-based markers was rapidly favored over the protein-based Isozyme marker system. The preferences for the DNA-based markers were mostly driven by the marker density across the genome and the technical ease of developing and generating these markers. Furthermore, the relative ease of localizing DNA molecular markers to physical genomic locations enables the use of these markers for functional analysis and marker–trait validation. For example, tightly linked or functional markers can be directly used in marker-assisted selection and/or for identifying candidate genes, which is a requirement for developing genetically modified organisms. Similarly, knowledge of alleles, allele effect estimates, and allele dosage effect, particularly in polyploids, can be highly informative and useful for evaluating how much of the phenotypic variation can be explained in controlled experiments before embarking on the implementation of time-consuming selective breeding or development of transgenic lines.

The evolution of DNA molecular markers and classification into first, second, and third-generation platforms is based on a combination of marker density (i.e., low-, medium- and high-density markers, respectively) and the strategy (i.e., assay method) used for the identification of polymorphisms, i.e., DNA–DNA hybridization, PCR, and sequence-based methods, respectively. Consequently, while the properties associated with these methods are often used as the basis for classifying them into first, second, and third-generation platforms, there are exceptions where more advanced methods incorporate methods from older technologies.

First-generation molecular markers use biochemical reactions (Isozymes), hybridization of antibodies (isozymes), and DNA probes (RFLP) to detect variants of the molecule separated on a gel matrix. Second-generation platforms are based on PCR amplification with random primers (e.g., RAPD: Random Amplified Polymorphic DNA) or sequence-specific primers (e.g., SSR: Simple Sequence Repeats, SCAR: SCAR, and STS: Sequence-Tagged Sites); a combination of PCR amplification and restriction enzyme digest (e.g., AFLP: Amplified Fragment Length Polymorphism; and CAPs: Cleavage Amplified Polymorphism); and detection of allelic differences based on physical changes in the DNA conformation of amplified fragments (SSCP: Single Strand Conformation Polymorphism). Third-generation molecular marker platforms, marked by the genomic era, typically target single nucleotide polymorphisms (SNPs) by deploying assays for allele-specific hybridization (SNP chip/array and DArT: Diversity Arrays Technology) and genome-wide sequencing. The NGS-based genotyping can be untargeted (e.g., GBS: Genotyping-By-Sequencing, and ddRADseq: double digest restriction-site associated DNA sequencing) or targeted (e.g., multiplexed PCR or hybridization-based sequence capture followed by sequencing). While third-generation platforms are designed to typically target SNPs, the start-of-the-art sequencing approaches can also identify other types of polymorphisms such as insertion-deletion polymorphisms (Indels) and short sequence repeats.

Even though sweetpotato research is supported by a small community of researchers, the diversity of molecular markers that have been used span multiple methods within each of the first-, second-, and third-generation marker platforms (Figs. 4.1 and 4.2). These highlight the importance of sweetpotato as a globally and economically important crop, as well as the interest in its polyploid evolution, domestication, and relationship with other wild diploid Ipomoae spp. (i.e., morning glories). These research interests span decades and pre-date molecular marker technologies.

Fig. 4.1
figure 1

Timeline of invention (Lewontin and Hubby 1966; John et al. 1969; Pardue and Gall 1969; Grodzicker et al 1974; Williams et al 1990; Zietkiewicz et al. 1994; Vos et al 1995; Pinkel et al. 1998; Baird et al. 2008; Elshire et al. 2011; Peterson et al. 2012; Sun et al. 2013; Wadl et al. 2018); utilization of first-; second-; and third-generation molecular markers technologies in sweetpotato and its crop wild relatives (Jarret et al. 1992; Reyes and Collins 1992; Connolly et al. 1994; Zhang et al. 2000; Veasey et al. 2008; Li et al. 2010; Shirasawa et al. 2017; Wadl et al. 2018; Bararyenya et al. 2020; Yamakawa et al. 2021; Yan et al. 2022); and landmark technologies the marker platforms depend on (Tiselius 1937; Southern 1975; Saiki et al 1985; Ronaghi et al. 1996)

Fig. 4.2
figure 2

The frequency studies published from 1992 to 2024 using various molecular marker types in Ipomoea spp.

4.2 First-Generation Molecular Marker Platforms Deployed in Sweetpotato

Following the inability to use morphological and cytogenetic markers to establish phylogenetic relationships between cultivated sweetpotato and its crop wild relatives (CWR), the earliest uses of molecular markers in sweetpotato were reported in 1992 at the USDA Agricultural Research Station, Griffin, GA, USA (Jarret et al. 1992) and North Carolina State University, Raleigh, NC, USA (Reyes and Collins 1992). The studies used RFLP and Isozyme markers, respectively, to understand phylogenetic relationships among cultivated sweetpotato and crop wild relatives, particularly species in the batatas complex. All polyploid populations were found to share almost the same number of isozymes as the diploid I. trifida, one of the putative ancestral progenitors of cultivated sweetpotato. The RFLP markers revealed diploid I. trifida and two Mexican tetraploids, an endangered I. tabascana (Austin 1988; Austin and De La Puente 1991; McDonald and Austin 1990) and an accession K233, that were closely related to cultivated sweetpotato with strong bootstrap support. This finding is in concordance with the fact that I. trifida has been observed to have traits that are required for commercial/cultivated sweetpotato, i.e., some lines develop thick roots similar to the sweetpotato, although rare, and some lines are sexually compatible with sweetpotato (Iwanaga 1988; Orjeda et al. 1990). This was reported at the first planning conference on exploration, maintenance, and utilization of sweetpotato genetic resources, held at the International Potato Center, Lima, Peru (Austin 1988). The RFLP marker data suggested two possible rounds of polyploidization events. The study with these molecular markers provided the first evidence for the allo-autopolyploid nature of the hexaploid sweetpotato.

Besides these 2 studies that used isozyme and RFLP markers in sweetpotato (Jarret et al. 1992; Reyes and Collins 1992), another study that used chloroplast-derived RFLP markers in new world Ipomoea spp. (McDonald and Austin 1990). The transition to the use of second-generation markers was a rapid shift due to their ease of use and relatively lower cost. The Isozyme marker system provides a universal protocol that produces markers that are conserved across diverse genetic backgrounds (intra- and inter-specific). The evolutionary constraints on isozyme proteins and the polymorphisms that do not inactivate enzyme activity allow for their utility in intra- and inter-specific studies. The DNA-based RFLP marker and the inclusion of a more diverse set of old and new world Ipomoea species provided the initial insights into the origins of sweetpotato and its relationship with CWR. Nevertheless, technical issues and low-throughput assays associated with first-generation molecular markers limit their routine application. For Isozyme markers, the requirement for fresh samples (or freezing of fresh material), instability of some proteins, and a limited number of markers across the genome can limit their application. They can also suffer from bias since these proteins can be a product of direct selection that is unrelated to traits of interest or phylogenetic models of species trees (Schlotterer 2004). While RFLPs can produce a higher number of markers across the genome (i.e., a high abundance of restriction sites), in practice the tedious assay limits the number of markers that can be developed. Additionally, RFLPs and isozymes require the development of DNA probes and biochemical assays, respectively, and large amounts of starting material are required. Consequently, their applications have only been limited to diversity, phylogenetic, and low-resolution linkage mapping studies. Because they are dominant markers (2 alleles represented by the presence and absence of a band), the inability to identify heterozygotes (achievable with co-dominant markers) is a major drawback for estimating allele effects. While it is impossible to determine allele dosage with isozyme makers, allele dosage can be theoretically inferred from the intensity of bands using RFLP markers, however, this is often not feasible since there is low confidence in making such an inference.

4.3 Second-Generation Platforms Deployed in Cultivated Sweetpotato

Before the emergence of SNP arrays and NGS-based genotyping, which is central to the state-of-the-art third-generation marker platforms, second-generation molecular markers were the markers of choice since the early 1990s and remain in use. Similar to the observations in other species, only a handful of second-generation markers have been consistently used in sweetpotato (i.e., RAPD, AFLP, and SSR; Fig. 4.2). Other second-generation markers occasionally used in sweetpotato include ITS, RIP, SRAP, cpSSR, ISSR, EST-SSR, and competitive allele-specific PCR markers (e.g., KASP). While KASP markers have been sparingly used in the past, there is sustained interest to continue using this marker platform for applications that require only a few markers. Since the use and results from the low-throughput KASP marker platform have not been reported in peer-reviewed publications, its frequency of use in sweetpotato is anecdotal. Some of these second-generation markers, such as AFLPs and SSRs, can produce medium-density markers (i.e., a few thousand), which allows for their utility in genetic analyses that require genome-wide marker data. Besides their use in diversity, fingerprinting, and phylogenetic analysis, they have been used to generate the first sweetpotato linkage map and QTL analyses (Kriegner et al. 2003). The limited genome resolution in large genomes such as the approximately 3 Gb genome of sweetpotato results in major gaps in linkage groups, multiple noncontiguous linkage groups that should all map to a single chromosome, and the unknown sequence context of the markers that are rarely anchored to positions in a physical reference genome assembly. Consequently, their utility for functional validation will tend to be limited. Second-generation markers suffer from this limitation since a primer pair can produce amplification products from multiple loci. Consequently, since amplicons are not sequenced the sequence context of alleles cannot be verified or confidently assigned to a physical genomic location. For example, SSR primer pairs tend to produce multiple alleles/bands (i.e., fragments with variable sizes) depending on PCR conditions, even in diploid genomes that should only produce 2 alleles per sample.

With the availability of reference genomes for hexaploid sweetpotato, the sequence context and physical position(s) for some of these second-generation markers can be determined. Although not limited to second-generation markers, genotypes with multiple alleles per locus and individual (even in diploids), are indicative of alleles that are obtained from multiple loci. Not to be confused with multi-allelic markers in polyploids (i.e., potentially up to 6 alleles per locus in a hexaploid), multi-locus markers derived from paralogous sequences can lead to erroneous interpretation in some genetic analyses and violate the assumption that markers are derived from a single locus. For example, this is one of the reasons for segregation distortion in genetic linkage maps and possibly false negatives and false positives during marker–trait associations. In the latter, it would be more problematic for single marker–trait genome-wide association analysis and to a lesser extent in interval mapping approaches that use interval mapping approaches based on markers that have been tested to segregate in mendelian fashion, i.e., no segregation distortion (Table 4.1).

Table 4.1 Publications using second-generation molecular markers for various genetic analyses in sweetpotato

RAPD markers were mostly used for diversity studies from 1994 to 2020, while AFLP markers were for genetic diversity and linkage analysis (genetic map construction and QTL analysis) from 2000 to 2014. Consequently, both RAPD and AFLP markers have not been reported for use in sweetpotato studies. On the other hand, SSR markers have remained in use since they were first in 2008. Application of SSR for genetic analyses includes genetic diversity analysis, phylogenetic analysis, genetic linkage analysis, marker–trait association, and genomic prediction. The first effort to characterize and develop SSR loci was based on EST-derived (expressed sequence tags) SSR markers. A study that aimed to develop genome-wide SSR markers revealed that of the 181,615 ESTs, a total of 8294 SSRs were identified from 7163 unique ESTs, i.e., a total of 3.9% of ESTs evaluated (Wang et al. 2011). The di-nucleotide repeats were the predominant repeats (41.2%), with AG/CT accounting for 26.9% of repeats. Other repeats in high frequency include AAG/CTT, AT/TA, CCG/CGG, and AAT/ATT, and accounted for 13.5%, 10.6%, 5.8%, and 4.5% of SSR repeats. Consequently, only 1060 high-quality SSR primer pairs were designed. Following validation, 816 primer pairs produced reproducible and strong amplificons, while 195 and 342 SSR markers exhibited polymorphism between 2 and among 8 cultivated sweetpotato clones, respectively. The medium-density marker data derived from SSR limits their application for analysis that requires genome-wide data. The co-dominant nature and ease of use (i.e., simple PCR assay and ability to resolve some polymorphism on agarose gels) make SSR markers a popular choice.

4.3.1 Application of Second-Generation Markers for the Relationship Between Sweetpotato and Its Crop Wild Relatives, and Genetic Diversity Studies

The initial use of ITS markers derived from the internal transcribed spacer (i.e., spacer DNA situated between the subunits of ribosomal RNA genes) revealed that the ITS markers poorly resolved relationships among 13 Ipomoea spp (Huang and Sun 2000). In contrast, AFLP and SSR markers were found to be more efficient in characterizing genetic diversity and phylogenetic relationships at both intra- and interspecific levels in 36 accessions that represent 10 Ipomoea spp (Huang et al. 2002). A total of 1182 AFLP bands (loci) were identified, of which 891 were polymorphic across all accessions evaluated. The AFLP markers were generated using six primer combinations. Consistent with using first-generation markers to study the relationship between sweetpotato and its crop wild relatives, I. trifida was found to be the most closely related to hexaploidy sweetpotato (I. batatas), while I. ramosissima and I. umbraticola were the most distantly related to I. batatas (Huang et al. 2002). In a study that used ITS sequences, while the nuclear ITS suggested an autopolyploid origin for sweetpotato, two I. batatas chloroplast lineages were identified (Roullier et al. 2013). More divergence was found between the I. batatas chloroplast lineages than with the closest putative progenitor, I. trifida. While this indicated allopolyploid or all-autopolyploid origin, the study also proposed two distinct autopolyploidization events involving polymorphic wild populations of a single progenitor species. Subsequent studies with high-density third-generation molecular markers all support an allo-autopolyploid origin from previous findings.

The second-generation markers routinely used for genetic diversity studies include platforms that can produce at least medium-density markers data, i.e., hundreds to a few thousand markers (Fajardo et al. 2002; Zhang et al. 2000, 2004). A total of 210 polymorphic AFLP fragments revealed the highest genetic diversity was found in Central America and the lowest in Peru-Ecuador. These results support the hypothesis that Central America is the primary center of diversity and most likely the center of origin of sweetpotato. Furthermore, while the post-Columbus dispersal of sweetpotato to Asia and the Pacific is well documented, the hypothesis that there was a prehistoric transfer of sweetpotato by Peruvian or Polynesian voyagers from Peru to Oceania has long been a controversial issue. A set of 210 AFLP markers revealed that Mexican and Oceania cultivars grouped together, while Peru-Ecuador germplasm was genetically distant from Oceania germplasm (Zhang et al. 2004). Consequently, the study suggested that Peru-Ecuador may not be the source of the Oceania germplasm.

4.3.2 Application of Second-Generation Markers for Genetic Linkage Analysis

The medium-density marker data produced from AFLP and SSR markers have been applied to genetic linkage map construction and QTL analysis. While these molecular marker platforms cannot call dosage directly, studies have classified the pseudo-diploidized codominant SSR and dominant AFLP genotypic classes to infer dosage based on the Mendelian segregation ratio. Linkage models in these studies often tested for autopolyploidy (hexasomic) and allo-autopolyploidy (tetrasomic) without direct dosage information. Double reduction events, where sister alleles move to the same gamete during meiosis and multivalent formation, were not modeled in the era of second-generation markers. In hexasomic inheritance (autopolyploids), pairing is random with all pairs of homoeologous chromosomes. Assuming sweetpotato is an allo-autopolyploid and that preferential pairing occurs, we would expect hexasomic (if there is partial preferential pairing), tetrasomic (random pairing with pairs of 4 of 6 homologous chromosomes), and disomic (random pairing with pairs of 2 of 6 homologous chromosomes) inheritance.

Based on Jones’s cytological hypotheses in sweetpotato (Jones 1967), where the other parental genotype is nulliplex, markers were classified into four types based on their segregation ratios: (1) simplex/single-dose markers present in one parent in a single copy and with a segregation ratio of 1:1 (presence: absence); (2) duplex/double-dose markers present in one parent in two copies and with hexasomic (4:1), tetrasomic (5:1), or disomic/tetradisomic (3:1) ratios; (3) triplex/triple-dose markers present in one parent in three copies and with hexasomic (19:1), tetradisomic (11:1) or disomic (7:1) ratios; and (4) double-simplex markers present in both parents in a single copy and with a 3:1 segregation ratio (Cervantes-Flores et al. 2008; Kriegner et al. 2003). Inferring dosage or mode of inheritance in this manner is only limited to cases where the other parent is a nulliplex (or simplex in both parents), i.e., can’t be applied to multiple dose marker genotypes in both parents. Furthermore, even though second-generation genetic markers can produce bridge markers (allele segregating in both parents; simplex-by-simplex marker configuration), earlier genetic linkage maps using these markers did not always use the bridge marker information to determine the 6 sets of linkage groups that correspond to the 6 sets of homoeologs. The exception is a study that partially identified some homoeologous linkage groups (Ma et al. 2020). Furthermore, no attempt was made using these markers to create a consensus map from the two parental maps. These highlight the major limitations of second-generation molecular markers for applications in genetic linkage analysis. The limitations are probably due to the inability to directly call dosage-based genotypes and the high marker density required for this genome-wide analysis.

The first attempt to construct a genetic linkage map in sweetpotato was based on a study that used 134 polymorphic markers and 76 F1 progenies (Thompson et al. 1997). While a linkage map was not constructed, the 1:1 segregation ratio from 74 polymorphic markers (presence–absence of bands in Vardaman and Regal parents) indicated linkage map construction is possible with sufficient markers and progenies. The first genetic linkage map was constructed by (Kriegner et al. 2003) using a total of 632 (Tanzania) and 435 (Bikilamaliya) AFLP markers that were mapped to and ordered on 90 and 80 linkage groups, respectively. The map lengths covered 3655.6 cM and 3011.5 cM, respectively, with an average marker interval of 5.8 cM. In this study, to determine if sweetpotato is an autopolyploid or allopolyploid, the ratio of linkage in the coupling phase to linkage in the repulsion phase and the ratio of non-simplex to simplex markers were examined. The results support the predominance of polysomic inheritance with some degree of preferential pairing that suggests an allo-autopolyploid genome. Consequently, Cervantes-Flores et al. (2008) generated 1944 and 1751 AFLP bands in Tanzania and Beauregard, with 1511 and 1303 being single-dose markers, respectively (Cervantes-Flores et al. 2008). The framework maps consisted of 86 and 90 linkage groups for Tanzania and Beauregard, respectively. The first sweetpotato map that used SSR markers was based on ISSR (inter simple sequence repeat) markers, which produced a low-resolution map with linkage groups ranging from 10.7 to 149.1 cM (Chang et al. 2009). Only 37 and 47 markers were mapped to parental maps spanning 479.8 and 853.5 cM, respectively. Another study that used only 130 EST-SSR markers, combined it with 1824 AFLP to produce a genome-wide genetic linkage map of the sweetpotato genome (Yu et al. 2014). The only case of deploying SRAP (sequence-related amplified polymorphism) markers sweetpotato is for linkage analysis (Li et al. 2010). A total of 800 SRAP markers were used to construct 2 parental linkage maps (Luoxushu 8 × Zhengshu 20) with 473 (81 linkage groups) and 328 (66 linkage groups) spanning 5802.5 and 38967.9 cM, respectively, and with average marker interval of 10.2 and 12.0 cM, respectively.

An SSR-based genetic linkage map with high resolution that comprised only SSR was constructed by using a de novo assembly of publicly available ESTs and mRNAs in sweetpotato (Zhang et al. 2016). A total of 1824 SSR markers were obtained from 1476 primer pairs. Of these, 214 pairs of primers that identified polymorphic loci produced 1278 alleles with an average of 5.97 per locus and a major allele frequency of 0.77. Another study that mapped only 210 SSR markers produced a gene linkage map that had a low resolution and only produced small linkage groups that were limited in their representation of the sweetpotato genome (Kim et al. 2017). Similarly, a map by Ma et al. (2020) that used higher marker density (484 and 573 polymorphic SSR markers) in Jizishu 1 and Longshu 9, respectively, had a significant number of the linkage groups that mostly small and low resolution (Ma et al. 2020). Most of these efforts that use hundreds of SSR markers for linkage analysis revealed the limitation of the marker density in initial SSR markers compared to studies that show that at least a few thousand markers are required for good genome coverage.

Meng et al. (2021) were able to generate 5057 polymorphic SSR markers from 571 polymorphic genomic SSR primer pairs and 35 EST-based SSR primer pairs. They produced 90 linkage groups and covered 13,299.9 cM with a marker density of 2.6 cM. Using 3009 SSR markers, the Zhengshu 20 parental map spanned 11,1229 cM, comprised 90 linkage groups, and had a marker density of 3.7 cM. The SSR primer pairs were derived from an initial 2545 primer pairs, including 1215 genomic SSR (gSSR) primer pairs and 1330 BES-SSR (bSSR) primer pairs designed from BAC-end sequences, respectively. Using a cross between Xushu 18 and Xu 781 sweetpotato cultivars and 601 SSR primer pairs, Zheng et al. (2023) generated 5547 SSR markers and 4599 SSR markers, respectively, to produce parental maps that also spanned 18,263.5 cM and 18,043.7 cM, respectively.

At an average of 8.86–9.23 markers per SSR primer pair, it is expected that a significant number of the primer pairs are non-specifically amplified products from multiple loci (paralogous sequences). The maximum number of possible alleles per individual can be no more than 6 alleles in hexaploidy sweetpotato. To resolve this ambiguity, unique SSR bands (i.e., same fragment or similar length under low resolving power of agarose gel) are often scored as dominant markers that are derived from a single locus. Nevertheless, some paralogs can produce amplicons of the same fragment length. The ability to accurately score genotypes is crucial for polyploids since markers have a high chance to erroneously fit multiple segregation ratios under hexasomic, tetrasomic, and disomic inheritance (i.e., 1:1, 4:1, 5:1, 3:1, 19:1, 11:1, 7:1, 3:1).

4.3.3 Application of Second-Generation Markers for Selection of Parental Genotypes

While second-generation markers have not been used for genomic selection in sweetpotato, to prevent the narrowing of the genetic base during breeding, knowledge of genetic relationships and diversity is important. Naidoo et al. (2022) used SSR markers for the selection of parental cultivars to maintain genetic diversity during breeding (Naidoo et al. 2022). Using 31 genotypes originating from the African and American continent and eight highly polymorphic SSR primers that produced 83 alleles, it was revealed that despite the high diversity among the genotypes, genetic distances among the genotypes were relatively low. To some extent, clustering identified three groups that reflect geographic origins and pedigree. The study suggested two heterotic groups African and American origin.

4.4 Third-Generation Platforms Deployed in Cultivated Sweetpotato

The third-generation molecular markers (aka next-generation or advanced molecular markers) are the current state-of-the-art that represents a significant advancement marked by high-throughput genotyping at a significantly lower cost than any of the other marker platforms. These molecular marker platforms use technologies that include DNA arrays (on slides or beads made from plastic or glass) or NGS. They offer improved capabilities for applications that require high-density genome-wide marker data, and allele-dose information, and for studying complex traits and complex genomic features.

Diploid and a few polyploid organisms started benefiting from early third-generation marker platforms, including SNP arrays developed in 1998 (The Whitehead Institute and Affymetrix SNP array/chip; Pinkel et al. 1998) and NGS-based genotyping (Baird et al. 2008; Balagué-Dobón et al. 2022; Elshire et al. 2011; Peterson et al. 2012). However, sweetpotato, like most complex polyploids lagged due to the prohibitive cost of developing SNP arrays for a small community of use and the inability of early NGS-based genotyping platforms to accurately capture allele dosage. Initially, the ability of SNP arrays to call allele dosage was limited and in the cases where they were used in polyploids, genotype calls were often limited to 2 × pseudo-diploidized genotypes in autopolyploids (e.g., potato) or subgenome-specific diploid genotype calls in allopolyploid (e.g., wheat) (Sun et al. 2020). Since then, the development of tools such as FitTetra (Zych et al. 2019) and ClusterCall (Schmitz Carley et al. 2017) now allow for dosage calling, although the application is limited to auto-tetraploids.

For higher ploidy levels in autopolyploids, the development of superMASSA allowed for dosage calling using a graphical Bayesian model for SNP genotyping. However, its application is limited to biparental populations since it requires Mendelian segregation information to re-classify genotypes into appropriate dosage classes (Serang et al. 2012). Although it can model all dosage configurations, it is similar to earlier approaches used for second-generation markers in that dosage calls are imputed or re-assigned based on expected Mendelian segregation rather than strictly using allelic read depth information. The approach is necessary since early third-generation marker platforms are inherently limited in their ability to accurately quantify allele dose. The polyrad (Clark et al. 2019) and updog (Gerard et al. 2018) software provide similar functionality as superMASSA but extend dosage calling to diversity panels and natural populations by using multiple features to update priors. The features in polyrad include population structure, model allele frequency gradients, rate of self-fertilization range from zero to one, and linkage disequilibrium of markers that have known physical map positions in the reference genome. The features in updog include allele bias, overdispersion, and sequencing error.

While the advent of reduced representation sequencing (RRS) democratized high throughput genotyping in model, non-model, and under-studied crops, the ability to use them for dosage calling is limited in the first iteration of RRS protocols (e.g., RADseq, GBS, and ddRADseq). This is due to the allele read depth ratios that are often highly skewed and lack uniformity in read depth across the genome resulting in a significant number of loci with low read depth that is insufficient for dosage calling. These limitations necessitated the need for tools such as superMASSA, polyrad, and updog. In studies where the first iteration of RRS methods was used, sweetpotato genotype calls were often based on 2 × pseudo-diploidized genotype calls rather than 6 × dosage calls. Examples of sweetpotato studies that used pseudo-diploid genotypes from RRS data include using DArTseq for GWAS, quality assurance and control, and genomic prediction studies (Bararyenya et al. 2020; Gemenet et al. 2020a, 2020b); and using SLAF-seq for linkage/QTL analysis (Yan et al. 2022). While analysis with the pseudo-diploidized genotypes produced meaningful results, comparison with 6 × dosage genotype calls in these studies revealed using dosage calls often produced superior results (Gemenet et al. 2020b). Other studies have attempted to use this first iteration RRS method with some limited success (Table 4.2).

Table 4.2 Publications using third-generation molecular markers for various genetic analyses in sweetpotato

Dosage-sensitive genotyping-by-sequencing (qRRS-based genotyping) has emerged as an amenable and robust strategy in polyploids due to its low cost and ability to quantitatively sequence alleles for dosage estimation. While approaches implementing quantitative reduced representation sequencing, GBSpoly and OmeSeq-qRRS, have been used in sweetpotato (da Silva et al. 2020; Gemenet et al. 2020b; Mollinari et al. 2020; Oloka et al. 2021; Wadl et al. 2018; Wu et al. 2018), multiplexed-PCR based approaches are been tested in sweetpotato and other polyploid crops. The GBSpoly protocol is a modification of the ligation-based GBS protocol that uses double-digestion with methylation-sensitive (TseI; rare cutter) and methylation-insensitive (CviAII, frequent cutter) restriction enzymes. To improve the quantitative sequencing assay, a library construction method (OmeSeq-qRRS) uses isothermal amplification (instead of a ligation approach) to incorporate barcoded adapters into genomic fragments following double-digestion of the genome with methylation-insensitive restriction enzymes (NsiI and NlaII). The methylation-insensitive restriction enzymes eliminate variability in hypomethylated and hypermethylated sequences, which are also variable across tissue types. A lower number of overall NGS reads is sufficient to achieve similar marker density in OmeSeq-qRRS compared to GBSpoly-qRRS. Additionally, the ligation bias of smaller genomic fragments and chimeric ligation are mitigated and eliminated, respectively, by using isothermal amplification.

The RRS/NGS-based genotyping is more amenable to polyploids than SNP arrays, which require significant assay development costs and accurate genome sequence data and assembly. The more recent NGS-based genotyping platforms aim to quantitatively sequence loci while maintaining allelic ratio in other to estimate dosage more accurately. Consequently, tools such as GATK, Freebayes, and Freebayes directly use quantitative sequencing information (i.e., allele read depth) for dosage-based variant calling by using haplotype-based calling. Variant calling based on dosage presents greater challenges in polyploids due to many potential genotypes at each locus, which arise from various combinations of unique alleles. Sequence reads are incapable of distinguishing identical copies of alleles, particularly when not physically linked to other heterogenous alleles. Additionally, as the ploidy level increases, determining allele copy numbers through dosage calling becomes increasingly complex at a specific read depth. The adoption of a haplotype-based strategy enhances genotyping accuracy by jointly assessing the combinations of multiple nearby alleles, known as haplotypes (Cooke et al. 2021).

While variant calling tools for polyploids assume that the genome is autopolyploid, studies in sweetpotato often indicate that it is an allo-autopolyploid and that large structural variations might exist between sweetpotato accessions (Wu et al. 2018). To address the allo-autopolyploid nature, accurate variant calling in polyploids can benefit from using multiple reference genomes of putative ancestral progenitors and haplotype-resolved genome assembly (6 sweetpotato haplomes). In addition to the sequencing of putative ancestral progenitors within the I. batatas complex (I. trifida, I. triloba, and I. tabascana), haplotype-resolved genome assemblies based on multiple sweetpotato cultivars are available (http://sweetpotato.uga.edu). A variant calling pipeline, GBSapp (Bararyenya et al. 2020; Gemenet et al. 2020b; Wadl et al. 2018), was developed to address the allo-autopolyploid nature of the sweetpotato genome by resolving sequence reads that map uniquely to haplomes or subgenomes and that are conserved across all 6 sweetpotato haplomes or subgenomes (using putative ancestral progenitors). Modeling dosage based on the number of homoeologs containing a specific sequence is particularly important for allo-autopolyploids since current variant calling tools, including haplotype-based variant calling tools, will assume and erroneously coarse genotypes to the dosage specified by the user (i.e., assumes strict autopolyploidy). In the absence of high-quality and complete haplotype-resolved reference genome assembly, which is preferred for variant calling, the GBSapp pipeline can use the known progenitors. Since the identity of the other ancestral progenitor is not known with certainty, GBSapp uses the closest ancestral diploid progenitor (I. trifida) and the more distantly related diploid (I. triloba) as reference subgenomes. This ensures that despite the evolutionary divergence of the latter, sequences conserved across both genomes would likely exist hexaploid sweetpotato and other species within the batatas species complex, hence dosage would be most likely 6 × dose.

4.4.1 Application of Third-Generation Markers for the Relationship Between Sweetpotato and Its Crop Wild Relatives, and Genetic Diversity Studies

The two studies conducted to understand the origins and domestication of sweetpotato were based on shotgun whole genome sequencing (Munoz-Rodriguez et al. 2022; Yan et al. 2024). Both studies confirmed I. trifida as a putative ancestral progenitor, that hexaploid sweetpotato is an allo-autopolyploid, and aimed to identify the putative tetraploid ancestral progenitor. These are findings that significantly advance the understanding of sweetpotato domestication and that inform if variant calling, and other genetic analyses should model sweetpotato as an autopolyploid or allo-autopolyploid. Munoz-Rodriguez et al. (2022) proposed I. aequatoriensis, a species of Mexican origin, played a direct role in the origin of the hexaploidy sweetpotato. This also underscored Central America as the origin of sweetpotato. Likewise, Yan et al. (2024) also used a haplotype-based phylogenetic analysis (HPA) to confirm that sweetpotato originated from reciprocal crosses between a diploid and tetraploid progenitor.

Genetic diversity analysis performed with two sets of USDA diversity panel (417 and 604 accessions) and using the GBSpoly-qRRS marker platform revealed similar results (Slonecki et al. 2023; Wadl et al. 2018). The clusters identified from STRUCTURE and phylogenetic analysis correspond to the geographical location that the accessions were collected from. Accessions from the Pacific Islands and Caribbean/Central American cluster within close proximity, supporting initial studies that germplasm from the Pacific Islands originated from Central America. Distinct clusters comprised of North American accessions cluster within close proximity.

4.4.2 Application of Second-Generation Markers for Genetic Linkage Analysis

Genetic linkage maps constructed with third-generation markers are typically marked by of high-density marker data set (i.e., about 30,000 markers). Early NGS-based genotyping platforms (ddRADseq and SLAF-seq) are limited to using simplex and nulliplex markers for linkage analysis (Shirasawa et al. 2017), while recent quantitative NGS-based genotyping markers (qRRS) using all marker configurations include low- and high-dose markers (Mollinari et al. 2020). Several publications have used third-generation markers for both QTL and genome-wide analysis (Bararyenya et al. 2020; da Silva et al. 2020; Gemenet et al. 2020c; Haque et al. 2020a; Haque et al. 2020b; Oloka et al. 2021).

4.4.3 Application of Second-Generation Markers in Genomic Prediction

The third-generation markers deployed for genomic prediction in sweetpotato are derived from the DArTSeq and GBSpoly-qRRS platforms (Batista et al. 2022; Gemenet et al. 2020b). Using DArtSeq (pseudo-diploidized markers) and GBSpoly-qRRS, genomic predictive abilities (PA) in a biparental population (Beauregard x Tanzania, BT) across root quality and yield-related traits revealed that models that used allele dosage information and G-matrix based additive effects have the best PA for most of the traits (Gemenet et al. 2020b).

4.5 Conclusion

Until 2018, sweetpotato lagged in the use of high throughput third-generation molecular markers since existing methods predating 2018 were limited in their ability to estimate allele dosage. Consequently, the need for high throughput and inexpensive genotyping in sweetpotato has driven innovations in quantitative genotyping-by-sequencing (or qRRS). The availability of NGS-based RRS and shotgun whole genome sequencing (WGS) data in sweetpotato has prompted the need to develop cutting-edge bioinformatic and analytical tools/pipelines for polyploid genetics that are being applied to other simple and complex polyploids. These tools include GBSapp (Wadl et al. 2018), MAPpoly (Mollinari et al. 2020), QTLpoly (da Silva et al. 2020), VIEWpoly (Taniguti et al. 2022), haplotype-based phylogenetic analysis (Yan et al. 2024), Ranbow (Moeinzadeh et al. 2020), and ngsAssocPoly (Yamamoto et al. 2020). The emergence of molecular marker platforms sensitive enough for quantifying allele dosage is revitalizing various areas of research in sweet potatoes and their wild counterparts.