Keywords

5.1 Introduction

Genetic linkage mapping is a fundamental tool for understanding the inheritance mechanism in cultivated crops. It is important for identifying and characterizing genomic regions associated with agronomic traits, supporting evolutionary studies, and assisting the assembly of reference genomes. Linkage analysis has been playing an important role in biology and, more specifically, in genetic studies. It started with the investigations of Thomas Hunt Morgan and his team conducting pioneering experiments with Drosophila melanogaster (fruit flies) that led to the discovery of gene linkage and the concept of genetic recombination (Morgan 1911). Further, he and his team, including his doctoral student Alfred Sturtevant, noticed that the frequency of certain traits being inherited together varied in a way that could be correlated with their relative positions on the chromosome.

Building on these observations, Sturtevant, constructed the first genetic map, demonstrating the linear arrangement of genes on a chromosome (Sturtevant 1913). This work fundamentally showed that the closer two genes were to each other on a chromosome, the less likely they were to be separated during genetic recombination, leading to the concept of linkage maps. Their contributions were facilitated by key characteristics of the Drosophila model, including fast and easy manipulation of experimental populations (small and short-life organism that allows inbreeding) and a small number of large chromosomes (2n = 2x = 8) that carry the genetic information of relatively simple morphological traits, such as eye color, wing format, and wing size. However, other organisms proved to be more complex than Drosophila, and it soon became clear that, although the basic concepts of linkage remained the same, extensions and modifications to the initial linkage analysis were necessary.

In the initial works on genetic linkage maps, a widely accepted assumption was that the linkage phase, i.e., the arrangement or orientation of alleles in the homologous chromosomes, was known for the parents in the studied population. Knowing the parental linkage phases simplifies the task of distinguishing between recombinant and non-recombinant individuals. This applies particularly to the most frequently utilized crossing structures, including F2 generations, backcrosses, and Recombinant Inbred Lines (RILs) (Mollinari et al. 2009). In that context, the process of building a genetic map can be summarized into three steps: (1) calculating the recombination fractions for all pairs of markers; (2) grouping markers into linkage groups; (3) ordering markers within linkage groups. While these three steps remain valid and de novo maps can still be constructed by applying them, recent advancements in sequencing technologies have facilitated the use of prior genomic information to group and order genetic markers more effectively. For the first step, calculating the recombination fractions is contingent upon the crossing structure of the mapping population. These calculations are detailed extensively in (Liu 1998), and usually follow the general formula:

$$rf_{ij} = \frac{R}{R + NR}$$

In this formula, \(rf_{ij}\) is the estimated recombination fraction between loci \(i\) and \(j\), \(R\) is the number of observed recombinant offspring, and \(NR\) is the number of observed non-recombinant offspring for both loci. The calculated recombination fractions can be readily converted into genetic distances, under certain assumptions, by employing one of the following mapping functions (Morgan 1917; Haldane 1919; Kosambi 1943):

$$d_{ij} = r_{ij}$$
$$d_{ij} = - \frac{1}{2}{\text{ln}}\left( {1 - 2r_{ij} } \right)$$
$$d_{ij} = \frac{1}{4}{\text{ln}}\left[ {\frac{{1 + 2r_{ij} }}{{1 - 2r_{ij} }}} \right]$$

where \(d_{ij}\) is the genetic distance between loci \(i\) and \(j\) in centimorgan (cM), and \(r_{ij}\) is the recombination fraction between loci \(i\) and \(j\).

Estimating the recombination fraction can become complex when the dataset has missing observations or when deriving the population from homozygous lines proves impractical. Under these circumstances, it becomes essential not only to tackle the missing data, but also to accurately infer the linkage phase configurations to ensure the integrity of the genetic map construction process. In such cases, more sophisticated methodologies are required, such as the general maximum-likelihood-based algorithm for simultaneously estimating linkage and linkage phases for markers with varying degrees of missingness, presented by Wu et al. (2002a, b) and implemented in the R package OneMap (Margarido et al. 2007).

This scenario becomes significantly more complex when dealing with organisms that possess multiple copies of their entire chromosome set, known as polyploids. Polyploid organisms are classified into autopolyploids, where the multiple chromosome sets originate from the same species, and allopolyploids, where the sets come from different species. In most instances, allopolyploids demonstrate segregation patterns similar to diploids, primarily because their homologous chromosomes usually form bivalents within each sub-genome (preferential pairing). On the other hand, autopolyploids often display either random bivalent formation or the formation of multivalents during meiosis, leading to more complex polysomic segregation patterns (Sybenga 1975; Soltis and Soltis 1993; Osborn et al. 2003; Mollinari and Garcia 2019).

In this chapter, we integrate the foundational principles of genetic linkage mapping and the specific complexities of the autohexaploid genome of sweetpotato (Ipomoea batatas (L.) Lam., 2n = 6x = 90). We examine the challenges and specialized methodologies required to build genetic linkage maps in a polyploid context, highlighting the differences from simpler diploid genetic models. First, we outline the differences in types of experimental populations commonly used in genetic mapping, such as backcrosses, F2, and RILs. We discuss the limitations that often restrict their application in polyploid mapping, specifically focusing on sweetpotato linkage mapping. Subsequently, we examine the inheritance patterns in autopolyploid organisms and then examine genotyping techniques employed in constructing genetic maps for autopolyploids, providing a historical perspective on the use of these techniques in the genetic mapping of autopolyploids. Finally, we concentrate on the history of sweetpotato genetic mapping and present our group's contributions to the current state-of-the-art in sweetpotato genetic maps.

5.2 Types of Experimental Populations Used in Genetic Mapping

Experimental populations are essential in various genetic mapping studies, including linkage analysis, quantitative trait loci (QTL) mapping, and candidate gene identification. The available choices and definitions of optimal experimental populations for a given species depend on its particularities and the objectives of the experiment (Doerge et al. 1997; Lynch and Walsh 1998; Doerge 2002). This can include their reproductive mechanisms, genetic diversity, feasibility of controlled crosses, and availability of resources. Such populations are designed to achieve specific goals in genetic mapping studies or  to meet  predefined objectives in a breeding program.

In several diploid species, the possibility of selfing individuals and obtaining inbred lines has been utilized as an advantageous tool for breeding and genetic mapping studies. In major diploid crops, such as maize, soybean, and rice, experimental populations for genetic mapping are usually derived from crosses between homozygous or inbred lines. Depending on the predominant reproductive mechanism of the target species, these inbred lines can be readily available, obtained by repeated self-fertilization of heterozygous material, or by double-haploidization techniques. Once two inbred lines are crossed, all first-generation (F1) individuals will be identical hybrids, while individuals formed in later self-pollination stages will segregate and increase homozygosity accordingly. Several experimental populations can be obtained by backcrossing or selfing strategies using the F1 individuals and founder parents (Lynch and Walsh 1998).

In experimental populations derived from inbred lines, the founder individuals will present homozygous genotypes for all loci. Consequently, the offspring will be composed of the known founder genotypes recombined according to the genetic distances between loci. This is valid for all inbred-based designs, such as RILs, backcrosses, Nested Association Mapping (NAMs), F2, and others. In such cases, the only variables to be estimated are the recombination frequency rates between markers in the genome (Liu 1998; Broman and Sen 2009). Hence, the phasing procedure, i.e., assessing the haplotype composition of individuals in the population, becomes trivial since the founder haplotypes are known by design.

However, many crop species do not tolerate self-pollination or the obtention of inbred lines. Several biological mechanisms might be in place to circumvent inbreeding, which include dioecy, chasmogamy, self-incompatibilities, spatial and temporal barriers, and others (Soltis and Soltis 2012). When one or more of these mechanisms are present, outcrossing might become the major form of reproduction, and the genetic structure of the populations can be associated with high levels of heterozygosity. Experimental populations can be obtained in such cases, but since founder genotypes will be composed of several heterozygous loci, the F1 population will present genetic segregation and the assortment of alleles in the homologous chromosomes is not defined by design (i.e., parental haplotypes are unknown).

Dealing with outcrossing species is usually associated with additional layers of complexity that sit on top of well-known practical challenges, such as making controlled crosses between individuals that present incompatibilities, unsynchronized reproductive maturity, production of a small number of seeds, and others. From a genetic perspective, the complexity often lies in the unknown linkage phase of heterozygous genotypes. These heterozygous loci  complicate the analysis, as it becomes necessary not only to estimate the recombination frequencies but also to determine the genetic linkage phases between loci. Since one depends on the other, this dual estimation task significantly contributes to the complexity of genetic analysis (Wu et al. 2002a, b). It can be exponentially challenging in other scenarios, such as complex crossing schemes (i.e., diallel crosses and breeding designs), especially with higher ploidy levels (Serang et al. 2012; Mollinari and Garcia 2019).

Similarly as observed for outcrossing diploid species, obtaining homozygous lines in most autopolyploid crops, such as sweetpotato, becomes impractical. While a high level of homozygosity can be achieved with five to six self-generations in diploid species, this number is much higher in autopolyploids. For instance, in diploids, it is possible to obtain approximately 97% of homozygosity with five self-generations from a heterozygous individual Aa (Fig. 5.1). On the other hand, in autohexaploids, this level of homozygosity could only be obtained after 34 self-fertilizations from a heterozygous individual AAAaaa. Despite the impracticality of the long time necessary to obtain that many generations, most autopolyploids present biological mechanisms to prevent inbreeding, such as inbreeding depression, incompatibilities and constraints to self-fertilization, which makes the obtention of homozygous materials much more difficult.

Fig. 5.1
figure 1

Frequency of genotypic classes through selfing heterozygous in diploid (Aa), tetraploid (AAaa), and hexaploid (AAAaaa) genotypes, respectively. Notice that diploids achieve 97% homozygosity in five self-generations, while in hexaploids, only 56% of the genotypes are homozygous in the same number of self-generations

Therefore, polyploid genetic mapping studies have extensively used experimental populations derived from bi-parental (or full-sib) crosses between heterozygous genotypes due to their practical aspects, relatively low resource demand, and genetic properties. Various statistical genetics methods have been developed to capture the Mendelian inheritance and facilitate genetic mapping studies in outcrossing and polyploid species (Wu et al. 1992, 2004; Grattapaglia and Sederoff 1994; Hackett et al. 2001; Luo et al. 2004; Bourke et al. 2018b; Mollinari and Garcia 2019) and examples of studies utilizing full-sib populations in outcrossing and polyploids are vastly available in the literature (Hackett et al. 2001, 2013; Ming 2001; Hackett 2003; Pastina et al. 2011; Margarido et al. 2015; Balsalobre et al. 2017; Ferreira et al. 2019; Deo et al. 2020; Mollinari et al. 2020; da Pereira et al. 2020; Cappai et al. 2020).

In addition to full-sib populations, other experimental population designs have been recently developed and employed in genetic mapping studies to increase genetic diversity and capture different allelic combinations. Diallel crosses, where multiple full-sib populations are obtained by crossing a set of founders in all possible combinations (complete diallel) or partial combinations (partial diallels), have been increasingly used for both diploid (Rosyara et al. 2013; Bink et al. 2014) and polyploid species (Zheng et al., 2021). These experimental populations provide a broader genetic basis for mapping studies and can enhance the detection of QTL associated with complex traits. In addition to their benefits, these populations might be already in place or have the potential to be utilized for different purposes other than discovery studies, which is especially relevant for breeding programs with limited resources. Similarly to the advancements in studying bi-parental outcrossing and polyploid populations, novel statistical methods have been developed and implemented for analyzing populations resulting from diallel crosses (Amadeu et al. 2021; Zheng et al. 2021).

Other variations of experimental population designs have also been used in genetic mapping studies, including factorial, top-crosses, and poly-crosses, all composed of a combination of multiple full-sib populations where parental haplotypes were recombined in a single generation. These populations can also benefit from tools developed to analyze diallel populations (Amadeu et al. 2021) since their basic unit and genetic structure are essentially the same. Other experimental population structures can involve multiple generations, such as NAMs (Yu et al. 2008; Buckler et al. 1979; Nice et al. 2017; Song et al. 2017) and the Multiparent Advanced Generation Inter-Cross (MAGIC) design (Huang et al. 2015). However, despite their adoption for genetic mapping studies in diploid species where inbred lines are possible (Yu et al. 2008; Buckler et al. 2009), no statistical methods or tools are available to perform genetic mapping studies in populations that involve multiple generations for outcrossing or polyploid species so far.

The ultimate challenge in genetic mapping studies is the inclusion of more complex scenarios and population structures closer to a true breeding population. These can include multiple combinations of various experimental population design in both diploid and polyploid settings, with multiple generations and even populations derived from individuals with mixed ploidy levels. Including such complex structures in genetic mapping studies would constitute the tipping point for most breeding programs, where populations developed for production may also be used for discovery purposes.

Although using breeding populations for genetic mapping studies would be ideal, many other challenges arise in such scenarios due to their complexity. They usually involve multiple crosses between highly heterozygous genotypes across multiple generations, which can also include a lack of genetic relatedness between individuals. These conditions impose several difficulties to track and account for recombinations, especially when high ploidy levels and multiple generations are involved. Thus, the design and analysis of experimental populations require careful consideration of the ploidy level, allele dosage effects, and statistical methods suitable for the species under study. Furthermore, computational tools and algorithms specifically tailored for polyploid mapping are crucial for accurate genetic analysis in these species.

5.3 Inheritance Patterns in Sweetpotato

The key element in a genetic mapping study is inferring the genome of offspring individuals based on the recombination of the founders’ genomes. Addressing this problem in sweetpotato requires careful consideration of its unique meiotic characteristics. Sweetpotato can be classified as a functional autopolyploid. This classification is significant because, despite the ongoing debate in the scientific literature about the origin of its multiple genomes, sweetpotato displays inheritance patterns typical of an autohexaploid. These patterns are characterized by predominant bivalent formation with random chromosome pairing, and the sporadic occurrence of multivalents during meiosis (Mollinari et al. 2020).

In polyploids, the formation of viable gametes usually involves each resulting cell receiving a balanced subset of the organism's multiple chromosome sets. During meiosis, chromosomes undergo pairing and segregation. However, unlike diploids, where each chromosome pair segregates into different gametes, polyploids must manage multiple sets of chromosomes. The outcome is the production of gametes with a balanced number and combination of chromosomes, which is crucial for the genetic stability and viability of the offspring (Zielinski and Mittelsten Scheid 2012; Soares et al. 2021). In this context, we typically expect gametes to contain half the ploidy level of the total chromosome set. As sweetpotato is an autohexaploid with a basic number of 15 chromosomes (2n = 6x = 90), a balanced gamete would ideally contain three copies of each of the 15 homology groups, resulting in a total of 45 chromosomes (n = 3x = 45) (Fig. 5.2).

Fig. 5.2
figure 2

Diagram showing a balanced sweetpotato gamete with three copies of each of the 15 chromosomes

Note that there are 20 possible ways to choose three homologs from a complete set of six, represented by the binomial coefficient \(\left( {\begin{array}{*{20}c} 6 \\ 3 \\ \end{array} } \right) = 20\). When combining two gametes in a simple cross, a staggering 400 genotypes can emerge in a single generation for a specific genome position. This number is 100 times greater than the four genotypes expected in a diploid cross, even though the ploidy level is only three times higher. This example underscores the exponential increase in genetic complexity associated with higher ploidy levels in autopolyploids (Mollinari et al. 2020). Even though there are 400 possible genotypes for a single position in a hexaploid cross, modern molecular techniques, such as Single Nucleotide Polymorphisms (SNPs), predominantly yield biallelic data. This characteristic often leads to a scenario where, depending on which specific homolog exhibits a given biallelic variation, several resulting genotypes are categorized into broader classes rather than identified individually. This grouping produces intricate and complex segregation patterns in the analysis, presenting additional challenges in accurately interpreting and understanding the genetic behavior and inheritance in these species. Figure 5.3 compares diploid and hexaploid crosses when evaluated using two types of markers: a completely informative multiallelic marker, and a biallelic marker. For the diploid case (A) with a biallelic marker, each homolog is assessed either with variants A or a, and the four possible genotypes are reduced to three genotypic classes where only the two heterozygous combinations are merged to a single class and cannot be distinguished. In the hexaploid context (B) with a biallelic marker, each parent contributes three homologs with variant A and three with variant a (triple-dose or triplex marker), but now the 400 possible genotypes are merged into seven genotypic classes where genotypes within each class are not distinguishable, which represents a drastic reduction in the informativeness of the marker. For the diploid cross, the expected segregation ratio is the well-known 1:2:1. Conversely, in the hexaploid cross, the segregation ratio becomes a more complex 1:18:99:164:99:18:1, reflecting the increased genetic complexity inherent in polyploid inheritance.

Fig. 5.3
figure 3

Segregation patterns considering complete informative markers and biallelic markers. a in a diploid cross, there are four possible classes in a bi-parental cross and three classes with proportions 1:2:1 when using biallelic markers. b In the hexaploid sweetpotato, 400 possible genotypes are combined in 7 classes when assessed with triplex biallelic markers in both parents. In this case, the expected segregation is 1:18:99:164:99:18:1

Given this genetic complexity, numerous challenges can arise in dealing with polyploid species. Besides the inherent genetic intricacies, most polyploid species exhibit some level of incompatibility, often imposing constraints on their breeding or mating strategies, as well as on the types of populations that can be created and obtained (Gallais 2003). This aspect is particularly critical for genetic studies, such as linkage and QTL mapping, where specific experimental designs and known genetic segregation patterns are required in the target population. With genetic incompatibilities present, individuals tend to exhibit a high degree of heterozygosity, leading to extensive segregation in the progenies of any viable cross. Consequently, appropriate genetic designs are essential when studying the genetic behavior of polyploid species.

5.4 Assessing Genotypes in Polyploids

From the 1910s through the 1960s, several studies laid the theoretical groundwork for understanding inheritance and genetic linkage analysis in polyploid organisms, with significant contributions from researchers such as Muller (1914), Haldane (1930), De Winton and Haldane (1931), Mather (1935, 1936), Wright (1938), Fisher (1947, 1943), and Elandt-Johnson (1967). The pioneering concepts of employing molecular markers in autopolyploids were introduced in the early 1990s by Sorrells (1992). Wu et al. (1992) suggested using restriction fragment length polymorphism (RFLP) markers present on single homologous chromosomes, referred to as single-dose or simplex markers. These markers, which reflect the concept of dosage in polyploids by indicating the number of copies of a particular allele, segregate in a 1:1 ratio in gametes. Therefore, when a simplex marker is unique to one parent, the resulting observed genotypes will also follow a 1:1 segregation pattern. In polyploid crosses, simplex markers are analogous to heterozygous markers in diploid crosses. This similarity allows the use of standard diploid mapping software and methodologies for linkage analysis and genetic map construction in polyploids. The linkage phase between markers can be deduced by evaluating the likelihood of competing models for restriction fragments found on the same (in coupling) or different (in repulsion) homologs. Several genetic maps of sweetpotato have been developed employing simplex markers as a framework (Ukoskit and Thompson 1997; Kriegner et al. 2003; Cervantes-Flores et al. 2008; Ai-xian et al. 2010; Zhao et al. 2013; Monden and Tahara 2017; Shirasawa et al. 2017).

While simplex markers are a useful approach for navigating the complexities of map construction in polyploid genomes, they tend to oversimplify and constrict our understanding of the inheritance process within such genomes. For example, simplex markers facilitate the creation of maps based on individual homologs but do not account for their interactions within their respective homology groups. On the other hand, multi-dose (or multiplex) markers allow the integration of multiple homologs from the same homology group into a comprehensive genetic map. This approach has roots in practical research, as seen in the morphological marker studies by Lawrence (1929) and Fisher and Mather (1940), and gained momentum with its application in sugarcane by Da Silva (1993a, b). Nevertheless, the early molecular marker technologies, such as Random Amplified Polymorphic DNA (RAPD) and Amplified Fragment Length Polymorphism (AFLP), were limited by their density and dominant nature. This scarcity of multiplex markers led to a constrained integration of homolog maps. For example, if we consider a triplex dominant marker in Fig. 5.3, the complex segregation ratio of 1:18:99:164:99:18:1 is condensed to a simplified ratio of 1:399 that only differentiates the aaaa genotype from the remaining genotypes, which are combined in a single genotypic class (Haldane 1930; Ripol et al. 1999).

Modern sequencing technologies have advanced rapidly in recent years, leading to a massive increase in genomic data available to researchers. The ability to generate large amounts of sequencing data quickly and cost-effectively has revolutionized the genotyping of polyploid species. While traditional genotyping methods rely on limited scalability techniques, modern sequencing technologies have greatly enhanced the ability to identify genetic variations quickly and accurately in large populations. These modern techniques allow for the assessment of the abundance of reference and alternate alleles, and proper methods for genotype calling are necessary for its use in downstram analyses, especially for outcrossing and polyploid species where heterozygosity plays an important role. Correct genotype calling in polyploid genomes is fundamental to constructing accurate and meaningful genetic maps.

Several methods were proposed and implemented for genotype calling in polyploids. FitTetra and its improved version, FitPoly (Voorrips et al., 2011), implement a classification mixture model weighted by expected frequencies of the genotypic classes in the population. This procedure was implemented to call SNPs in array data, such as Affymetrix Axiom® and Illumina Golden Gate® assays. In these data, the alleles are detected using the fluorescence of two probes using a laser scanning confocal microscope. The reads for the two channels provide a set of ordered pairs of allele intensity for each individual. FitPoly fits a mixture model where parameters are estimated separately for each slice of the population analyzed. As an output, the software provides the probability distribution of the dosage-based genotypic classes for the individuals in the population.

SuperMASSA (Serang et al. 2012) was primarily developed to handle mass-spectrometry-based SNPs. In this case, a Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) spectrometer measures the time of flight of the two alleles, each harboring a different size flanking sequence. Sequences with different masses will arrive at different times at the mass spectrometer detection plate, and their abundances are recorded, generating an ordered pair for each SNP. A sequence of ordered pairs is generated in a population of individuals that are classified in terms of their dosage using a Bayesian Network. It also uses the ratio of the two allele channels weighted by the expected genotypic frequencies in the studied population. It implements frequencies based on a Mendelian F1 segregation model and a Hardy–Weinberg equilibrium model. Although SuperMASSA has been developed to deal with mass spectrometry data, it has been successfully applied in several studies with different types of genotypic data, including sweetpotato populations that were genotyped with SNPs via Genotyping-by-sequencing (GBS) (Mollinari et al. 2020; Oloka et al. 2021), More recently, Gerard et al. (2018) developed a method to deal with data sets generated by GBS methods and implemented it in the R package updog. The package is designed to handle data overdispersion, sequence errors, and genotype biases, elements that are often found in sequence-based genotyping methods. Also, it implements several population models to deal with different assumptions and genotype frequencies. Other methods were proposed, including PolyRAD (Clark et al. 2019) and ClusterCall (Schmitz Carley et al. 2017), but their use is not documented in sweetpotato.

All methods presented in this section provide a dosage-based genotype calling framework for polyploids. For the purpose of genetic mapping, they are often used in populations derived from biparental or interconnected crosses. The next step is to find the relationship between these markers, i.e., linkage analysis, and although it is a well-established procedure in diploids, especially in experimental populations (Lincon et al. 1992; Margarido et al. 2007; Stam 1993), it was not until the recent advancement of high-throughput genomic methods that the full spectrum of genotypes could be incorporated into polyploid linkage analysis. In the next sections, we explore the methodologies for building genetic linkage maps using the different types of molecular data. An overview of the initial strategy of creating separate maps for each parent is presented, then we trace how this approach has informed current methods, preparing us to tackle the unique complexities of polyploidy in genetic mapping studies.

5.5 Initial Polyploid Maps Focusing on Individual Parents

Given the challenges and complexities involved in polyploid genetics, the construction of the first linkage maps in polyploid species started by using methods initially developed for diploid species. Since obtaining inbred-based populations in polyploids is very difficult, several studies have been based on methods that were developed for genetic linkage analysis in outcrossing bi-parental populations, often treating one or both parents as diploids (Grattapaglia and Sederoff 1994; Wu et al. 1992). These methods involve a similar strategy, where simpler segregation cases are used to detect the linkage between markers and estimate the recombination fractions and linkage phase configurations, thus enabling genetic studies in complex and highly heterozygous organisms.

Wu et al. (1992) proposed a polyploid mapping method using single-dose restriction fragments (SDRFs), now commonly called simplex markers, which segregate in a 1:1 ratio in heterozygous plants. They demonstrated this method with hypothetical allopolyploid and autopolyploid species across different ploidy levels to identify SDRFs, detect linkages, and determine genome constitution. The study suggested a minimum population size of 75 to confidently identify simplex markers and detect linkage in coupling phase for both allopolyploids and autopolyploids. However, it noted that meaningful linkages in repulsion were less practical for autopolyploids due to the need for larger populations. Furthermore, the study indicated that the ratio of repulsion to coupling linkages could serve as an indicator of preferential chromosome pairing, which helps differentiate allopolyploids from autopolyploids.

Furthering this approach, the two-way pseudo-testcross strategy was initially devised to analyze linkage in diploid outcrossing species with unknown parental phases (Grattapaglia and Sederoff 1994). Due to its similarities with the approach by Wu et al. (1992), this method has been widely applied to create separate genetic maps for each parent in polyploid species. This method utilizes markers that follow a known Mendelian segregation pattern in the progeny, similar to a test-cross involving one genetically informative parent. For example, consider a diploid biparental cross with a biallelic marker A, where one parent is heterozygous (Aa), and the other is homozygous (aa). In this case, the progeny will show a Mendelian 1:1 segregation of Aa and aa genotypes, given that only the heterozygous parent produces gametes with different alleles. Extending this to two unlinked biallelic markers, A and B, with the first parent heterozygous for both (Aa, Bb) and the second parent homozygous (aa, bb), the progeny will segregate into four genotypic combinations (AaBb, Aabb, aaBb, aabb) in a 1:1:1:1 ratio, according to Mendelian inheritance. If markers A and B are linked, this ratio will alter in proportion to the genetic distance between them. Because only the heterozygous parent provides informative genetic variation, it's the linkage phase configuration of this parent that remains to be estimated. This approach simplifies the process of determining the recombination fraction across all potential linkage phase configurations, whether in coupling or repulsion, and ascertains the most probable configuration.

Beyond their simplicity, a key advantage of using these methods is that the estimators for the recombination fraction are the same for both auto and allopolyploids when using single-dose markers in coupling phase, thus accommodating mapping construction for any ploidy level, including those with intermediate levels of preferential pairing. These methods have seen widespread application in the literature due to their versatility (Porceddu et al. 2002; Chakraborty et al. 2005; Cai et al. 2014; Yang et al. 2016; Vigna et al. 2016). While the approach by Wu et al. (1992) provides a mean to infer the prevalent type of polyploidy, it lacks the more rigorous statistical analysis found in recent advancements such as those by Mollinari et al. (2020) or Bourke et al. (2021).

Although using simplex markers for constructing genetic maps in polyploids offers practical benefits and compatibility with diploid mapping software, it introduces several drawbacks. A significant limitation is the production of separate genetic maps for each parent rather than a single unified map. This occurs because markers segregating in a 1:1 ratio capture recombination events only in the parent they inform about, making the other parent's contribution to those loci invisible. Additionally, this strategy constructs maps on a per-homolog basis rather than per-homology group. This limitation arises because simplex markers located in different homologs (in repulsion) provide minimal information for estimating recombination fractions between them, with feasible estimations only achievable within the same homolog. The lack of information for simplex markers in repulsion becomes more pronounced in higher ploidy levels, which is the case of sweetpotato.

Furthermore, constructing genetic maps for polyploids on a per-homolog basis does not adequately capture the essence of trait expression in those species, as the alleles controlling specific traits are located at analogous positions across all homologs within homology groups. It is, therefore, crucial to treat homologs as part of a homology group rather than as separate entities. Additionally, building a comprehensive map using these strategies would require a density of simplex markers sufficient to ensure complete genome coverage of all homologs for both parents. However, this is not always feasible, sometimes due to limitations on the genotyping platform used and sometimes because of the biological characteristics of the species. The absence of simplex markers observed in chromosome 11 in the landrace Tanzania (Mollinari et al., 2020), probably caused by a recent nondisjunction of sister chromatids in meiosis II in one of Tanzania's parents, consists of a good example of a biological characteristic that poses a limitation to this approach. Such biological constraints highlight the need for alternative mapping strategies that can accommodate the complexities of polyploid genomes and provide a more accurate portrayal of their genetic architecture.

5.6 Parental Integration and Dosage-Based Maps

Despite the substantial theoretical progress in polyploid linkage analysis made from the 1920s to the 1960s, the construction of genetic linkage maps well into the late 1990s continued to rely heavily on simplex markers. Early efforts to integrate multi-dose or multiplex markers into mapping studies of crops like sugarcane and alfalfa highlighted the promise of these methodologies. However, these initial endeavors were hindered by the technological limitations of the time (da Silva 1993a, b; Yu and Pauls 1993; Da Silva et al. 1995; Guimarães et al. 1997). Regardless the limitations, Ripol et al. (1999) laid foundational work by dissecting the use of multiplex markers in genetic mapping for autopolyploids across arbitrary ploidy levels, although initially focusing on markers informative in just one parent. In order to expand the use of informative markers to both parents, Hackett et al. (1998) conducted a simulation study establishing formulae to compute the recombination fraction, standard error, and test power for all possible combinations of simplex and duplex markers in tetraploid species, marking a significant step toward comprehensive parental integration and dosage-based mapping.

Following these initial works, Luo et al. (2001) presented the use of dominant and codominant markers scored in an autotetraploid population for genetic map construction while Hackett et al. (2003) laid out marker ordering procedures. These efforts culminated in the development of Tetraploidmap, the first linkage mapping software developed specifically for autotetraploid species (Hackett 2003). Additionally, a series of studies aimed at modeling polyploid genetics complexities, such as double reduction, multivalent formation, and preferential pairing, were presented (Wu et al. 2001a, b; Wu and Ma 2005), providing a deeper understanding of polyploid inheritance.

The development of multilocus maps in autotetraploids represented a pivotal leap forward in polyploid genetic mapping. Luo et al. (2004) laid a comprehensive theoretical groundwork for linkage analysis in autotetraploids, which was further advanced by Leach et al. (2010) through the proposal of a definitive multilocus tetrasomic linkage analysis using Hidden Markov Models (HMM). The significance of multilocus analysis in enhancing our understanding of polyploid genetics and its broader implications will be further explored in this chapter.

The advent of high throughput technologies marked a significant evolution in genetic mapping, enabling the detailed assessment of allelic variants through SNP dosage information. This breakthrough was exemplified in the work of Hackett et al. (2013), who, leveraging the capabilities of the Infinium 8303 potato SNP array (Felcher et al. 2012), advanced the methodologies initially proposed by Luo et al. (2001). They constructed a comprehensive genetic linkage map for a biparental autotetraploid potato population encompassing 190 individuals, utilizing SNP dosage data to map a total of 3839 markers, which was used in a QTL mapping study for several important traits in potato (Hackett et al. 2014). These endeavors led to the development of an enhanced version of Tetraploidmap named TetraploidSNPMap (Hackett et al. 2017https://www.bioss.ac.uk/knowledge-exchange/software/TetraploidSNPMap). As the number of markers increased, there arose a need for faster and more accurate ordering algorithms. The introduction of the multidimensional scaling (MDS) algorithm for genetic mapping, along with the fitting of a principal curve to assess optimum marker order within a linkage group (Preedy and Hackett 2016), significantly propelled the advancement of genetic mapping in polyploids. This methodology facilitated the ordering and estimation of maps for thousands of markers, culminating in the creation of the R package MDSmap (https://cran.r-project.org/package=MDSMap).

Through the application of mapping methodologies developed specifically for polyploids, studies have elucidated the genetic mechanisms underlying complex phenomena such as double reduction in potatoes (Bourke et al. 2015), preferential pairing in roses (Bourke et al. 2017), and hexasomic inheritance in chrysanthemums (van Geest et al. 2017b). The latter, utilized a biparental population consisting of 405 offspring individuals, enabling the detailed examination of genetic inheritance patterns. Using the same population, van Geest et al. (2017a) published the first integrated genetic map for an autohexaploid species. This map featured 30,312 segregating SNPs, covering 9 homology groups, and successfully identifying 107 out of the 108 expected homologs. Further refining the methodologies employed in this study, Bourke et al. (2018a) released the R package polymapR (https://cran.r-project.org/package=polymapR), which was designed to create linkage maps using dosage-based markers in outcrossing diploid, autotriploid, autotetraploid, and autohexaploid species, as well as segmental allotetraploids.

The advancements in genetic mapping for polyploid species, notably facilitated by software such as TetraploidSNPMap and polymapR, marked a significant leap forward for the polyploid research community. However, these tools initially lacked the capability for multipoint mapping estimation for high ploidy levels, specifically hexaploids, a gap that was addressed by the introduction of MAPpoly (Mollinari and Garcia 2019, https://cran.r-project.org/package=mappoly). This R package, specifically designed for constructing genetic maps in both diploids and autopolyploids across even ploidy levels up to 8 using HMMs and up to 12 through two-point simplification, represented a crucial advancement in the field. The implementation of MAPpoly facilitated the development of the first integrated multilocus genetic map in sweetpotato, which will be presented in the next section. This development not only highlighted the continuous innovation within polyploid genetics, but also set the stage for the detailed exploration and understanding of complex genetic architectures in such species.

5.7 Linkage Analysis in the Cultivated Sweetpotato

The genetic mapping in sweetpotato spans decades, evolving through various phases of scientific discovery and technological advancement. In this section, we will explore the early efforts and the progressive development of marker technologies that have shaped our understanding of genetic mapping in sweetpotato.

5.7.1 Historical Context of Genetic Mapping in Sweetpotato

In the early 1990s, Hong and Thompson (1994) conducted a preliminary analysis on four biparental crosses using RAPD markers to examine the genetic structure of parental and offspring individuals. They identified several primers that generated polymorphic bands, indicating linkage among them, and confirming their segregation according to the Mendelian ratios expected for a hexaploid organism. This early work demonstrated the feasibility of constructing a genetic linkage map for sweetpotato. Building on this foundation, Thompson et al. (1997) further analyzed 100 RAPD markers in two biparental populations, finding 74 markers that segregated at a 1:1 ratio and identifying five linked pairs of markers. This effort culminated in the first genetic linkage map for the hexaploid sweetpotato, featuring 188 RAPD markers across 18 and 16 linkage groups for each parent, respectively, with estimated total map lengths ranging from 173.1 to 265.4 cM (Ukoskit and Thompson 1997). Their method, assessing the type of polyploidy in sweetpotato through the analysis of simplex versus non-simplex markers and the interaction phases between simplex markers, supported the autohexaploid model of sweetpotato that corroborates with contemporary studies.

Subsequent work by Kriegner et al. (2003) employed AFLP markers within a biparental population to develop a genetic linkage map, utilizing the two-way pseudo-testcross strategy as described by Wu et al. (1992) and Grattapaglia and Sederoff (1994). The resulting map spanned 3655.6 cM and 3011.5 cM for the female and male parents, respectively, organized into 90 and 80 linkage groups. These were further aligned into 15 homologous linkage groups using multiplex markers. The authors analyzed the simplex versus multiplex marker ratio alongside the ratio of coupling versus repulsion linkage phases to determine the polyploidy type in sweetpotato. Their findings indicated a predominant polysomic inheritance of homologous chromosomes, once again affirming the autohexaploid nature of sweetpotato, though they noted a minor degree of preferential chromosome pairing. Additionally, the linkage map for the female parent was later detailed and utilized in research mapping resistance to virus diseases in sweetpotato (Mwanga et al. 2002), showcasing the map's practical application in addressing specific agricultural challenges.

Using another mapping population composed of 240 individuals of a cross between the cream-fleshed African landrace ‘Tanzania’ and the US orange-fleshed cultivar ‘Beauregard’, Cervantes-Flores et al. (2008) made a significant contribution by constructing a detailed genetic linkage map using 3695 AFLP, which spanned 5792 cM and 5276 cM for female and male genomes, respectively, across 86 and 90 linkage groups. Their findings pointed to the presence of distorted segregation among markers of different dosages, suggesting some level of preferential pairing. However, due to the observed segregation ratios and the proportion of simplex to multiplex markers, they concluded that strict allopolyploidy could be ruled out, suggesting a complex inheritance pattern more aligned with an autohexaploid organism. In parallel, other studies also contributed to the genetic mapping of sweetpotato. Chang et al. (2009) utilized 37 and 47 inter simple sequence repeat (ISSR) markers to represent 479.8 and 853.5 cM of each parent genome, and Li et al. (2010) utilized 801 sequence-related amplified polymorphism (SRAP) markers that showed polymorphisms in the 240 individuals of the progeny to represent 81 and 66 linkage groups with total lengths of 5802.46 and 3967.90 cM for each parent, respectively.

Zhao et al. (2013) published what was then the densest genetic linkage map of the sweetpotato genome, utilizing 4031 AFLP and SSR markers. These markers were distributed across 90 linkage groups for both parents, covering lengths of 8184.5 and 8151.7 cM, respectively. This achievement marked the first time a genetic linkage map covered all linkage groups for both parents in sweepotato, representing a significant advancement for genetic research in this crop. Two years later, Monden et al. (2015) developed a genetic linkage map using 98 progeny individuals and 246 retrotransposon insertion polymorphism (RIP) markers. They successfully reconstructed 43 and 47 linkage groups for each parent, covering 931.5 cM and 734.3 cM of the parental genomes, respectively. This work demonstrated the usefulness and efficiency of retrotransposon-based molecular markers in constructing a genetic map for a polyploid species, highlighting the evolving landscape of genomic tools in sweetpotato research.

The first genetic linkage map that utilized data from a high throughput genotyping platform in sweetpotato was published by Shirasawa et al. (2017). This linkage map featured 28,087 markers obtained with the double-digest RAD-Seq (ddRAD-Seq) technology that spanned all 90 sweetpotato linkage groups, along with an additional six groups. These were subsequently consolidated into 15 homology groups, covering 33,020.4 cM of the sweetpotato genome. Notably, this map was also the first for sweetpotato to be constructed with the aid of a reference genome, utilizing the wild diploid relative Ipomoea trifida as a basis for anchoring and calling variants. It was also the first sweetpotato linkage map developed jointly for both parents. It remains the only map based on a population derived from a self-fertilizing individual, breaking a biological barrier and setting a precedent for future genetic research in sweetpotato.

In (2020), Mollinari et al. published the most comprehensive genetic linkage map for sweetpotato to date, using a biparental population of 315 individuals from a cross between ‘Beauregard’ and ‘Tanzania’ (BT), the same parents studied by Cervantes-Flores et al. (2008), but in a reciprocal cross. This ultra-dense map featured 30,684 SNP markers distributed across 15 linkage groups, spanning 2708.36 cM of the sweetpotato genome. It was the first genetic linkage map constructed for both parents simultaneously, accounting for the hexaploid nature and multiple dosages of sweetpotato through a refined analytical approach, detailed by Mollinari and Garcia (2019). The process started with pairs of markers and progressively incorporating them into an HMM framework to re-estimate recombination fractions and linkage phases. The study revealed that homolog pairing predominantly occurs randomly, supporting sweetpotato's classification as an autohexaploid. Analysis of meiotic configurations revealed that most gametes formed through pairing and recombination between two homologs, with a smaller percentage showing configurations that involved three to six homologs, indicative of multivalent formations. This pattern was consistent across both ‘Beauregard’ and ‘Tanzania’ parents, with a noted correlation between multivalent formations and the length of linkage groups. An interactive version of the BT map can be accessed at https://gt4sp-genetic-map.shinyapps.io/bt_map/. Figure 5.4 features a screenshot from a web-based application that illustrates the meiotic processes implicated in forming homology group 1 for individual BT05.221. The application allows the exploration of any homology group across different offspring individuals, facilitating a comprehensive understanding of their gamete formation.

Fig. 5.4
figure 4

Interactive display of haplotype estimation and multivalent evidence in gamete formation analysis of homology group 1 for individual BT05.221. The color-coded probability profiles in the figure showcase homologs a-f for ‘Beauregard’ and g-l for ‘Tanzania’. Arrows within these profiles mark crossing-over points, with their positions presented in the accompanying table. The network diagram employs the same colors to illustrate recombination chains; for instance, homologs c and d are connected by two edges, indicating a double crossover, whereas homologs i, j, h, and g form a recombination chain, revealing multivalent pairing evidence. The left panel provides user-controlled parameters for detecting these genetic events, with the methodology and analytical detail explained in Mollinari et al. (2020). This tool and its capabilities for exploring genetic linkages in BT offspring can be accessed at https://gt4sp-genetic-map.shinyapps.io/offspring_haplotype_BT_population/

Oloka et al. (2021) reported another sweetpotato map based on the same cross used by Cervantes-Flores et al. (2008) (‘Tanzania’ × ‘Beauregard’) but based on a high-throughput GBS genotyping platform and using a similar methodology employed by Mollinari et al. (2020). The authors assembled a dense, high-quality genetic linkage map that comprised 14,813 markers, covering 2120.5 cM of the sweetpotato genome, and representing all 15 linkage groups. This linkage map was slightly shorter than the one reported by Mollinari et al. (2020), with a smaller progeny size and approximately half the number of markers in the final linkage map. As marker technologies have become more affordable and accessible, there has been a significant surge in studies incorporating genomic information in sweetpotato research. Kim et al. (2017) and Li et al. (2018) developed maps using EST-SSR and SRAP markers, respectively. Sasai et al. (2019) used retrotransposons, SSR, and SNP markers for their linkage map reconstruction. Ma et al. (2020) constructed the first genetic linkage map for purple sweetpotato using SSR markers, followed by Haque et al. (2020), who employed ddRAD-Seq markers. Meng et al. (2021) created the densest SSR-based linkage map to date for sweetpotato, while Yan et al. (2022) developed the second map for purple sweetpotato, this time utilizing SNP-based markers. Most recently, Zheng et al. (2023) achieved the longest coverage of the sweetpotato genome, with 10,146 SSR markers covering over 18,000 cM. A comprehensive list of all published sweetpotato genetic linkage maps is available in Table 5.1.

Table 5.1 Sweetpotato genetic maps published to date

Several key factors have contributed to the increased density and quality of these recent genetic linkage maps for sweetpotato. These include the reduced cost of genotyping technologies, a surge in multi-omics research efforts, and the availability of tools and resources that are appropriate for polyploid genotyping and data analysis. Such resources include sequencing protocols tailored for polyploids (Wadl et al. 2018; Mollinari et al. 2020), high-quality reference genomes (Wu et al. 2018), and advanced statistical genetics methods for conducting accurate linkage analysis in polyploids (Bourke et al. 2018a, b; Mollinari and Garcia 2019; Zheng et al. 2021).

5.7.2 Advanced Genetic Mapping Techniques for Sweetpotato

5.7.2.1 Multilocus Analysis: Importance and Methods

As presented so far, linkage analysis has been used to study patterns of inheritance, meiotic landscapes (Bourke et al. 2015), and to provide a foundational framework for subsequent genetic and genomic analyses, including QTL mapping (Doerge 2002), evolutionary studies (Ahn and Tanksley 1993; Huang and Rieseberg 2020), and the assembly of genomes (Lewin et al. 2009).

A significant challenge encountered in genetic mapping is the issue of missing data, which not only pertains to data failing to meet quality control threshold filters, but also includes data that inherently offers only partial genotype information of the loci under study. This challenge was notably prevalent in diploid mapping populations when using dominant markers, such as AFLPs and RAPD. For instance, in an F2 cross, the anticipated 1:2:1 segregation ratio for AA:Aa:aa genotypes is effectively reduced to a 3:1 ratio due to these markers’ inability to distinguish between AA and Aa genotypes. Consequently, identifying recombination events with such markers necessitates resolving these combined classes through numerical algorithms like Expectation–Maximization (EM) (Dempster et al. 1977). This phenomenon, a well-documented property in the study of natural systems within statistics for decades, was explored in its application to linkage analysis in the influential work by Mather (1957). The impact of missing data becomes especially apparent in the context of linkage analysis between two positions in the genome. In such instances, the ability to detect recombination events is constrained by the information available from the experimental population concerning only those two specific loci. Typically referred to as two-point analysis, this approach involves examining all marker pairs across the genome to estimate the recombination fraction or genetic distance between them (Liu 1998).

Another frequent issue in linkage analysis is the significant amount of noise observed in the dataset, which can propagate to downstream analyses. This issue may arise from duplications during the polymerase chain reaction (PCR) that lead to biases and sequencing errors, which culminates in inflated and incorrect linkage maps and has become more prevalent with the advent of high-throughput sequencing technologies (Taniguti et al. 2022). While these technologies have revolutionized genetic research by providing thousands of markers at a reasonable cost, they sometimes also introduce a lower signal-to-noise ratio. This decrease in data clarity can complicate analyses, making it challenging to distinguish between meaningful genetic signals and mere background noise. The problem is especially pronounced in polyploids, where many genotypes are possible and the genome's complexity adds layers of difficulty to the precise interpretation of data (Gemenet et al. 2020; Liao et al. 2021).

Addressing the challenges posed by noise and missing data in genetic datasets demands the adoption of a joint analysis of groups of genomic loci, a process that is fundamental to enhancing the clarity and utility of genetic information. This method, known as multilocus or multipoint analysis, capitalizes on the simultaneous use of information from multiple markers, facilitating information sharing among markers situated closely on the genome. This strategy has been a cornerstone of genetic linkage analysis since the early 1980s, marked by groundbreaking contributions from researchers like Thompson (1984), Lathrop and Lalouel (1984), and Lathrop et al. (1985), along with the seminal work by Lander and Green (1987). They introduced Hidden Markov Models for reconstructing genetic maps, offering a robust framework for addressing missing data issues, as thoroughly analyzed by Jiang and Zeng (1997) in various diploid experimental populations.

Multilocus methods have become increasingly important in polyploids. As illustrated in Fig. 5.3, the number of genotypes that can be generated from biparental crosses is staggering. A fully informative marker, i.e., a marker capable of distinguishing all possible genotypic classes, is practically nonexistent (Leach et al. 2010). Despite the widespread use of co-dominant markers such as SNPs, several genotypic classes are still collapsed into one class (Fig. 5.3). Therefore, applying multilocus analysis in polyploids becomes a crucial tool for overcoming the limitations of the less informative nature of commonly used markers. Furthermore, it is well-documented that as the ploidy level increases, differentiating between genuine genetic signals and background noise becomes more challenging (Liao et al. 2021), highlighting the critical need for advanced analysis techniques like multilocus analysis.

In the realm of autotetraploid species, specifically designed and successfully applied multilocus methods have facilitated the construction of genetic maps, advancing our understanding of these organisms (Hackett 2001, 2003; Hackett et al. 2013; Leach et al. 2010; Xie and Xu 2000; Zheng et al. 2016, 2021). These studies not only highlight the utility of multilocus approaches in mapping autotetraploid species but also explore the complexities of tetrasomic inheritance, including the phenomenon of double reduction. To date, the only multilocus method available for constructing genetic maps in hexaploids was introduced by Mollinari and Garcia (2019). This method can theoretically handle a biparental cross between parents of any even ploidy level. In their work, the authors derived general equations to calculate recombination fractions between pairs of markers, considering all possible phase configurations. This two-point-based method was integrated with a general multipoint HMM approach and combined with a sequential algorithm to narrow down the search space for phase configurations. The R package MAPpoly incorporates the multipoint algorithm along with a suite of additional tools, offering a cutting-edge linkage analysis system specifically designed for outcrossing species with even ploidy levels varying from 2 to 8, when using the multilocus approach, and up to 12 when using two-point based algorithms.

A crucial benefit of the multilocus approach lies in its ability to utilize information propagated along the chromosome chain to refine loci genotypes, thereby rectifying potential dosage misclassifications and genotype inaccuracies. This refinement process, described in detail by Mollinari and Garcia (2019), leverages the analytical power of HMMs to update the posterior probabilities of genotypes. HMMs iteratively adjust these probabilities based on observed data and the sequence of events leading to them, thus effectively correcting genotype assignments. This correction mechanism was utilized in the genetic mapping efforts of Mollinari et al. (2020) and Oloka et al. (2021) and has been implemented in the genetic mapping software MAPpoly. Through this methodology, HMMs provide a robust framework for enhancing the accuracy of genetic maps, ensuring the integrity of the data analysis process.

5.7.2.2 High Throughput Data and Its Impacts in Sweetpotato Mapping

The advent of high-throughput sequencing technologies has fundamentally transformed genetic mapping in polyploid species, offering the capability to generate an unprecedented volume of molecular marker data. This surge in data availability has opened new avenues for genetic research, allowing for the detailed exploration of complex genomes that were once considered too challenging to analyze effectively. High-throughput technologies have facilitated the identification of a broad spectrum of genetic variations, providing a rich dataset for constructing more accurate and comprehensive genetic maps. The densest and highest quality linkage maps available for sweetpotato to date were constructed using molecular markers that were obtained using high throughput technologies (Shirasawa et al. 2017; Mollinari et al. 2020; Oloka et al. 2021; Haque et al. 2020; Sasai et al. 2019). Some of these maps, as well as other studies in sweetpotato, could benefit from the information contained in one or multiple reference genomes (Wu et al. 2018), which could only be assembled by using high throughput sequencing technologies. Such developments are crucial for advancing our understanding of the sweetpotato genome.

The abundance of molecular markers provided by these new technologies makes it feasible to rigorously filter out markers that exhibit low signal-to-noise ratios or are plagued by specific issues encountered in GBS, such as allele dropout, PCR amplification biases, or sequencing errors. However, it is crucial to avoid discarding markers that deviate from expected patterns, such as Mendelian segregation, without thorough investigation (Mollinari et al. 2020). Such deviations may not be artifacts but rather indicators of underlying biological phenomena. Identifying whether unexpected patterns are confined to specific chromosome segments or individuals within the population can provide valuable insights. The power of high-throughput sequencing is undeniable, but it also demands substantial computational resources and sophisticated algorithms to navigate the complexities inherent in polyploid analysis, underscoring the need for continuous advancements in bioinformatics to fully leverage the potential of this technology in genetic mapping (Taniguti et al. 2022).

5.8 Final Remarks

Although the concept of genetic maps dates to the origins of genetics, they remain vital tools for elucidating the genome behavior during meiosis. This is particularly true in polyploids, where meiosis presents added complexity. Studies on potato by Bourke et al. (2015) and Pereira et al. (2021), blueberry by Cappai et al. (2020), and sweetpotato by Mollinari et al. (2020) exemplify the utility of genetic maps in unraveling the meiotic processes and its characteristics in polyploid organisms. Furthermore, applying HMM-based multilocus analysis not only improves the construction of these maps, but also offers a mean to correcting inaccuracies or misclassifications within datasets, a common issue in organisms with high ploidy levels, such as sweetpotato. As highlighted by Mollinari et al. (2020), a genetic map transcends the mere linear arrangement of markers along linkage groups; it elucidates the inheritance patterns governing the genome transfer from parents to offspring, with the mapping method’s capacity to estimate haplotypes across generations and providing a comprehensive characterization of this transmission process.