Keywords

6.1 Introduction

Most agronomically important traits are controlled by many regions in the genome, which makes traits targeted by breeding programs usually quantitative in nature and more or less influenced by environment. The breeders need to evaluate a lot of candidate genotypes in multiple locations and years to select the best ones according to the product profiles they have in place. This process takes a lot of time and resources, making opportunities for variety renewal relatively cumbersome. For a crop species, product profiles vary according to countries and breeding programs within countries. In the case of sweetpotato, product profiles might include flesh and skin colors, β-carotene, sugar and dry matter contents, resistance to pests and diseases, and high yield, among others (Lindqvist-Kreuze et al. 2023). One way to accelerate this process is to use DNA-based markers genetically associated with quantitative traits to perform marker-assisted selection (MAS).

To implement MAS, one first initiative is to find which genomic regions underlie the variation of traits of interest (Collard and Mackill 2007). These regions are called quantitative trait loci (QTL) and the process of finding association between the genotype (molecular markers) and the phenotype (trait expression) is called QTL mapping. By doing so, we aim to describe the genetic architecture of quantitative traits of interest, i.e., we seek to uncover the number of loci influencing the variation in such traits, along with their respective map (or genome) locations and effects. QTL mapping studies follow a typical workflow, consisting of the (i) choice of parental varieties and obtaining the segregation population, (ii) collection of phenotypic data from such population, (iii) genotypic evaluation with polymorphic molecular markers, (iv) linkage map construction, and (v) QTL mapping itself. Here, we will quickly describe the first four requirements, but will mostly focus on previous and current methods for conducting QTL mapping in sweetpotato.

Despite our somewhat historical perspective on QTL methods, this chapter has no intention of covering all aspects and details involved in such models, but rather providing basic understanding of QTL methods specifically applied in sweetpotato research.

6.2 QTL Mapping

A QTL mapping study starts with the need to generate a biparental mapping population that segregates for the trait(s) of interest. On the one hand, if pure lines were available for sweetpotato, backcross, F2 and recombinant inbred line (RIL) populations could easily be employed. On the other hand, when dealing with highly heterozygous sweetpotato parents, full-sib families (i.e., segregating F1 populations) are generally used. Some of the reasons for absence of inbred lines in outcrossing species were mentioned in previous chapters. Self-incompatibility, inbreeding depression, and its autopolyploid nature are among the major factors preventing the existence of sweetpotato pure lines.

For outcrossing species in general, F1 populations can be utilized for linkage map construction and QTL analysis. If both parents are phenotypically contrasting due to complete fixation of respective Q and q alleles of a given QTL (e.g., QQQQQQ × qqqqqq), there will be no segregation for that locus as all F1 individuals will be QQQqqq. If it represents a major QTL, less (heritable) variation will be noticed in the progeny and marginal variation due to minor QTL will be hardly detectable, especially under limited population sizes. For most diploid linkage-based QTL studies providing sufficient resolution and statistical power, population sizes greater than 200 individuals are often utilized in literature. Though such studies are not very common in hexaploid species, based on experience, population sizes greater than 250–300 individuals are recommended.

Given that the parents have been crossed and the population has been established in a screenhouse, sweetpotato progenies can be cloned to constitute plant material for genotyping as well as for phenotyping trials. Such trials can be conducted in screenhouse or field depending on which traits will be evaluated. Reaction to disease due to artificial infection or drought-related traits are more easily assessed in screenhouse trials where environmental control is usually better, for example, whereas most of the other traits can be accessed via field trials. In any case, experimental designs must be employed to increase the accuracy of individual mean estimates and, consequently, the ability to detect QTL. Making sure that the residual variation is kept to a minimal and therefore genetic variation can be dissected in QTL studies is imperative. In fact, because sweetpotato can be clonally propagated, plot trials can have more than one plant and experiments can be replicated in different locations, seasons, or conditions. Although mapping populations are created with certain trait(s) in mind, other varying traits are often studied for the same mapping population.

Genotyping is conducted based on DNA extracted from healthy, clean, fresh leaves from plant material available at screenhouse or field. Methods for obtaining molecular markers have been discussed in previous chapters. Here, we will just reinforce that advancing of genotyping platforms, such as those based on next-generation sequencing (NGS), allow for variation detection at the single base resolution level for thousands of markers, namely single nucleotide polymorphisms (SNPs). In addition, the number of reads from each SNP can be leveraged for allele dosage or micro-haplotyping purposes, increasing the informativeness of such markers for linkage and subsequent QTL mapping (Hackett et al. 2014; Mollinari et al. 2020). Previous genotyping platforms, such as those based on electrophoretic gels, serve for lots of purposes in genetic studies, but they are particularly limiting in the case of polyploids. The number of confounded classes using dominant markers increases from 2 in diploids (0  = aa, versus 1 = Aa, or AA), to 4 classes in autotetraploids (0 = aaaa versus 1 = Aaaa, AAaa, AAAa, or AAAA), and to 6 classes in autohexaploids (0 = aaaaaa versus 1 = Aaaaaa, AAaaaa, AAAaaa, AAAAaa, AAAAAa, or AAAAAA).

As described in the previous chapter, linkage mapping is then conducted to group, order and phase such markers. In the context of where there is missing data—as it is, in fact, the case of allele dosage-based SNPs—, methods using hidden Markov models should be preferred (Mollinari et al. 2020). The ultimate goal of linkage mapping is to provide a comprehensive view of segregation of homologous chromosomes from parents to progeny and, by doing so, to allow the computation of possible QTL genotypes—including those between marker intervals—that each individual is most likely to carry. The computation of QTL genotype probabilities conditional to a map is the basis of the most employed QTL detection methods that we will describe here. Even in the context of high-density maps, fully phased maps help with figuring out the best haplotype(s) to be targeted for MAS (Gemenet et al. 2020a).

Finally, QTL mapping can be performed using different statistical genetics methods such as single marker analysis—SMA (Stuber et al. 1987; Edwards et al. 1987), interval mapping—IM (Lander and Botstein 1989), composite interval mapping—CIM (Jansen and Stam 1994; Zeng 1994), and multiple interval mapping—MIM (Kao et al. 1999). Except for SMA that relies on the marker information alone, all other methods employ the marker information in the context of a linkage map. All these methods were initially developed and broadly used in diploid species where traditional populations such as backcross, F2 or RIL were available. Before integrated, fully phased maps were available for sweetpotato, most of these methods have been applied to previous studies of the crop. However, in most cases, IM and its variations were employed using separate maps, one for each parent, when computing QTL genotype conditional probabilities (Cervantes-Flores et al. 2008b). Currently, QTL mapping based on integrated maps can be performed for complex autopolyploid species (Da Silva Pereira et al. 2020). A summary of published QTL studies in the crop so far is available in Table 6.1.

Table 6.1 Quantitative trait loci (QTL) mapping studies for sweetpotato

We will take advantage of previously published data from the ‘Beauregard’ x ‘Tanzania’ (BT) population (Gemenet et al. 2020a) as an example to illustrate QTL identification in sweetpotato through a range of methods, discussing its basis but without going into much technical details. ‘Beauregard’ is an orange-fleshed American variety, with low dry matter content and susceptible to nematodes (namely Meloidogyne incognita and M. enterolobii) and sweetpotato virus disease (SPVD) (Rolston et al. 1987), whereas ‘Tanzania’ is a cream-fleshed African landrace showing high dry matter content and resistance to nematodes and SPVD (Mwanga et al. 2001). This population was obtained and evaluated in five environments in Peru for several traits, including flesh color which is our target here (Fig. 6.1a). Flesh color was evaluated for 315 progenies based on scores ranging from 1 (white) to 8 (dark orange) (Grüneberg et al. 2019), and adjusted means obtained as described before (Gemenet et al. 2020a) (Fig. 6.1b). The population was genotyped using a quantitative genotyping-by-sequencing based protocol (GBSpoly) and the reads were aligned against both Ipomoea trifida and I. triloba genomes. A total of 38,701 SNPs were used for map construction, and 17 progenies have been filtered out, making up to a population size of 298 (Mollinari et al. 2020).

Figure 6.1
figure 1

Phenotypic segregation for flesh color in the ‘Beauregard’ × ‘Tanzania’ full-sib population (\(N = 315\)) evaluated in Peru. a Each photo depicts a different progeny (bottom) in comparison to their parents (top). b Distribution of adjusted means along with parents ‘Beauregard’ (B) and ‘Tanzania’ (T). Adapted from Gemenet et al. (2020a)

6.2.1 Single-Marker Analysis

Single-marker analysis (SMA) can be carried out using any statistical method that tests whether the differences among mean classes are significant or not, such as t-tests or analysis of variance (ANOVA) derived F-tests. For example, if we are interested in testing (additive) effects of single markers in a diploid F2 population derived from inbred parents, genotypic classes of codominant molecular markers can be scored as 0 (aa), 1 (Aa), or 2 (AA) depending on the number—or dosage—of a certain alternate allele A. Using the same reasoning, hexaploid genotypic classes can be represented by 0 (aaaaaa), 1 (Aaaaaa), …, up to 6 (AAAAAA). A simple linear regression model relating \(y_{i}\), the phenotype of individual \(i\) (or the response variable), to \(x_{i}\), the genotype of individual \(i\) (or the explanatory variable), can be performed as follows:

$$y_{i} = \mu + \beta x_{i} + \varepsilon_{i}$$

where \(\mu\) is the intercept; \(\beta\) is the regression coefficient representing the expected change in \(y_{i}\) for a one-unit change in \(x_{i}\) or, in other words, the additive effect as the average effect of allele substitution (when an a is replaced by an A); and \(\varepsilon_{i} \sim N\left( {0,\sigma^{2} } \right)\) is the residual term, expected to be normally distributed with mean zero and variance \(\sigma^{2}\). The residual term is where all the unexplained variation of variable \(y\) goes after fitting variable \(x\). Fitting this model by ordinary least squares, the coefficient of determination, \(R^{2}\), equals one minus the ratio between residual sum of squares and total sum of squares. This estimate is often interpreted as the proportion of variance explained (\(PEV\)) by marker (or QTL) in the context of QTL analysis.

Maximum likelihood can be leveraged for parameter estimation and assessing significance. In our case, we are interested in knowing which hypothesis, the null (\(H_{0} :\beta = 0\)) or alternate one (\(H_{1} :\beta \ne 0\)), is to be rejected given the data. In this case, the likelihood \({\mathcal{L}}_{1}\) of full model (including variable \(x\), thus under \(H_{1}\)) is tested against the likelihood \({\mathcal{L}}_{0}\) of reduced model (\(y_{i} = \mu + \varepsilon_{i}\), thus under \(H_{0}\)) by means likelihood ratio test (\(LRT\)) as follows:

$$LRT = 2 \times \log \frac{{{\mathcal{L}}_{1} \left( {\mu ,\beta ,\sigma^{2} } \right)}}{{{\mathcal{L}}_{0} \left( {\mu ,\sigma^{2} } \right)}}$$

\(LRT\) is assumed to have a chi-squared distribution, with degrees of freedom equals the number of classes minus one, from which \(P\)-values can be drawn. Genome-wide threshold for declaring significant QTL can be obtained through permutation tests (Doerge and Churchill 1996).

In our illustration, a total of 28,651 SNPs derived from alignment to I. trifida reference genome has been tested using an additive (dosage) model (Fig. 6.2a). Several models could have been tested in order to try to find associations between phenotype and genotype in autopolyploids, like those proposed at GWASpoly R package (Rosyara et al. 2016). In fact, SMA consists of a typical Genome Wide Association Study (GWAS) model without the need to controlling for population structure or cryptic relatedness. The SNP Chr03_3120245 (Fig. 6.2b) contributed most (\(PEV = 39\%\)) to explaining flesh color variation in BT population, followed by Chr12_22117539 (\(PEV = 21\%\)).

Fig. 6.2
figure 2

Single-marker analysis for flesh color in sweetpotato ‘Beauregard’ × ‘Tanzania’ full-sib population (\(N = 298\)) evaluated in Peru. a Log of \(P\)-values for simple linear regression for 28,651 SNPs based on an additive model. Markers are ordered according to Ipomoea trifida genome. b Phenotype distribution (along with parents B and T) according to SNP Chr03_2120245 dosage classes. c Simple linear regression of the top associated markers on chromosome 3 (3,120,245 bp) and 12 (22,117,539 bp) depicting additive and dominant models. Dominant models mean dosages > 0 are grouped into a single genotype class 1 (i.e., at least one A)

The additive model tests the linear relationship between flesh color scores and allele dosage data as explained earlier, thus we expect to see scores increasing (positive slope) or decreasing (negative slope) if they are significantly associated with a marker. Depending on how many dosage classes one marker shows, dominant models will be equivalent to the additive one. Examples of dominant models, like what electrophoresis gel-based markers would only allow testing for, are shown for the same markers for comparison purposes. In the case of Chr12_22117539 (three classes), \(PEV = 25\%\) for the dominant model whereas in the case of Chr03_3120245 (two classes), the two models are equivalent (Fig. 6.2c). For highly heritable traits, such as flesh color (\(H^{2} = 0.92\) in our example) and high-density genotyping individuals, SMA might be able to detect genomic regions associated with such traits as seen here.

Even less informative markers, such as those based on amplification fragment length polymorphism (AFLP), were first used to carry out SMA by Cervantes-Flores et al. (2008b) when working with reciprocal ‘Tanzania’ × ‘Beauregard’ (TB) population (\(N = 240\)). They found nine markers associated to root-knot nematode (RKN) resistance, where both ‘Tanzania’ (seven) and ‘Beauregard’ (two) appeared to hold resistance alleles, ranging from 11.5% to 2.2% of the total variation. Similarly, Yada et al. (2017) using ‘New Kawogo’ × ‘Beauregard’ (NKB) population (\(N = 240\)) (Yada et al. 2015) were able to identify 12, 4, 6, and 8 SSR markers associated to yield (\(PEV = 4.2\sim9.1\%\)), dry matter (\(PEV = 3.1\sim4.4\%\)), starch (\(PEV = 4.3\sim6.9\%\)), and β-carotene (\(PEV = 2.0\sim7.4\%\)). As observed, despite limitations in gathering high-density markers at that time, there has been progress in characterizing variation of important traits in sweetpotato via molecular markers when genetic maps were not available.

6.2.2 Fixed-Effect Interval Mapping Model

Interval mapping (IM) was first introduced in the context of newly developed linkage maps based on multipoint estimations using the hidden Markov model framework (Lander and Botstein 1989). Such maps are used for the computation of conditional probability distribution of genotypes (Jiang and Zeng 1997) allowing for a systematic search of QTL, including within marker intervals (inter-marker search). This idea, initially implemented for diploid inbred-derived populations, was later extended to accommodate both diploid (Wu et al. 2002) and autopolyploid (Mollinari and Garcia 2019) outbred-derived populations. In any case, because we do not observe the QTL genotypes, they are treated as latent variables and can be modeled as a mixture of normal distributions. An F2 model for testing QTL additive effects, e.g. every 1 cM, can be represented as follows:

\(y_{i} = \mu + \beta^{*} x_{i}^{*} + \varepsilon_{i}\)

where \(x_{i}^{*}\) is an indicator variable with probabilities of individual \(i\) being 0 (qq), 1 (Qq) or 2 (QQ) at given position; \(\beta^{*}\) represents the additive effect of a QTL (instead of a marker); and \(\varepsilon_{i} \sim N\left( {0,\sigma^{2} } \right)\). Again, \(LRT\) can be carried out based on the ratio between likelihoods of models under alternate hypothesis \({\mathcal{L}}_{1} \left( {\mu ,\beta^{*} ,\sigma^{2} } \right)\) and null hypothesis \({\mathcal{L}}_{0} \left( {\mu ,\sigma^{2} } \right)\), known as odds ratio. Broadly preferred for interpretation (and plotting) purposes, logarithm of the odds ratio (\({\text{LOD}}\)) scores can be obtained by using:

$$LOD = \log_{10} \frac{{{\mathcal{L}}_{1} \left( {\mu ,\beta^{*} ,\sigma^{2} } \right)}}{{{\mathcal{L}}_{0} \left( {\mu ,\sigma^{2} } \right)}}$$

Or simply by using \(LOD = LRT/\left[ {2 \times \log (10)} \right]\).

An extension of IM, composite interval mapping (CIM), proposes the inclusion of \(M\) markers as covariates (also called cofactors) in order to control variation outside the region being search for QTL, increasing the detection power (Zeng 1994), as follows:

$$y_{i} = \mu + \beta^{*} x_{i}^{*} + \mathop \sum \limits_{m = 1}^{M} \beta_{m} x_{mi} + \varepsilon_{i}$$

Similar to what can be done for SMA, in order to evaluate significance (declare a QTL), empirical \(LOD\) thresholds are computed for each trait using permutations (Doerge and Churchill 1996). Such models and algorithms for running IM and CIM are available in software like WinQTL Cartographer (Basten et al. 1999) and MapQTL (van Ooien et al. 2000), both broadly used in sweetpotato QTL mapping work (Table 6.1).

This model can be employed in linkage maps constructed using the double pseudo-test cross method, resulting in two separate maps, one for each parent (Grattapaglia and Sederoff 1994). After building such maps for TB population (Cervantes-Flores et al. 2008a), IM and CIM was used for QTL confirmation for RKN resistance (Cervantes-Flores et al. 2008b) as well as for QTL identification for dry matter (13, \(PEV = 15\sim24\%\)), starch (12, \(PEV = 17\sim30\%\)), and β-carotene (8, \(PEV = 17\sim35\%\)) in the TB population (Cervantes-Flores et al. 2011). Both methods were also used to identify other 27 QTL in different environmental conditions for dry matter (\(PEV = 9.0\sim45.1\%\)) (Zhao et al. 2013), and 8 QTL for starch (\(PEV = 9.1\sim38.8\%\)) (Yu et al. 2014). For yield traits, 23 QTL have identified using CIM in ‘Nancy Hall’ (\(PEV = 14.1 \sim 29.8\%\)) and ‘Tainung 27’ (\(PEV = 16.0 \sim 29.9\%\)) separate maps, respectively (Chang et al. 2009), whereas another study identified 45 QTL using IM and CIM, explaining between 10.2 and 59.3% of the phenotypic variation.

A first approach to map QTL in autopolyploid species using the information of fully phased linkage maps has been initially proposed for autotetraploids (Hackett et al. 2014), herein called fixed-effect interval mapping (FEIM) model. It consists of a single-QTL model, where every position is tested according to a model that can be more generally written for any given ploidy \(p\) as follows:

\(Y = \mu_{C} + \mathop \sum \limits_{j = 2}^{p} \alpha_{j} X_{j} + \mathop \sum \limits_{j = p + 2}^{2p} \alpha_{j} X_{j}\)

where \(\mu_{C}\) is the intercept, and \(\alpha_{j}\) and \(X_{j}\) are the main additive effects and indicator variables for allele \(j\) (i.e., haplotype probabilities inferred from fully phased linkage maps), respectively, where \(j = \left\{ {1, \ldots ,p} \right\}\) and \(j = \left\{ {p + 1, \ldots ,2p} \right\}\) represent the two sets of alleles, one for each parent. The constraints \(\alpha_{1} = 0\) and \(\alpha_{p + 1} = 0\) are imposed to satisfy the conditions \(\sum\nolimits_{i = 1}^{p} {X_{j} } = p/2\) and \(\sum\nolimits_{i = p + 1}^{2p} {X_{j} } = p/2\), so that \(\mu_{C}\) is a constant hard to interpret due to these constraints. Notice that the higher the ploidy level, the more effects must be estimated. For example, tetraploid models have six main effects, hexaploid models have 10 effects, octoploid models will have 14 effects (i.e., \(2p - 2\)), which will be needed for every new QTL added in a multiple loci model.

Such a model has been implemented for hexaploid species in R packages like polymapR (Bourke et al. 2018) and QTLpoly (Da Silva Pereira et al. 2020). Application of QTLpoly function ‘feim()’ in our flesh color illustration shows two QTL in the same genomic regions as identified using SMA (Fig. 6.3a). The analysis used the linkage map reported for the population, combining a total of 38,701 SNPs aligned against both I. trifida and I. triloba genomes (Mollinari et al. 2020). The QTL on chromosome 3 at 34.11 cM (\(LOD = 31.21\), \(PEV = 41.8\%\)) has its peak close to SNP Chr03_2615608, whereas QTL on chromosome 12 at 146.02 cM (\(LOD = 17.98\), \(PEV = 21.6\%\)) was mapped close to SNP Chr12_22131994 (both in relation to I. trifida reference genome). LOD threshold for 95% genome-wide significance equals 7.7 was obtained after 1,000 permutation tests (Fig. 6.3b).

Fig. 6.3
figure 3

Fixed-effect interval mapping analysis of flesh color in sweetpotato ‘Beauregard’ × ‘Tanzania’ full-sib population (\(N = 298\)) evaluated in Peru. a Log-of-the-odds (LOD) score profile showing QTL on chromosomes 3 and 12 (triangles). b Distribution maximum LOD scores from 1000 permutation tests and genome-wide significance threshold for \(\alpha = 0.05\)

6.2.3 Multiple QTL Random-Effect Model

Although all these studies were important and made progress in understanding agronomically important traits in sweetpotato, they were relatively limited by the QTL mapping method. In fact, IM and CIM offer improvements in comparison to SMA by allowing marker interval testing and increased detection power. However, they all consisted of single-QTL models. One of our expectations when conducting QTL mapping studies is realizing that multiple QTL, in fact, contribute toward the trait variation. In this scenario, multiple interval mapping (MIM) model together with an algorithm for searching QTL was needed, similar to what diploid, inbred-based populations had (Kao et al. 1999).

Our method is based on the following random-effect model (Da Silva Pereira et al. 2020):

$${\varvec{y}} = \varvec{1}\mu + \mathop \sum \limits_{q = 1}^{Q} {\varvec{g}}_{q} + {\varvec{\varepsilon}}$$

where the vector of phenotypic values from a specific trait \({\varvec{y}}\) is a function of the fixed intercept \(\mu\), the \(q = 1, \ldots ,Q\) random QTL effects \({\varvec{g}}_{q} \sim N\left( {0,{\varvec{G}}_{q} \sigma_{q}^{2} } \right)\), and the random environmental error \({\varvec{\varepsilon}} \sim N\left( {0,{\varvec{I}}\sigma^{2} } \right)\). \({\varvec{G}}_{q}\) is an identity-by-descent (IBD) matrix comparing all possible 400 genotypes in an autohexaploid biparental population (i.e., whether two individuals share from 0 to 6 alleles IBD) according to genotype conditional probabilities of QTL \(q\), working similarly to an additive relationship matrix. Because we only need to estimate one parameter per QTL (the very variance component associated with it), it is relatively easy to look for additional QTL and add them to the variance component model, without ending up with an overparameterized model.

A multiple-QTL model is known to have increased power when compared to a single-QTL model, with ability to detect minor or separate yet linked QTL (Da Silva Pereira et al. 2020). Variance components associated with putative QTL (\(\sigma_{q}^{2}\)) are tested using score statistics from the R package varComp (v. 0.2–0) (Qu et al. 2013). Final models are fitted using residual maximum likelihood (REML) from the R package sommer (v. 3.6) (Covarrubias-Pazaran 2016). Rather than guessing pointwise significance levels for declaring QTL, we use the score-based resampling method to assess the genome-wide significance level \(\alpha\) (Zou et al. 2004).

Building a multiple QTL model is considered a model selection problem, and there are several ways to approach it. QTLpoly tries to provide functions flexible enough, so that the users can build a multiple QTL model on their own, manually. The strategy mentioned below has been tested through simulations and it is implemented in a function called ‘remim()’. It consists of an adaptation of the algorithm proposed by Kao et al. (1999) for fixed-effect MIM for diploids to our random-effect MIM (REMIM) for polyploids, which is summarized as follows:

  1. 1.

    Null model. For each trait, a model starts with no QTL:

    $${\varvec{y}} = \varvec{1}\mu + {\varvec{\varepsilon}}$$
  1. 2.

    Forward search. QTL (\(q = 1, \ldots ,Q\)) are added one at a time, conditional to the one(s) (if any) already in the model, under a less stringent genome-wide significance level (e.g., \(\alpha < 0.20\)):

    $${\varvec{y}} = \varvec{1}\mu + \mathop \sum \limits_{q = 1}^{Q} {\varvec{g}}_{q} + {\varvec{\varepsilon}}$$
  1. 3.

    Model optimization. Each QTL \(r\) is tested again conditional to the remaining one(s) in the model under a more stringent genome-wide significance level (e.g., \(\alpha < 0.05\)):

    $${\varvec{y}} = \varvec{1}\mu + {\varvec{g}}_{r} + \mathop \sum \limits_{q \ne r} {\varvec{g}}_{q} + {\varvec{\varepsilon}}$$

Steps 2 and 3 are repeated until no more QTL can be added to or dropped from the model, and positions of the remaining QTL do not change. After the first model optimization, any following forward searches use the more stringent threshold (e.g., \(\alpha < 0.05\)) as the detection power is expected to increase once QTL have already been added to the model.

  1. 4.

    QTL profiling. Score statistics for the whole genome are updated conditional to the final set of selected QTL. Once the final model is fitted, QTL heritability is computed as \(h_{q}^{2} = \sigma_{q}^{2} /\sigma_{p}^{2}\), where \(\sigma_{p}^{2}\) is the total phenotypic variance.

The BT mapping population was leveraged to identify two major QTL related to starch, β-carotene, and their respective correlated traits, dry matter and flesh color (Gemenet et al. 2020a)—the same one described here in our example (Fig. 6.4a). Again, two QTL, one on chromosome 3 at 36.14 cM (\(P < 10^{ - 16}\), \(h_{q}^{2} = 0.536\)) was located close to SNP Chr03_2994719 (thus close to FEIM results), and one on chromosome 12 at 146.02 cM (\(P < 10^{ - 16}\), \(h_{q}^{2} = 0.293\)) was mapped close to SNP Chr12_22131994 (same position as for FEIM). Together with RNA-seq data, this research has shown that the QTL on chromosome 3 presented a correlated effect in reducing starch (and dry matter) and increasing β-carotene contents (and flesh color scores, Fig. 6.4b) in genotypes carrying a haplotype from the ‘Beauregard’ parent, shedding light into the genetics basis of negatively correlated traits (dry matter and flesh color), very well known to breeders (Gemenet et al. 2020a).

Fig. 6.4
figure 4

Random-effect multiple interval mapping analysis for flesh color in sweetpotato ‘Beauregard’ × ‘Tanzania’ full-sib population (\(N = 298\)) evaluated in Peru. a Log of P-value profile showing QTL (triangles) on chromosomes 3 (\(h_{q}^{2} = 0.536\)) and 12 (\(h_{q}^{2} = 0.293\)). b Prediction of parental haplotype contributions to increasing (blue) or decreasing (red) overall mean \(\mu = 5.68\) per QTL. Adapted from Gemenet et al. (2020a)

Considering the same BT population integrated linkage map (Mollinari et al. 2020) and the REMIM approach, 13 QTL were mapped for eight yield-related traits, with the number of QTL per trait ranging from one to four. These QTL explained up to 55% of the total variation, where both parents (‘Beauregard’ and ‘Tanzania’) contributed with alleles to increasing the trait means (Da Silva Pereira et al. 2020). Studying iron (Fe) and zinc (Zn) contents in Ghana for BT population, two QTL each were found explaining respective 51.0%, and 23.5% of total variation, in the same location as those QTL for β-carotene (Mwanga et al. 2021), making double biofortification efforts likely to be successful. Finally, one major QTL explaining 58.3% of total variation for root-knot nematode resistance for the reciprocal TB population was also detected (Oloka et al. 2021).

We are currently carrying out validation tests on SNPs converted into kompetitive allele specific PCR (KASP) within QTL regions associated with several traits (Da Silva Pereira et al. 2023). In our flesh color illustration, the genome of I. trifida shows six transcripts annotated as phytoene synthase, namely itf12g01830.t1, itf03g05110.t1, itf03g10720.t1, itf03g10720.t2, itf14g07540.t1, itf14g07550.t1 (http://sweetpotato.uga.edu/). From the QTL mapping analysis, we found out that itf03g05110 is the most likely gene involved in variation of flesh color in the BT population, as it matches the location of QTL on chromosome 3. Polymorphic SNPs derived from whole-genome sequencing of the 16 parents of an 8 × 8 diallel called ‘Mwanga Diversity Panel’ (MDP) (Wu et al. 2018) were selected within itf03g05110. Results for one SNP, Chr03_3120259, are shown here. Allele dosage was estimated using fitPoly R package (Voorrips et al. 2011) (Fig. 6.5a), and tested against the flesh color scores of a sample of the 16 parents plus 78 progenies from MDP. The results have shown significant association (\(P = 8.0 \times 10^{ - 6}\)) between genotype and phenotype, with 21% of proportion of variance explained by this single marker (Fig. 6.5b), making it a candidate for MAS purposes in sweetpotato.

Fig. 6.5
figure 5

QTL for flesh color on chromosome 3 converted into kompetitive allele specific PCR (KASP) marker. The marker was designed at 3,120,259 bp within phytoene synthase gene (itf03g05110) of Ipomoea trifida genome. a Dosage calling from alleles a and A intensities where each dot represents an individual assigned to given genotype class (color) under certain probability P (transparency). b Association with allele dosage shows an \(R^{2} = 0.21\) (\(P = 8.0 \times 10^{ - 6}\)) for samples of an 8 × 8 diallel (\(N = 94\)). Adapted from Da Silva Pereira et al. (2023)

6.3 BSA-seq

As observed in the previous subsection, the detection of QTL depends on the availability of DNA molecular markers and the construction of linkage maps from biparental crosses with segregating phenotypes. As such, fine QTL mapping requires a high number of polymorphic markers and a large population size. Although sequencing technologies costs had lowered over time, high-throughput SNP genotyping of large populations is still costly, especially for polyploid species which require high sequencing depth to accurately perform dosage calling (Gemenet et al. 2020b).

Bulk-segregant analysis sequencing (BSA-seq) is a strategy that enables the identification of SNPs associated to traits of interest in a less expensive way when compared to conventional QTL mapping strategies, by combining bulked-segregant analysis (BSA) (Michelmore et al. 1991) and whole genome sequencing (WGS). The DNA of progenies in the extremes of the trait distribution are bulked according to their phenotypic class—one called ‘low’ and another ‘high’ bulks. Both bulks are subjected to WGS, and the resulting reads are mapped to a reference genome. A similar frequency distribution of alleles from both parents is expected in regions that are not associated with the phenotype expression, while an uneven representation from one of the two parental alleles is expected in QTL regions.

This strategy was first proposed in yeast (Ehrenreich et al. 2010) and later applied to rice under so-called QTL-seq, where QTL for blast disease resistance and seedling height were successfully identified (Takagi et al. 2013). The proportion of reads derived from each parental genome was used to determine a SNP-index where 0 and 1 represent the entirety of reads containing the reference or alternative allele, respectively, and 0.5 represents equal contributions. The difference between indexes from both low and high bulks, called ΔSNP-index, will show values around 0 for the latter and close to 1 for the former case. Since then, BSA-seq has been used for several species such as tomato (Wen et al. 2019), capsicum (Park et al. 2019), and soybean (Zhang et al. 2018).

BSA-seq was first applied to a polyploid species in 2018. Clevenger et al. (2018) used the strategy to identify QTL for late leaf spot disease resistance in allotetraploid peanut. The initial hindrance was that the SNP detection methods used for diploid species produced a high proportion of false positive SNPs in peanuts. To circumvent this issue, the authors used a polyploid SNP calling pipeline. The polyploid calling allowed the identification of three QTL and the development of SNP markers for MAS. Recently, BSA-seq was used in combination with other techniques for QTL detection and MAS application for seed weight in peanut (Wang et al. 2022), and disease resistance in allotetraploid cotton (Zhao et al. 2021). In allooctoploid strawberry, BSA-seq was used to specify the subgenomes origins of three male sterility QTL (Wada et al. 2021).

For autopolyploid species, a polyploid BSA-seq method was developed and tested using data from tetraploid potato and hexaploid sweetpotato. The minimum sequencing depth for identifying parent-specific simplex SNP calling was determined to be 40 × and 75 × for potato and sweetpotato, respectively (Yamakawa et al. 2021). Sequences from one parent are aligned to the species public genome to identify a reference SNP allele. Reads from the bulks and the second parent are reported in relation to reference alleles and SNP indices are calculated. SNP loci were evaluated for both parents considering sites where SNP-indexes were equal to 0 in one of the parents, i.e., in nulliplex cases. The potato’s H1 resistance locus was identified on chromosome 5, and sweetpotato’s anthocyanin QTL was detected on chromosome 12 (Yamakawa et al. 2021).

In our illustration, we have simply combined the read counts of BT progenies whose flesh color scores were lower than 4.30 (24 individuals within ‘low’ bulk) or greater than 7.21 (23 individuals within ‘high’ bulk) as if they were sequenced in their respective bulks out of the raw variant call format (VCF) files. From a total of 87,134 variants derived from I. trifida genome alignment, there were 8,567 and 18,226 SNPs in simplex (Aaaaaa) states for either ‘Beauregard’ or ‘Tanzania’, respectively, to contrast with the nulliplex (aaaaaa) states of the other parent. Differences between SNP-index from the low and high value bulks, ΔSNP-index, allowed us to detect the same regions on chromosome 3 and 12 to be associated with flesh color (Fig. 6.6).

Fig. 6.6
figure 6

BSA-seq for flesh color in sweetpotato ‘Beauregard’ × ‘Tanzania’ full-sib population (\(N = 298\)) evaluated in Peru. The evaluation of 26,793 simplex SNPs (8567 for ‘Beauregard’ and 18,226 for ‘Tanzania’) has shown highest absolute values of ΔSNP-index between ‘low’ and ‘high’ bulks on chromosome 3 (3,120,245 bp) and 12 (21,812,147 bp for ‘Beauregard’, and 21,271,480 bp for ‘Tanzania’) close to previously identified QTL regions

6.4 Future Prospects

The main goal of QTL mapping is to investigate the genetic architecture of quantitative traits of interest. Single- and multiple-QTL models have been available for inbred, diploid mapping populations for quite some time now (see Da Costa and Zeng (2010) for a comprehensive review). However, only recently these methods became available for outbred, polyploid mapping populations. For sweetpotato, QTL mapping has been used in different populations (backgrounds) and for a set of traits to date. Great progress has been achieved, particularly with the recent studies which used molecular markers and statistical methods specifically developed for complex autopolyploid species. Next steps in QTL mapping should remain in the extension of the methods to account for multiple traits or environments simultaneously, enabling the investigation of pleiotropic effects and linkage as well as the interaction between QTL and environments. Certainly, the future results of these approaches will be helpful to improve our understanding regarding genetic architecture of the traits. Major effects QTL, detected with stability in multiple environments and for several traits, could be incorporated as a fixed term into prediction statistical models, in the context of genomic selection. The development of these models is currently in progress for sweetpotato populations, in its first version, and the use of novel QTL could be helpful. Overall, we believe these approaches will provide valuable information for MAS into sweetpotato breeding.