Improvement of genomic prediction in advanced wheat breeding lines by including additive-by-additive epistasis

Raffo, Miguel Angel; Sarup, Pernille; Guo, Xiangyu; Liu, Huiming; Andersen, Jeppe Reitan; Orabi, Jihad; Jahoor, Ahmed; Jensen, Just

doi:10.1007/s00122-021-04009-4

Improvement of genomic prediction in advanced wheat breeding lines by including additive-by-additive epistasis

Original Article
Open access
Published: 01 January 2022

Volume 135, pages 965–978, (2022)
Cite this article

Download PDF

You have full access to this open access article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Improvement of genomic prediction in advanced wheat breeding lines by including additive-by-additive epistasis

Download PDF

Miguel Angel Raffo ORCID: orcid.org/0000-0002-5330-0117¹,
Pernille Sarup^1,2,
Xiangyu Guo¹,
Huiming Liu¹,
Jeppe Reitan Andersen²,
Jihad Orabi²,
Ahmed Jahoor^2,3 &
…
Just Jensen¹

3632 Accesses
20 Citations
3 Altmetric
Explore all metrics

Abstract

Key message

Including additive and additive-by-additive epistasis in a NOIA parametrization did not yield orthogonal partitioning of genetic variances, nevertheless, it improved predictive ability in a leave-one-out cross-validation for wheat grain yield.

Abstract

Additive-by-additive epistasis is the principal non-additive genetic effect in inbred wheat lines and is potentially useful for developing cultivars based on total genetic merit; nevertheless, its practical benefits have been highly debated. In this article, we aimed to (i) evaluate the performance of models including additive and additive-by-additive epistatic effects for variance components (VC) estimation of grain yield in a wheat-breeding population, and (ii) to investigate whether including additive-by-additive epistasis in genomic prediction enhance wheat grain yield predictive ability (PA). In total, 2060 sixth-generation (F₆) lines from Nordic Seed A/S breeding company were phenotyped in 21 year-location combinations in Denmark, and genotyped using a 15 K-Illumina-BeadChip. Three models were used to estimate VC and heritability at plot level: (i) “I-model” (baseline), (ii) “I + G_A-model”, extending I-model with an additive genomic effect, and (iii) “I + G_A + G_AA-model”, extending I + G_A-model with an additive-by-additive genomic effects. The I + G_A-model and I + G_A + G_AA-model were based on the Natural and Orthogonal Interactions Approach (NOIA) parametrization. The I + G_A + G_AA-model failed to achieve orthogonal partition of genetic variances, as revealed by a change in estimated additive variance of I + G_A-model when epistasis was included in the I + G_A + G_AA-model. The PA was studied using leave-one-line-out and leave-one-breeding-cycle-out cross-validations. The I + G_A + G_AA-model increased PA significantly (16.5%) compared to the I + G_A-model in leave-one-line-out cross-validation. However, the improvement due to including epistasis was not observed in leave-one-breeding-cycle-out cross-validation. We conclude that epistatic models can be useful to enhance predictions of total genetic merit. However, even though we used the NOIA parameterization, the variance partition into orthogonal genetic effects was not possible.

Genome-wide mapping and prediction suggests presence of local epistasis in a vast elite winter wheat populations adapted to Central Europe

Article 19 December 2016

Genomic prediction of agronomic traits in wheat using different models and cross-validation designs

Article 01 November 2020

Choosing the right tool: Leveraging of plant genetic resources in wheat (Triticum aestivum L.) benefits from selection of a suitable genomic prediction model

Article Open access 01 October 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Genomic selection (GS, Meuwissen et al. 2001) methods based on whole-genome prediction (WGP) have been successfully applied for a variety of quantitative traits of agronomic importance in animals and plants (Poland et al. 2012; Gianola and Rosa 2015; Crossa et al. 2017; Kristensen et al. 2019).

In quantitative genetics, a distinction is made between the genomic estimated breeding value (GEBVs, estimated additive genetic effects) and the total genetic value (estimated additive plus non-additive genetic effects). Traditionally, wheat breeders have based the selection of lines on phenotypic selection, which can be seen as a measure of total genetic value. The better performance of GS over phenotypic selection (Crossa et al. 2011; Michel et al. 2017; Tessema et al. 2020) has led many wheat breeding programs to implement GS, and base the selection of lines on the prediction of GEBVs, which in general are used to select both breeding lines and commercial varieties. However, the non-additive genetic effects can play a relevant role in the determination of complex traits such as grain yield (Carlborg and Haley 2004; Mackay 2014). Separating additive and non-additive genetic effects can be favorable if it contributes to a more accurate estimate of both additive and total genetic merit. In this context, treating additive and non-additive effects separately can result in an improved strategy of selection, allowing to select crossing parents based exclusively on the additive effect, and develop commercial varieties, based on both additive plus non-additive effects.

The non-additive genetic effect can be defined by their “biological” meaning, referred to the variations due to gene action, or as defined by Fisher (1918), by their “statistical” meaning, referred to deviations from additivity in a statistical model. The non-additive genetic effects are classified into epistasis and dominance. Epistasis is defined as the interaction between alleles at different loci, and it can be divided into pairwise classes: (i) additive-by-additive, (ii) additive-by-dominance and (iii) dominance-by-dominance, and into higher-order epistatic classes involving more than two loci. In wheat breeding, commercial cultivars are commonly developed by several generations of selfing to create inbred lines. Due to the high homozygosity of inbred lines obtained by seed multiplication via selfing, the epistatic interactions are fixed in cultivars and can be kept for future generations used in commercial production.

Modelling additive-by-additive effects in genomic prediction (GP) can be restrictive due to the high computational load caused by the high number of interactions between markers if all possible interactions are considered. Under the assumption of quantitative trait loci (QTL) effects coming from the same normal distribution, a mathematically equivalent alternative to model epistasis, and reduce the computational load, is to use models including genomic relationship matrices as covariance structures for individuals. Several authors have proposed to extend the genomic best linear unbiased prediction (G-BLUP) model (Habier et al. 2007; VanRaden 2008) by adding non-additive terms (extended best linear unbiased prediction, EG-BLUP). The term “EG-BLUP” refers in the literature to a model with multiple types of genetics effects (additive, dominance, epistatic), in which the coding of the marker matrix to calculate the relationship matrices can be flexible (Su et al. 2012; Xu 2013; Jiang and Reif 2015; Martini et al. 2016). Henderson (1985) proposed to use the Hadamard product of the pedigree-based additive relationship matrix with itself to approximate the additive-by-additive epistatic matrix. Henderson’s approach was later implemented in the genomic framework by Su et al. (2012), where the Hadamard product of the additive genomic relationship matrix was used to build the additive-by-additive matrix. The resulting marker-based relationship matrix captures deviations due to additive-by-additive interactions plus dominance when it is present (Martini et al. 2016). Marker-based epistatic relationship matrices are also proposed to estimate the additive-by-additive interactions without including the dominance effect (Xu 2013; Jiang and Reif 2015; Martini et al. 2016). Recently, Vitezica et al. (2017) proposed to use the natural and orthogonal interactions (NOIA) approach (Alvarez-Castro and Carlborg 2007) to model non-additive genetic effects in GP. However, as recently reported by Joshi et al. (2020), the EG-BLUP and the NOIA are equivalents if the marker coding for the EG-BLUP follows VanRaden (2008) and only additive and additive-by-additive epistatic effects are included in the models.

The dominance genetic effect has also been investigated in GS for wheat breeding. Dominance is defined as the effects of allelic interaction within loci (Fisher 1918), and it has been particularly relevant for the heterotic effect in hybrid wheat populations (Zhao et al. 2015; Jiang et al. 2017). Jiang et al. (2017) found a heterotic effect for grain yield in a hybrid population of winter wheat derived from crosses among diverse elite parents. In their study, the hybrids outperformed the mid-parents by 10% on average. The relevance of accounting for dominance in prediction models has also been investigated in simulation studies, reporting an increase in the prediction accuracy for populations presenting a dominance effect when dominance was accounted for in prediction models (Wellmann and Bennewitz 2012). However, for inbred wheat lines, the dominance effects are very low to negligible due to their reduced heterozygosity, and the epistasis is, therefore, the only relevant non-additive genetic effect.

The lack of independence between loci, and having linked markers instead of causative mutations may affect the orthogonal partition of genetic effects into independent statistical components and lead to problems in the estimation of genetic variances (Zeng et al. 2005; Wang and Zeng 2006; Hill and Mäki‐Tanila 2015; Vitezica et al. 2017). The lack of orthogonality between genetic effects can be evidenced by estimates that are affected when an additional genetic term is included in the model (Papoulis and Pillai 2002; Vitezica et al. 2017; Joshi et al. 2020). Nevertheless, several authors have reported that including epistasis in genetic models can be useful to enhance prediction and selection (Hu et al. 2011; He et al. 2016; Martini et al. 2017). On the other hand, different results have been reported by other authors, Jarquín et al. (2014) found that including epistasis did not improve PA, and Lorenzana and Bernardo (2009) even found a negative effect of including epistasis in PA.

In this study, we use a large set of winter wheat breeding lines, phenotyped for grain yield in multiple environments and in multiple years. Our study had two specific objectives:

(i)
To evaluate the performance of models including additive and additive-by-additive epistatic effects for variance component estimation for grain yield.
(ii)
To investigate the predictive ability (PA) of such models for prediction of advanced breeding lines.

Materials and methods

Experimental data

The plant material consisted of 2060 sixth-generation (F₆) winter wheat lines (T. aestivum L.) developed by the breeding company Nordic Seed A/S. The data were collected from seven breeding cycles from 2013 to 2019, each including around 330 lines evaluated in three locations in Denmark (DK): Odder (Central DK), Holeby (South DK) and Skive (North DK). The F₆ lines of each breeding cycle originated from approximately 60 parental line-crosses, followed by five generations of selfing, including creating single seed descent (SSD) lines in generation F₄. The breeding cycles from 2013 to 2016 were evaluated in two consecutive years (cycle 1: 2013–2014, cycle 2: 2014–2015, cycle 3: 2015–2016, cycle 4: 2016–2017), and the cycles coming from 2017 to 2019 were evaluated in one year only (cycle 5: 2017, cycle 6: 2018, cycle 7: 2019). The field trials consisted of 15 blocks of 46 line plots of 8.25 m² per year × location combination. Each block had two replicates of 21 F₆ lines and two checks randomly assigned. The experimental conditions within the year × location subsets were homogeneous for the trials (e.g., sowing time, application of treatments, assessment time). The quantitative trait analyzed in this study was the yield measured as kg per plot (8.25 m²).

Genotyping

DNA extractions from the plant material were based on a modified CTAB method (Rogers and Bendich 1985). The genotyping was carried out using a 15 K Illumina Infinium iSelect HD Custom Genotyping BeadChip technology (Wang et al. 2014). For the quality control, the SNPs with minor allele frequency (MAF) lower than 5% and with a call rate < 0.90 were removed. Missing genotypes were imputed with mean value (∼1.3% of missing values imputed). In total, 10,688 SNPs passed the quality control.

Statistical models

This study compared three different models. Firstly, a baseline mixed model without genomic information (I-model, Eq. 1), including fixed and random effects, was used as the starting point for the construction of the other models (Cericola et al. 2017; Tsai et al. 2020). Secondly, the I + G_A-model (Eq. 2) was used to extend the I-model with an additive genomic effect according to the NOIA parametrization proposed by Alvarez-Castro and Carlborg (2007) and later extended to GP by Vitezica et al. (2017). Third, the I + G_A + G_AA-model (Eq. 4) was used to extend the I + G_A-model by adding a pairwise additive-by-additive epistatic terms according to the NOIA parametrization.

I-model (Baseline)

The baseline model (Eq. 1) was developed considering the main sources of variability affecting the experimental data and included them as fixed or random effects, and we referred to as “I-model” hereinafter since it uses an identity covariance matrix for the line effects. A similar model has also been presented in earlier studies working with a set of data from Nordic Seed A/S (Cericola et al. 2017; Tsai et al. 2020). The I-model was defined as:

$${\varvec{y}} = {\varvec{Xb}} + {\varvec{Z}}_{1} {\varvec{l}} + {\varvec{Z}}_{2} {\varvec{f}} + \mathop \sum \limits_{i = 1}^{9} {\varvec{Z}}_{{{\varvec{i}} + 2}} {\varvec{s}} + {\varvec{e}}$$

(1)

where ${\varvec{y}}$ is the vector of observed phenotypes; ${\mathbf{X}}$ is the design matrix for fixed effects; ${\varvec{b}}$ is the vector of fixed trial effects nested within year, location and breeding cycle; ${\mathbf{Z}}_{1}$ and ${\mathbf{Z}}_{2}$ are design matrices of random effects; ${\varvec{l}}$ is a vector of line effect with $\user2{l }\sim N\left( {0,{\varvec{I}}\sigma_{l}^{2} } \right)$, where ${\varvec{I}}$ is an identity matrix and $\sigma_{l}^{2}$ is the variance due to uncorrelated line effects; ${\varvec{f}}$ is a vector of line by environment interaction (lines-by-year-location) with $\user2{f }\sim N\left( {0,{\varvec{I}}\sigma_{f}^{2} } \right)$, where $\sigma_{f}^{2}$ is the variance due to uncorrelated line by environment effects; ${\varvec{s}}$ is a vector of spatial effect with $\user2{s }\sim N\left( {0,{\varvec{I}}\sigma_{s}^{2} } \right)$, where $\sigma_{s}^{2}$ is the spatial effect variance. The spatial effect contains the X and Y coordinate of the target plot and the eight surrounding plots (n = 9), for plots located in the border, virtual plots were added to guarantee all plots have n = 9 in order to account for border effects (Supplementary material Fig. 1S). Therefore, the spatial effect on an individual plot is the sum of effects with the square centered on the plot itself plus the effects of eight surrounding plots with a square centered on those plots; ${\varvec{e}}$ is a vector of random residuals with $\user2{e }\sim N\left( {0,{\varvec{I}}\sigma_{e}^{2} } \right)$, where $\sigma_{e}^{2}$ is the residual variance. All random effects were assumed to be independent.

Note that the genetic term in the I-model is miss-specified since the model assumes all lines to be unrelated. Therefore, it may lead to a biased estimation of the total genetic variance.

I + GA-model

The “I + G_A-model” (Eq. 2) was the second model used, and it includes an additive genomic relationship matrix based on the NOIA parametrization as covariance structure to define the additive genetic effects. The I + G_A-model was defined as:

$${\varvec{y}} = {\varvec{Xb}} + {\varvec{Z}}_{1} {\varvec{l}} + {\varvec{Z}}_{3} {\varvec{g}} + {\varvec{Z}}_{2} {\varvec{f}} + \mathop \sum \limits_{i = 1}^{9} {\varvec{Z}}_{{{\varvec{i}} + 3}} {\varvec{s}} + {\varvec{e}}$$

(2)

where ${\mathbf{X}}$, ${\mathbf{Z}}_{{\mathbf{n}}}$, ${\varvec{b}}$, ${\varvec{l}}$, ${\varvec{f}}$, ${\varvec{s}}$, and ${\varvec{e}}$ are the same as described in the I-model (Eq. 1); ${\varvec{g}}$ is a vector of additive genomic breeding values with $\user2{g }\sim N\left( {0,{\mathbf{G}}_{{{\mathbf{NOIA}}}} \sigma_{g}^{2} } \right)$, where $\sigma_{g}^{2}$ is the genomic additive genetic variance and ${\mathbf{G}}_{{{\mathbf{NOIA}}}}$ is a genomic relationship matrix constructed based on Vitezica et al. (2017):

$${\varvec{G}}_{{{\varvec{NOIA}}}} = \frac{{{\varvec{H}}_{{\varvec{a}}} {\varvec{H}}_{{\varvec{a}}}^{\prime } }}{{tr\left( {{\varvec{H}}_{{\varvec{a}}} {\varvec{H}}_{{\varvec{a}}}^{\prime } } \right)/n}}$$

(3)

where ${\varvec{H}}_{{\varvec{a}}}$ is an n rows (number of lines) x m columns (number of markers) matrix containing the additive coefficients as:

$${\varvec{H}}_{{\varvec{a}}} = \left( {\begin{array}{*{20}c} {{\varvec{h}}_{{{\varvec{a}}_{{\varvec{i}}} }} } \\ {\begin{array}{*{20}c} . \\ . \\ . \\ {{\varvec{h}}_{{{\varvec{a}}_{{\varvec{n}}} }} } \\ \end{array} } \\ \end{array} } \right)$$

and ${\varvec{h}}_{{{\varvec{a}}_{{\varvec{i}}} }}$ is a row vector for the ith individual with m columns. For individual 1 with marker j = 1,.., m, the element ${\varvec{h}}_{{{\varvec{a}}_{1} {\varvec{j}}}}$ is equal to:

$${\varvec{h}}_{{{\varvec{a}}_{1} {\varvec{j}}}} = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} { - \left( {p_{Aa} - 2p_{aa} } \right)} \\ { - \left( {1 - p_{Aa} - 2p_{aa} } \right)} \\ \end{array} } \\ { - \left( {2 - p_{Aa} - 2p_{aa} } \right)} \\ \end{array} } \right.{\text{for genotypes}}\;\left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {AA} \\ {Aa} \\ \end{array} } \\ {aa} \\ \end{array} } \right.$$

where $p_{Aa}$ and $p_{aa}$ are the genotypic frequencies for the genotypes Aa and aa in locus A. The term $tr\left( {{\varvec{H}}_{{\varvec{a}}} {\varvec{H}}_{{\varvec{a}}}^{\prime } } \right)/n$ is the trace for the ${\varvec{H}}_{{\varvec{a}}} {\varvec{H}}_{{\varvec{a}}}^{\prime }$ matrix, which standardize ${\varvec{G}}_{{{\varvec{NOIA}}}}$ to a variance equal to one.

I + GA + GAA-model

Our last model, extend I + G_A-model by including an additive-by-additive epistatic term using a genomic relationship matrix based on NOIA parametrization (Alvarez-Castro and Carlborg 2007; Vitezica et al. 2017) as covariance structures. The I + G_A + G_AA-model was defined as:

$${\varvec{y}} = {\varvec{Xb}} + {\varvec{Z}}_{1} {\varvec{l}} + {\varvec{Z}}_{3} {\varvec{g}} + {\varvec{Z}}_{4} {\varvec{h}} + {\varvec{Z}}_{2} {\varvec{f}} + \mathop \sum \limits_{i = 1}^{9} {\varvec{Z}}_{{{\varvec{i}} + 4}} {\varvec{s}} + {\varvec{e}}$$

(4)

where ${\mathbf{X}}$, ${\mathbf{Z}}_{{\mathbf{n}}}$, ${\varvec{b}}$, ${\varvec{l}}$, ${\varvec{f}}$, ${\varvec{s}}$, ${\varvec{g}}$, and ${\varvec{e}}$ are the same as described in the I-model (Eq. 1) and I + G_A-model (Eq. 2); ${\varvec{h}}$ is a vector of epistatic genomic values for the lines with $\user2{h }\sim N\left( {0,{\varvec{H}}_{{{\varvec{NOIA}}}} \sigma_{h}^{2} } \right)$, where $\sigma_{h}^{2}$ is the genomic epistatic variance and ${\varvec{H}}_{{{\varvec{NOIA}}}}$ is the epistatic relationship matrix constructed based on Vitezica et al. (2017):

$${\varvec{H}}_{{{\varvec{NOIA}}}} = \frac{{{\varvec{G}}_{{{\varvec{NOIA}}}} \odot {\varvec{G}}_{{{\varvec{NOIA}}}} }}{{tr\left( {{\varvec{G}}_{{{\varvec{NOIA}}}} \odot G_{NOIA} } \right)/n}}$$

(5)

where the $\odot$ operator represents the Hadamard product between matrices, and the term $tr\left( {{\varvec{G}}_{{{\varvec{NOIA}}}} \odot {\varvec{G}}_{{{\varvec{NOIA}}}} } \right)/n$ is the trace for the ${\varvec{G}}_{{{\varvec{NOIA}}}} \odot {\varvec{G}}_{{{\varvec{NOIA}}}}$ matrix, which standardize ${\varvec{H}}_{{{\varvec{NOIA}}}}$ to a variance one.

As explained in the introduction, the model following NOIA parametrization is equivalent to a EG-BLUP model following VanRaden (2008) coding since in the current scenario, only additive and additive-by-additive effects are considered in the models, and no dominance effect is present in the population (Joshi et al. 2020).

Variance components and heritability

The estimation of variance components (VC) was performed using the Average Information Restricted Maximum Likelihood (AI-REML) algorithm in the DMU software (Madsen and Jensen 2013). The phenotypic variance of the plot ($\sigma_{P}^{2}$) for the I + G_A-model (Eq. 2) was calculated as:

$$\hat{\sigma }_{P}^{2} = \hat{\sigma }_{l}^{2} + \hat{\sigma }_{g}^{2} + \hat{\sigma }_{f}^{2} + 9\hat{\sigma }_{s}^{2} + \hat{\sigma }_{e}^{2}$$

(6)

where $\hat{\sigma }_{l}^{2}$ is the estimated variance of the line that cannot be attributed to the markers; $\hat{\sigma }_{g}^{2}$ is the genomic estimated additive variance; $\hat{\sigma }_{f}^{2}$ is the line by environmental estimated variance; $9\hat{\sigma }_{s}^{2}$ is the estimated spatial variance for an individual plot ($\hat{\sigma }_{s}^{2}$) multiplied by nine, which is the total number of plots considered as random effect for each observation; $\hat{\sigma }_{e}^{2}$ is the estimated variance of residuals. Narrow-sense (Eq. 7) and broad-sense (Eq. 8) plot heritabilities for the I + G_A-model (Eq. 2) were estimated as:

$$\hat{h}^{2} = \hat{\sigma }_{g}^{2} /\hat{\sigma }_{P}^{2}$$

(7)

$$\hat{H}^{2} = \left( {\hat{\sigma }_{l}^{2} + \hat{\sigma }_{g}^{2} } \right)/\hat{\sigma }_{P}^{2}$$

(8)

Additionally, for the I + G_A + G_AA-model (Eq. 4), the estimated epistatic variance ($\hat{\sigma }_{h}^{2}$) was considered in the calculation of broad-sense heritability and total phenotypic variance ($\hat{\sigma }_{P}^{2}$) for this model. For the I-model, only the broad-sense heritability was calculated. The total genetic variance for each model ($\hat{\sigma }_{G}^{2}$) was defined as $\hat{\sigma }_{l}^{2}$ for the I-model, $\hat{\sigma }_{l}^{2}$ + $\hat{\sigma }_{g}^{2}$ for I + G_A-model, and $\hat{\sigma }_{l}^{2}$ + $\hat{\sigma }_{g}^{2}$ + $\hat{\sigma }_{h}^{2}$ for the I + G_A + G_AA-model.

Cross-validation schemes and model validation

The PA ($r_{{\widehat{g,}p}}$) of the models was evaluated using two cross-validation (CV) schemes: (i) leave-one-line-out (LOO), and (ii) leave-one-breeding-cycle-out (LSO) CVs. The LOO CV scheme was used to get the PA with the largest reference population possible and investigate the potential performance of the genetic models on PA. The LOO strategy was performed by masking the phenotype of a single line and using the remaining lines to predict the GEBV and the Genomic Estimated Epistatic Value (GEEV) of the masked line. This methodology was repeated n-times (n = no. of lines = 2060) until all lines were predicted. The LSO CV was used to measure the PA of genetic models in conditions closer to those observed in wheat breeding programs. For LSO the phenotypes from a breeding cycle were masked, and the information from the remaining breeding cycles was used to predict the genetic values. This process was repeated n-times (n = no. of breeding cycles = 7) until all breeding cycles were predicted. The PA was calculated as the Pearson correlation between the vector of all predictions and the lines averages after correcting for the fixed effects. The predicted values were the additive predicted values (predicted GEBVs) for the I + G_A-model and I + G_A + G_AA-model, and the additive (predicted GEBVs) plus epistatic (predicted GEEVs) values for the I + G_A + G_AA-model. The fixed effects were estimated in a model using the complete phenotypic information. The line averages were computed first subtracting the estimates of the fixed effects from each plot observation and then averaging the values of the lines without fixed effect across year-locations and repetitions. To contrast the PA for models in the LOO CV scenario, an ordinary nonparametric bootstrap with replacement based on a sample size equal to n = 2060 (full sample size), and 10,000 replicates was performed. In each bootstrap replication, the PA was recorded until reaching 10,000 bootstrap-based PAs, and the standard error of PAs was obtained. The bootstrap procedure was performed for I + G_A-model and I + G_A + G_AA-model, and a two-tailed paired t-test was used to contrast the bootstrap PAs from both models (significance threshold set at 0.01). The relative difference (RD) in PA between prediction for the additive genetic effect using I + G_A-model (GEBVs) and total genetic effect using I + G_A + G_AA-model (GEBVs + GEEVs) was estimated as: $RD = \frac{{I + G_{A} + G_{AA} \hbox{-}modelr_{{\widehat{g,}p}} - I + G_{A} \hbox{-}modelr_{{\widehat{g,}p}} }}{{ I + G_{A} \hbox{-}modelr_{{\widehat{g,}p}} }}$. The maximum potential PA was calculated for the I + G_A-model and for the GEBVs of the I + G_A + G_AA-model as: $\sqrt {{\text{n}}h^{2} /\left( {1 + \left( {{\text{n}} - 1} \right)h^{2} } \right)}$, where n is the average number of lines repetitions, and for the GEBVs + GEEVs of the I + G_A + G_AA-model using the same equation but with the proportion of total variance explained by additive plus epistatic effects instead of $h^{2}$.

The statistics for bias ($\mu_{wp}$) and variance inflation ($b_{w,p}$) in the predicted genetic values were estimated according to the LR method (Legarra and Reverter 2018). The $\mu_{wp}$ was calculated as $\mu_{wp} = E\left( {\overline{{\widehat{{{\text{u}}_{p} }}}} - \overline{{\widehat{{{\text{u}}_{w} }}}} } \right)$; where $\overline{{\widehat{{{\text{u}}_{p} }}}}$ represents the mean of the genomic estimated values with “partial” (subscript p) information (predictions for all genotypes from CVs when their own phenotypes were masked, e.g., 2060 “partial” dataset of one line and seven “partial” dataset of one breeding cycle were generated for LOO and LSO CVs, respectively), and $\overline{{\widehat{{{\text{u}}_{w} }}}}$ represents the mean of the genomic estimated values with “whole” (subscript w) information (estimations with complete phenotypic information for all genotypes). The statistics $\mu_{wp}$ has an expected value of 0 when the estimations are unbiased. The $b_{w,p}$ was calculated as the regression of estimated values obtained with whole information (subscript w) on the estimated with partial information (subscript p), $b_{w,p} = \frac{{cov\left( {\widehat{{{\text{u}}_{w} }} , \widehat{{{\text{u}}_{p} }}} \right)}}{{var\left( {\widehat{{{\text{u}}_{p} }}} \right)}}$. The statistic $b_{w,p}$ has an expectation $E\left( {b_{w,p} } \right) = 1$ when there is no under- or over-dispersion in the predictions. Additionally, the Pearson correlation was used to compare predictions between models, where the correlation between the estimated values with whole information for the I + G_A-model and the I + G_A + G_AA-model ($\rho_{{I + G_{A} \hbox{-}model_{GEBV} , I + G_{AA} + G_{AA} \hbox{-}model_{GEBV} }}$ and $\rho_{{I + G_{A} \hbox{-}model_{GEBV} , I + G_{A} + G_{AA} \hbox{-}model_{GEEV} }}$) was calculated.

Results

Phenotyping and genotyping

The descriptive statistics for grain yield are presented in Table 1. The average yield was 8.71 kg of grain for an 8.25 m² plot, ranging from 3.85 to 12.35 kg/8.25 m², and the coefficient of variation was 11.27% when using the simple SD of all observations.

Table 1 Descriptive statistics for the yield of F6 wheat breeding lines

Full size table

A total of 10,688 SNPs passed the quality control filters and were used to build the genomic relationship matrices. According to the heat map and the principal component analysis of the G-matrix (Fig. 1), there was no clear separation of breeding cycles. However, there was a trend that lines coming from the first four breeding cycles were more separated by the first principal component from lines coming from last three breeding cycles. The first and second principal components together explained 52.8% of the total variance (40.4 and 12.4% of the variance for first and second principal component, respectively) showing that there are strong relationships between the lines included in the study. The observed level of heterozygosity of the lines had an average value of 2.70% as expected after five generations of selfing.

Variance components and heritability

Three models differing in how the genetic components were treated (Eqs. 1, 2, and, 4) were used to estimate VC and the narrow-sense and broad-sense plot heritabilities (Table 2). The estimates for total phenotypic ($\hat{\sigma }_{P}^{2}$) and error variance ($\hat{\sigma }_{e}^{2}$) were similar for all models. The highest variance was attributed to the genotype-by-environment interaction, which explained around 40% of the total variability. The estimated total genetic variance ($\hat{\sigma }_{G}^{2}$) was largest when the I + G_A-model (0.104) or I + G_A + G_AA-model (0.098) were used, followed by the baseline model with the lowest value (0.089). The models using genomic information captured around 15% more $\hat{\sigma }_{G}^{2}$ compared to the baseline model.

Table 2 Estimation of variance component, narrow-sense, and broad-sense plot heritabilities

Full size table

The partition of total genetic variance $\hat{\sigma }_{G}^{2}$ estimated by the different models is shown in Fig. 2. For the I + G_A-model, the estimated additive variance ($\hat{\sigma }_{g}^{2}$) was approximately half of the total genetic variance $\hat{\sigma }_{G}^{2}$ (48.8%). The partition of the estimated variances for the I + G_A + G_AA-model changed considerably compared to the I + G_A-model. The estimate of additive genetic variances ($\hat{\sigma }_{g}^{2}$ for I + G_A + G_AA-model) was reduced to approximately 20%, and the estimated epistatic variances ($\hat{\sigma }_{h}^{2}$) represented 65.4% of the total genetic variance $\hat{\sigma }_{G}^{2}$. Note that the inclusion of an epistatic term captured much of what had previously been part of the estimated line and additive variances in the I + G_A-model. The reduction in the additive variance when the epistatic effect is included in the model can be seen as a signal of lack of orthogonality between the additive and additive-by-additive genetic effects.

The H² estimate was slightly different for the genomic models I + G_A-model (0.31) and I + G_A + G_AA-model (0.30), and in both cases, it was higher than the I-model (0.28), which did not include genomic information. The narrow-sense heritability estimated (h²) for the I + G_A-model on the plot level had a value of 0.15. For the I + G_A + G_AA-model, h² was not estimated because the estimation of additive variance may be especially affected due to the lack of orthogonality among genetic effects when the epistatic genetic effect is considered in the model.

Genomic prediction

The PA between the lines averages after correcting for fixed effects and the predicted genetic values (${\varvec{r}}_{{\widehat{{{\varvec{g}},}}{\varvec{p}}}}$) was evaluated for the proposed models using LOO and LSO CV schemes (Fig. 3). The I-model was not included in this section because such a model has no PA in CVs due to the model assumptions of independence between lines.

In the LOO CV, the highest PA was observed for prediction of total genetic merit (additive plus epistatic genetic effects), combining the predictions for the additive effect (GEBVs) plus the predictions for the epistatic effect (GEEVs) from the I + G_A + G_AA-model (PA = 0.45). The theoretical maximum PA was also the highest for the I + G_A + G_AA-model when additive plus epistatic predictions were combined (green bars in Fig. 3). The PA of the I + G_A + G_AA-model for total genetic merit (PA = 0.45) was contrasted to the PA of the I + G_A-model for the additive effect (PA = 0.39), and it was significantly different in a two-tailed paired t-test (significance threshold set at 0.01), showing an increase of 16.5% in PA for the I + G_A + G_AA-model. For the LSO CV scheme, the highest PA between predicted genetic values and corrected phenotypes was reached when the GEBVs from the I + G_A-model were used, PA = 0.20, while the PA using the GEBVs plus GEEVs from I + G_A + G_AA-model was 0.19. Nevertheless, the difference in PA between models for LSO CV was not significant in a two-tailed paired t test (significance threshold set at 0.01).

Model validation

The regression coefficient ($b_{w,p}$), used as a test of variance inflation in the predicted genetic effects, was measured as the slope of the regression between observed and predicted values (Fig. 4). In the LOO CV, the $b_{w,p}$ did not present significant under- or over-dispersion since it had values around 1 for both models (Fig. 4a–c). The GEBVs from I + G_A-model and I + G_A + G_AA-model had a $b_{w,p}$ value of 0.99, while the GEEVs presented a value of 1.04. The $b_{w,p}$ was also estimated for the combination of I + G_A + G_AA-model predictions (GEBVs + GEEVs, data not displayed in the plot), which presented an intermediate value of 1.02. In the LSO CV, the $b_{w,p}$ statistic indicates over-dispersion (inflation) for predicted values since it had values below 1 (Fig. 4d–f). The GEBVs from I + G_A-model and I + G_A + G_AA-model had $b_{w,p}$ values of 0.85 and 0.91, respectively, while the GEEVs from the I + G_A + G_AA-model had a lower $b_{w,p}$ value of 0.70. The $b_{w,p}$ for the combination of I + G_A + G_AA-model predictions (GEBVs + GEEVs, data not displayed in the plot) presented an intermediate value of 0.78.

The bias in prediction of genetic values ($\mu_{wp}$) was analyzed following the LR method (Table 3). For both LOO and LSO CVs, all the predictions showed a $\mu_{wp}$ close to 0 for I + G_A-model and I + G_A + G_AA-model, which reflects unbiased estimation for all cases.

Table 3 Estimated bias ($\mu_{wp}$) for predictions of I + G_A-model and I + G_A + G_AA-model

Full size table

Correlation between G-BLUP and NOIA predictions

The additive and epistatic predictions using complete phenotypic information for the I + G_A-model and the I + G_A + G_AA-model were compared using Pearson’s correlation ($\rho_{{G\hbox{-}BLUP_{w} NOIA_{w} }}$). The correlation for GEBVs between I + G_A-model and I + G_A + G_AA-model had a high value of 0.94, while the correlation between GEBVs from I + G_A-model and GEEVs from I + G_A + G_AA-model had a lower value of 0.65. It was also reflected in the ranking of the best lines for the different genetic effects, where 7 of the 10 lines with highest GEBVs were common for predictions of I + G_A-model and I + G_A + G_AA-model, but when GEBV from I + G_A-model and GEEV from I + G_A + G_AA-model were compared, only 3 of 10 lines were common.

Discussion

In this study, we investigated the performance of the NOIA parametrization (I + G_A-model and I + G_A + G_AA-model) in the estimation of VC for a set of advanced wheat breeding lines from the commercial breeding company Nordic Seed A/S. The I + G_A + G_AA-model was not able to achieve an orthogonal estimation of genetic variance components as revealed by the difference of the estimated additive variance between I + G_A-model and I + G_A + G_AA-model. We also investigated the PA for the developed models in two CVs schemes: (i) leave-one-line-out and (ii) leave-one-breeding-cycle-out. We observed a significant increase of 16.5% (P-value < 0.01) in the PA for the LOO CV when I + G_A + G_AA-model was used to predict total genetic merit compared to I + G_A-model predictions. However, the improvement for including epistasis was not observed in the LSO CV, where no significant differences between PA from I + G_A-model and I + G_A + G_AA-model were observed.

Variance components

The partition of genetic variance through the NOIA parametrization led to problems of non-orthogonality of genetic effects. The clearest signal of lack of orthogonality was observed in the difference of the estimated additive variance between I + G_A-model and I + G_A + G_AA-model. When the epistatic effect was present in the I + G_A + G_AA-model, it caused a considerable reduction in the additive variance (58.4% of reduction) compared to the I + G_A-model estimation. The non-orthogonal partition of genetic variances can most likely be caused as result of a mix between lack of independence of causal effect, lack of independence of markers (both influenced by LD), and for having linked markers instead of causative mutations (Wang and Zeng 2006; Hill and Mäki‐Tanila 2015; Vitezica et al. 2017). The lack of orthogonality of genetic effects can be also evidenced in the high negative correlation (−0.36) among additive and additive-by-additive epistatic variance component estimates for the I + G_A + G_AA-model. For an orthogonal partition of variance into genetic components, the correlation between the variance component estimates is expected to be close to 0 (correlation of zero indicates independence between model effects). These results have also been consistent with the simulation study performed by Vitezica et al. (2017), where they tested the performance of the NOIA parametrization for an LD simulated population, and concluded that VC were wrongly estimated. In our study, negative correlations among genetic variance estimates were also observed between the line and additive effect for the I + G_A-model (−0.44) and line and epistatic effects for the I + G_A + G_AA-model (−0.74). These trends are expected since the line (l) effect in the I-model can be seen as a mix of additive and non-additive effects. Therefore, when the additive effect is included in the I + G_A-model, it takes the proportion of additive variance explained by SNPs. Then, the line covariance of l in the I + G_A-model can be interpreted as an estimate of remaining non-additive effects which can be partially captured by the epistatic effect in the I + G_A + G_AA-model.

Narrow and broad-sense heritability

The interpretation of the h² is strongly related to the orthogonality of the estimated genetic variances. When additive and non-additive genetic effects are considered in genomic models, the lack of orthogonality affects the estimation of h². Due to this issue, we analyzed h² only for the I + G_A-model, which does not include the definition of a genomic epistatic term. Note that in our study, we have approached the heritability calculations considering the line effect (${\varvec{l}}$) in the models. This approach was used in order to have control for the genetic factors (additive and non-additive) that are not captured by the markers in the genomic terms. The h² estimated using the I + G_A-model was 0.15, representing around half of the total genetic variation, and H² was 0.31 to 0.30 for I + G_A-model and I + G_A + G_AA-model, respectively. The difference between h² and H² is given by non-additive variance and by remaining additive variance not captured by the markers (e.g., due to imperfect LD between markers and QTLs). The sizable difference between h² and H² may suggest a significant non-additive effect for wheat grain yield in the analyzed population, which also agrees with prior expert-knowledge from the breeding company.

Genomic predictive ability

The PAs estimated as the correlation between the line averages after correcting for fixed effects and the predicted values of the I + G_A-model and the I + G_A + G_AA-model were estimated for the LOO and LSO CV schemes. Note that while the LOO CV is useful for model comparison and investigating the potential PA of genetic models, this scheme provides higher PAs than expected for breeding situations (Shao 1993; Kohavi 1995); conversely, the LSO CV better reflects the conditions in a breeding scenario where new lines must be predicted before phenotypes are obtained. In the LOO CV, the PA for I + G_A + G_AA-model combining the predictions for the additive effect (GEBVs) plus the predictions for the epistatic effect (GEEVs) outperformed the I + G_A-model PA using GEBVs with a significant (P-value: 0.01) increase of 16.5%. However, the improvement in PA for including epistasis was not observed when the LSO CV was used. The differences in the performance of models in the LOO and LSO CVs indicates a strong influence of relationships among individuals from the reference and validation population over the PA, as close relatives like full sibs are excluded in the LSO scenario. A possible explanation for the effect of genetic relationships on the performance of epistatic predictions could be related to the fact that the additive-by-additive effect is the result of a pairwise interaction, and it is more likely that the pairs involved in the interaction are present in close relatives as usually happen in the LOO CV but not in the LSO CV. Another factor that could be affecting the predictive performance is a weak LD for the additive-by-additive effects; while for the additive effect of a gene, the LD depends on the genetic distance between the gene and the linked marker, for the epistatic effect of a pair of genes, the LD depends on the product of the genetic distance between each gene of the pair and their linked markers, which may result in poorer predictive performance when relationship in reference population are lower.

In the literature, the value of including epistasis in GP has been population dependent and has varied among studies. While in some studies the PA increased (Heslot et al. 2012; He et al. 2016), in others, it changed very little (Jarquín et al. 2014) or even decreased (Lorenzana and Bernardo 2009). Increases in PA ranging from 4 to 25% have been found for random folds CV (fivefold or tenfold) when shifting from additive to additive plus epistatic effects models in wheat (Crossa et al. 2010; Heslot et al. 2012; Jiang and Reif 2015; He et al. 2016), which agrees with the range of improvement found in our study for the LOO CV. Recently, Schrauf et al. (2020) found a better PA for non-additive models even when non-additive variance was expected to be low. They attributed this improvement to a better capacity of epistatic models to capture additive variance (of causal loci) associated with non-additive apparent effects (on markers) at low marker densities (“Phantom epistasis”). These authors have warned on the risk of over-interpretation of the biologically functional meaning of estimated statistical parameters. While a straightforward biological interpretation is to relate the highest PA of epistatic models to an underlying genetic architecture of substantial additive-by-additive epistasis, it could also reflect “Phantom epistasis” due to incomplete LD due to low marker density. Contrasting these results, Lorenzana and Bernardo (2009) using a fivefold CV found a poorer performance for predictions when the model accounted for additivity and epistasis in comparison with a model accounting only for additivity. The discrepancies among the results found may be explained by differences in marker density, the level of additive-by-additive epistasis among the evaluated populations. Forneris et al. (2017) explored the effect of including epistasis in the evaluation model (knowing the causal mutations), and they reported that including epistasis in the models when there was none led to lower prediction accuracies.

Beyond the discussion of whether the improvement in PA comes from a real reflection of additive-by-additive epistasis or from apparent epistasis, this does not undervalue the potential of epistatic models to improve GS. Therefore, the statistical advantage of improving GS is recognized and the use is encouraged. In addition, the expert knowledge about the genetic architecture of the trait as well as the type of population and species may be relevant factors to determine the potential of including epistasis in GS.

Inflation of variance and bias

The test for variance inflation in the predicted genetic effects, calculated as the regression of estimated values with whole information on estimated with partial information ($b_{w,p}$), led to regression coefficients close to 1 for the LOO CV, which means that none of the proposed models had a significant under- or over-dispersion in their predictions. Note that the LOO CV represents an optimal scenario due to the use of the largest possible reference population for predictions, and therefore, under- or over-dispersion in predictions of genetic values is in general not observed. In the LSO, values of $b_{w,p}$ lower than 1 were observed for predictions of both genetic effects (GEBVs and GEEVs), indicating over-dispersion of genomic predicted values. Particularly, predictions of epistatic values (GEEVs) for the I + G_A + G_AA-model had the lowest $b_{w,p}$ value ($b_{w,p}$ = 0.70), suggesting that the epistatic predictions were more sensitive to the lack of information in the reference population. The bias ($\mu_{wp}$) of predictions had coefficients close to 0 for GEBVs and GEEVs in both CVs utilized; it indicates that unbiased genomic values were reached for all proposed models.

Correlation between G-BLUP and NOIA estimates

We found that Pearson’s correlation between GEBVs from I + G_A-model and I + G_A + G_AA-model was high (0.94) compared to the correlation between GEBVs and genomic estimated epistatic values (0.65). Accordingly, differences were also evidenced in a change of ranking between lines with superior additive value (based on GEBVs) and lines with superior total genetic value (based on GEBVs plus genomic estimated epistatic values), indicating that the use of I + G_A + G_AA-model to predict lines with higher total genetic value led to a different selection of candidate lines than using the I + G_A-model. These differences could be exploited by addressing the selection of crossing parents based on the I + G_A-model predictions and commercial varieties based on the I + G_A + G_AA-model predictions.

As reflected by the LOO CV, the current study confirms the potential of increasing the PA for total genetic merit by including epistasis in GS models. Importantly, the differences found for I + G_A-model and I + G_A + G_AA-model must not be interpreted as exclusive for the NOIA parametrization since other codings such as for EG-BLUP based on Su et al. (2012) or Martini et al. (2016) are equivalent in the current scenario (Joshi et al. 2020). Further studies are required to: (i) investigate the influence of genetic relationships on the performance of epistatic predictions and develop CVs schemes that allow to capitalize the benefit of epistatic models in wheat breeding programs, (ii) develop breeding programs that consider more elaborate mating schemes in order to improve the genetic relationships between breeding cycles, and (iii) develop a GP model in which the inclusion of pairwise interaction effects has minimal impact on the estimates of additive effects and their variance.

Conclusions

In this research, we found that the orthogonal partition of genetic variances into additive and additive-by-additive epistatic effects was not possible. Nevertheless, including additive-by-additive epistasis in a genomic prediction model increased predictive ability for total genetic merit significantly (16.5%) compared to an additive genomic-based model in a leave-one-line-out cross-validation. The advantage of including epistasis in predictive ability was not observed for a leave-one-breeding-cycle-out cross-validation. Further studies are required to: (i) investigate the influence of genetic relationships on the performance of epistatic predictions and develop CVs schemes that allow to capitalize the benefit of epistatic models in wheat breeding programs, (ii) develop breeding programs that consider more elaborate mating schemes in order to improve the genetic relationships between breeding cycles, and (iii) develop a GP model in which the inclusion of pairwise interaction effects has minimal impact on the estimates of additive effects and their variance.

Data availability

The datasets analyzed during the current study are available in the Harvard dataverse public repository at the following link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ULGTGT.

Abbreviations

ADD:: Predicted additive values
CV:: Cross-validation
DK:: Denmark
EG-BLUP:: Extended genomic best linear unbiased prediction
EPI:: Predicted epistatic values
G-BLUP:: Genomic best linear unbiased prediction
GEBV:: Genomic estimated breeding value
GEEV:: Genomic estimated epistatic value
GP:: Genomic prediction
GS:: Genomic selection
HWE:: Hardy–Weinberg equilibrium
LD:: Linkage disequilibrium
LE:: Linkage equilibrium
LOO:: Leave-one-line-out
LSO:: Leave-one-breeding-cycle-out
MAF:: Minor allele frequency
NOIA:: Natural and orthogonal interactions approach
PA:: Predictive ability
RD:: Relative difference
SNP:: Single nucleotide polymorphism
VC:: Variance components
WGR:: Whole genome regression

References

Alvarez-Castro JM, Carlborg O (2007) A unified model for functional and statistical epistasis and its application in quantitative trait Loci analysis. Genetics 176:1151–1167. https://doi.org/10.1534/genetics.106.067348
Article PubMed PubMed Central Google Scholar
Carlborg Ö, Haley CS (2004) Epistasis: too often neglected in complex trait studies? Nat Rev Genet 5:618–625
Article CAS Google Scholar
Cericola F, Jahoor A, Orabi J, Andersen JR, Janss LL, Jensen J (2017) Optimizing training population size and genotyping strategy for genomic prediction using association study results and pedigree information. A case of study in advanced wheat breeding lines. PLoS One 12:e0169606. https://doi.org/10.1371/journal.pone.0169606
Article CAS PubMed PubMed Central Google Scholar
Crossa J et al (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724. https://doi.org/10.1534/genetics.110.118521
Article CAS PubMed PubMed Central Google Scholar
Crossa J, Pérez P, de los Campos G, Mahuku G, Dreisigacker S, Magorokosho C (2011) Genomic selection and prediction in plant breeding. J Crop Improvement 25:239–261. https://doi.org/10.1080/15427528.2011.558767
Article Google Scholar
Crossa J et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22:961–975. https://doi.org/10.1016/j.tplants.2017.08.011
Article CAS PubMed Google Scholar
Fisher RA (1918) The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinb 53:399–433
Google Scholar
Forneris NS, Vitezica ZG, Legarra A, Perez-Enciso M (2017) Influence of epistasis on response to genomic selection using complete sequence data. Genet Sel Evol 49:66. https://doi.org/10.1186/s12711-017-0340-3
Article CAS PubMed PubMed Central Google Scholar
Gianola D, Rosa GJM (2015) One hundred years of statistical developments in animal breeding. Annu Rev Anim Biosci 3:19–56. https://doi.org/10.1146/annurev-animal-022114-110733
Article PubMed Google Scholar
Habier D, Fernando RL, Dekkers JC (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397
Article CAS Google Scholar
He S et al (2016) Genomic selection in a commercial winter wheat population. Theor Appl Genet 129:641–651. https://doi.org/10.1007/s00122-015-2655-1
Article CAS Google Scholar
Henderson C (1985) Best linear unbiased prediction of nonadditive genetic merits in noninbred populations. J Anim Sci 60:111–117
Article Google Scholar
Heslot N, Yang H-P, Sorrells ME, Jannink J-L (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160. https://doi.org/10.2135/cropsci2011.06.0297
Article Google Scholar
Hill W, Mäki-Tanila A (2015) Expected influence of linkage disequilibrium on genetic variance caused by dominance and epistasis on quantitative traits. J Anim Breed Genet 132:176–186
Article CAS Google Scholar
Hu Z, Li Y, Song X, Han Y, Cai X, Xu S, Li W (2011) Genomic value prediction for quantitative traits under the epistatic model. BMC Genet 12:15
Article Google Scholar
Jarquín D, Kocak K, Posadas L, Hyma K, Jedlicka J, Graef G, Lorenz A (2014) Genotyping by sequencing for genomic prediction in a soybean breeding population. BMC Genom 15:1–10
Article Google Scholar
Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection. Genetics 201:759–768. https://doi.org/10.1534/genetics.115.177907
Article CAS PubMed PubMed Central Google Scholar
Jiang Y, Schmidt RH, Zhao Y, Reif JC (2017) A quantitative genetic framework highlights the role of epistatic effects for grain-yield heterosis in bread wheat. Nat Genet 49:1741–1746. https://doi.org/10.1038/ng.3974
Article CAS PubMed Google Scholar
Joshi R, Meuwissen THE, Woolliams JA, Gjoen HM (2020) Genomic dissection of maternal, additive and non-additive genetic effects for growth and carcass traits in Nile tilapia. Genet Sel Evol. https://doi.org/10.1186/s12711-019-0522-2
Article PubMed PubMed Central Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, vol 2. Montreal, Canada, pp 1137–1145
Google Scholar
Kristensen PS, Jahoor A, Andersen JR, Orabi J, Janss L, Jensen J (2019) Multi-trait and trait-assisted genomic prediction of winter wheat quality traits using advanced lines from four breeding cycles. Crop Breed Genet Genom 1(2):e190010. https://doi.org/10.20900/cbgg20190010
Article Google Scholar
Legarra A, Reverter A (2018) Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method. Genet Sel Evol 50:53. https://doi.org/10.1186/s12711-018-0426-6
Article PubMed PubMed Central Google Scholar
Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor Appl Genet 120:151–161. https://doi.org/10.1007/s00122-009-1166-3
Article PubMed Google Scholar
Mackay TF (2014) Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet 15:22–33. https://doi.org/10.1038/nrg3627
Article CAS PubMed Google Scholar
Madsen P, Jensen J (2013) An user’s guide to DMU, Version 6, Release 5.1. Center for Quantitative Genetics and Genomics. In: Dept. of Molecular Biology and Genetics, University of Aarhus. Research Centre Foulum Tjele, Denmark
Martini JW, Wimmer V, Erbe M, Simianer H (2016) Epistasis and covariance: how gene interaction translates into genomic relationship. Theor Appl Genet 129:963–976. https://doi.org/10.1007/s00122-016-2675-5
Article CAS PubMed Google Scholar
Martini JW, Gao N, Cardoso DF, Wimmer V, Erbe M, Cantet RJ, Simianer H (2017) Genomic prediction with epistasis models: on the marker-coding-dependent performance of the extended GBLUP and properties of the categorical epistasis model (CE). BMC Bioinform 18:3. https://doi.org/10.1186/s12859-016-1439-1
Article Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Article CAS Google Scholar
Michel S et al (2017) Genomic assisted selection for enhancing line breeding: merging genomic and phenotypic selection in winter wheat breeding programs with preliminary yield trials. Theor Appl Genet 130:363–376
Article CAS Google Scholar
Papoulis A, Pillai SU (2002) Probability, random variables and stochastic processes, 4th edn. McGraw-Hill Education
Poland J et al (2012) Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome. https://doi.org/10.3835/plantgenome2012.06.0006
Article Google Scholar
Rogers SO, Bendich AJ (1985) Extraction of DNA from milligram amounts of fresh, herbarium and mummified plant tissues. Plant Mol Biol 5:69–76
Article CAS Google Scholar
Schrauf MF et al (2020) Phantom epistasis in genomic selection: on the predictive ability of epistatic models. G3 Genes Genom Genet 10:3137–3145
CAS Google Scholar
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494
Article Google Scholar
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS One 7:e45293. https://doi.org/10.1371/journal.pone.0045293
Article CAS PubMed PubMed Central Google Scholar
Tessema BB, Liu H, Sørensen AC, Andersen JR, Jensen J (2020) Strategies using genomic selection to increase genetic gain in breeding programs for wheat. Front Genet. https://doi.org/10.3389/fgene.2020.578123
Article PubMed PubMed Central Google Scholar
Tsai HY et al (2020) Use of multiple traits genomic prediction, genotype by environment interactions and spatial effect to improve prediction accuracy in yield data. PLoS One 15:e0232665. https://doi.org/10.1371/journal.pone.0232665
Article CAS PubMed PubMed Central Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Article CAS Google Scholar
Vitezica ZG, Legarra A, Toro MA, Varona L (2017) Orthogonal estimates of variances for additive, dominance, and epistatic effects in populations. Genetics 206:1297–1307. https://doi.org/10.1534/genetics.116.199406
Article PubMed PubMed Central Google Scholar
Wang T, Zeng ZB (2006) Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium. BMC Genet 7:9. https://doi.org/10.1186/1471-2156-7-9
Article CAS PubMed PubMed Central Google Scholar
Wang S et al (2014) Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array. Plant Biotechnol J 12:787–796
Article CAS Google Scholar
Wellmann R, Bennewitz J (2012) Bayesian models with dominance effects for genomic evaluation of quantitative traits. Genet Res 94:21–37
Article Google Scholar
Xu S (2013) Mapping quantitative trait loci by controlling polygenic background effects. Genetics 195:1209–1222. https://doi.org/10.1534/genetics.113.157032
Article CAS PubMed PubMed Central Google Scholar
Zeng ZB, Wang T, Zou W (2005) Modeling quantitative trait Loci and interpretation of models. Genetics 169:1711–1725. https://doi.org/10.1534/genetics.104.035857
Article CAS PubMed PubMed Central Google Scholar
Zhao Y et al (2015) Genome-based establishment of a high-yielding heterotic pattern for hybrid wheat breeding. Proc Natl Acad Sci 112:15624–15629. https://doi.org/10.1073/pnas.1514547112
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Aarhus University for providing financial support to authors MR, XG, HL, and JJ, and the commercial partner Nordic Seed A/S breeding company for PS, JRA, JO and AJ funding, as well as for providing phenotypic and genotypic data. We also thank the three anonymous reviewers for their valuable comments on our work. Financial support from the National Research and Innovation Agency of Uruguay for the PhD scholarship of author MR (code POS_EXT_2018_1_154284) is also greatly acknowledged.

Funding

This research received funding from the National Research and Innovation Agency (POS_EXT_2018_1_154284).

Author information

Authors and Affiliations

Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
Miguel Angel Raffo, Pernille Sarup, Xiangyu Guo, Huiming Liu & Just Jensen
Nordic Seed A/S, Galten, Denmark
Pernille Sarup, Jeppe Reitan Andersen, Jihad Orabi & Ahmed Jahoor
Department of Plant Breeding, The Swedish University of Agricultural Sciences, Uppsala, Sweden
Ahmed Jahoor

Authors

Miguel Angel Raffo
View author publications
You can also search for this author in PubMed Google Scholar
Pernille Sarup
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Huiming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jeppe Reitan Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Jihad Orabi
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Jahoor
View author publications
You can also search for this author in PubMed Google Scholar
Just Jensen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: JJ, MR, PS, JRA. Data curation: MR, JRA, PS, JO. Formal analysis: MR. Funding acquisition: JJ, MR, JRA, JO, AJ. Investigation: MR, JJ. Methodology: MR, JJ, XG. Project administration: MR, JJ. Resources: JJ, PS. Software: MR. Supervision: JJ, PS. Validation: MR, HL. Visualization: MR. Writing—original draft: MR. Writing—review and editing: JJ, PS, HL, XG, JRA, JO, AJ.

Corresponding author

Correspondence to Miguel Angel Raffo.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding authors state that there is no conflict of interest.

Additional information

Communicated by Mikko J. Sillanpää.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 78 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Raffo, M.A., Sarup, P., Guo, X. et al. Improvement of genomic prediction in advanced wheat breeding lines by including additive-by-additive epistasis. Theor Appl Genet 135, 965–978 (2022). https://doi.org/10.1007/s00122-021-04009-4

Download citation

Received: 12 April 2021
Accepted: 29 November 2021
Published: 01 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00122-021-04009-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improvement of genomic prediction in advanced wheat breeding lines by including additive-by-additive epistasis

Abstract

Key message

Abstract

Similar content being viewed by others

Genome-wide mapping and prediction suggests presence of local epistasis in a vast elite winter wheat populations adapted to Central Europe

Genomic prediction of agronomic traits in wheat using different models and cross-validation designs

Choosing the right tool: Leveraging of plant genetic resources in wheat (Triticum aestivum L.) benefits from selection of a suitable genomic prediction model

Introduction

Materials and methods

Experimental data

Genotyping

Statistical models

I-model (Baseline)

I + GA-model

I + GA + GAA-model

Variance components and heritability

Cross-validation schemes and model validation

Results

Phenotyping and genotyping

Variance components and heritability

Genomic prediction

Model validation

Correlation between G-BLUP and NOIA predictions

Discussion

Variance components

Narrow and broad-sense heritability

Genomic predictive ability

Inflation of variance and bias

Correlation between G-BLUP and NOIA estimates

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 78 KB)

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation