Introduction

How humans choose candidates for selection has evolved over time. At first, selection based solely on phenotypic observation of progeny was next assisted by pedigree-based predictions, more particularly in animals (Henderson 1975; Falconer and Mackay 1996). Nowadays, genomic predictions of individual genetic values (Whittaker et al. 2000; Meuwissen et al. 2001) and cross values (Schnell and Utz 1975; Zhong and Jannink 2007; Lehermeier et al. 2017; Danguy des Déserts et al. 2023) are promising and open up new prospects for improving mating design and optimizing long-term genetic gain (Allier et al. 2019a, 2019b).

In plants, crosses are essentially selected to secure a high mean yield performance of the progeny. Breeders seek to identify which crosses will produce the best superior progeny in order to ensure genetic gain in the next selection cycle, and to maintain long-term genetic gain. It is thus important that the selected crosses also generate a high progeny genetic variance. A strategy to find these crosses is to select on the usefulness criterion (\(\text{UC}\)) (Schnell and Utz 1975; Zhong and Jannink 2007) defined as: \(\text{UC}=\text{PM}+i\times \text{SD}\), where \(\text{PM}\) is the parental mean, \(i\) the selection intensity corresponding to the fraction \(q\) of selected progenies, and \(\text{SD}\) the progeny standard deviation (square root of progeny variance). PM can be estimated from the mean of additive parental genetic values using their phenotypes or genomic estimated breeding value (GEBV). Several attempts have been made to predict progeny variance using phenotypic distances (Utz et al. 2001) and genetic distances (Bohn et al. 1999; Hung et al. 2012), but without success. According to the recent methodological studies, genomic predictions appear to be a promising tool for optimizing complementarity between parents, as well as short- and long-term genetic gains in breeding schemes.

Many factors can influence genomic prediction accuracy (Liu et al. 2018). This was extensively demonstrated for GEBV (see Elsen 2022 for a review), and to a lesser extent for progeny standard deviation (\(\text{SD}\)), and for usefulness criterion (\(\text{UC}\)) which combines PM and SD. The accuracy of these predictions influences the relative efficiency of selection using \(\text{UC}\). A first factor influencing genomic prediction accuracy is the genetic architecture of traits, which includes the number of QTL and the distribution of linkage disequilibrium over the genome. Zhong and Jannink (2007) demonstrated algebraically that genetic gain using \(\text{UC}\) criteria decreases when the number of QTL increases in plants. In dairy cattle, Santos et al. (2019) demonstrated a higher \(\text{UC}\) efficiency relatively to classical selection based on GEBV when the trait is controlled by 200 QTL rather than 20. A second factor influencing genomic prediction ability is the number of progenies. The progeny standard deviation (\(\text{SD}\)) estimated algebraically, or by simulation, may be correct for an ideal offspring population of infinite size, but in practice the number of progenies is limited. A third factor is the precision of the marker effects estimation which depends (1) on the size of the reference population (the set of individuals both phenotyped and genotyped) and (2) on the statistical method used when estimating these marker effects. The effects of the size and characteristics of the reference population have been extensively studied for the estimation of GEBV (Hayes et al. 2009; Calus 2010; Goddard et al. 2011; Erbe et al. 2012; Lee et al. 2017), showing that the precision increases when the reference population is bigger, more diverse and genetically closer to the set of selection candidate individuals. A huge literature deals with comparisons of statistical prediction methods of marker effects and GEBV precision. The list includes genomic best linear unbiased prediction (GBLUP) (Habier et al. 2007), RR-BLUP, Bayesian methods such as Bayes A, B, C, etc. (Meuwissen et al. 2001; Gianola et al. 2009), partial least square and least absolute shrinkage and selection operator (Usai et al. 2009). If accuracies of the different genomic prediction methods are similar when the trait depends on a large number of genes with small effect (infinitesimal model), Bayesian approaches perform better when the trait is controlled by QTL with very different effects.

The precision of the progeny standard deviation (\(\text{SD}\)) estimations not only depends on the marker effects estimation method, but also on the way they are processed. An algebraic formula to predict inbred progeny variance derived from a cross between two inbred lines was provided by Lehermeier et al. (2017) based on marker effect estimates and their co-segregation in progeny. In this paper, to this prediction described as the “Variance of posterior mean” (VPM) was compared a “Posterior mean variance” (PMV) estimator which, using an MCMC method, takes into account the error in marker effect estimates and gives higher genetic gain. In the present paper, we use an algebraic interpretation of their PMV and we propose a more advanced estimator which considers the fact that the uncertainty of the estimation of marker effects is modulated by the genomic composition of each parent as well.

Those elements influencing estimation precisions were considered in the simulations we performed to better understand the parameters impacting our predictive ability of \(\text{PM}\), \(\text{SD}\), and \(\text{UC}\).

Several other studies showed that selecting on criteria based on progeny variance such as UC could actually increase the genetic gain in plants and in animals (Tiede et al. 2015; Allier et al. 2019a, 2019b; Bijma et al. 2020). But, to our knowledge, although the ability to predict the value of a cross has only been validated by simulations, just a few publications have validated the ability to predict UC compared to real data (Lian et al. 2015; Neyhart et al. 2019; Osthushenrich et al. 2017), none on wheat, and no publications tested the ability of prediction of SD.

So this study was carried out to (i) estimate the ability of genomic prediction of three cross value components (PM, SD, and UC) in experimental data, (ii) to identify the factors influencing this predictive ability based on simulations, (iii) to test if we could improve SD estimation by taking into account the error of the estimation of marker effects and the genomic composition of each parent. We were able to compare cross value components predictions and experimental observations for 101 crosses with an average of 55 unselected progenies each.

Materials and methods

The objective of the study was to compare genomic predicted values of parental/progeny mean (PM), parental gametic variance/standard deviation (SD) and mean of top progenies (UC: usefulness criterion) to experimental observations of winter bread wheat. We evaluated four traits (yield, grain protein content, plant height, and heading date) for 101 crosses with 55 progenies on average for 3 years.

In order to compare our results to a reference and to better understand the parameters influencing our predictive ability, we conducted simulations based on the same crosses, using the same parental genotypes.

Materials

The training population (TP)

TP composition

The TP was composed of 2,146 INRAE-AO (the French research institute Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement and its subsidiary company Agri-Obtentions) breeding lines and 650 French registered varieties evaluated by GEVES (Groupe d'Étude et de contrôle des Variétés Et des Semences). Data were collected between 2000 and 2022 in two French breeding networks (North and South). Each year, the lines were phenotyped in up to 11 locations for INRAE-AO and 15 for GEVES in split-plot trials with two repetitions. In each location, four registered varieties were used as controls for several consecutive years. Those controls evolved gradually during 20 years (only one control can be changed on one specific year). In total, up to 169 (42) lines were observed each year for the INRAE-AO (GEVES) dataset. For two consecutive years, up to 76 (40) lines were in common for the INRAE-AO (GEVES) dataset. We report here data for yield, grain protein content, plant height and heading date under high-yield crop management (optimized pesticide, fungicide and nitrogen amount) (Supplementary Tables S1–S9).

TP adjusted phenotypes

Trial experts removed blocks presenting obvious experimental problems and outliers. In each trial, we modeled spatial heterogeneity when field coordinates where available. Rows and columns were modelized as random effects, and field trends were modelized as a smooth surface as implemented in the 2-dimensional P-spline mixed model of the SpATS v1.0.16 R package (Rodríguez-Álvarez et al. 2018) (Supplementary Table S1). Genotype was set as a fixed term to estimate genotypic BLUEs. Note that it has been proven that when there was no spatial variation, this model did not perform worse than not including a spatial effect (Selle et al. 2019). In the other trials, we applied arithmetic means to replicates to get genotypic values. We then calculated best linear unbiased estimator (BLUE) values for each line of the TP using the following model:

$$y_{{ij}} ~ = \mu + g_{i} + e_{j} + \varepsilon _{{ij}}$$

where \(y_{{ij}}\) is the adjusted mean for line \(i\) in environment (year and location intersection)\(j\), \(\mu\) the grand mean, \({g}_{i}\) the fixed effect of line \(i\), \({e}_{j}\) the random effect of environment \(j\), and \({\varepsilon }_{ij}\) the random effect of residual. The model was implemented using the lme4 R package. We considered these BLUE values as phenotypes in further analyses (Supplementary Table S1).

TP genotypes

The 2,146 lines were genotyped using a 35 K SNP (single nucleotide polymorphisms) chip representative of the TaBW280K array (Rimbert et al. 2018; Ben-Sadoun et al. 2020). After quality filtering for minor allele frequency below 0.01 and for missing call rate superior to 0.9, we used 19,065 markers. We imputed missing genotypes with the algorithm implemented in BEAGLE v5.3 (Browning et al. 2018) using the genetic positions previously estimated for a West-European bread wheat population (Danguy des Déserts et al. 2021). The total number of lines with genotypes and phenotypes was 2,146, 2,062, 2,126 and 2,145 for yield, grain protein content, plant height and heading date, respectively (Supplementary Table S1).

TP variance components

First, we estimated repeatability of each trait, using the following model:

$$y_{{ijk}} = \mu + g_{i} + e_{j} + \left( {g \times e} \right)_{{ij}} + \varepsilon _{{ijk}}$$

where \(y_{{ijk}}\) is the raw observations for line \(i\) in environment (year and location intersection) \(j\), repetition \(k\), \(\mu\) the grand mean, \({g}_{i}\) the effect of line \(i\), \({e}_{j}\) the effect of environment \(j\), \({\left(g\times e\right)}_{ij}\) the effect of the interaction between line \(i\) and environment \(j\), and \(\varepsilon\) the effect of residual. All effects were considered as random. The model was implemented using the lme4 R package. We then computed repeatability (Piepho and Möhring 2007) at the plot and design levels as:

$$\widehat{{{\text{rep}}}}_{{{\text{plot}}}} = \frac{{\hat{\sigma }_{g}^{2} }}{{\hat{\sigma }_{g}^{2} + \hat{\sigma }_{{\left( {g \times e} \right)}}^{2} + \hat{\sigma }_{\varepsilon }^{2} }}$$
$$\widehat{{{\text{rep}}}}_{{{\text{design}}}} = \frac{{\hat{\sigma }_{g}^{2} }}{{\hat{\sigma }_{g}^{2} + \frac{{\hat{\sigma }_{{\left( {g \times e} \right)}}^{2} }}{{{\text{nb}}\_{\text{env}}}} + \frac{{\hat{\sigma }_{\varepsilon }^{2} }}{{{\text{nb}}\_{\text{rep}}}}}}$$

where \({\widehat{\sigma }}_{g}^{2}\), \({\widehat{\sigma }}_{\left(g\times e\right)}^{2}\) and \({\widehat{\sigma }}_{\varepsilon }^{2}\) are the line, the interaction between line and environment and the residual estimated variances, respectively, and \(\text{nb}\_\text{env}\) and \(\text{nb}\_\text{rep}\) are the average number of environments and replications per line, respectively.

Next, we estimated genomic heritability of each trait with a RR-BLUP (Ridge Regression Best Linear Unbiased Predictor) approach, using the following model, implemented in the rrBLUP v4.6.1 R package (Endelman 2011):

$$\varvec{y} = \varvec{\mu } + \varvec{X\beta } + \varvec{r}$$

where \(\varvec{y}\) is the vector of BLUE values, \({\varvec{\mu}}\) the grand mean, \({\varvec{\beta}}\) a vector of random SNP effects assumed to be normally distributed \(N\left(0,{{\varvec{I}}\sigma }_{\beta }^{2}\right)\), with its matrix of incidence \({\varvec{X}}\), and \({\varvec{r}}\) the vector of random residual effects assumed to be normally distributed \(N\left(0,{{\varvec{I}}\sigma }_{r}^{2}\right)\). We then computed genomic heritability (Nyquist and Baker 1991; Falconer and Mackay 1996) as:

$$\hat{h}_{{{\text{genomic}}}}^{2} = \frac{{\hat{\sigma }_{g}^{2} }}{{\hat{\sigma }_{g}^{2} + \hat{\sigma }_{r}^{2} }}$$

where \(\hat{\sigma }_{g}^{2} = \hat{\sigma }_{\beta }^{2} \times 2\mathop \sum \limits_{{i = 1}}^{n} p_{i} q_{i}\), with \({p}_{i}\) being the allele frequency of SNP\(i\),\({ q}_{i}=1-{p}_{i}\), and \(n\) the total number of SNP.

Predictive ability of genome estimated breeding values (GEBV) in TP

We assessed the predictive ability in the TP using a tenfold cross-validation procedure iterated 10 times. In each iteration, the TP was split into 10 equal sets, each containing 1/10-th of the TP, which was predicted by the remaining 9/10-ths of the TP. The predictive abilities of individual models are expressed as the average between the GEBV and observed phenotypic values in the validation set across all tenfold. The process was repeated 10 times with 10 new independent sets being sampled each replication, in which case the reported predictive abilities are averages across all folds and replications. This cross-validation procedure is implemented in the PopVar v1.3.0 R package (Mohammadi et al. 2015; Neyhart et al. 2019).

The experimental population (EP)

In this study, we generated 101 crosses evaluated in six environments for three years. The objective was to compare cross value components estimated using genomic predictions and the same components observed in the field. Details about the experimental design are provided in Supplementary Table S10.

EP description

Within the FSOV PrediCropt project, a total number of 101 crosses using 106 different parents included in the TP were generated between 2020 and 2022. These crosses were generated by a private company (Florimond-Desprez) (16), and INRAE-AO (85). The number of progenies per cross ranged from 8 to 122 (mean: 54.8 progenies per cross, Supplementary Table S11). The 16 crosses generated by Florimond-Desprez were evaluated in two locations (Cappelle and Houville, France), and the 85 remaining were evaluated in 4 other locations in France (Auzeville: INRAE UE GC, Clermont-Ferrand: INRAE UE PHACC, Lusignan: INRAE UE FERLUS, and Mons: INRAE UE GCIE). Crop management methods corresponded to high-yield objectives (optimized pesticide, fungicide and nitrogen amount). Two registered varieties and the parents of the generated crosses were used as controls and repeated across trials (year x location). At least 10% of the progenies were replicated in each trial. For each partner, all crosses were observed in all locations.

Lines were evaluated for yield, grain protein content, plant height and heading date. For grain protein content, only crosses generated during the two first years (73) were analyzed in this study. We discarded from further analyses plots that were affected by lodging or that were still segregating (presenting different types of plants). We kept all crosses for the study after checking congruency between parents and progeny (by genotyping 2 progenies per cross and applying quick visual observations such as presence/absence of awns, common stems and leaf color). After cleaning raw observations, the total number of phenotyped progenies were 5565, 3522, 5591 and 5590 for yield, grain protein content, plant height and heading date, respectively (Supplementary Table S10).

Adjusted EP phenotypes

For each trial, we adjusted phenotypes for spatial effects using the SpATS R package. We checked for usual Pearson’s correlations between sites and years for each trait (Supplementary Tables S12 and S13).

We then calculated BLUE values for each line using the following model:

$$y_{{ij}} ~ = \mu + g_{i} + e_{j} + \varepsilon _{{ij}}$$

where \(y_{{ij}}\) is the adjusted phenotype for line \(i\) in environment (year and location intersection)\(j\), \(\mu\) the grand mean, \({g}_{i}\) the fixed effect of line \(i\), \({e}_{j}\) the random effect of environment \(j\), and \({\varepsilon }_{ij}\) the random effect of residual. The total number of environments was 11 for grain protein content and 15 for the three other traits. The model was implemented using the lme4 R package. We considered these BLUE values as phenotypes in further analyses. The total number of BLUE values was 5658, 3596, 5684 and 5683 for yield, grain protein content, plant height and heading date, respectively (Supplementary Table S10).

Methods

Cross value components predictive ability

In this study, we considered three components for the value of a cross: the parental mean (PM), the progeny standard deviation (SD) which is the square root of gametic variance and the UC, corresponding in our study to the expected mean of the 7% best progeny of a cross (Schnell and Utz 1975; Zhong and Jannink 2007; Lehermeier et al. 2017).

Models and formulae to predict cross value components

The three cross value components can be predicted by analytic formulae using information obtained from the TP.

Marker effects estimation The estimation of marker effects was performed with the PopVar R package using the following model (Meuwissen et al. 2001):

$$\varvec{y} = \varvec{\mu } + \varvec{X\beta } + \varvec{r}$$

where \(\varvec{y}\) is the vector of TP BLUEs described above, \({\varvec{\mu}}\) the grand mean, \({\varvec{\beta}}\) a vector of random SNP effects assumed to be normally distributed \(N\left(0,{\boldsymbol{ }{\varvec{I}}\sigma }_{\beta }^{2}\right)\), with its matrix of incidence \({\varvec{X}}\) for SNPs (TP), and \({\varvec{r}}\) the vector of random residual effects assumed to be normally distributed \(N\left(0,{ I\sigma }_{r}^{2}\right)\). We tested genomic prediction models using different hypotheses of marker effect variance distribution (Meuwissen et al. 2001; Habier et al. 2007, 2011; VanRaden 2008; Park and Casella 2008): “BRR” (Bayesian ridge regression, Bayesian approach with a Gaussian prior for SNP effects), “BayesA” (Bayesian approach with a scaled-t prior), “BayesB” (Bayesian approach with a two-component mixture prior with a point of mass at zero and a scaled-t slab), “BayesC” (Bayesian approach with a two-component mixture prior with a point of mass at zero and a Gaussian slab), and “BL” (Bayesian Lasso, Bayesian approach with a double-exponential prior). These models are implemented in the R package BGLR v1.0.9 (Pérez and de los Campos 2014) available in PopVar.

PM estimation We predicted PM as the average parental GEBV computed as the matrix product between parental genotypes and estimated marker effects:

$$\hat{\mu }_{{P_{1} \times P_{2} }} = \frac{{\varvec{\hat{\beta }}^{\varvec{'}} \left( {\varvec{X}_{{\varvec{P}_{1} }} + \varvec{X}_{{\varvec{P}_{2} }} } \right)}}{2}$$

where \(\mu _{{P_{1} \times P_{2} }}\) is the PM for cross \({P}_{1}\times {P}_{2}\), \(\widehat{{\varvec{\beta}}}\) the vector of estimated marker effects, and \({{\varvec{X}}}_{{{\varvec{P}}}_{1}}\) and \({{\varvec{X}}}_{{{\varvec{P}}}_{2}}\) are the vectors of genotypes for parent \({P}_{1}\) and parent \({P}_{2}\), respectively.

Gametic variance estimation The formula takes into account marker effects and phase in parents as well as recombination rates. We used the formula proposed by Lehermeier et al. 2017:

$$\hat{\sigma }_{{P_{1} \times P_{2} }}^{2} = \varvec{\hat{\beta }}^{\varvec{'}} \varvec{V}_{{\varvec{P}_{1} \times \varvec{P}_{2} }} \varvec{\hat{\beta }}$$

where \(\hat{\sigma }_{{P_{1} \times P_{2} }}^{2}\) is the gametic variance for cross \({P}_{1}\times {P}_{2}\), and \({{\varvec{V}}}_{{{\varvec{P}}}_{1}\times {{\varvec{P}}}_{2}}\) the genotypic variance–covariance matrix for biparental RIL progeny calculated as follows:

$$\left( {V_{{P_{1} \times P_{2} }} } \right)_{{jl}}^{{{\text{RIL}}\left( k \right)}} = 4D_{{jl}}^{*} \left( {\mathop \sum \limits_{{{\text{gen}} = 1}}^{k} \left( {0.5\left( {1 - 2c_{{jl}} } \right)} \right)^{{{\text{gen}}}} } \right)$$

where \({D}_{jl}^{*}\) is the linkage disequilibrium (LD) between alleles at loci \(j\) and \(l\) among both parental lines (either 0 if parents carry the same alleles at loci \(j\) and/or \(l\), or 0.25 if alleles are different at both locus between parents and in coupling phase, that is, one parent carries the two beneficial alleles, while the other carries deleterious alleles, or -0.25 if the alleles are in repulsion phase), \(k\) the number of generations, and \({c}_{jl}\) the recombination rate between loci \(j\) and \(l\). The recombination rates were computed from the West-European bread wheat population genetic map (Danguy des Déserts et al. 2021).

An alternative approach was used to compute the gametic variance. We called Vg1 (standing for variance of gametes) the formulae proposed by Lehermeier et al. 2017 using RR-BLUP model (Meuwissen et al. 2001) to estimate marker effects:

$$\begin{array}{*{20}c} {\left( {\hat{\sigma }_{{P_{1} \times P_{2} }}^{2} } \right)_{{{\text{Vg}}_{1} }} = \varvec{\hat{\beta }}^{\varvec{'}} \varvec{V}_{{\varvec{P}_{1} \times \varvec{P}_{2} }} \varvec{\hat{\beta }}} \\ \end{array}$$
(1)

For the model called Vg2, we added a first term taking into account the error in marker effect estimation given by Lehermeier et al. 2017 (Eq. 10) in their algebraic version of the posterior mean variance (PMV):

$$\left( {\hat{\sigma }_{{P_{1} \times P_{2} }}^{2} } \right)_{{Vg_{2} }} = \hat{\beta }^\prime V_{{P_{1} \times P_{2} }} \hat{\beta } + {\text{trace}}\left\{ {V_{{P_{1} \times P_{2} }} \text{var} \left( {\beta |X,y} \right)} \right\}$$
(2)

with \(\left( {\beta |X,y} \right) = \hat{\sigma }_{\beta }^{2} \left( {I - X^{\prime} \left( {XX^{\prime} + I\frac{{\hat{\sigma }_{r}^{2} }}{{\hat{\sigma }_{\beta }^{2} }}} \right)^{{ - 1}} X} \right)\), where \({\widehat{\sigma }}_{\beta }^{2}\) and \({\widehat{\sigma }}_{r}^{2}\) are the markers and residual estimated variances, and \({\varvec{X}}\) is the matrix of TP’s genotypes.

For the model called Vg3, we added a second term that aims to consider the fact that the uncertainty of the estimation of marker effects is modulated by the genomic composition of each parent:

$$\left( {\hat{\sigma }_{{P_{1} \times P_{2} }}^{2} } \right)_{{Vg_{3} }} = \hat{\beta }^{\prime} V_{{P_{1} \times P_{2} }} \hat{\beta } + {\text{trace}}\left\{ {V_{{P_{1} \times P_{2} }} \text{var} \left( {\beta |X,y} \right)} \right\} + 0.25X_{{P_{1} \times P_{2} }}^{\prime } \text{var} \left( {\beta |X,y} \right)X_{{P_{1} \times P_{2} }}$$
(3)

where \({{\varvec{X}}}_{{{\varvec{P}}}_{1}\times {{\varvec{P}}}_{2}}\) is the vector of genotypes for the F1 of cross \({P}_{1}\times {P}_{2}\). Details are provided in Supplementary Information S1.

UC estimation The UC was calculated as follows (Schnell and Utz 1975; Zhong and Jannink 2007):

$$\widehat{{UC}}_{{P_{1} \times P_{2} }} = \hat{\mu }_{{P_{1} \times P_{2} }} + i\hat{\sigma }_{{P_{1} \times P_{2} }}$$

where \(i\) is the selection intensity corresponding in our study to a 7% selection rate (\(i \sim 1.91\), computed as the inverse Mills ratio), and \({\widehat{\sigma }}_{{P}_{1}\times {P}_{2}}\) the estimated progeny SD for cross \({P}_{1}\times {P}_{2}\).

EP cross value components

We calculated the experimental cross value components for each cross from the distribution of the progeny BLUE values: mean (PM), standard deviation (SD) and mean of the 7% best progeny (observed UC) (or the value of the best progeny when progeny size was inferior to 15).

Cross value components’ predictive ability

We compared the three cross value components (PM, SD and UC) genomic estimates with the corresponding experimental values using a weighted Pearson’s correlation (weights v1.0.4 R package (Pasek and Schwemmle 2021)) to adjust for the number of progenies per cross. These correlations were considered as predictive ability.

Factors influencing predictive ability of the cross value components (simulations)

In order to evaluate the different parameters impacting the predictive ability of the three cross value components (PM, SD and UC) and to have a reference for the values we can get in optimal/suboptimal conditions (perfect/unperfect prediction of marker effects, limited/infinite progeny size), we developed a simulation study.

Simulation design description

The general scheme of the simulation study is provided in Fig. 1. We estimated cross value components using analytic formulae 1 or by simulating progenies 2 for the same 73 crosses used in the experiment.

Fig. 1
figure 1

Simulation scheme. TP training population, TBV true breeding value, pred_true: predicted true cross value components, pred_estimated: predicted estimated cross value components, progeny_phenotypes: observed (simulated phenotypes) cross value components

We ran different scenarios, varying the number of QTL (from 10 to 1,000), heritability (from 0.2 to 0.8) and progeny size (from the size in experiment to 2,000). We simulated 20 traits for each number of QTL. We positioned the QTL randomly along the genetic map (Danguy des Déserts et al. 2021). We assigned them an effect drawn from a normal distribution \(N\left(\text{0,1}\right).\) These true marker effects 1.2 were then adjusted to provide a variance of true breeding values (TBV) similar to the GEBV variance observed for yield in the TP.

Analytic formulae 1. To estimate cross value components, we either used the true marker effects 1.2 (TRUE scenario) to mimic a scenario where the error in marker effect prediction is negligible (TP size is infinite and the prediction model perfect) or estimated marker effects 1.5 (ESTIMATED scenario) when the size of the TP is limited. The TBV of the TP were calculated as the cross-product between true marker effects and genotypes 1.3. We simulated phenotypes 1.4 for the TP by adding a noise to their TBV. This noise was normally distributed with a variance corresponding to the residual variance from a specified heritability. For instance, if the TBV variance is 80 and we specify a heritability of 0.7, we would draw the noise for phenotypes computation from \(N\left(\text{0,34}\right)\) according to the following equation: \({h}^{2}=\frac{80}{80+34}=0.7\). We used the genotypes (excluding QTL) and simulated phenotypes of the TP to estimate marker effects 1.5 using different genomic prediction models. The pred_true (pred_estimated) cross value components were calculated using true (estimated) marker effects using the formula described in 2.2.1.1. Recombination rates were estimated in a Western Europe diversity panel (Danguy des Déserts et al. 2021).

Simulated progenies 2. We simulated progeny genotypes 2.2 for the 73 experimental crosses 2.1 phenotyped for all four traits, using the genetic map from (Danguy des Déserts et al. 2021) and the MoBPS v1.6.64 R package (Pook et al. 2020). The number of progenies per cross varied between the one observed in the experiment (Supplementary Table S11) and 2,000. We produced 25 times these number of progenies per cross to account for variations due to Mendelian gamete sampling. (The results presented in this study are the average of these 25 simulations.) Progeny TBV 2.3 were calculated as the cross-product between true simulated marker effects and the simulated progeny genotypes. Progeny phenotypes were simulated 2.4 by adding a noise to their TBV. This noise was normally distributed with a variance corresponding to the residual variance observed in the TP (34 if we keep the example of the previous section with a heritability of 0.7).

Finally, we calculated the cross value components for each cross from the distribution of the progeny phenotypes (progeny_phenotypes in Fig. 1): mean, standard deviation and mean of the 7% best progeny for the UC (or the value of the best progeny when progeny size was inferior to 15).

Predictive ability For each scenario, we compared the genomic predictions of the three cross value components obtained with the analytic formulae using true or estimated marker effects, with the ones obtained in simulated progenies using a weighted Pearson’s correlation to adjust for the number of progenies per cross. These correlations were considered as predictive ability.

Parameters influencing the predictive ability

For each scenario, 20 trait architectures were simulated. To compare scenarios, we considered for each cross value component the median values of correlations across the 20 runs.

We explored different parameters that could impact the predictive ability of the three cross value components: genetic architecture (number of QTL and heritability of the trait), number of progenies per cross, genomic prediction models, three different formulae to estimate the gametic variance (Vg1, Vg2, Vg3) and mating design (similar to the experiment or maximizing diversity).

Genetic architecture We tested different numbers of QTL for the simulation of the genetic determinism of agronomic traits: 10-30-300-1,000. The scenarios with 10 or 30 QTL represent polygenic traits (with major genes), whereas 300 and 1,000 QTL scenarios correspond to quantitative traits. Heritability ranged from 0.2 to 0.8 (with a 0.1 step). Note that this heritability is the true heritability. In our simulations, the corresponding genomic heritability estimated using a RR-BLUP approach was lower. For instance, a true heritability of 0.7 corresponded to a genomic heritability of ~ 0.5 (Supplementary Figure S1), congruent with the genomic heritability observed for yield in experimental data.

Progeny size To evaluate if the observed variance in the field using a limited progeny size is reliable, we tested different numbers of progenies per cross, starting with the real numbers observed in our experiment (ranging from 5 to 122 progenies; mean = 47.6 progenies per cross, Supplementary Table S11) to a maximum of 2000 progenies for all crosses (50, 100, 200, 1,000 and 2,000).

Precision of marker effect estimation To evaluate the importance of the quality of marker effects estimation in genomic predictions, we implemented a true and an estimated scenario. For the later, several genomic selection methods were compared: different Bayesian approaches and a RR-BLUP approach.

Precision of SD estimation We implemented two algebraic variants of the PMV approach proposed by Lehermeier et al. 2017 with a RR-BLUP approach described earlier (“Vg1,” “Vg2,” “Vg3”).

Genetic composition of the mating plan The experimental mating plan of the PrediCropt FSOV project was chosen by breeders to maximize genetic gain. To evaluate the bias possibly generated by expert choice (some parents were used in several crosses and some were genetically similar, as described in Supplementary Table S11 and Supplementary Figure S2, respectively), we tested a different set of parents. This new set was chosen to be representative of the total genetic diversity present in the TP and the least possible genetically similar. We chose 146 different parents to generate 73 crosses. To do so, we first extracted the 10 principal components (PC analysis) from the variance-standardized genomic relationship matrix (GRM) of the TP lines, using PLINK v2.00a2LM toolset. We computed Euclidean distances between lines using those 10 PC and performed a hierarchical cluster analysis (146 groups) with the Ward’s minimum variance method (Murtagh and Legendre 2014) (dist and hclust functions, respectively, from the stats v4.1.1 R package (R Core Team 2021)). In each cluster, we picked the most distant line. Finally, we generated 73 crosses by randomly mating this new set of 146 parents 10 times. We considered the median values of correlations across the 10 random mating plans to compare with the crosses from the experiment. The plots of the 4 first PC for the two sets of parents (“PrediCropt” crosses and this new set of “Unrelated” parents) show that “Unrelated” parents cover a larger diversity (Supplementary Figure S2).

Results

Quality of the training population (TP)

We used as TP the historical French registration data (GEVES) and the INRAE-AO breeding program, including 2000–2022 evaluation trials, both Southern and Northern France networks. The average Pearson’s correlations between years in each dataset were 0.69 for yield, 0.78 for grain protein content, 0.87 for plant height and 0.91 for heading date (Supplementary Tables S2 to S9).

The repeatabilities at the plot level were 0.32 for yield, 0.52 for grain protein content, 0.73 for plant height and 0.85 for heading date. Repeatabilities at the design level were very high for all traits (ranging from 0.91 to 0.99). Those values are in congruence with other trials and reflect higher GxE interactions for grain protein content and yield compared to plant height and heading date. Genomic heritabilities were 0.53 for yield, 0.51 for grain protein content, 0.56 for plant height and 0.73 for heading date. If we compare repeatability’s rankings to genomic heritability’s rankings, we notice a high missing heritability for plant height, maybe due to epistasis. Finally, the predictive ability using an additive model ranged from 0.56 for plant height to 0.70 for heading date (Table 1).

Table 1 Variance components, repeatabilities and predictive ability in the TP

Simulation study

In our simulation study, we explored different ranges of parameters that could impact the predictive ability of the three cross value components (PM, SD and UC): the number of QTL (10–1000), the heritability (0.2–0.8), the progeny size (experimental progeny size ~ 48–2000), the precision of the estimation of marker effect (true or estimated) and the genetic composition of the mating plan. For all scenarios, the predictive ability was higher for PM and UC compared to SD.

Genetic architecture

The predictive ability of the three cross value components increased with heritability (Fig. 2). For 300 QTL, for instance, predictive abilities increased from 0.64 to 0.93 for PM, from 0.48 to 0.86 for UC and from 0.04 to 0.36 for SD between heritability 0.2 and heritability 0.8.

Fig. 2
figure 2

Impact of genetic architecture on the predictive ability of the three cross value components, based on simulations. The number of progenies per cross was the same as in experimental data, and the marker effects were estimated with “BayesA” method. PM parental mean, SD standard deviation, UC usefulness criterion

The predictive ability of the three cross value components decreased when QTL number increased (Fig. 2). While PM predictions were slightly impacted (0.91 to 0.84 between 10 and 1,000 QTL with h2 = 0.6, for instance), SD decreased significantly (from 0.69 to 0.18). UC that is calculated with both PM and SD (\(\text{UC}=\text{PM}+i\times \text{SD}\)) was still correctly estimated (dropped from 0.82 to 0.77). This suggests that UC is more driven by PM than SD in this elite material.

Progeny size

To evaluate if the observed variance in the field using a limited progeny size (< 100) is reliable, we tested different numbers of progenies per cross (Fig. 3).

Fig. 3
figure 3

Impact of the number of progenies per cross on the predictive ability of the three cross value components, based on simulations. A number of progenies equal to 0 means that the number of progenies was the same as in the experimental data (see Materials and methods section). Marker effects were estimated with “BayesA” method. PM parental mean, SD standard deviation, UC usefulness criterion

The predictive ability of the three cross value components increased when the number of progenies per cross increased from the number of progenies in the experiment (~ 48) to 2,000 (Fig. 3). PM and UC were slightly impacted: The predictive abilities increased from 0.90 to 0.91 for PM and from 0.82 to 0.90 for UC when h2 = 0.7 and QTL number = 300, for instance. The interest to produce large progeny was much higher for SD estimation: Predictive abilities increased from 0.29 to 0.57. We observed a first slope decrease when progeny size is superior to 250 and a second when progeny size is superior to 1000, close to a plateau.

The relative increase in predictive ability for SD was remarkably stronger in scenarios with high number of QTL (Fig. 3 and Supplementary Figure S3), with a plateau when the number of QTL is superior to 300.

Marker effect estimation precision

We compared two scenarios to test the impact of marker effect estimation precision on cross value components prediction. We considered the TRUE scenario as a reference (pred_true in Fig. 1). In this optimal situation, TP size is infinite, and its genetic composition ideal and marker effects are perfectly estimated: We use true simulated marker effects to estimate PM, SD and UC. In the more realistic ESTIMATED scenario (pred_estimated in Fig. 1), marker effects are estimated using a genomic prediction model from our TP that is limited.

In the TRUE scenario (pred_true), the predictive abilities were very high for PM and UC whatever the number of QTL or heritability. But SD predictive ability was high only when the number of QTL was inferior to 300. For instance, the predictive ability was 0.97, 0.89 and 0.81 for PM, UC and SD, respectively, when QTL number was 10 and when the number of progenies was the same as in the experimental data (Fig. 3 and Supplementary Figure S3), and 0.99, 0.95 and 0.95 when the number of progenies was 2,000 (Fig. 4B). We observed a strong decrease in predictive ability for SD when the number of QTL increased (0.37 for QTL number = 1000 versus 0.81 for QTL number = 10 and when progeny size = the number of progenies in experimental data, Fig. 4A). This decrease was much smaller when the number of progenies per cross was 2000 (0.86 for QTL number = 1000 versus 0.95 for QTL number = 10, Fig. 4B). All these results show that if we would perfectly estimate marker effects and progeny size was sufficient, we would be able to estimate SD as well very well, even when the number of QTL is high.

Fig. 4
figure 4

Impact of the quality of marker effect estimates on the predictive ability of the three cross value components based on simulations. The number of progenies per cross was the same as in experimental data for plot A and equal to 2000 for plot B. The heritability was 0.7. The true cross value components (pred_true) were calculated with true marker effects, whereas the estimated cross value components (pred_estimated) were calculated with estimated marker effects using “Bayes A” method. PM parental mean, SD standard deviation, UC usefulness criterion

As we observed an important gap of predictive abilities between the TRUE and the ESTIMATED scenarios for SD (Fig. 4), we tested different genomic prediction models with different hypotheses on trait architecture (BayesA, BayesB, BayesC, BL, BRR and RR-BLUP (Vg1)). We also tested 2 different SD estimators deriving from the algebraic formulae of SD: Vg2 = Vg1 + term that takes into account the error of estimation of marker effects, and Vg3 = Vg2 + term that takes into account the variance in parents (Supplementary Information S1) (Fig. 5).

Fig. 5
figure 5

Impact of the genomic selection method on the predictive ability of SD based on simulations. The number of progenies per cross was the same as in experimental data for plots A and B and equal to 2,000 for plots C and D. In A and C, the heritability was 0.7. In B and D, the number of QTL was 300. PM parental mean, SD standard deviation, UC usefulness criterion

When progeny size was equal to the experimental design (~ 48) and QTL number was inferior or equal to 300, Bayesian models performed better than the RR-BLUP approach, and more particularly the Bayes A and B models. For instance, predictive abilities were 0.76 for both Bayes A and B models versus 0.62 for Vg1 when QTL number = 10 and heritability = 0.7 (Fig. 5A and B). Taking into account the error in marker effect estimates slightly increased the predictive abilities in scenarios with 1,000 QTL and heritability = 0.7 (0.26 for Vg2 and Vg3 compared to 0.23 for Vg1) (Fig. 5B).

When progeny size was 2,000 and QTL number was inferior to 300, Bayes B still gave better predictions of SD (for instance, 0.91 for Bayes B versus 0.73 for Vg1 when QTL number = 10 and heritability = 0.7; Fig. 5C). When number of QTL ≥ 300, Vg3 increased predictive abilities (Fig. 5C), especially when heritability was low (0.25 for Vg1 versus 0.33 for Vg3 when QTL number = 300 and heritability = 0.2; Fig. 5D). Vg3 is equal to Bayes B when heritability = 0.8 and number of QTL = 300 (0.64) (Fig. 5D).

All these results show that the quality of marker effect estimates has a strong impact on the predictive ability of SD. For traits with small number of QTL, Bayesian models (Bayes A or B) should be preferred. When the number of QTL is high, Vg3 should be used, especially if heritability is low. (It is equivalent to Bayes B when heritability is very high.)

Genetic composition of the mating plan

To evaluate the bias possibly generated by the choice of crosses in our experimental design, we compared the results obtained with the experimental crosses called “PrediCropt” with a set of crosses called “Unrelated.” We selected the “Unrelated” parents to be representative of the total genetic diversity present in the TP and the crosses by mating parents with low genetic similarity (see Materials and methods). The 4 first PC for the two sets of parents are provided in Supplementary Figure S2 and show that the “Unrelated” parents are more scattered throughout the TP than “PrediCropt” parents (for instance, no “PrediCropt” parent in the bottom left side in graphic A—PC 1 versus PC 2).

Ten random associations of the “Unrelated” set of parents gave the predictive abilities’ results provided in Supplementary Figure S4 for scenario with 300 QTL and heritability of 0.7. Little differences were observed between the two sets of parents. Indeed, the predictive abilities were 0.90 (“PrediCropt”) versus 0.88 (“Unrelated”) for PM, 0.82 for both sets for UC, and 0.29 (“PrediCropt”) versus 0.24 (“Unrelated”) for SD. According to these results, we considered as negligible the bias generated by the choice of our experimental crosses.

Experimental data application

In our experimental data, we phenotyped yield, plant height and heading date for 101 progenies and grain protein content for 73 progenies (Supplementary Table S11). On average correlations between trials (year x location) were 0.48, 0.43, 0.79 and 0.88 for yield, grain protein content, plant height and heading date, which is congruent with other trials (Supplementary Tables S12 and S13).

We compared experimental cross value components to genomic predictions described above. Note that both are predictors of the true values and we do not know a priori if one is better than the other. We just look here at their correlations (Table 2). Observed and predicted PM was significantly correlated for the 4 phenotypes. Median correlation values across all prediction models were 0.38, 0.63, 0.51 and 0.91, for yield, grain protein content, plant height and heading date, respectively. SD was significantly correlated for plant height and heading date (0.59 and 0.38, respectively) but not for yield and grain protein content (0.01 and 0.13, respectively). UC was, however, significantly correlated for all 4 traits (0.45, 0.52, 0.54 and 0.74 for yield, grain protein content, plant height and heading date, respectively).

Table 2 Correlations between predictions and experimental values of the three cross value components

Note that for plant height and heading date, SD was better predicted with non-Gaussian Bayesian approaches (0.63–0.64 and 0.46–0.49, respectively) than with Gaussian approaches (0.53–0.54 and 0.33–0.34, respectively), suggesting that those traits are polygenic with some major QTL (Ellis et al. 2002).

Discussion

The present study aimed at investigating the genomic predictive ability of the cross value called UC (usefulness criterion; \(\text{UC}=\text{PM}+i\times \text{SD}\)) and its two components, the mean (PM; parental mean) and the variance (SD; standard deviation) of the progeny. We used simulations to test the impact of the precision of marker effect estimation in predictions. We compared two scenarios, one using perfectly estimated marker effects (true marker effects) and one using more realistically estimated marker effects. We tested two new estimators of SD that take into account the error in marker effect estimation. We also tested the impact of trait heritability (varying from 0.2 to 0.8), trait architecture (number of QTL varying between 10 and 1000) and progeny size (varying from the number of progenies per cross observed in the field to 2000). We used those figures as a reference grid for the comparison of predictions with experimental data composed of yield, grain protein content, plant height and heading date evaluated for 101 crosses with 55 progenies on average. Our results shed light on the factors influencing cross value predictive ability and provide insights into the prerequisites for using such methods in precision breeding programs.

A high-quality training population (TP)

In this study, we took advantage of a historical TP of more than 20 years of data collection to predict three cross value components (PM, SD and UC) of 73 or 101 crosses depending on the trait using genomic prediction models. We compared these predictions to experimental observations of progenies. All the results concerning the TP (correlations between years, repeatabilities at plot and design levels) suggest a high-quality dataset. The predictive abilities obtained by cross-validation ranging from 0.56 (plant height) to 0.70 (heading date) were higher than previously obtained with a smaller TP (G. Charmet, personal communication). We checked if we could optimize the training population (data not shown) by an iterative approach that removes the environments (year x location) one by one and that computes predictive ability at each iteration (Heslot et al. 2013). We finally kept all the environments of the TP because we did not detect any environment deteriorating the quality of the prediction. Even trials from the South network were improving predictions of North trials. We also checked (data not shown) the predictive ability within each sub-dataset of the TP (GEVES, INRAE-AO, North and South) and the predictive abilities of the 73 crosses obtained for the three cross-value components using these sub-datasets as TP. We observed better predictive abilities using all datasets combined as TP. All these preliminary results led us to retain the entire dataset as the TP in this study.

Due to the size and complexity of the TP design (not uniform spatial information), for computational reasons, and as done in previous studies (Tiede et al. 2015; Danguy des Déserts et al. 2023), we separated the steps of correction for the fixed effects such as spatial coordinates and environmental effects (location and year) from the estimation of marker effects. It would be interesting to evaluate the impact of this pre-correction on the estimation of marker effects step.

Parameters impacting predictive abilities based on simulations

To evaluate different parameters impacting predictive abilities, we simulated progenies using the same crosses than in our experiment, varying heritability, QTL number, progeny size, prediction models, and genetic composition of the mating plan.

The first comment on this simulation study concerns the difference observed between true heritability and estimated heritability, with a missing heritability ranging from 0.1 to 0.2 in estimated heritabilities, whatever the number of QTL simulated. This could be explained by a lack of genetic information, such as the absence of rare variants or structural variants (Manolio et al. 2009), epistasis and GxE. According to this observation, we focused our results on scenarios where heritability = 0.7, which correspond to an estimated heritability of 0.5, congruent with yield observations.

First, the predictive ability of the three cross value components increased when heritability increased. The impact of QTL number impacted SD predictions: the predictive ability decreased when QTL number increased. These two results were in concordance with the results obtained in previous studies (Wimmer et al. 2013; Tiede et al. 2015; Yao et al. 2018). The first conclusion about this simulation study is that trait architecture is the most important factor impacting predictive abilities and more especially for the prediction of SD.

Second, the predictive ability of the three cross value components increased when progeny size increased, the SD being strongly affected. To our knowledge, no study has yet investigated this parameter. Our results suggest that more than 1000 progenies per cross should be sufficient to maximize SD predictive ability as little differences were observed between the 1000 and 2000 progenies per cross-scenarios.

Third, we observed that if we perfectly knew marker effects (TRUE scenario), we were able to estimate the three cross value components when QTL number is low and progeny size is large almost perfectly. But SD predictions decrease strongly with the number of QTL, especially when marker effects are estimated. This could be due to two factors: The TP is not informative enough compared to our validation population, or/and the prediction model is not optimal. We could not explore the first factor because no other TP was available for this study. However, we investigated the second parameter by testing different prediction models. We observed small differences between prediction models, and mostly in the small progeny size scenarios. This result was in concordance with several previous studies (Yao et al. 2018; Neyhart et al. 2019; Santos et al. 2019). RR-BLUP is known to have a similar or slightly better performance for GEBV prediction than models with differential shrinkage when the number of QTL is large (Daetwyler et al. 2010). We actually observed that Bayesian models, Bayes A and B in particular, outperformed models using Gaussian marker effect variance distribution when the number of QTL was low, in accordance with several previous studies (Meuwissen et al. 2001; Shepherd et al. 2010; Legarra et al. 2011).

In this study, we computed SD using the algebraic formula proposed by Lehermeier et al. 2017 for gametic variance. As the precision of marker effect estimates strongly impacted predictive abilities, we proposed two variants for the computation of SD that account for the error in marker effect estimation. The first variant (Vg2) was the algebraic version of the posterior mean variance (PMV) (Lehermeier et al. 2017). The second variant (Vg3) aimed at considering the fact that the uncertainty of the estimation of marker effects is modulated by the genomic composition of each parent. These two variants were computed only with marker effects estimated by a RR-BLUP approach in our study. They increased SD predictive abilities when progeny size was large and the number of QTL was high, and to a greater extent when heritability was low. This result is in concordance with the 300 QTL simulation scenario described in Lehermeier et al. 2017.

Finally, SD predictive ability was high only in the TRUE scenario, when the number of QTL was 10. We did not take into account the interaction between genotype and environment or epistasis in our prediction models. This could improve predictive ability of traits such as yield, grain protein content and plant height in particular. We could also test if our predictive ability is sufficient to improve our mating design and short/long-term genetic gain in elite and more diverse material.

Validation in experimental data

The predictive ability of cross value components based on experimental progenies’ phenotypes was validated for maize yield, moisture and test weight SD (0.18 for yield, 0.49 for moisture and 0.52 for test weight) (Lian et al. 2015). For flowering time (Osthushenrich et al. 2017), the correlations were around 0.90 for PM and of 0.46–0.65 for SD. For barley (Tiede et al. 2015; Neyhart et al. 2019), the predictive abilities for PM were moderate to high (0.46–0.69), whereas those for SD were lower (0.01–0.48). They were higher for heading date (the most heritable trait) and lower for FHB severity (the least heritable trait).

In our study, observed and predicted PM and UC were significantly correlated for all traits (PM: 0.38, 0.63, 0.51 and 0.91; UC: 0.45, 0.52, 0.54 and 0.74; for yield, grain protein content, plant height and heading date, respectively), while SD was correlated only for heading date and plant height (0.64 and 0.49, respectively). We were not able to validate our ability of prediction of SD for yield because progeny sizes were too small according to simulations. As quoted by Santos et al. (2019), inferences using SD « should be regarded as a bet»: The value of the best individual among 40 may differ from its parental UC calculated with a of 10 or 2%. For grain protein content, plant height and heading date, non-Gaussian Bayesian approaches outperformed Gaussian approaches for SD prediction, indicating that considering more flexible distributions for marker effects can improve SD estimation accuracy.

It would now be interesting to validate the value of UC in terms of genetic gain using different levels of diversity in the starting germplasm (elite breeding program vs pre-breeding program) under a long-term perspective, using different traits of interest. The progeny size is very important to calibrate to assure the realization of the SD potential of crosses. It would also be interesting to test if including interaction effect of markers with environments improves SD predictive ability.

Recommendations for breeding schemes

Our results contribute to practical guidance for precision breeding programs. The study highlights the importance of accurate marker effect estimates, thanks to large and optimized TP datasets, as well as the optimization of prediction models to estimate the value of a cross. As progeny size is determinant to realize the potential of a cross, we can advise the breeders to produce larger populations for crosses with outstanding SD. Optimizing mating design based on UC might be even more interesting with more diverse material where the variance of SD between crosses is larger, in pre-breeding programs, for instance.

Finally, breeders are interested in predicting several traits of agronomic interest simultaneously in progenies, the more challenging case being those with negative genetic correlations, yield and grain protein content, for instance, for bread wheat (Thorwarth et al. 2018). One interesting perspective is now to optimize the predictive ability of this correlation (Neyhart et al. 2019) in order to predict crosses with outstanding progenies for the main trait and satisfying values for the secondary trait.