Introduction

Now that the barley genome has been entirely mapped (Beier et al. 2017; Mascher et al. 2017) and the cost-efficiency of genotyping technology is continuously improving, molecular markers are widely used in plant breeding to accelerate crop improvement by increasing the genetic gain per generation (Bernardo 2008; Yabe and Iwata 2020). In organic farming, due to the higher variability in crop yield and product quality across environments, testing across more years and locations is required to identify varieties showing stable performance under low input growing conditions. Marker-assisted selection (MAS) would allow the pre-selection of lines based on the occurrence of genomic regions known to influence the trait of interest and thus reduce the need for extensive phenotyping. However, consecutive breeding programs focused primarily on yield improvement may have inadvertently suppressed favorable alleles for valuable traits under low input conditions such as disease resistance, straw strength, weed competitive ability, and nitrogen use efficiency (Newton et al. 2010).

Reintroducing in breeding programs ancient varieties preserved in Genebanks may help recover some favorable alleles (Alqudah et al. 2020). Genome-wide association study (GWAS) is a powerful tool to identify genes of interest by screening a large and diverse population on their phenotype and genotype. Many factors influence the statistical power of GWAS models, including population size, marker density, minor allele frequency (MAF), and linkage disequilibrium (LD) (Alqudah et al. 2020). LD represents the degree of correlation between markers. The more rapidly LD decreases with increased markers distance, the higher the marker density needed to cover the whole genome information. Single Nucleotide Polymorphisms (SNP) are now commonly used in GWAS because they are cheap and available in large quantities across the genome. Moreover, marker density has been enhanced considerably over the past decade, with, for example, a barley 50 k SNP genotyping chip released in 2017 (Bayer et al. 2017).

The first GWAS models developed were testing each marker independently. Whole genome models were developed thereafter to estimate all effects simultaneously and avoid overestimating single marker effect in the presence of markers in LD creating noise and redundancy in the analysis (Tibbs Cortes et al. 2021).

GWAS may also result in false positives due to confounding factors, such as genetic population structure due to common ancestry, cryptic relatedness due to genetic proximity between individual (He et al. 2019), with common SNP explaining a large part of the trait heritability. When many markers are tested, correcting the p-value significance threshold for multiple testing is also critical to identify true associations. A stringent correction such as Bonferroni is efficient to control the rate of false positives but may lead to a higher rate of false negatives because markers are assumed to be independent (Kaler and Purcell 2019). This hypothesis is not true because of LD pattern. A correction based on the number of independent tests instead of the total number of markers tested has been proposed by Cheverud (2001), but the traditional Bonferroni correction remains widely used.

To increase the statistical power to detect true associations while optimizing the computation efficiency, various statistical models were developed. The Efficient Mixed Effect Association (EMMA) (Kang et al. 2008) is a single-locus mixed linear model (MLM), which can use the Population Parameter Previously Determined (P3D) allowing to estimate variance components only once for all the single-marker tests, thus considerably improving the computational efficiency. Generally, in mixed-effect models, the major genetic principal components are included as fixed factors and a kinship matrix as a random factor to correct for population structure and relatedness, respectively. The G model is a multi-locus model developed by Bernardo (2013) that unlike common GWAS models accounts for population structure or relatedness by calculating markers effects on each chromosome separately while controlling for the background effect of markers on the remaining chromosomes. In a second step, most significant markers are identified via stepwise regression analyses. The author argues that this method corrects for an additional level of redundancy, and may thus be more effective in identifying “true” associations. For example, the G model has been used to validate results from other methods (Gao et al. 2016; Sallam et al. 2017). Another approach to correct for LD patterns is the Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) (Huang et al. 2019). This model uses LD to select a set of independent markers that are then tested in a mixed effect model, and repeats this procedure multiple times. This method has been proven to have high power to detect large-effect QTL. Lastly, the recently developed GWAS model 3VMrMLM (Zhang et al. 2022) is a two-step method that first estimates markers’ effects in a single-locus model to select a set of markers based on a relaxed p-value significance threshold, and next applies to this selection a multi-locus compressed analysis of variance (Li et al. 2022a, b). This method has proven to have an increased statistical power to detect small effect and dominant effect markers when heterozygosity is present. Indeed, unlike other models, 3VMrMLM estimates additive and dominant allelic effects in the second step. Moreover, multi-environmental analyses with 3VMrMLM allows to explore markers by environment interactions.

The efficient prioritization of significant Marker-Trait Associations (MTA) discovered in GWAS relies on accurate marker effect calculation, which is challenging in the presence of non-additive effects (dominant, heterozygous, epistatic, marker-by-environment interaction) (Zhu and Zhou 2020). Some but not all GWAS models provide in the output an estimate of the proportion of the trait variance explained by the marker (PVE). There are multiple ways to calculate the markers effects, which may lead to contrasting results: over or under-estimated. The additive effect of one allele is commonly estimated, but may under-estimate the true effect in the occurrence of non-additive effects. Similarly, estimating the effect in a single-marker regression may over or under estimate the effect due to epistatic effect. The effect may also vary across environments, thus if we were looking for QTL with a consistent effect, carrying out the calculations based on the mean BLUP would not necessarily provide reliable estimates.

In this study, the first objective was to compare different GWAS models, including a single locus model EMMA and 3 multi-locus models, namely the G model, BLINK, and 3VMrMLM., on their ability to identify genetic associations with key agronomic, diseases and grain quality traits for naked barley grown under organic conditions. The second objective was to prioritize MTA from multiple model outputs for the identification of major Quantitative-Trait Loci (QTL) and investigate associated candidate genes. The last objective was to investigate the practical use of those QTL in organic naked barley breeding programs to favor long-term genetic gain.

Material and methods

Plant material

The diversity panel comprised 247 spring naked barley lines, including 58 lines from Genebanks and 189 from recent international breeding programs (Table S1).

Field trial

The panel was grown over 3 years (2019, 2020, 2021) on an organic-certified field in Camolin, Co. Wexford, Ireland (52° 37′ 21.5′′ N, 6° 25′ 06.1′′ W). The trial followed a type II modified augmented experimental design (MAD2) (Lin and Poushinsky 1985). Tested lines were un-replicated, while 3 checks were replicated within the design: Full Pint, a two-row malting hulled barley variety (OSU) released in 2014; CDC Clear, a two-row malting naked barley variety (CDC, Canada) released in 2012; Annapurna, a two-row food naked barley variety (Semillas Batlle, Spain) released in 2018. They were selected in different climate zones and on different quality criteria, and therefore considered to be representative of the panel.

Sowing dates were 04/05/2019, 22/04/2020, and 24/04/2021. Accessions were sown with a Hege cartridge seeder in rows of 1.5 m length organized in a rectangular plot of 25 m length and 15 m width, with 10 rows by 30 columns broken into 20 incomplete blocks distributed in 2 Rows and 10 Columns. Plots were spaced by 0.5 m horizontally and 1 m vertically. The primary check was assigned to the central plot of each Block and was different each year (Full Pint in 2019, CDC Clear in 2020, Annapurna in 2021). The 2 secondary check lines were randomly assigned to 2 plots within 4 of the 10 Blocks in each Row, bringing to 36 the number of plots allocated to checks. The randomized tested lines were then allocated to the remaining plots (Fig. S1).

The trial was conducted under organic conditions. The farmer usually incorporates about 4t/ha of beef cattle manure in the soil every other year or depending on results from soil sample analysis. No pest or disease management was applied to the field because the plant response to natural infection was to be assessed. Hand weeding was carried out every two or three weeks, depending on the weed pressure. Each plot was harvested leaving a residue height of 5 cm and tied into a bundle. Samples were run through an experimental thresher (Almaco (Small Vogel Plot or Bundle Thresher)), as much as required to ensure complete separation of grain and husks. Each sample was then run through an Air Blast Seed Cleaner (Almaco) to remove remaining crop residues, straws, and husks.

Phenotyping

Seedlings’ early vigor was scored 3 weeks after sowing (Vigour, score 1–8). Growing Degree Days required to reach Anthesis (GDDA, °Cd) and to Physiological Maturity (GDDPM, °Cd) were estimated using Zadoks’ scale (Zadoks et al. 1974). Plant height (PH, cm) and straw strength (Lodging, score) were measured at the grain-filling stage. Disease severity was estimated by visual screening of the percentage of the whole plot leaf area covered by symptoms (AHDB 2008; FAO 2016; Spaner et al. 1998). The maximal percentage for each genotype was converted to a score using the following scale: 0 (0%), 1 (< 5%), 2 (5–10%), 3 (10–20%), 4 (20–30%), 5 (30–40%), 6 (40–50%), 7 (50–65%), 8 (65–80%), 9 (> 80%). Major diseases observed included Brown Rust (Puccinia triticana) (BrownRust), Net Blotch (Pyrenophora teres) (NetBlotch), Powdery Mildew (Blumeria graminis) (Mildew), Ramularia leaf spot (Ramularia collo-cygni) (Ramularia), Rhynchosporium leaf scald (Rhynchosporium secalis) (Rhyn), and barley yellow dwarf virus (BYDV).

Traits related to threshing hardness comprised the number of runs required for threshing (Thresh, count) and the weight proportion of hulled grains remaining after threshing (hulled, %). Yield and grain quality traits included grain yield (yield, g/plot), harvest index (HI, %) (grain yield/total bundle weight), thousand kernel weight (TKW, g/1000 grains), grain plump fraction above 2.5 mm screen (plump, %), and grain protein (protein, %) and beta-glucan (Bgluc, %) content. Grain protein and moisture levels were estimated by Near Infrared technology with the ®GrainSense tool. Beta glucan content was estimated with the mixed-linkage method on flour samples (McCleary and Codd 1991) using the Megazyme kit (Megazyme 2021). To adapt the protocol to the material available in our laboratory, we used part of the modified protocol proposed by Hu and Burton (2008). After incubation, 1 mL of each test tube was transferred into 1.2 mL cluster tubes and centrifuged at 2200 rpm for 15 min in a microcentrifuge (Thermo Scientific (Pico 21)). The data was adjusted for grain moisture content to get the beta-glucan percentage of the dry weight.

3 years of data were collected on GDDA, BYDV, yield, TKW, protein, and Bgluc. 2 years of data were collected on Vigour, GDDPM, PH, Ramularia, Mildew, Rhyn, BrownRust, NetBlotch, Lodging, HI, Thresh, hulled, plump.

Phenotypic data analysis

Data was first adjusted for external factors such as field heterogeneity and environmental variations. Mixed linear modeling was performed with the R packages lme4 and LmerTest using Restricted Maximum Likelihood Estimation (REML) and the Nelder-Mead correction (Rice et al. 2020). In the next section we will present the models with random terms underlined and fixed term not.

Adjustment of beta-glucan data

$$y=\mu +Day+Batch +\underline{Assay}+\underline{\epsilon }$$
(1)

Bgluc values were first adjusted for batch effect when significant, each year separately, using the values obtained on the barley flour control. 2 dummy variables were created to differentiate control from tested flour samples: 0 was assigned to control flour samples in both ControlsvsTested and Tested, while for tested flour samples, ControlsvsTested equals 1 and Tested equals the sample plot number. The Eq. (1) was used to fit the mixed model with μ, the grand mean, Day, the day the assay was analyzed and Batch, the batch number, both fixed effects, Assay, the random effect of the assay for tested samples only (i.e., the interaction between ControlsvsTested and Tested) and ϵ, the random experimental error.

Heritability calculation

$$y=\mu +Year:CheckvsTested+Year:Checks+Year+\underline{Year:Row}+\underline{Year:Column}+\underline{Year:Block}+\underline{Geno}+\underline{Geno:Year}+\underline{\epsilon }$$
(2)

For the following analyses, 3 dummy variables were created to separate check varieties from tested lines, as described by Piepho and Williams (2016). CheckvsTested equals 0 if a check line and 1 if a tested line, Check equals 0 for tested lines and the genotype identification number (gid) of check lines. Tested equals 0 for check lines and the the gid for tested lines. The effect of tested lines alone (Geno) or check lines alone (Checks) corresponds to the interaction effect between CheckvsTested and Tested, or CheckvsTested and Check, respectively. The Eq. (2) describes the fitted mixed model with μ, the grand mean, Year, the experimental year, Year:CheckvsTested, the difference between checks and tested lines in each year, Year:Checks the difference between check lines in each year, Year:Row, Year:Column, and Year:Block, the field effects in each year, Geno, the effect of tested lines, Geno:Year, the genotype by year interaction effect. ϵ, the random experimental error.

$${{\text{H}}^{2} =\frac{{\upsigma }_{\text{g}}^{2}}{{\upsigma }_{\text{p}}^{2}},\upsigma }_{\text{p}}^{2} = {\upsigma }_{\text{g}}^{2} +\frac{{\upsigma }_{\text{gy}}^{2}}{\text{y}}+ \frac{{\upsigma }_{\text{e}}^{2}}{\text{y}*\text{n}}; n={\left(\sum_{i}{n}_{i}\right) }^{2}+\sum_{i}{n}_{i}^{2}$$
(3)

Trait heritability (H2) was estimated using variance components extracted from model (2) with the VarCorr function. The phenotypic variance was calculated on the genotype mean basis using the formula described by You et al. (2016) for MAD2 designs (3). \({\upsigma }_{\text{g}}^{2}\) refers to the genetic variance associated with the term Geno; \({\upsigma }_{\text{gy}}^{2}\) the tested line by year interaction variance associated with the term Geno:Year; \({\upsigma }_{\text{e}}^{2}\) the residual error variance; y, the number of years; n, the average number of check replicates, ni the number of replicates of the ith check.

Calculation of adjusted means

$$y=\mu +Year+Year:Test+Year:Check+\underline{Year:Row}+\underline{Year:Column}+\underline{Year:Block}+\underline{Year|Geno}+\underline{\epsilon }$$
(4)

Adjusted means for tested lines were estimated using model Eq. (4). The only term differing from Eq. (2) is Year|Geno, referring to the within-year effect of tested lines. Field effects were only accounted for when improving the model significantly, with the Bayesian Information Criterion (BIC) used to select the best model fit. The model residuals were considered normally distributed if absolute values for the skewness and kurtosis parameters were below 0.8 and 4, respectively. The data was quantile transformed otherwise, using the orderNorm function of the BestNormalized package. The protein readings with NIR technology were low for black grains, with no calibration for darker colors yet available. Therefore, the seed color was considered a fixed factor in the model for protein data adjustment.

The Best Linear Unbiased Predictors (BLUP) for the term Year|Geno were extracted from the model output using the ranef function. Those values represented the deviation from the grand mean of each accession in each year. Final adjusted mean values were obtained by adding the yearly trait mean across all accessions.

Genotyping data

The genotyping data obtained with the Illumina 50 K Single Nucleotide Polymorphism (SNP) markers bead chip (Bayer et al. 2017) was provided by Oregon State University (OSU) in HapMap format with 247 genotypes and 44,040 SNP. Data quality checks and filtering were performed in TASSEL 5.0, with SNP and genotypes with more than 10% missing data and SNP with MAF lower than 5% removed from the dataset. For population structure and relatedness analyses, data was further filtered for LD < 0.2 with the function snpgdsLDpruning of the R package SNPRelate, to avoid bias related to correlated markers.

Population structure analysis

A Principal Component Analysis (PCA) was performed with the snpgdsPCA function on the pruned dataset. The function snpgdsDiss was used on the pruned set of SNP to compute the dissimilarity matrix to input in the hclust function for hierarchical clustering with the Ward.D2 method, chosen based on the agnes criterion. Results were visualized on a dendrogram, allowing the identification of the optimal number of clusters. Genetic clusters were then represented on a PCA plot of individuals using the ggplot2 R package.

Relatedness analysis

Kinship coefficients were computed with Identity-By-Descent (IBD) and maximum likelihood estimation (MLE) and assembled in a 247 by 247 kinship matrix. The pruned dataset was inputted in the function snpgdsIBDMLE, and the resulting matrix was normalized to obtain a positive semi-definite kinship matrix required in GWAS. A heatmap was generated with the GAPIT package.

LD analysis

LD was estimated by the squared Pearson correlation coefficient between allelic states of two markers (r2). r2 and its p-value were estimated for each pair of markers in TASSEL 5.0. Markers were considered unlinked when the p-value was above the significance threshold (0.05). The 95% percentile of squared transformed values of unlinked r2 corresponds to the critical r2 (Bengtsson et al. 2017) and LD decay distance to the distance over which markers are unlikely to be associated i.e., mean distance between markers with LD below the estimated critical r2. A non-linear model was fitted between r2 value and markers distance to estimate the mean decay distance for each chromosome and across the whole genome (Remington et al. 2001).

Genome-wide association study

Four methods were used for GWAS, including a single locus model (EMMA) and 3 multi-locus models (BLINK, G model, and 3VMrMLM). 3 types of phenotypes were used in the study: (a) within-year adjusted means (2019, 2020, 2021), (b) average of the within-years adjusted means (mean), (c) multiple within year adjusted means fitted together (multi). All the models were able to fit the phenotypes (a) and (b), while multi-environmental data could only be explored with 3VMrMLM and EMMA.

For multi-year analysis, 3VMrMLM was fitted with the “Multi-env” option in the IIIVMrMLM package and function, while the GWAS function of the R package rrBLUP was used to fit EMMA with the modified code proposed by Isidro-Sánchez et al. (2017). For the analyses on single year datasets and the means across years, BLINK was fitted with GAPIT R package, 3VMrMLM with the “Single-env” option in the IIIVMrMLM R package and function, and the G model with the Fortran program developed by Bernardo (2013).

Relatedness was accounted for as random effect with an additive relationship matrix computed with the A.mat function (rrBLUP R package) in EMMA, or the kinship matrix previously obtained by IBD for BLINK and 3VMrMLM. Population structure was accounted for in 3VMrMLM and EMMA by fitting the 4 first genetic principal components (PC) as fixed factors. For BLINK, the model was fitted with 0 to 4 PC. The optimal number of PC to include was determined according to the genomic inflation factor (λ < 1.1) (Yang et al. 2011) and the number of MTA (maximal). For the G model, LD pruning was carried out before analysis with snpgdsLDpruning to remove the highest level of redundancy, which is mostly associated with the genetic background and population structure, only keeping one marker among highly correlated markers (r2 > 0.85).

The Bonferroni correction was applied to the significance threshold (0.05/number of tested SNP) for all models, but the pruned set of SNP was used in the G model. Unlike the other models, the denominator was the number of SNP after pruning in the G model.

All the significant MTA identified across methods and datasets were compiled. LD blocks were investigated using the trio package and the functions getLD and findLDblocks to define groups of correlated SNP. To avoid bias related to multicollinearity in multivariate regression carried out in a later stage, the MTA with the lowest p-value was selected as representative of the block.

Comparison of GWAS models

Multivariate linear regression analysis was used to compare the models on their ability to explain the trait additive variance and their ability to detect large effect markers. For each analysis (1 trait—1 BLUP type—1 model), the allelic information on all the SNP found significant with the corresponding model were fitted against the trait phenotypic values in the analyzed dataset. The phenotypic data was standardized before analysis using the scale function, and the filtered genotyping data was converted to numeric according to the number of copies of the minor allele with genotypes homozygous for the minor allele coded as 2, heterozygous as 1, and the homozygous for the major allele as 0.

$$\begin{gathered} {\text{Full}}\;{\text{model}}:{ }y = \mu + Year + \mathop \sum \limits_{1}^{4} PC_{j} + \mathop \sum \limits_{1}^{n} SNP_{i} + \epsilon \hfill \\ {\text{Reduced}}\;{\text{model}}:{ }y = \mu + Year + \mathop \sum \limits_{1}^{4} PC_{j} + \varepsilon \hfill \\ \end{gathered}$$
(5)

The total variance explained by selected SNP (PVEtotal) was estimated by subtracting the R squared of the reduced model (without SNP) from the R squared of the full model (with SNP and covariates). The formulas are presented in (5) with μ the trait grand mean for the BLUP type analyzed, Year the year effect (for multi-year BLUP type), PCj the jth principal components to account for population structure effect, and SNPi the effect of the ith SNP significant in the analysis and \(\epsilon\) representing the error variance.

SNP effects were estimated simultaneously to account for possible epistasis effect. The percentage of variance explained by each significant SNP (PVEsnp) was estimated according to formula (6)  with a, the associated regression coefficient, and p, the minor allele frequency (MAF).

$${Varsnp}_{i}=2*p*(1-p)*a^{2};{PVEsnp}_{i}=\frac{{Varsnp}_{i}}{\sum_{i}{Varsnp}_{i}}*PVE_{total}$$
(6)

SNP prioritization

$$y=\mu +Year+\sum_{1}^{4}PC_{j}+\sum_{1}^{n}\underline{SNP_{i}}+\underline{\varepsilon }$$
(7)

LDBlock-Trait associations detected with at least 2 analyses were selected (i.e. SNP in the same LD block found significantly associated with the trait). Among those SNP in the same LD block, the one with the lowest p-value across analyses was selected as representative of the block. The variance explained by all (PVEtotal) and each (PVEsnp) selected SNP were estimated for each BLUP type (single-year, multi-year, mean) by multivariate regression analysis, with SNP fitted as fixed factors (PVEtotal_fixed, PVEsnp_fixed) and random factors (PVEtotal_random, PVEsnp_random). The first relates to the SNP additive effect, while the latter also accounts for the heterozygous effect. For the fixed PVE, the same methodology based on formula (5) and (6) was applied to the n selected SNP, while for the random PVE the mixed model formula (7) similar to the full model in (6) but with SNP as random factors.

The function r.squaredGLMM from the R package MuMIn allowed estimating PVEtotal_random. Variance components were extracted with Varcorr (R lmerTest). PVEsnp_random was also calculated with formula (6) as for PVEsnp_fixed.

Mapping and identification of candidate genes

Barleymap (Cantalapiedra et al. 2015) was used to get the physical position of all validated SNP on the Morex v3 reference genome map. Candidate genes were investigated for SNP with a PVEsnp fixed or random above 5% across all the BLUP types (within-year, mean, multi-year). Those were defined as potential Quantitative-Trait-Loci (QTL) hotspots. The GrainGenes platform (Yao et al. 2022) was used for investigating previously reported MTA or QTL at similar positions.

Favorable haplotypes

A relationship between the number of favorable alleles exhibited by accessions and the trait value was shown in Gao et al. (2016). However, this approach does not consider possible outperforming heterozygous genotypes, which may be more favorable than the homozygous. Therefore, the relationship between the number of favorable QTL for SNP with medium to large effect size (PVE above 5% consistent across BLUP types) (NbFav) and the trait value was investigated. The SNP random effects (i.e. deviation from the trait mean due to the SNP in each allelic state existing in the population) were extracted from model (7) using the ranef function (lmerTest). For each SNP, a genotype (allelic state) was considered favorable if the sign of the effect matched the expectations. Desirable barley traits may vary between end-uses. Here, we consider the grain quality requirements for malting: low protein, low beta-glucan level, high TKW and plumpness…, etc. For each trait, calculations were based on the mean BLUP type and SNP genotypes were coded as 2 for the most favorable, 1 for the second favorable (if any), and 0 for unfavorable (opposite effect sign). NbFav was obtained by summing up these numbers for each accession.

A multivariate regression was then fitted for multiple comparison of means between groups of accessions possessing the same NbFav using the formula Trait ~ μ + Year + NbFav + Year:NbFav + ϵ.

The R package emmeans was used for the calculation of marginal means and confidence intervals corresponding to the interaction term (Year:NbFav). The cld function of the multcomp package applied to the result allowed for grouping means into statistical groups using the Bonferroni corrected significance threshold.

Multi-trait genotype ideotype distance index applied to marker-assisted selection

Marker-Assisted Selection (MAS) is generally based on the presence/absence of one or a few SNP found associated with a trait of interest. Organic farmers are looking for crops with good overall performance across multiple traits. Indices such as the multi-trait genotype-ideotype distance index (MGIDI) (Olivoto and Nardino 2021) are used in phenotypic selection to identify lines best balancing multiple traits. We investigated the potential of applying MGIDI in MAS, by inputting each trait NbFav, instead of the phenotypic values. The R package metan and mgidi function was used for calculations. The objective was to maximize NbFav for desirable traits, namely low protein, low beta-glucan level, high plumpness, high TKW, high yield, low threshing hardness, high Vigour, high resistance to lodging, low disease scores, long degree days to maturity but short degree days to anthesis. The function first groups correlated traits into factors. A score is then calculated for each factor and the sum of those scores is the final index value of an accession. The lower the index, the closer the line is from the ideotype. A selection intensity of 15% was applied to the panel. Selected accessions were compared to the whole panel on the proportion of accessions from each breeding origin and on the BLUP population means for key naked barley traits.

All the figures were generated with R using ggplot2 package.

Results

Population structure and relatedness

35,552 SNP markers remained after filtering for missing data, MAF, and unknown map position. A subset of 6505 independent SNP was used for population structure and relatedness analyses. In the Principal Component Analysis, the 4 first Principal Components explained 32% of the genotypic variation among the diversity panel, with the remaining PC explaining less than 4% each. We can conclude with a medium population structure.

The dendrogram associated with the heatmap shows how the population can be divided according to genetic relatedness and suggests at least 2 groups (Fig. 1). However, the heatmap indicates that within groups, there are some small subgroups of highly genetically related lines but also lines that are not close to any other accession.

Fig. 1
figure 1

(left) Kinship matrix heatmap and associated dendrogram with genotypes represented on each axis; (right) PCA plot representing the accessions position according to the 2 first PCs. Point shape and ellipses indicate the genetic groups resulting from hierarchical clustering analysis described in Table 1

Hierarchical clustering identified 2, 4, or 6 groups, with 4 being the optimal number. The main criteria differentiating the groups are the line breeding history (Genebanks (G1, G2) or breeding programs (G3, G4)), the seed color (either yellow (G3, G4) or colored (G1, G2)), the specific line donor or breeding program and the spike morphology (2 or 6-rows). Table 1 summarizes the number of lines in each genetic group for each possible combination of these criteria.

Table 1 Repartition of accessions in each genetic group

Linkage disequilibrium (LD)

The maximum LD between 2 SNP markers was 0.46, and the critical r2 was 0.12. Figure 2 shows that LD decays rapidly, with LD decay estimated at 1.72 Mbp. However, LD decay varied across chromosomes (1.91 ± 0.40 Mbp).

Fig. 2
figure 2

LD between 2 SNP markers depending on their distance. Mbp = Mega base pair: horizontal line = r2 critical value; vertical line = LD decay distance

Phenotypic data range and distribution

In the following sections, results on individual traits are only described for 7 of the 19 traits, each representing a trait category: phenology (GDDA), agronomy (Lodging), diseases (Rhyn), yield, threshing hardness (Thresh), grain quality (plump), and nutritional value (Bgluc). Results for the remaining traits are provided in supplementary materials.

Figure 3 represents the data range and distribution of genotypic BLUPs for each trait and BLUP type (within-year BLUP, mean of within year BLUP, multiple within-year BLUP). Out of the 84 datasets analyzed, only 14 passed the Shapiro–Wilk test (p-value ≥ 0.01) but only 9 had an absolute value of skewness above 0.8 and kurtosis remained below 4, indicating only slight deviation from normality and may thus not affect GWAS analyses (Table S4). Only three traits had heritability below 60% (yield, HI, and hulled) (Table S3), indicating good reproducibility of the data.

Fig. 3
figure 3

Violin plots and boxplots for each trait and each BLUP type. See material and methods for full description of traits and measurement units; Lodging, Rhyn, and Thresh were quantile transformed prior mixed linear modeling

Genome-wide association studies

The study identified 1742 Marker-Trait Associations (MTA) across traits, models, and BLUP types. 1653 MTA remained after grouping significant SNP into LD blocks and selecting the most significant as representative SNP of the block.

Comparison of GWAS models

Overall, the 3VMrMLM model performed best with significant MTA explaining the largest proportion of the trait additive variance in most analyses (Table 2). 3VMrMLM also covered all the effect size ranges (Table 3). BLINK and 3VMrMLM respectively discovered 3 and 11 MTA that explained more than 5% of the trait variance, while the other models discovered small effect markers. On the multi-year BLUP type, EMMA failed to detect significant associations for 8 of the 19 traits and did not detect any on the within-year BLUP types (Table 2). The G model generally identified more MTA but with similar or lower PVEtotal. Interestingly, the model performed better on some agronomic and disease traits (plump, Ramularia). BLINK identified a smaller number of markers, but their effects combined explained a larger proportion of the variance in some cases.

Table 2 Total additive variance explained (PVEtotal) by significant MTA discovered by each model on each BLUP type for seven of the traits studied
Table 3 Number of MTA across traits and datasets for each model and additive effect size category

MTA validation and QTL identification

In total, 259 MTA were significant in at least two analyses (Fig. 4, Table S6). The position and effect size of SNP associated with 7 of the 19 traits are represented in Fig. 5. Discovered on multiple BLUP types, 175 MTA were selected with the G model, 13 with BLINK, and 50 with 3VMrMLM. 29 MTA were validated with at least 2 models and/or on several BLUP types but none by all 4 models: 13 with BLINK and 3VMrMLM (including 2 QEI), 7 with 3VMrMLM and EMMA, 2 with BLINK, EMMA and 3VMrMLM, 1 with G model and 3VMrMLM, 1 with G model and EMMA. Rhyn was associated with 14 SNP only, but above half were selected, while 111 MTA were discovered for GDDA but only 24 in several analyses.

Fig. 4
figure 4

Venn diagram representing the number of common MTA between BLUP types across models and traits and between models across BLUP types and traits

Fig. 5
figure 5

Position and effect size category of MTA validated across analyses. Across BLUP types, minimal percentage of variance explained by SNP when fitted as random

Selected SNP explained a larger proportion as random compared to fixed effect (Table 4). Fixed and random PVE were correlated for both PVEtotal (\(rpearson=0.69\)) and PVEsnp effect (\(rpearson=0.43\)) (Fig. S3). Based on the PVE calculated from random effects, 36 MTA were identified as “major QTL” and explained above 5% of the trait variance on each the BLUP types (Table 5, Table S7).

Table 4 Prioritization of SNP
Table 5 SNP markers prioritized via multi-model GWAS with medium to large fixed and/or random effect

The investigation of candidate genes or known QTL in LD with the markers led to the further validation of 20 of them, explaining up to 26% of the trait variance on average across BLUP types (Table S7). For example, Q.Thresh4H explained at least 20.9% of Thresh and Q.Lodging3H.1 explained 21.2% of Lodging variance.

Relationship between the number of favorable major QTL and the trait value

Figure 6 shows an additive effect of the number of favorable major QTL, and consistency across years for most traits. Overall, significant differences were observed between the low and high NbFav, except when the number of accessions was too small leading to high standard error. However, results highlight the presence of a genotype-by-year interaction. Yield had only one major QTL identified and was significantly improved by the favorable genotype in 2020 and 2021 but not in 2019. Rhyn had low disease levels in 2021 for all the haplotypes (i.e., combinations of alleles from all major QTL), mainly due to reduced disease pressure compared to 2020.

Fig. 6
figure 6

Effect of combination of favorable allelic states at major QTL associated with 7 of the 19 studied traits. Letters correspond to statistical groups from multiple comparison of means with p-value Bonferroni correction. Square dots correspond to the marginal means and error bars to the associated confidence intervals. The number of lines corresponding to each haplotype is indicated above the x-axis

Multi-trait marker assisted selection with MGIDI on the number of favorable QTL (NbFav)

The MGIDI index selected 37 accessions with maximized NbFav across the studied traits. Figure 7A shows how the traits were grouped into factors according to NbFav values, and the proportion of the selection index explained by each factor. A high contribution to MGIDI indicates a strength and a low contribution, a weakness. Overall, all the accessions exhibited strength on plump and Thresh, while there is more disparity on the other factors (Table S12). Setana Hadaka seems to maximize NbFav for FA2 corresponding to yield and disease resistance, while some of the DH lines show weaknesses on the FA1 regrouping agronomic and nutritional traits. The comparison of trait means between the selection and the full panel confirms that the selection favored genetic gain for plump, TKW, and Thresh (Table 6). For other important traits such as protein, Bgluc, and Lodging, although no genetic gain was observed, mean values remain in an acceptable range. The diversity within the panel was also maintained with similar proportions of accessions from each breeding history for the selection and the full panel (Fig. 7B).

Fig. 7
figure 7

(A) Radar chart representing the contribution of each factor to the MGIDI index, for each accession selected based on NbFav. (B) Proportion of accessions from each breeding history (Donors) in the whole panel (p0) versus in the selection (pS)(%)(acronyms related to Donors described in Table 1)

Table 6 Population means across the 3 years of trial for accessions selected with MGIDI based on NbFav (MeanS), compared to the full panel (Mean0) for main naked barley traits of interest. [bold] favorable genetic gain from selection

Discussion

Performance and complementarity of different GWAS models

The recent model 3VMrMLM showed the best performance identifying main effect markers, with 24 out of the 36 major QTL identified with this model, including 17 with this model only. Since its development, a few studies have also demonstrated the high power of this model to estimate the markers’ effect in an unbiased manner and detect markers from all ranges of effect sizes. He et al. (2022), Li et al. (2022a, b), Wei et al. (2022), and Zhang et al. (2022) compared 3VMrMLM to a single-locus method, GEMMA, REMMA, MLM and EMMA, respectively, and demonstrated a lower power to detect small effect and stable QTNs. In the present study, the single locus model EMMA failed to identify any significant MTA within the single-year and the mean BLUP types. However, using multi-year data increased the population size and thus improved the power of the model to detect associations and allowed the identification of 369 MTA across 11 out of 19 traits, of which only 8 were also discovered with other models. The fair comparison of 3VMrMLM, BLINK, G model, and EMMA is not straightforward and rather complex due to their very different characteristics (SNP starting number, steps of analysis, … etc.). Only 3VMrMLM and EMMA can fit multi-environmental data, and only 3VMrMLM is able to explore SNP by environment interaction. Thus, no comparison could be made for the latter.

QTL were mostly selected based on multiple analyses with the same model on different BLUP types, but the combination of models allowed to explain a larger part of the variance for 7 of the 19 traits (Table S9). Each model seems to have its strengths and weaknesses depending on the trait category. For example, major QTL discovered with the G model or 3VMrMLM alone explained up to 18% of the random variance each, whereas combining QTL selected from the two models’ outputs explained up to 47%. Similarly, for Bgluc, QTL found with BLINK and 3VMrMLM alone explained up to 29% and 37%, respectively, while selecting QTL from the two models ‘outputs explained up to 58% of the random variance. This suggests that very different GWAS models can be complementary and allow broader understanding of the trait genetic architecture.

Efficient markers prioritization for the identification of major QTL

Grouping correlated SNP markers into LD blocks and selecting blocks found associated with a trait in more than 1 analysis allowed prioritizing more associations than restricting the selection to specific SNP. While it was not necessarily the case for within-model selection across BLUP types, between-models selection identified 29 MTA if based on LD blocks against 22 if based on SNP. For example, only 2 MTA were selected between the G model and other models, whereas 4 additional SNP were selected based on LD blocks. Similarly, Gao et al. (2016) used the G model to validate results from MLM for wheat leaf rust resistance and the two models did not necessarily discover the same but closely mapped markers.

The second step of SNP prioritization is based on its effect size. GWAS model outputs do not necessarily provide this information, and if provided, the values are not calculated the same way across models. Indeed, multiple statistical methods can be used for marker effect estimation, each with specific advantages and limitations (Xavier 2019; Zhu and Zhou 2020). Compared to the fixed effect procedure, the random effect procedure allowed prioritizing more MTA and identify outperforming heterozygous that may be of interest This study proposed a methodology to prioritize SNP discovered with multiple models ran on multiple BLUP types. Calculating the effect of markers on each BLUP type and prioritizing SNP based on their minimum PVE across BLUP types guaranteed the selection of SNP with a consistently large effect across years.

Validating prioritized SNP with genes previously characterized for impacting the studied trait, provides an additional level of reliability to the marker. Half of the 36 major QTL are potentially novel i.e., with no obvious candidate gene or co-localized QTL identified. One explanation for not finding a candidate gene may be that the gene has not been characterized yet. Candidate genes are usually searched for in direct proximity to the SNP but due to the intra-chromosome variation of LD pattern on top of the inter-chromosome variation, the associated gene might be located slightly farther than expected. For example, Pasam et al. (2012) observed a faster LD decay among landraces than modern barley cultivars, and a higher LD between SNP among 6-rows than 2-rows.

Some QTL may be expressed differently in one environment from another. GWAS analysis on the aggregated within-year BLUP with 3VMrMLM identified 13 significant QTN by environment interaction (QEIs) associated with Bgluc, protein, yield, HI, Vigour, BYDV, GDDA, which explained between 0.7 and 3.9% of the trait variance (Table S8). Interestingly, Q.BYDV3H.1 was a QEI also identified as major QTL. Candidate genes for QEI mainly encode for proteins involved in various biotic and abiotic stress tolerance mechanisms such as chaperone proteins, hexosyltransferase (Dawood et al 2020), Zinc finger proteins (Hussain et al. 2022), ferredoxin (Wójcik-Jagła et al. 2020), and serine/threonine protein kinase. This result coincides with the contrasted weather conditions at the experimental site: 2020 was characterized by a warm and dry spring followed by a wet and cool summer and 2021 by a cold spring but a warm and dry summer with a reduced disease pressure compared to 2020.

The environmental effect was also highlighted by year-specific MTA. 3 major QTL for Thresh, Ramularia, and GDDPM were only identified on the 2020 and the mean datasets, and 1 QTL for Lodging only on the 2021 and the mean datasets. As highlighted previously for diseases, low symptom severity may have led to only partial or no expression of the QTL, thus explaining the inconsistency in GWAS output. For example, JHI-Hv50k-2016-426290 on 6H (546.7 Mbp) explained up to 29% of the variance for Ramularia in 2020 but only 3% in 2021. A relevant candidate gene is HORVU.MOREX.r3.6HG0627440 encoding for an Agmatine coumaroyltransferase-2 protein involved in the reinforcement of cell walls to prevent from pathogen infection (Lemcke et al. 2021). Although this SNP was not prioritized based on our data, it may be a SNP of interest. This observation highlights the need for phenotyping in more years/locations for QTL validation.

Predicting the genotype performance using major QTL identified in multi-model GWAS

The relationship between the number of favorable alleles and the trait value was demonstrated for wheat coleoptile length, based on the allelic additive effect (Wei et al. 2022). However, in the presence of heterozygous effect, it may be more accurate to consider the favorable genotypes of major QTL (i.e. heterozygous, homozygous for the minor or major allele). In this study, multiple comparison of haplotypes showed that the haplotype combining the most favorable genotypes of all major QTL had the best trait value. However, this theoretical best haplotype was not always existent in the studied population, or the number of accessions exhibiting this haplotype was too low to detect a significant difference (Table S11). Nevertheless, our results highlighted a correlation between the number of favorable genotypes at major QTL (NbFav) and the trait value.

A practical application of this finding is the use of NbFav in a multi-trait index for more efficient Marker-Assisted-Selection (MAS). Based on our data, the selection succeeded in identifying a subset of lines best-balancing multiple traits while maintaining genetic diversity. If we were to compare MAS to phenotypic selection, short-term genetic gain may be lower but higher on the long-term, with better trait stability.

In early stage of plant breeding programs, the idea of MAS may be more to eliminate the less desirable accessions, rather than selecting the best. Part of the trait variance remains unexplained because of the highly quantitative nature of some traits such as phenological traits or yield. Those traits are controlled by many genes with rather small effect and are highly influenced by the environment. The QTL identified in this study may not be effective at other locations, thus further phenotyping in multiple environments would be required to select reliable QTL for their broad use in MAS. Finally, the identification of environment-specific QTL may also be of interest to select for local adaptation.

Conclusion

Each of the 4 GWAS models tested showed strengths and limitations but 3VMrMLM performed best overall. Using multiple models allowed validating more MTA than single model analysis on multiple datasets. Prioritizing validated MTA based on their minimum random PVE across BLUP types allowed the identification of 36 large effect QTL including 18 associated with a candidate gene or a known QTL and the remaining potentially novel. The study also highlighted a relationship between the number of favorable genotypes at major QTL and the trait value. Thus, maximizing this number for each trait of interest in a selection index using an ideotype design may lead to efficient Marker-Assisted Selection (MAS) for accessions best balancing multiple quantitative traits.