Introduction

Understanding the genetic basis of variation for quantitative traits has been a long-standing challenge in biology. Forward genetics is the classical approach in genetics research. In contrast to reverse genetics, forward genetics progresses from phenotype to genotype, aiming to identify elements of genetic variation that underlie the corresponding phenotypes1,2. Association analysis of natural populations and linkage mapping of biparental segregating populations (F2, recombinant inbred lines [RILs], etc.) constitute two approaches to map QTLs3. The efficacy of forward genetics is affected by the density of genetic markers, the accuracy of scored phenotypes and the size of the mapping population4. Benefiting from the development of high-throughput sequencing technology, genotyping by sequencing facilitates the identification of genetic variation. More and more phenotypes can be accurately measured using the multi-omics approach, such as transcriptomics, metabolomics and phenomics5. This also underlies and enables genetical genomics or systems genetics approaches that combine genetics with gene expression analysis to understand complex traits6,7. The concept of systems genetics was proposed in the early 21st century7,8,9, and much progress has been made in humans10,11,12 and plants13,14,15. The application of systems genetics in crops promises to facilitate both gene discovery and molecular breeding.

Potato (Solanum tuberosum L.) is the most important tuber crop and staple food for 1.3 billion people worldwide16. However, the complexity of tetrasomic inheritance has limited genetic studies in potato. It is challenging to resolve QTLs to the single-gene level using autotetraploid populations (2n = 4x = 48). Genetic analysis at the diploid level will help to simplify analyses and identify genes associated with agronomically important traits. By overcoming self‐incompatibility17,18,19,20 and inbreeding depression21,22,23, tremendous efforts have already been made to reinvent potato as a seed-propagated diploid crop; yet, compared with other major crops, diploid potato breeding is still in its infancy5,24,25,26. Several studies have explored the potential to map QTLs using the diploid populations27,28,29,30,31. However, almost all of the diploid mapping populations have been derived from two heterozygous parents that contain four segregating alleles, and lack of large-scale genetic analysis limits their application in potato breeding. In a previous study, we developed a genome-design approach and created a diploid hybrid potato based on two inbred parental lines23. The two parents, coming from two different diploid accessions: S. tuberosum group Stenotomum and S. tuberosum group Phureja, have significant diversity in tuber traits, gene expression and metabolite content, etc.23,32 The segregating F2 population generated from the two lines is an ideal resource for large-scale genetic analyses of potato, especially for the genetic dissection of heterosis.

Heterosis refers to the superior performance of hybrids over their parents33,34,35, which has been extensively applied for agronomic purposes in animals and plants. Understanding the genetic mechanisms underlying heterosis is beneficial for future crop hybrid breeding. Dominance, overdominance and epistasis are three classical models causally explaining the phenomenon of heterosis36,37,38. Recent reports in rice and maize found that dominance and overdominance effects have explanatory power in the different F2 populations39,40. However, mechanisms underlying heterosis in clonally propagated crops are largely unknown. The diploid hybrid potato we created exhibits significant heterosis at different developmental stages23,32, and it constitutes suitable material for the study of mechanisms underpinning heterosis. Furthermore, genetic dissections of the key loci contributing to heterosis are achievable in the F2 population, enabling us to evaluate the potential heterotic effects of each locus.

In this study, we conduct large-scale genetic and heterotic analyses in a diploid inbred line-based F2 population. Using a multi-omics dataset, we construct a large database of genetic resources. Importantly, we reveal the genetic basis of heterosis in this elite hybrid potato cross and identify a male fertility-related PME gene with dominance heterotic effect. Our findings contribute to the molecular breeding of diploid potato and provide insights into the understanding of heterosis in clonally propagated crops.

Results

The sequencing map of an inbred line-based F2 population

A summary of our sequencing map and research pipeline is outlined in Fig. 1. We constructed an immortalized F2 population, including 1064 individuals and conserved by tissue culture, derived from two homozygous diploid inbred lines23: A6-26 and E4-63. We re-sequenced all F2 plants with an average coverage of 3× and constructed a high-density genotype and genetic map based on 4,794,364 single-nucleotide polymorphisms (SNPs) for QTL mapping (Supplementary Fig. 1a, b). We identified two genomic regions with severe segregation distortion (SD) where the segregation ratios in the F2 population did not agree with 1:2:1 proportion (Supplementary Fig. 1c). These two SD regions colocalize with two self-compatibility-related genes: the S-RNase mutant Ss11 on chromosome 1 (chr01) (derived from the female parent A6-26)17 and the S-locus inhibitor (Sli) on chr12 (derived from the male parent E4-63)19,20. Only pollen grains harboring either one or both genes complete the double fertilization, therefore resulting in SD in the F2 population.

Fig. 1: Schematic diagram of genetic and heterosis analyses in an inbred line-based F2 population.
figure 1

The F1 hybrid potato with strong heterosis was selfed to generate the F2 segregating population. The analyses of genetic network and potato heterosis were conducted using multi-omics data including genomic, phenomic (macro-phenome), transcriptomic and metabolomic data.

To boost the progress of gene discovery in potato, we developed a high-throughput phenome detection and analysis system that integrates multi-omics technology and further divided the potato phenome into macro-phenome and micro-phenome. The macro-phenome includes traits investigated by manual and high-throughput optical imaging (RGB camera, hyperspectral imaging and structured light imaging) (Supplementary Fig. 2 and Supplementary Methods 1 and 2). We thus generated 537 macro-traits including pollen viability, yield-related traits, tuber structure-related traits, morphology-related traits, etc. (Supplementary Data 1). The micro-phenome consists of the transcriptome and the metabolome of tuber. After filtering out low-quality data, we identified 19,166 expressed genes and 679 metabolites for genetic and heterotic analyses.

QTL mapping of the multi-omics traits

We mapped several qualitative traits to known loci using bulked-segregant analysis (BSA). For example, genes controlling tuber flesh color and tuber shape mapped to the ends of chr03 and chr10, corresponding to the Y locus41 and the Ro gene29,42, respectively (Supplementary Fig. 3a, b). Purple Tuber Bud is located on chr02, corresponding to the same region as purple flower32 and possibly regulated by DFR (Dihydroflavonol-4-reductase, D locus)43 (Supplementary Fig. 3c). The gene associated with yellow leaf was mapped to chr12 (Supplementary Fig. 3d). The underlying gene yl1 came from parent A6-26 and was also found in progeny of PG6359 (the parental line of A6-26)21,23.

We applied the composite interval-mapping method to map quantitative traits, identifying 135 macro-phenome QTLs (pQTLs) (Fig. 2a, Table 1, Supplementary Data 2). We collected data on pollen viability (male fertility-related) and flowering time (yield-related) during the 2021 trial and obtained data for four other yield-related traits (plant height, tuber yield, tuber number and tuber size) during the trials in 2021 and 2023, with three replicates each year (see Methods). The correlations between five yield traits of the 1064 individuals revealed that tuber number, tuber size and plant height all positively contribute to tuber yield in two years, while tuber size and tuber number are negatively correlated, consistent with abundant observations to that effect (Fig. 2b and Supplementary Fig. 4). Among the 135 pQTLs, 68 are related to shoot RGB image-based above-ground traits involving shoot greenness-related traits, shoot morphology-related traits, etc (Supplementary Methods 1 and 2). Of those, C01G_Bin340 controls seven morphology-related traits (Supplementary Data 2) and harbors an OVATE transcription factor (St_E4-63_C01G002870). The OVATE family proteins have been reported to regulate plant architecture44 and fruit morphology45. Since the hyperspectral imaging is primarily used for prediction and has been successfully applied to predict water content, yield and metabolites in wheat46,47, they are excluded from genetic mapping. Correlation coefficients of tuber size and tuber reflectance reveal that tuber size is positively correlated with total reflectance of near-infrared (R > 0.7, p-value < 2.2 × 10−16 for all wavelengths), but negatively correlated with average reflectance of visible light (R < −0.4, p-value < 2.2 × 10−16 for all wavelengths) (Fig. 2c). To further connect the tuber spectroscopic data with tuber-related traits, we modeled them using stepwise linear regression analysis. We found that tuber reflectance could be a good indicator of tuber dry matter and tuber yield, whose direct measurement is labor-intensive and time-consuming. Through a 10-fold cross-validation, we further identified the wavelengths that are most effective for predicting dry matter and yield (Supplementary Data 3 and 4). The best prediction accuracy of correlation coefficients (R2) is 0.81 for tuber dry matter and 0.62 for tuber yield (Supplementary Fig. 5), which indicates these tuber reflectance traits can be used for non-destructive selection of potato lines with higher dry matter or yield. The same strategy was used to predict the content of tuber metabolites. We identified 162 metabolites that can be well predicted (R2 > 0.5) using hyperspectral reflectance (Supplementary Data 3 and 4). Tuber structure, including shape-related traits (length, width and height) and size-related traits (surface area and volume), was detected by structured light imaging (Supplementary Methods 1 and 2). The QTLs of these traits are mainly enriched on chr09 (all five traits) and chr10 (tuber length, tuber width and tuber height), consistent with the results generated by manual investigation (Supplementary Data 2). These results indicate that the high-throughput optical imaging system can be applied for efficient phenotype detection of potato.

Fig. 2: Overview of QTLs at the whole genome and correlation of different traits.
figure 2

a QTL distribution analyzed by sliding windows with 1-Mb window and 100-kb step sizes. The y-axis of i–iv indicates QTL number. i, pQTLs; ii, mQTLs; iii, local eQTLs; iv, distant eQTLs; v, the regulatory network of distant eQTL hotspots. b Correlation of tuber yield-related traits in 2021. c Correlation of tuber size and tuber reflectance. The first row concerns tuber size, indicated by the arrow. d Correlation of metabolites. Different kinds of metabolites can be separated by their levels of correlation (R). The x and y axes indicate different kinds of metabolites. In (bd), red indicates positive and blue indicates negative correlations. Source data are provided as a Source Data file.

Table 1 Numbers of traits and QTLs

To explore genetic variants involved in gene expression, we conducted expression QTL (eQTL) mapping using normalized FPKM values. We identified 24,371 eQTLs associated with 14,835 genes, accounting for 77.4% of tuber-expressed genes (Fig. 2a, Table 1, Supplementary Data 2). About 43.0% of the genes (6735) are regulated by more than one eQTL. To better understand the regulatory mechanism of eQTLs, we further divided them into 7273 local eQTLs and 17,098 distant eQTLs, based on the distance between eQTLs and their corresponding genes. In our dataset, we identified nine distant eQTL hotspots (permutation test, p-value < 0.01, distant eQTL number > 223). These hotspots comprise 3009 correlations and can regulate the expression of 2876 genes (Fig. 2a).

For the tuber metabolome, we found that different kinds of metabolites can be separated by their strength of correlation (Fig. 2d). The similar accumulation patterns of highly correlated abundance of some metabolites suggests that they might be controlled by the same regulator or affected by a change in the abundance of upstream metabolites from a single pathway. We detected 1264 mQTLs for 538 metabolites (Fig. 2a, Table 1, Supplementary Data 2). Among these metabolites, 69.1% have more than one mQTL. Moreover, we found that some metabolites with high correlation were mapped to the same genetic regions. For example, several steroid alkaloids (also called solanine in potato) are all regulated by C01G_Bin334–C01G_Bin338. Some flavonols, mainly glycosylated substances, colocalize in C01G_Bin210–C01G_Bin222 (Supplementary Data 2). This finding can help identify potential master regulators of these metabolites.

The systems genetics of the F2 population

Integrating the multi-omics data, we applied a systems-genetics approach to construct the genetic network of this population (Supplementary Fig. 6a). First, we conducted a weighted correlation network analysis (WGCNA) to identify genes with similar expression patterns, uncovering 21 modules (referred to as M1–M21) based on 19,166 tuber-expressed genes (Supplementary Fig. 6b). Correlation analysis between gene expression and tuber-related traits (yield and metabolites) identified 66,512 correlations (q-value < 0.001) with a median correlation coefficient of 0.35, associated with 12,824 genes, 376 metabolites and three yield-related traits (Supplementary Data 5).

Next, according to the results of QTL mapping, we analyzed the bins regulating both gene expression and tuber-related traits (metabolite content and yield-related traits), defined as so-called triple relationships. A total of 3499 triple relationships (gene–bin–trait) emerged, including 2728 genes, 361 metabolites and three yield-related traits (Fig. 3a, Supplementary Data 6). Solanine content of potato tubers is a domesticated trait and plays an important role in tuber quality. As mentioned above, solanines have a common mapping region at C01G_Bin334 (Fig. 3b). Since the identified solanines are highly correlated, we further conducted QTL mapping using the dimensionality reduction method48, and the same locus was found on chr01 (Supplementary Fig. 7a). The WGCNA data show that Module 20 (M20) is highly correlated with various types of solanines (Supplementary Fig. 6b). Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that genes in M20 are significantly enriched in the “steroid biosynthesis” term (Supplementary Fig. 7b), such as St_ E4-63_C04G002433 (cycloartenol synthase), St_ E4-63_C01G002910 (sterol 4 α-methyl oxidase), and others. Based on the triple relationships, 19 genes are also regulated by the solanine-common bin C01G_Bin334, of which three are regulated in a locally acting manner: St_E4-63_C01G002818, St_E4-63_C01G002829 and St_E4-63_C01G002853, with only St_E4-63_C01G002818 belonging to M20. The correlation data of gene expression and metabolites revealed that only St_E4-63_C01G002818 shows significant correlation (q-value < 0.001) with all colocalized solanines (Fig. 3c). According to its gene annotation, St_E4-63_C01G002818 encodes an ethylene-responsive transcription factor known as GAME9, a master regulator of solanine49,50. We then checked the expression level of GAME9 and solanine content of the parents’ tubers. We found that GAME9 is more highly expressed in A6-26 than in E4-63 (p-value < 0.05) (Supplementary Fig. 7c). Additionally, by comparing the promoter sequence (2000 bp upstream of the ATG) of GAME9 in the two parental lines, we identified an activation sequence-1 (as-1) element that can activate gene expression in plants51 and was disrupted by an 11-bp insertion in E4-63, which may lead to lower expression of GAME9 (Supplementary Fig. 7d). Consistent with this, the tubers of A6-26 contain markedly more solanines than those of E4-63 (p-value < 0.001) (Supplementary Fig. 7e), which further suggests that GAME9 is responsible for solanine accumulation in this population. Interestingly, GAME9 is located at a domestication-selection sweep in potato52, which is consistent with the domestication of solanines. We applied the same strategy for colocalized flavonols and identified two candidate genes (Supplementary Fig. 7f–h). These results demonstrate the high efficacy of this genetic network and bode well for its use in gene discovery.

Fig. 3: The systems-genetics network.
figure 3

a The triple relationships (gene–bin–trait) associated with 2728 genes, 245 bins, 361 metabolites and three yield-related traits. b QTLs of different kinds of solanines on chr01. The dotted line shows the LOD cutoff value (3.5). c The regulatory network of GAME9. C01G_Bin334 regulates the expression of GAME9 in a locally acting manner; GAME9 significantly correlates with solanines (purple triangles) in the WGCNA (q-value < 0.001). Source data are provided as a Source Data file.

Single-locus heterotic effects in the diploid potato hybrid

Heterosis, characterized by the better performance of hybrids compared to their parents, has been extensively applied in crop breeding. To better understand the mechanisms underlying heterosis in potato, we estimated the genetic effects of the identified QTLs. By comparing the heterozygous genotype with both homozygous genotypes, we used the degree of dominance (d/a) of each QTL to estimate single-locus heterotic effects.

Male fertility and tuber yield are two key traits for hybrid potato breeding. Through the analysis, we identified four pollen viability and 24 yield-related QTLs which were identified in both years. For all 28 QTLs, 21 exhibited consistent heterotic patterns in two years (including 1-year traits). We found that (partial) dominance constitutes the main heterotic effect, followed by overdominance (Fig. 4a), similar to reports in rice39 and maize40. The dominance model of heterosis implies that the trait value of the heterozygous QTL is the same as that of the advantageous homozygous genotype. Thus, to further elucidate the contributors of advantageous alleles in F1 hybrids, we assessed the trait values of all dominant QTLs. The contributions of better yield and pollen viability from the two parents are almost equal (Fig. 4b). For instance, both A6-26 and E4-63 contribute to tuber yield of hybrids (Supplementary Fig. 8a). Five QTLs (Tuber Yield, TY; Tuber Size, TS) show dominance effects (TY2, TY4, TY5, TS3 and TS4). In these dominant QTLs, TY4 and TS4 were contributed by E4-63 (Supplementary Fig. 8b, c), and TY2, TY5 and TS3 were contributed by A6-26 (Supplementary Fig. 8d–f). The complementation of these dominant QTLs in F1 hybrids results in partial yield heterosis.

Fig. 4: Single-locus effects of heterosis in the diploid hybrid.
figure 4

a The heterotic effects of yield and pollen viability QTLs. The y axis indicates d/a values; d/a values > 3 are displayed as 3. The red dotted line indicates ±1.25 and the dotted blue lines indicate ±0.25. Data from 2021 were used. b The contributor (parental) source of advantageous alleles in hybrid potato. c The yield of different genotypes of TY1 in 2021. A/A, the A6-26 homozygous genotype (A6-26/A6-26); A/E, the heterozygous genotype (A6-26/E4-63); E/E, the E4-63 homozygous genotype (E4-63/E4-63); n = 253, 498 and 163 for A/A, A/E and E/E, respectively. The upper and lower edges of the boxes denote 75% and 25% quartiles, and the central line indicates the median. Whiskers extend to the lower hinge –1.5× interquartile range and upper hinge +1.5× interquartile range of the data. P values are obtained by Student’s t tests (two-tailed). d The proportion of different genetic effects of mQTLs. The numbers in parentheses refer to QTL numbers and percentages. e Correlations (R) between dry matter and metabolites with p-values < 0.01. P values are computed with Pearson correlation tests. Source data are provided as a Source Data file.

Of the overdominant or pseudo-overdominant QTLs, TY1 (C01G_Bin337) showed the largest heterotic effect in both 2021 and 2023 (d/a = 14.35 [2021] and 19.01 [2023]), explaining 6.55% and 4.21% of the phenotypic variance for 2021 and 2023, respectively (Fig. 4a and Supplementary Fig. 9a). Trait value analysis of TY1 showed that the yield of TY1/ty1 was significantly higher than that of TY1/TY1 (E/E, p-value = 6.9 × 10−7) and ty1/ty1 (A/A, p-value = 1.6 × 10−12), whereas there was no significant difference between the parents (p-value = 0.28) (Fig. 4c). The same heterotic pattern was found for TY1 in 2023 (Supplementary Fig. 9b). Interestingly, C01G_Bin337 is the common QTL of three yield-related traits (tuber yield, tuber size and plant height) (Supplementary Data 2). Consistent with the tuber yield, C01G_Bin337 (TS1 for tuber size) also showed overdominant/pseudo-overdominant effect for tuber size in both two years (Fig. 4a and Supplementary Fig. 9a). Also, there were no negative heterotic effects (recessive or underdominance) in yield heterosis. All yield QTLs added positive or additive contributions to yield performance in heterozygous genotypes, leading to yield heterosis in F1 hybrids. For a deeper understanding of yield heterosis, additional efforts are needed to resolve the identified QTLs to the single-gene level.

The expression patterns of genes also contribute to crop heterosis. We found that additive effects are the major genetic effects of gene expression. About 37.7% of eQTLs show additive effects (Supplementary Fig. 10a), indicating that the gene expression level controlled by these eQTLs in heterozygous genotypes is about half the level of the two (combined) homozygous genotypes. We then focused on (partial) dominance/overdominance and further identified 1878 genes with positive eQTLs (all eQTLs associated with the expression of a gene show dominance or overdominance effects) (Supplementary Data 7). The KEGG analysis found that genes with positive eQTLs are significantly enriched in some primary metabolic pathway, such as “citrate cycle” and “biosynthesis of amino acids” (Supplementary Fig. 10b), which might be associated with the energy metabolism process in tubers.

We found different patterns when estimating the heterotic effects of metabolites. In contrast to yield-related traits, the recessive model (51.5%) represents the principal heterotic effects for metabolites (Fig. 4d and Supplementary Fig. 10c). Dominance and overdominance account for only 12.3% and 3.9%, respectively (Fig. 4d). Although the primary metabolites are associated with more positive mQTLs than those of secondary metabolites, the overall patterns of primary and secondary metabolites are consistent with the pattern for the entirety of metabolites (Supplementary Fig. 10d, e). This implies that most mQTLs tend to cause the heterozygous genotypes to contain fewer small-molecule metabolites than the mid-parent value. Our previous study demonstrated that most metabolites in F1 tubers show negative mid‐parent heterosis32, consistent with the findings in the F2 population. As conjectured for F1 hybrids, this indicates that the energy of heterozygous-genotype tubers is preferentially used to synthesize dry matter such as starch and proteins, which cannot be detected by metabolome technology used in this study. Correlation analysis showed that 84.0% of metabolites are negatively correlated with dry matter (p-value < 0.01) (Fig. 4e). We then evaluated the d/a of the mQTLs using dry matter as the input trait. For the recessive/underdominant mQTLs, only 19.0% were also regarded as recessive or underdominant effects for dry matter, while the dominance/overdominance increased to 73.1% (Supplementary Fig. 10f), the opposite pattern between dry matter and metabolites (more dry matter and fewer metabolites content in F1 tuber). These results further support our findings in metabolites heterosis.

A pectin methylesterase contributes to male fertility heterosis

Among the four pollen viability QTLs, the locus on chr07 with the highest LOD value (PV1) exhibited a dominance heterotic effect (Fig. 4a and Fig. 5a). The trait value analysis revealed that the beneficial allele comes from the A6-26 parent (Fig. 5b). Although the pollen viability of A6-26 was inferior to that of E4-6332, A6-26 produced more seeds than E4-63. To investigate whether PV1 is associated with this phenomenon, we integrated genomic and transcriptomic data to clone this major-effect gene governing pollen viability in this population.

Fig. 5: The PME gene contributes to male fertility heterosis in potato.
figure 5

a The QTL mapping of pollen viability. The cutoff LOD value was set to 3.5. b The pollen viability of different genotypes of PV1. A/A, the A6-26 homozygous genotype (A6-26/A6-26); A/E, the heterozygous genotype (A6-26/E4-63); E/E, the E4-63 homozygous genotype (E4-63/E4-63); n = 185, 399 and 233 for A/A, A/E and E/E, respectively. The upper and lower edges of the boxes denote 75% and 25% quartiles, and the central line indicates the median. Whiskers extend to the lower hinge –1.5× interquartile range and upper hinge +1.5× interquartile range of the data. P values are obtained by Student’s t tests (two-tailed). c The sequence of knockout target and main mutant type of PMEKO plants. d The pollen grain stain of WT and PMEKO plants. Red means good viability. e Statistics of pollen viability of WT and PMEKO plants. n = 20 for WT, KO-1, KO-2 and KO-3; The upper and lower edges of the boxes denote 75% and 25% quartiles, and the central line indicates the median. Whiskers extend to the lower hinge –1.5× interquartile range and upper hinge +1.5× interquartile range of the data. P values are obtained by Student’s t tests (two-tailed). f Pollen germination of WT and PMEKO plants. g The fruit and seed pictures of WT and PMEKO plants. h Statistics of seed number of WT and PMEKO plants. n = 15, 8, 17 and 18 for WT, KO-1, KO-2 and KO-3, respectively. The upper and lower edges of the boxes denote 75% and 25% quartiles, and the central line indicates the median. Whiskers extend to the lower hinge –1.5× interquartile range and upper hinge +1.5× interquartile range of the data. P values are obtained by Student’s t tests (two-tailed). Source data are provided as a Source Data file.

Based on the reference genome of E4-63, 21 genes were identified within the PV1 interval spanning a 310-kb interval. We conducted anther transcriptome sequencing of E4-63 including four important developmental stages, encompassing the initial to complete maturity stages (Supplementary Fig. 11a). Among the 21 genes, 11 were expressed (FPKM > 1.5) in at least one stage (Supplementary Fig. 11b). To pinpoint the candidate gene, we checked their expression in other tissues23 and found only St_E4-63_07G001402 and St_E4-63_07G001408 showed lower expression in other tissues but higher expression in anther (Supplementary Fig. 11b). Notably, these two genes displayed a rising-then-falling expression pattern in anther, peaking at stage 3 (with FPKMs of 6.32 and 23.72 for St_E4-63_07G001402 and St_E4-63_07G001408, respectively). Based on gene annotation, St_E4-63_07G001402 encodes a zinc/iron permease, while St_E4-63_07G001408 encodes a pectin methylesterase (PME) with a signal peptide and a predicted PME domain. To our knowledge, zinc/iron permease is not associated with pollen development, but PME is a widely distributed cell wall-related enzyme in plants, implicated in pollen wall synthesis53.

To verify the function of the PME gene in potato, we constructed a CRISPR/Cas9 system targeting the first exon to knock out the PME gene in the diploid potato clone 01-58. Although carrying different mutant types, the two alleles of PME gene in three PME-knockout (PMEKO) plants were mutational in the transgenic T0 plants (Fig. 5c). Pollen viability in PMEKO plants (34–45%) was significantly reduced compared to that in wild-type (WT) plants (~80%) (Fig. 5d, e). Scanning electron microscopy revealed that some pollen grains of the PMEKO plants exhibited aberrant morphology (Supplementary Fig. 11c, d). As pectin plays a vital role in pollen hydration, the lack of PME could impair pectin demethylation, affecting pollen tube germination54. The germination assays in vitro revealed a reduced number of germinated pollens in PMEKO plants compared to WT plants (Fig. 5f). Consistent with this phenotype, when selfing, the PMEKO plants produced fewer seeds per fruit, with the average seed number in WT plants exceeding that of PMEKO plants by over threefold (Fig. 5g, h). To further confirm these results and avoid the possible off-target effect, we conducted deep whole-genome sequencing (~60×) of 01-58 and three PMEKO plants. According to a reported method55, we identified 131 possible off-target sites (Supplementary Data 8) and detected no off-target effects at any of the possible 131 sites in three PMEKO plants. These results indicated that the phenotypes of PMEKO plants are caused by mutation of PME gene. In summary, the PME gene with the dominance heterotic effect confers superior pollen viability and enhanced pollen tube germination, leading to male fertility heterosis in hybrids.

Discussion

Mining functional genes is a critical part of plant biology and crop breeding. Based on genetic manipulations of functionally important genes, researchers have achieved de novo domestication breeding of wild tomato56 and rice57. A quantitative genomics map of rice was generated by utilizing mapped QTLs/genes to guide breeding58. Unlike these well-researched crops, progress in mapping of functional QTLs/genes in potato has been hampered by the complexities of tetrasomic inheritance. Potato is going through a green revolution via efforts to transform it into a seed-propagated diploid crop23,59, promising to greatly simplify genetic analyses. In this study, we conducted the large-scale genetic analysis of an inbred line-based F2 population comprising 1064 individuals. The parents A6-26 and E4-63 were derived from different lineages of S. tuberosum, with many traits segregating in the F2 population. Thanks to the advancements in omics technologies, we can now more quickly and accurately identify plant phenotypes at different levels compared to traditional manual surveys. In this study, we identified 20,382 traits and 25,770 QTLs using transcriptomics, metabolomics and phenomics (including RGB imaging, structured light imaging and hyperspectral imaging) technologies (Fig. 2a, Table 1, Supplementary Data 1, 2). Several phenotypes were mapped to previously reported loci, such as tuber shape29,42 and tuber flesh color41 (Supplementary Fig. 3), suggesting the reliability of our database. This QTL database will provide useful genetic markers for molecular breeding and gene discovery in potato.

Systems genetics combines genetics with gene expression analysis to explain complex traits7. In this study, we integrated multi-omics analyses to explore potato traits using the F2 population. Considering that any changes in gene expression would lead to later changes in the accumulation and composition of metabolites in tubers at a final mature stage, we advanced the sampling time for the transcriptome analysis to better correlate gene expression with the final metabolite content (see Methods). This approach has been proven effective15,60. Combined with QTL results, we constructed a systems-genetics network in this F2 population. Based on this network, we efficiently identified the master regulator GAME9 controlling solanine accumulation. Although the systems-genetics network can help to narrow down the suite of candidate genes, the traits affected by variations on protein function require further study (fine mapping, etc.). However, since the F2 population has experienced only one round of recombination, the average resolution of QTLs is ~190 kb in this study, making it difficult to directly identify candidate genes. Additionally, the high genomic and phenotypic diversity in this F2 population leads to many minor QTLs, hampering the identification of major QTLs. For example, we identified seven tuber yield QTLs in this study. TY1 with the highest LOD value explains only 6.55% and 4.21% of the phenotypic variance in 2021 and 2023, while the total explanation of all seven QTLs is 25.4% and 24.4%, respectively. Thus, a near-isogenic line population based on backcrosses is necessary to further fine-map promising genes and reduce the genetic variance to elevate the genetic effect of targeted QTLs. The genome information of the parental lines enables us to track the source of advantageous alleles and select the more suitable parent for backcrossing. Combined with genetic markers and whole-genome resequencing, this will turn complex quantitative traits into simple qualitative traits.

Heterosis has been extensively applied in hybrid crop breeding, particularly in maize and rice38,61. However, heterosis of clonally propagated crops has rarely been analyzed. Using the parental lines and their F1 hybrids, our previous study revealed the multi-omics basis of potato heterosis32. Here, we further explored the genetic basis of heterosis of the elite potato hybrid. The dominance, recessive and additive effects are the principal single-locus heterotic effects of yield-related traits, metabolites and gene expression, respectively (Fig. 4 and Supplementary Fig. 10). We found that hybrid tubers contain more dry matter and fewer metabolites content and over 60% of mQTLs show negative heterotic effects (Fig. 4d). This phenomenon might be explained by a trade-off theory between dry matter and small-molecule metabolites. As an energy sink of potato, tubers are storage compartments for dry matter62 that serves as the nutrition provider for asexual reproduction. In heterozygous-genotype tubers, the distribution of energy or resource utilization seems more rational, and energy or resources might preferentially flow to fuel the synthesis of dry matter leading to a relatively lower small-molecule metabolites in hybrids. Similar results were also found in maize63 and Arabidopsis64. Moreover, we identified a PME gene with a dominance heterotic effect involved in male fertility in potato. The advantageous allele comes from A6-26 and confers better pollen viability and more seeds for hybrids, potentially explaining the dominant effect.

However, we did not identify epistatic interactions between the 24 yield-related QTLs in both years. To further identify the interactions on the whole genome-wide (all markers involved), backcross populations can be developed to reduce the genetic diversity within the population, because the interactions in the F2 population are complex and hard to be detected accurately65. This process can be carried out simultaneously with the construction of introgression lines. Therefore, we believe that for genetic studies on this population, F2-derived populations should be primarily constructed. This approach not only focuses on cloning genes associated with agronomically important traits but also aims to utilize key genetic loci for improvement of diploid inbred lines.

Overall, molecular breeding and functional genomics study in diploid potatoes are still in its infancy. This study provides valuable genetic and phenotypic resources for the potato community. By integrating multi-omics data to construct a systems genetics network, along with studies on heterosis, our findings in this work contribute to gene discovery and enhance our understanding of the genetic basis of heterosis in potatoes.

Methods

Plant materials

The parents and their F1 hybrid (A6-26 × E4-63) were developed in a previous study23. The homozygosity of the parents is 98.16% and 98.52% for A6-26 and E4-63, respectively. The F2 population was generated through selfing the F1 hybrid. The 1064 F2 individuals were randomly selected. Each F2 individual was planted on MS medium for permanent preservation by tissue culture.

Construction of the genotype map and genetic map

Genomic DNA of the 1064 lines was extracted from fresh leaves by the CTAB method. Whole-genome sequencing was conducted on the DNBSEQ platform at Annoroad Gene Technology company (Beijing, China). For each line, ~2 Gb clean data with 150-bp read length were generated. The clean reads were aligned against the E4-63 reference genome23 using BWA (0.7.5a-r405)66. GATK (v4.2.1.0) was used to mark duplicated reads67. We used SAMtools (v.1.9)68 and BCFtools (v.1.9)66 to extract SNPs. The SNPs between two parents were identified based on 50× DNA resequencing data32. The SNPs were further filtered with base quality ≥ 40, mapping quality ≥ 30 and 20 ≤ depth ≤ 100, and only “1/1” genotype was reserved. Then, these filtered SNPs were used to construct a parent SNP database. Only SNPs of F2 individuals in this database were reserved for subsequent analyses (use BCFtools parameter -R). The genotype map was constructed by calculating the genotype of each bin21,69. A total of 2475 bins were identified in the genotype map of 1064 individuals. For genetic map construction, we fed the bins of the genotype map to QTL IciMapping (v4.2)70.

Segregation distortion analysis

In the F2 population, the expected segregation ratios of zygotes and gametes are 1:2:1 and 1:1, respectively. The χ2 test was applied to determine the significance between the observed segregation ratio and the expected segregation ratio with a cutoff value of P = 0.001.

Collection of phenome data

Individuals used for collecting different traits are listed in Supplementary Data 1. In 2021 and 2023, the 1064 individuals were cultivated with three replicates in Kunming, China. Plant height, tuber yield, tuber number, and tuber size were assessed in both years, while flowering time and pollen viability were investigated in 2021. Yield-related traits (plant height, flowering time, tuber yield, tuber size and tuber number) and the qualitative traits (Supplementary Fig. 3) were measured manually. The individuals used for BSA are listed in Supplementary Data 1. For BSA, 20–50 individuals according to the different traits were selected as a pool (Supplementary Data 1). Using our previously developed high-throughput phenotyping facility71, 38 aboveground traits were identified with an RGB camera using two photo angles (from the top and from the sides) at 60 days after transplanting (Supplementary Data 1). With the RGB imaging device, six side-view images and one top-view image of each plant were taken from a fixed horizontal projection of equal angles. Using near-infrared hyperspectral imaging (1000–1700 nm, Headwall Hyperspec Starter Kit-VNIR, USA) and a visible-light hyperspectral imaging (400–1000 nm, Headwall VNIR A-Series, USA), the mature tubers were scanned, and 488 hyperspectral traits were analyzed using hyperspectral image-analysis pipeline72,73 (Supplementary Method 1). Mature tubers were scanned using the structured light imaging (Reeyee Pro, WIIBOOX, China), and five tuber traits were analyzed using structured light image analysis pipeline74 (Supplementary Method 1). Potato image processing, image trait extraction and definition of the macro-phenomics traits are explained in Supplementary Methods 1 and 2. For each F2 individual, three replicates were phenotyped in our experiment. The average values of all traits were used for the further genetic analyses.

To determine the elements regulating gene expression, we conducted transcriptome sequencing of developing tubers of 204 F2 individuals (80 days after transplanting) that were randomly selected. To ensure the accuracy of subsequent analyses, genes with low expression level (FPKM < 1) in over 90% of the 204 samples were filtered out. Finally, 19,166 expressed genes remained after our filtering steps, accounting for 40.4% of all annotated genes.

We applied an ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) approach to detect metabolite content in mature tubers of 215 F2 individuals (120 days after transplanting), of which 191 are shared with the transcriptome dataset. The other 24 samples were selected randomly. We thus identified 679 metabolites. The individuals used for detecting transcriptomes and metabolomes were selected randomly. The metabolome and transcriptome data of parents sampled at the same stage have been reported32.

Metabolite detection

For each sample, three mature tubers in 2021 were freeze-dried and mixed. The tuber extracts were analyzed using a UPLC-MS/MS system. The UPLC was equipped with a 1.8 µm, 2.1 mm * 100 mm Agilent SB-C18 column. The mobile phase included pure water with 0.1% formic acid (solvent A) and acetonitrile with 0.1% formic acid (solvent B). The column temperature was set to 40 °C and the injection volume was 4 μL. Metabolites were quantified by multiple reaction monitoring mode (MRM) of MS/MS with collision gas (nitrogen) set to medium. According to the metabolites eluted, a specific set of MRM transitions was monitored for each period. Analyst (v1.6.3) software (https://sciex.com/products/software/analyst-software) was used to assess and quantify the MS data.

Transcriptome profiling

Total RNA was extracted from three developmental tubers (mixed together) collected at 80 days after transplanting in 2021 and sequenced on the DNBSEQ platform at the China National GeneBank (Shenzhen, China). The raw data were filtered to remove low‐quality reads and adapters using SOAPnuke (v1.5.6)75. Then, the ~4 Gb clean reads for each sample were mapped to the E4-63 reference genome using HISAT2 (v2.1.0)76. Gene expression levels were quantified using FPKM values, calculated using StringTie (v2.1.1)77.

QTL mapping

QTL mapping was carried out by R/qtl using the composite interval-mapping method78. Bins with LOD values > 3.5 were selected. Consecutive QTLs (LOD > 3.5) were merged and considered as one QTL. Gene expression levels were normalized with quantile–quantile normalization.

eQTL analysis

The eQTLs were divided into local and distant eQTLs based on their distance from the corresponding genes. To identify local eQTLs, we selected the flanking bins of the peak-containing bin as the eQTL borders. If the distance of the interval of flanking bins is within 100 kb with the regulated gene, the eQTL was defined as a local eQTL; otherwise, it was considered a distant eQTL. The distant eQTL hotspots were identified by 10,000 permutation tests. In each permutation, all distant eQTLs were randomly assigned to each 1-Mb genomic interval. Then, numbers of distant eQTLs in each interval were counted to determine hotspots with a cutoff p-value < 0.01.

Correlation analysis

Pearson’s correlation coefficients and p-values were calculated using the R package ‘Hmisc’ using the ‘rcorr’ function79. The R package ‘corrplot’ was used to visualize the results79.

Construction of the systems-genetics network

First, WGCNA was performed to identify gene modules using the R package WGCNA80. The soft-thresholding power b was set to 16. We identified 21 gene modules. The online tool (https://cloud.metware.cn) was used to conduct GO and KEGG analyses. The “corPvalueStudent” function of WGCNA was used to analyze relationships of gene modules/genes and traits. The triple relationships were linked by QTLs. To ensure the reliability of results, only QTLs with LOD values > 5.0 were considered for triple-relationship analysis. We selected QTLs associated with both classes of traits (i.e. metabolites and yield-related traits, data from 2021) and gene expression, and we then merged QTLs within 30 kb. The triple relationships were visualized with Cytoscape software81. The WGCNA data coupled with the triple relationships comprise the systems-genetics network.

Evaluation of single-locus effects

The degree of dominance for each QTL was estimated by the ratio of dominant effect/additive effect (d/a). The dominant effect (d) rests on the difference between the heterozygous genotypes and mid-parent values. The additive effect (a) is determined by half the difference between the two homozygous genotypes. The genetic model was further partitioned into five types using the following standards40: (1) overdominance: d/a ≥ 1.25; (2) dominance (including partial dominance): 0.25 ≤ d/a < 1.25; (3) additive effect: −0.25 ≤ d/a < 0.25; (4) recessive effect (including partial recessive effect): −1.25 < d/a < −0.25; (5) underdominance: d/a ≤ −1.25.

Detection of pollen viability and pollen tube germination assay

The mature pollen grains were collected from freshly opened flowers. Then, the pollen grains were stained using 0.4% 2,3,5-triphenyltetrazolium chloride (Sangon Biotech) at 37 °C for 20 min. After staining, the pollen solution was dispensed onto a glass slide and the red pollen grains (good viability) were observed under the microscope.

Ca2+ and boric acid are necessary substances for the in vitro germination of potato pollen. In this study, we used a medium containing 17% sucrose, 0.05% CaCl2, 0.01% boric acid and 0.05 % agar. The pollen grains were incubated in medium at 28 °C in the dark for 20 hours, and the pollen tube germination was observed.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.