Background

First reported in Japan in 1910 [1, 2], rice sheath blight (ShB) is one of the most devastating fungal diseases threatening rice production worldwide [3,4,5]. The causal agent of ShB is the soil-borne, necrotrophic fungi Rhizoctonia solani Kühn [teleomorph: Thanatephorus cucumeris (A.B. Frank) Donk], which belongs to the division Basidiomycota and subdivision Agaricomycotina. R. solani is a species complex that is classified into 13 anastomosis groups (AGs) based on the ability of genetically similar isolates to undergo hyphal fusion (anastomosis) [6,7,8]. Isolates in each AG are further categorized into intraspecific groups (ISGs) based on differences in host range, pathogenicity, cultural morphology, and biochemical characteristics [6, 8,9,10]. For instance, isolates of ISG IA belonging to AG1 (AG1-IA) can infect members of the Poaceae family including rice, maize, and turfgrass [11,12,13] to cause ShB, banded leaf sheath blight, and brown patch disease, respectively [3, 6, 14, 15]. Multiple studies have used genomic [13, 16,17,18,19,20,21,22,23], transcriptomic [16, 24] and proteomic approaches [25, 26] to examine the molecular basis for R. solani pathogenesis. A 36.94-Mbp draft genome sequence for R. solani AG1-IA isolated from infected rice in South China was assembled into 2648 scaffolds (for an N50 scaffold size of 474.5 Kb) in which 6156 genes were annotated [16]. However, more high-quality genome sequences from multiple rice-infecting isolates are needed to robustly identify conserved genomic signatures of rice-infecting R. solani AG1-IA.

Pathogenic fungi secrete various proteins to promote successful infection by suppressing host defenses and/or manipulating the physiology of host cells [27, 28]. Accordingly, this suite of proteins determines both the lifestyle and host ranges of these fungi [27]. A subset of these proteins is cysteine-rich, small (≤ 300 amino acids) secreted proteins (SSPs) called effectors [29, 30] that can form disulfide bridges to stabilize their tertiary structure, making them more resistant to degradation [31,32,33]. Plant pathogenic fungi also secrete carbohydrate-active enzymes (CAZymes) that allow them to breach the plant cell wall and enter their hosts [34,35,36]. These CAZymes are categorized into modules: glycoside hydrolases (GHs), carbohydrate esterases (CEs), auxiliary activities (AAs), glycosyltransferases (GTs), polysaccharide lyases (PLs), and (non-catalytic) carbohydrate-binding modules (CBMs) [36, 37].

This study aimed to sequence four genomes of R. solani isolated from ShB-infected rice and conducted comparative genome analyses amongst them (R. solani AG1-IA) as well as to 4 R. solani genomes belonging to AGs aside from AG1-IA (AG1-IB, AG2, AG3 and AG8). We hypothesized that rice-infecting R. solani would possess a large arsenal of cell wall-degrading genes to support its necrotrophic lifestyle and host range. To test this hypothesis, we assembled high-quality genome sequences for four rice-infecting R. solani strains that were isolated from rice grown in diverse geographic regions of the world (USA, China, and India) and compared them to publicly available genomes belonging to R. solani AG1-IA, AG2-IB, AG2-2IIIB, AG3, and AG8 [13, 16, 17, 19, 20]. We also selected representative genomes encompassing different nutritional lifestyles and hosts from Basidiomycota (9 genomes) and Ascomycota (9 genomes) into our comparative analyses. In this study, pairwise whole-genome alignments suggest that macrosynteny exists among the rice-infecting R. solani genomes. This phylogenetic proximity is supported by a phylogenetic tree constructed using the maximum likelihood method as well as the existence of a larger set of core-orthogroups among rice-infecting AG1-IA genomes (5812 orthogroups) compared to core orthogroups of R. solani genomes from diverse R. solani AGs (3635 orthogroups). Comparative genome analyses also revealed that rice-infecting R. solani have a smaller set of SSPs compared to biotrophs and other necrotrophs (cereal). Conversely, rice-infecting R. solani genomes code for the highest number of CAZymes, which are predicted to be involved in plant cell wall modification and degradation. Specifically, all R. solani genomes used in this study, regardless of AG, were highly enriched in pectin-degrading genes, containing even more than the well-known pectin-degrading, necrotrophic fungus Verticillium dahliae. The high-quality genome sequencing data and comparative genomic results from this study are useful resources for functional analysis of pathogenicity genes in this important fungal pathogen of rice.

Results

High-quality genome sequences for rice-infecting R. solani AG1-IA isolates

Four rice-infecting R. solani isolates were collected from rice grown in USA (B2), India (ADB, WGL), and China (YN-7) (Table 1). De novo sequencing of the B2 genome was achieved using a Single-Molecule Real-Time (SMRT; Pacific Biosciences); the WGL, ADB, and YN-7 genomes were sequenced using Illumina technology and subsequently assembled using the B2 genome as reference (Additional file 1: Fig. S1). B2 had the largest genome of the four isolates (45.01 Mbp; 96 scaffolds), while YN-7 had the smallest (38.92 Mbp; 413 scaffolds). ADB (39.90 Mbp; 811 scaffolds) and WGL (39.98 Mbp; 724 scaffolds) were intermediate in size. Accordingly, B2 had the highest number of protein-coding genes (11,505 genes), followed by WGL (10,044 genes), ADB (10,010 genes), and YN-7 (9715 genes). Despite these slight variations in genome size and gene numbers, all four genomes had similar total GC contents (47.3 to 47.7%). Based on scaffold numbers, which ranged from 96 to 811, and N50 values, which ranged from 1.22 Mbp to 1.56 Mbp, the newly sequenced R. solani AG1-IA B2 genome is of higher quality than the previously sequenced R. solani AG1-IA genome (2648 scaffolds; 36.94 Mbp) [16]. Thus, all comparative genomic analyses hereafter used the B2 genome as the representative for rice-infecting R. solani AG1-IA isolates.

Table 1 Genome statistics of the four rice-infecting Rhizoctonia solani AG1-IA (B2, ADB, WGL and YN-7), AG1-IB, AG2-2IIIB, AG3 and AG8 isolates

Phylogenetic proximity of R. solani isolates

To evaluate the phylogenetic relationships of all 27 fungal genomes used in this study (Additional file 2: Table S1), we constructed a maximum likelihood-based phylogenetic tree using single-copy orthogroups (Fig. 1a). We confirmed that the four rice-infecting R. solani isolates were most phylogenetically related to each other and to the previously sequenced Chinese R. solani isolate, followed by AG1-IB, and finally the remaining R. solani AG isolates used in this study. In addition, comparison of the R. solani genomes indicated that the rice-infecting R. solani genomes share 5812 orthogroups of protein-coding genes while the genomes of different AG-ISG groups (AG1-IA B2, AG1-IB, AG2-2IIIB, AG3, and AG8) share only 3635 orthogroups (Fig. 1b). Of these orthogroups, 25 to 164 were specific to the genomes of rice-infecting R. solani AG1-IA while 318 to 3329 are AG-ISG specific.

Fig. 1
figure 1

Evolutionary closeness of the genomes of rice-infecting R. solani AG1-IA and to that of the selected fungal outgroups used in this study. a Single-copy orthogroup, maximum likelihood-based phylogenetic tree illustrating the evolutionary proximity of R. solani isolates relative other members of the Basidiomycota and Ascomycota. b Intra- and inter- anastomosis group comparisons of orthogroups shared among the genomes of R. solani depicted in Venn diagrams

Genome synteny between R. solani isolates

To determine the synteny between R. solani genomes, we performed pairwise whole-genome alignments using PROmer [38], a script-based pipeline to align multiple, divergent sequences and identify similar genomic regions based on the translation of all six reading frames. Large syntenic size (Fig. 2a) and diagonal dot plots (Additional file 3: Fig. S2) suggested that the five rice-infecting R. solani genomes (the four from this study and the previously sequenced Chinese isolate) share a high degree of genome conservation, ranging from 66 to 70.9% (Additional file 4: Table S2) when the B2 genome was used as reference for comparison. However, the degree of genome conservation between the B2 genome and those of different R. solani AGs (AG1-IB, AG2-2IIIB, AG3, and AG8) dramatically decreases to 3.3 to 9.6%, suggesting that the AG1-IA genome is quite divergent from other AGs.

Fig. 2
figure 2

Genomic level synteny between R. solani anastomosis groups and proteome-level conservation of genes among and between the selected comparison groups. a Circos plot depicting the syntenic region size of all R. solani genomes used in this study. The outer block represents accumulated syntenic region size in Mbp calculated by PROmer. Red and blue blocks and ribbons represent AG1 and the rest of the anastomosis groups, respectively. b Comparison of the protein-coding gene proximity of five closely-related groups. Each consists of protein-coding genes of intra-AG-IA (rice-infecting R. solani AG1-IA), inter-AGs (AG1-IA B2, AG1-IB, AG2, AG3 and AG8), Basidiomycetes (Piriformospora indica, Pleurotus ostreatus, Armillaria ostoyae, Heterobasidion irregulare, Dacryopinax sp.), Ustilago and Trametes. The asterisks represent significant differences in distribution according to the t-test (P over 0.05, ∗∗∗P ≤ 0.001)

Protein-coding gene conservation among R. solani isolates

To determine the degree of proteome similarity in AGs, we performed pairwise ortholog clustering to compare each protein sequence of protein-coding genes. The number of shared orthologs ranged from 8798 to 9723 in the protein-coding genes of R. solani AG1–1A isolates (Additional file 5: Table S3). The average proteome similarity of intra-AGs of R. solani was 89.69%. In contrast, the protein-coding gene similarity of inter-AGs of R. solani was more diversified, wherein the percentage of shared predicted proteomes in inter-AGs was averagely 68.36%. Moreover, in order to determine protein-coding gene similarity of genomes belonging to Basidiomycota, we added four Ustilago and four Trametes genomes along with five Agaricomycetes genomes used for phylogenetic analyses (Additional file 6: Table S4). R. solani genomes of inter-AGs (compared with B2 strain) were larger than that of different Basidiomycota genus-group (54.53%). On the contrary, intra-genus group under Basidiomycota showed high protein-coding gene similarity compared to inter-AG (Ustilago intra-genus; 88.08%, Trametes intra-genus; 83.41%) (Fig. 2b). In comparison of single copy orthologs, discrepancy of inter AGs were increased (intra-AG1 IA; 63.89%, inter-AGs; 33.27%).

Transposable element profiles of the rice-infecting R. solani isolates

Transposable elements (TEs), such as class I retrotransposons and class II DNA transposons, can create temporary or permanent genomic rearrangements and modifications [39], and the abundance and frequency of these genetic elements can significantly influence the size of eukaryotic genomes [40]. To define the repetitive element profiles for the rice-infecting R. solani genomes, we analyzed the type and proportion of repetitive elements in each newly sequenced genome. The B2 genome contains the largest proportion of TEs (26.74%) compared to three other R. solani AG1-IA genomes; ADB (8.89%), WGL (9.16%), and YN-7 (6.18%), respectively (Additional file 7: Table S5). The number of total TEs for the ADB and WGL genomes are comparable (ADB: 10,421, WGL: 10,248), yet less than that of the B2 genome (17,123) and more than that of YN-7 (8030). Specifically, the B2 genome contains the highest proportion of DNA transposons, Long Terminal Repeats (LTRs) and Long Interspersed Nuclear Elements (LINEs) among the R. solani AG1-IA genomes, wherein the proportion of LTRs in B2 (20.14%) was identified to be more than 3 times of that of ADB (5.47%), WGL (5.65%) and YN-7 (3.84%). However, all four newly sequenced AG1-IA genomes as well as the previously sequenced AG1-IA genome possess lower numbers of TEs compared to AG1-IB, AG2-2IIIB, AG3 and AG8. In terms of average length of repetitive sequences, the PacBio-sequenced B2 genome contains approximately twice as long (797.4 bp) compared to the Illumina sequenced ADB (370 bp), WGL (385.6 bp), YN-7 (345.2 bp) in this study and other previously sequenced genomes of AG1-IA, AG1-IB, AG2-2IIIB, AG3 and AG8.

Predicted secretome of rice-infecting R. solani isolates

The putative secretomes of the rice-infecting R. solani isolates were analyzed, and 818 to 888 predicted secreted protein genes were identified (Fig. 3a, Additional file 8: Table S6). This suggests that rice-infecting R. solani isolates have a secretome that is intermediate in size smaller than those of necrotrophic ascomycetes (cereal) but larger than those of brown-rot fungi Postia placenta and Dacryopinax sp. as well as biotrophs Ustilago maydis and Blumeria graminis. The number of small secreted proteins (SSPs; putative effectors) in the four genomes ranges from 263 to 279, accounting for 30–33% of each isolate’s individual predicted secretome. We identified 367 R. solani specific orthogroups of SSPs. Among these specific SSPs, 12 (AG1-IB) to 105 (AG8) AG-specific SSPs were identified through ortholog comparison analysis of putative SSP gene sets (Additional File 9: Table S7), and the greatest number of specific SSPs were identified in AG8. We also observed that Rhizoctonia AG1-IA genomes have relatively small predicted protein-coding genes and SSPs are shown to be necrotrophic fungal groups (Fig. 3b).

Fig. 3
figure 3

Distribution of small secreted proteins in R. solani isolates and other fungal species. a The number of small secreted proteins in the total secretome of each fungal genome. Gray and red bars represent the size of secretome and the number of small secreted proteins, respectively. b The number of SSPs in relation to the number of total protein-coding genes. Red, blue, gray dots represent genomes belonging to intra-AGs, inter-AGs, and other fungal species. c The heatmap shows the conservation of 272 SSPs in B2 against the other R. solani genome sequences. Exonerate 2.4.0 was utilized to perform protein to genome sequence alignments of the SSPs

Furthermore, using the 272 SSPs identified in B2, we performed alignment of their protein sequence to the genomes of R. solani AG1-IA, AG1-IB, AG2-2IIIB, AG3, and AG8. The B2 SSPs possess a high degree of homology amongst rice-infecting AG1-IA genomes, whereas they showed decreased homology to genomes of other R. solani AGs (Fig. 3c, Additional file 10: Table S8).

Predicted CAZyme genes of the rice infecting R. solani isolates

We identified cell wall degrading enzymes by searching for each of the different CAZyme gene families across all 27 fungal genomes (Fig. 4). These fungal genomes were then categorized into 11 groups considering their nutritional lifestyle and type of host. A chi-square test of proportions was then used to determine whether gene frequency variations between genomes of each grouping were significant (Additional file 11: Table S9 and Additional file 12: Table S10). Our analyses indicated that there was significant variation across all 11 groups for all CAZyme gene families except GTs. Rice-infecting R. solani genomes had the highest enrichment of CAZyme genes (725 genes) while the genomes of other R. solani AGs, necrotrophs (cereal and dicot) showed only moderate enrichment for these genes. In contrast, biotrophs and brown rot genomes contain a relatively low number of CAZymes compared to rice-infecting R. solani genomes.

Fig. 4
figure 4

Distribution of gene families in rice-infecting R. solani AG1-IA isolates and the fungal outgroups used in this study. Phylogenetic tree with information of contracted and expanded gene families. Abundance of genes in carbohydrate-binding module (CBM), glycoside hydrolase (GH), carbohydrate esterase (CE), glycosyltransferase (GT), polysaccharide lyase (PL) and auxillary activity (AA) families. Expansion and contraction of enriched pectin lyase and pectate lyase (PL/PNL: PL1–1 (EC 4.2.2.2), PL1–2 (EC 4.2.2.10), PL3–1 (EC 4.2.2.2), PL4, PL9–1 (EC 4.2.2.2)), polygalacturonase (PG: GH28–1 (EC 3.2.1.15), GH28–2 (EC 3.2.1.67)), pectin methylesterase (PME: CE8) pectin acetylesterase (PAE: CE12–1 (EC 3.1.-)), and other GHs (GH105–1 (EC 3.2.1.172), GH88–1 (EC 3.2.1.-), GH78–1 (EC 3.2.1.40)) in all 27 fungal genomes used in this study indicated. Red circle indicates the gain EC in R. solani monophyletic

Lignocellulose-degrading genes in rice-infecting R. solani isolates

To ascertain whether rice-infecting R. solani isolates can degrade lignocellulose in a similar fashion to fungi in the subdivision Agaricomycotina, we specifically searched for AA family-encoding genes (Additional file 13: Table S11). Genomes of white-rot fungi and other R. solani AGs had the highest total number of CAZyme genes (131 genes), followed by necrotrophs (cereal) (128 genes) and rice-infecting R. solani (121 genes). In contrast, brown-rot and biotroph genomes completely lacked these lignin depolymerization genes. We also examined the genomes to identify genes belonging to individual AA subfamilies. Genes belonging to the AA1 subfamily (EC 1.10.3.2) were most abundant in white-rot fungi but could also be found in rice-infecting R. solani isolates, hemibiotrophs, and necrotrophs (cereal); however, the presence of these AA1 genes was significantly lower in brown-rot fungi and biotrophs. In addition, while brown-rot and rice-infecting R. solani genomes did not contain any representatives of the AA2 subfamily, these genes were present in the genomes of necrotrophs (cereal) and white-rot fungi; the latter were highly enriched in manganese peroxidase (MnPs; 1.11.1.13) and versatile peroxidase (VPs; 1.11.1.16) genes. We also observed enrichment of AA8 subfamily genes in the genomes of rice-infecting R. solani isolates, while white-rot genomes had only 1 or 2 AA8 genes and cereal necrotroph genomes had none (except for Septoria nodorum, which had 1). Finally, the AA5 subfamily, which was absent in brown-rot fungi, was significantly enriched in other R. solani AGs and to a lesser extent in rice-infecting R. solani and white-rot fungi.

We assessed the abundance of cellulose-degrading genes across all genomes in a similar fashion (Additional file 13: Table S11). Endoglucanase genes (GH3) were equally abundant in rice-infecting R. solani isolates and necrotrophs (cereal). However, exo-1,3-β-glucanase (3.2.1.58, GH5), endo-1,6-β-D-glucanase (3.2.1.75, GH5), and cellulose 1,4-β-cellobiosidase (3.2.1.176, GH7) genes were enriched in rice-infecting R. solani isolates but absent in necrotrophic fungi (cereal). Strikingly, rice-infecting R. solani genomes were highly enriched in genes encoding starch-degrading α-amylase (3.2.1.1, GH13). Moreover, genes encoding endo-1,4-β-D-glucanohydrolase (3.2.1.4, GH5), which causes endohydrolysis of (1 → 4)-β-D-glucosidic linkages in cellulose, lichenin, and cereal β-D-glucans, were enriched in rice-infecting R. solani and S. nodorum. Finally, genes encoding lytic polysaccharide monooxygenases (LPMOs) of the AA9 subfamily were significantly enriched in other R. solani AGs, though they were also present in rice-infecting R. solani. In contrast, biotrophs and brown-rot fungi had, at most, one AA9 gene.

Pectin-degrading and modifying genes in rice-infecting R. solani

Extensive enrichment of genes belonging to the PL family was observed in both rice-infecting (82 genes) and other R. solani AGs (76 genes) (Additional file 13: Table S11). Otherwise, only necrotrophic (dicot) V. dahliae was predicted to have an enriched number of PL genes, though it only had approximately half as many PL genes as R. solani. In-depth analyses of the PL subfamilies revealed that genes encoding members such as pectate lyases (EC 4.2.2.2.2; PL1, 3, 9), pectin lyases (EC 4.2.2.10; PL1), and rhamnogalacturonan endolyases (EC 4.2.2.-, PL4) were significantly enriched in rice- and other R. solani AGs.

GH genes that encode pectin-degrading enzymes like unsaturated rhamnogalacturonyl hydrolase (EC 3.2.1.172, GH105), unsaturated β-glucuronyl hydrolase (3.2.1.-, GH88), and α-L-rhamnosidase (3.2.1.40, GH78) were analyzed in a similar fashion. GH28 genes were more enriched in rice-infecting R. solani than in necrotrophs (cereal) and V. dahliae. Specifically, 13 to 14 polygalacturonase (PGs) (EC 3.2.1.15, GH28) genes were found in AG1-IA genomes and other R. solani AG genomes while V. dahliae only had 2 PG genes. Exo-PGs (EC 3.2.1.67, GH28) were similarly abundant and enriched in R. solani and V. dahliae. Meanwhile, CE genes that encode pectin-modifying enzyme genes, including pectin methylesterases (PMEs; CE8) and pectin acetylesterases (CE12; EC 3.1.1.-), were most abundant in all R. solani genomes. All of the enriched genes belonging to the homogalacturonan modification genes (PAE, PME, PG, and PL/PNL) have largely expanded when the R. solani divergence occurred (Additional file 14: Fig. S3).

Genes for monocot-specific cell wall degrading enzymes

To determine whether rice-infecting R. solani isolates have any CAZyme genes that allow them to infect their monocot host, we searched for CAZyme genes that degrade arabinoxylans, ferulic acids, and mixed linked glucans (MLGs), such as α-L-arabinofuranosidases (EC_3.2.1.55), feruloyl esterases (EC_3.1.1.73), and (1,3;1,4)-β-D-glucan endohydrolases/licheninase (EC 3.2.1.73), respectively (Additional file 15: Table S12). α-L-arabinofuranosidase genes were enriched in necrotrophs, including R. solani AG1-IA and other R. solani AGs, but not in white- and brown-rot fungi. However, while feruloyl esterase genes were enriched in necrotrophs (cereal), symbionts, and hemibiotrophs, none were identified in rice-infecting R. solani. While the raw data suggested that there was an enrichment in genes encoding (1,3;1,4)-β-D-glucan endohydrolases/licheninases in rice-infecting R. solani genomes, our chi-square test failed to reject the null hypothesis, so the proportion of (1,3;1,4)-β-D-glucan endohydrolases/licheninases genes across the different fungal genomes is likely similar.

Prediction of secondary metabolite biosynthesis gene clusters

antiSMASH [41] was used to identify putative secondary metabolite biosynthesis gene clusters for polyketide synthase (PKS), terpene synthase (TS), non-ribosomal peptide synthetase (NRPS), and other accessory enzymes in rice-infecting R. solani isolates (Additional file 15: Table S12). However, none of the secondary metabolite biosynthesis gene clusters predicted for rice-infecting R. solani isolates or members of related AGs contained PKS genes. In contrast, there was an abundance of secondary metabolite-producing enzymes predicted for necrotrophic ascomycetes (cereal), including type 1 PKS, type 2 PKS, NRPS, and TS. Despite the low abundance and diversity of putative secondary metabolite biosynthesis gene clusters in R. solani genomes, R. solani AGs and necrotrophic ascomycetes (cereal) have similar levels of terpenes, which suggests that secondary metabolites may not be important for R. solani virulence.

Discussion

De novo and reference-based genome assemblies of rice-infecting R. solani isolates

Both SMRT and Illumina sequencing technologies were used in the de novo sequencing of the B2 genome, which allowed us to utilize (1) the ability of SMRT sequencing to generate long read lengths (5 to 20 kb) and precisely capture genomic regions containing repetitive elements and novel gene isoforms [42]. Utilizing these sequencing approaches facilitated assembly of the 45-Mbp B2 genome, which is much larger than a previously published 36.94-Mbp R. solani AG1-IA genome and has a relatively high proportion of repetitive sequences. In addition, it also provided an opportunity to accurately annotate the TE content of the B2 genome. Upon comparing to both newly and previously sequenced R. solani AG1-IA genomes as well as to genomes of other AGs, the B2 genome was found to possess the highest proportion of TEs and the longest average length of TEs. Thus, the higher quality B2 genome generated from the two sequencing methods allowed us to more accurately annotate the R. solani AG1-IA genomes for detailed comparative genome analyses of this important fungal pathogen.

Genomic differences among R. solani isolates

In previous studies, R. solani isolates have been classified based on their ability to hyphal fuse or through sequence analysis of phylogenetic markers [9, 43, 44] but whole-genome comparisons were not available. Here, we compared five R. solani AG1-IA genomes and four neighboring AG representative genomes and found out that genomic drastically decreased as the comparisons were made from among AG1-IA genomes to among different AG group genomes. Moreover, the similarity of inter-AGs was lower than other Basidiomycota genus-groups. This phylogenomic result shows R. solani species complex reflects multi-species feature in genome contents.

Small set of predicted putative effectors in rice-infecting R. solani isolates

It has been shown that necrotrophic fungi have fewer effectors than biotrophs [45]. Along this line, we found that the newly sequenced rice-infecting R. solani AG1-IA genomes had relatively small set of effectors among fungal genomes analyzed in this study. Previous reports suggest that the broad host range of R. solani is not dependent on the size of its secretome [46] but rather on the secretion of specific effectors than can infect a variety of different hosts, as is the case with necrotroph Sclerotinia sclerotiorum [47]. Furthermore, it has been reported that Colletotrichum pathogens from different clades have tailored suites of CAZymes that are specific to their individual host range and infection lifestyle [48]. We speculate that the relatively low number of SSPs (putative effectors) may be compensated by the diverse and large arsenal of CAZymes of rice-infecting R. solani, allowing them to be a competitive pathogen of broad host range. Additional bioinformatic and functional genomics analyses must be conducted in order to dissect the role of SSPs in rice-infecting R. solani genomes. However, we have provided evidence that the SSPs among the rice-infecting R. solani genomes share high homology and decrease homology of these SSPs among genomes of different R. solani AGs suggest that the SSPs of from R. solani species complex as a whole may be diverse that previously expected.

Lignocellulose-degrading CAZyme genes in rice-infecting R. solani isolates

Essential for cell growth and differentiation, cell walls are primarily comprised of cellulose, hemicellulose, pectin, and lignin [49,50,51,52,53], which also provide plants with resistance to and protection from biotic and abiotic stresses [54,55,56]. Plant biomass-degrading fungi that inhabit diverse ecological niches such as forest litters, trees, crops, and grasses of the subdivision Agaricomycotina [34] are classified as either white- or brown-rot [57] based on their ability to degrade lignin. White-rot fungi use oxidative enzymes [58, 59] like glyoxal oxidases to efficiently depolymerize lignin and class II heme peroxidases of the AA2 subfamily, such as MnPs and VPs, to degrade the lignin matrix and expose the embedded cellulose [60, 61]. While we observed enrichment of glyoxal oxidase-encoding genes in rice-infecting R. solani genomes, we could not identify any peroxidase genes in the rice-infecting R. solani or brown-rot genomes. This apparent lack of peroxidases may explain why brown-rot fungi rapidly degrade cellulose but leave behind a chemically modified lignin matrix for long-term degradation by other microbes [62,63,64].

While a relationship between modes of wood decomposition and CAZyme families exists, some wood-decaying fungi cannot be strictly categorized as white- or brown-rot, suggesting that a continuum may exist between these two types of wood-decay fungi [57]. Rice-infecting R. solani isolates were enriched in lignocellulose-degrading CAZyme genes as well as strong cellulose-degrading enzymes, which are hallmarks of white- and brown-rot fungi, respectively. Hence, we hypothesize that R. solani may fall along the continuum between these two types of wood-decay fungi. Furthermore, the abundance of oxidoreductase and iron reductase genes in the R. solani genomes may indicate that these enzymes are involved in the production of hydroxyl radicals to drive lignocellulose attack, thereby working with crystalline cellulose-degrading LPMOs to expose the complex lignocellulose structure for further cell wall degradation by other CAZymes.

Enrichment of pectin-degrading enzymes in rice-infecting R. solani isolates

Pectin is a structural heteropolysaccharide with a 1,4-α-D-galacturonic acid (GalA) backbone that contributes to the mechanical strength of plants [65,66,67] by forming a gel-like matrix that interacts with cellulose and hemicellulose in the primary plant cell wall [51, 68]. It is present in higher proportions in dicot (type I) cell walls than in monocot (type II) cell walls [50], and some reports suggest that dicot-specific fungal pathogens have higher amounts of pectin-degrading enzymes than monocot-specific pathogens [69, 70]. Pectin is classified based on the degree of methoxylation, methylesterification, and/or acetylation of its backbone [71, 72]. Homogalacturonan (HG), a polymer of 1,4-linked α-D-galactopyranosyluronic acid, exists in methylesterified or acetylated form in the primary cell walls of plants [65], and pectin methylesterases (PMEs; EC 3.1.1.11) and pectin acetylesterases (PAE; EC 3.1.1.6) catalyze the demethylesterification and deacetylation of HG, respectively [73,74,75]. This process yields substrates for PGs [76], pectin lyases (PNLs; EC 4.2.2.10), and pectate lyases (PLs; EC 4.2.2.2), which loosen the cell wall [77]. Demethylesterification of HGs can also lead to ‘egg-box formation’ and cell wall stiffening caused by the interaction of negatively-charged demethylesterified HG and divalent cations such as calcium ions [77], which may explain the stiff, hollow-textured stem phenotypes observed in ShB-susceptible rice cultivars but not in moderately ShB-resistant rice cultivars (Lee et al., unpublished data). The significant enrichment of PMEs, PAEs, PGs, PNLs, and PLs in R. solani genomes suggests that these pathogens may have evolved a diverse suite of pectin depolymerization enzymes that allow them to efficiently breach host cell walls. These expanded homogalacturonan modification genes are known to have the enzymatic activity of cell wall loosening roles in the infection process. Similarly, mutations in Arabidopsis thaliana pectin methylesterase 35 (PME35) lead to the suppression of HG demethylesterification and a concomitant increased stem deformation rate, supporting the essential role of pectin in maintaining the integrity of the plant cell wall and supporting the plant’s mechanical properties [65].

Gene family expansions and contractions are the signatures of an organism’s adaptation to new ecological niches [78], as exemplified by the presence of numerous pectin-degrading genes in rice-infecting R. solani and neighboring R. solani AGs. However, dicot-specific fungal pathogens may not necessarily have more specialized pectin-depolymerizing enzyme suites than monocot-specific pathogens, as genes encoding HG-modifying enzymes are more highly enriched in rice-infecting R. solani isolates than in the dicot-specific pathogen V. dahliae. This enrichment in pectin degrading genes indicates that R. solani can degrade a wide range of pectic substrates and polysaccharide linkages, allowing it to use multiple virulence mechanisms to invade a variety of hosts. Moreover, the large and diverse suite of pectin-degrading enzymes in rice-infecting R. solani isolates may not have evolved in response to the amount of pectin in host plant cell walls but rather as an efficient mechanism for loosening plant cell walls, breaking crosslinks with other cell wall components, and dissolving plant tissue.

Monocot-specific degrading genes in rice-infecting R. solani isolates

Previous studies have shown a correlation between cell wall composition and host specificity, suggesting that plant pathogens produce host-specific cell wall degrading enzymes [79, 80]. Accordingly, monocot-specific fungal pathogens are adept at hydrolyzing monocot cell walls while dicot-specific fungal pathogens are better at degrading dicot cell walls [81]. Non-cellulosic polysaccharides such as arabinoxylans, MLGs, and hydroxycinnamates such as ferulic acids, which are enriched in grass cell walls but either limited or absent in dicot cell walls [50, 82,83,84], are degraded by α-L-arabinofuranosidases (EC 3.2.1.55, GH51) [81, 85], licheninases [86], and feruloyl esterases (EC 3.1.1.73) [34]. However, we did not observe significant differences in the number of monocot-specific cell wall degrading CAZyme genes present in the different rice-infecting R. solani genomes, which may indicate the broad host range of this necrotrophic fungus [13].

Secondary metabolite biosynthesis clusters in R. solani isolates

Previous studies suggest the association of loss of secondary metabolite genes with biotrophy [87, 88]. However, most of the pathways for secondary metabolite synthesis in the biotrophic fungus Cladosporium fulvum were revealed to be cryptic [89]. Despite our results suggest that R. solani possess limited number of secondary metabolite biosynthesis clusters, further research on expression analyses of the putative secondary metabolite genes and those identified along with metabolite extraction and chromatography will be needed. These analyses will provide conclusive evidence about the extent of involvement of secondary metabolite genes in the lifestyle and pathogenicity of R. solani.

Conclusion

In this study, we analyzed the cell wall degrading enzyme profiles of four newly sequenced rice-infecting R. solani genomes. Comparative analyses of these rice-infecting R. solani genomes can help identify cell wall degrading mechanisms, such as homogalacturonan modification, that are utilized by this necrotrophic, rice-infecting ShB pathogen. With more and more R. solani genomes are sequenced in the future, reclassification of this fungal pathogen should be discussed and implemented. Moreover, our findings, along with the high-quality genome sequences of rice-infecting isolates of R. solani AG1 IA, provide additional genomic resources that can be used to further our understanding of the pathobiology of this necrotrophic fungal pathogen.

Methods

Sources of R. solani AG1-IA isolates

Four R. solani AG1-IA isolates were collected from rice cultivars grown in the USA, India and China with famers’ permission. B2 was recovered from Jerry Bogard Farms, Stuttgart, Arkansas, USA. ADB and WGL were recovered from Srinivasa Rao Farms in Adilabad and Bose Reddy Farms in Warangal in Telangana State, India, respectively. YN-7 was recovered from Zongliang Chen Farms in Yangzhou, China.

Fungal DNA extraction of R. solani isolates

Hyphal tip isolation and culture maintenance of R. solani isolates were conducted using Potato Dextrose Agar (PDA). R. solani hyphae-containing agar blocks were isolated from the actively growing mycelial portion of the fungus and cultured in the dark in liquid Potato Dextrose Broth (PDB) at 25 °C on an orbital shaker (150 rpm) for 4–5 days. Mycelia were filtered using sterile Miracloth (Millipore, Sigma, Burlington, MA, USA), rinsed with sterile distilled water, and frozen in liquid nitrogen. Genomic DNA (gDNA) was extracted using a DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA), and the resulting DNA pellet was resuspended in 10 mM Tris-HCl (pH 8.0) buffer. The quality of the isolated gDNA was assessed using agarose gel-based electrophoresis while the total DNA concentration was calculated based on UV-Vis measurements on a Nanodrop spectrophotometer. Isolated gDNAs were sent to the National Instrumentation Center for Environmental Management (NICEM), Seoul National University, Korea, for SMRT and Illumina-based sequencing.

Genome sequencing, assembly, and annotation

A PacBio sequencing assemblage strategy was used to assemble the B2 genome. Raw PacBio RSII sequence reads were assembled and corrected using Canu v2.0 [90] and trimmed using Circlator v0.14.0 [91]. All resulting contigs were then joined before the Redundans assembly pipeline [92] and Pilon v1.22 [93] was used to improve the final draft genome.

For Illumina-based sequencing and assembly of the ADB, WGL, and YN-7 genomes, contigs were aligned to the PacBio RSII-sequenced B2 genome. As with B2, the Redundans assembly pipeline was used to process Illumina sequencing reads, but with an additional step: filtered Illumina reads by FASTQC v0.11.6 [94] were assembled using Jellyfish v1.1.5 [95]. Alignment and assembly were performed using SOAPdenovo2 [96] and Velvet v1.2.10 [97]. Gap-filling within scaffolds was achieved using GapCloser v1.12-r6 [98]. The resulting sequences were joined and corrected using Pilon v1.22. AUGUSTUS v3.2.2 [98] was used to make gene annotation predictions based on deposited six different protein-coding genes of R. solani in NCBI (AG1-IA, AG1-IB, AG22IIIB, AG3 Rhs1AP, AG3 123E, and AG8 WAC10335). To assess the repetitive element content of the newly sequenced R. solani AG1-IA isolates, RepeatScout [99] was used to predict de novo consensus repetitive element families in all the R. solani genomes used in this study. Through this approach, all low-complexity sequences, tandem repeats as well as repeat elements which contain less than 10 repeats sequences were filtered out. The resulting consensus repeat elements were then classified using TEclass [100] and mapped using RepeatMasker v4.0.7 [101].

Comparative analyses and ortholog clustering

For pairwise genomic comparisons, MUMmer v3.23 [38] was used to align and compare the whole genome sequences of R. solani isolates. PROmer, a built-in MUMmer package that generates and aligns translations of all six reading frames for genome sequences of interest, was used to determine the extent of synteny between the genomes used in this study. OrthoFinder v 2.2.7 [102] was used for ortholog clustering to sort out single-copy gene families that would be the most phylogenetically informative. Single-copy ortholog genes in all fungal species were then aligned using ClustalW v2.1 [103], and poorly aligned regions were removed using trimAl v1.2 with the strict method [104]. RAxML v8.2.8 [105] and a bootstrap value of 1000 was used to construct a maximum likelihood-based phylogenetic tree. Ortholog genes were annotated with Gene Ontology (GO) annotation using Interproscan v5.20 [106].

Gene family analyses

Genes encoding plant cell wall degrading enzymes were predicted and categorized using dbCAN HMMER v6 [107]. Each EC gene was collected from classification of CAZyDB-ec-info.txt.07-20-2017. Each classified group from dbCAN was subdivided using EC classification using BLAST 2.2.26. Aligning EC classified protein sequences using ClustalW 2.1 and removal of poorly aligned regions by trimAl v1.2 were preceded before phylogenetic analysis. Phylogeny trees were constructed using RAxML version 8.2.9 with a bootstrap value of 1000. We reconciled the gene tree resulting from this analysis with the species tree using NOTUNG 2.6 [108]. The secretome data of selected species were obtained from the Fungal Secretome Database (FSD) [109]. The database detects all possible secreted proteins by eliminating proteins with transmembrane or endoplasmic reticulum domains and using SignalP 3.0 [110]. The SSPs were then selected from each fungal secretome, considering proteins with a length shorter than 300 amino acids, as previously described [45]. Exonerate 2.4.0 was utilized to perform protein to genome sequence alignments of the effectors among R. solani genomes [111]. Genes encoding laccases and peroxidases were predicted using fPoxDB [112], while putative secondary metabolite biosynthesis gene clusters were identified using antiSMASH v3.0 [41], and the P450 database [113] was searched to predict cytochrome P450 genes in each genome. Transcription factors were identified using the Fungal Transcription Factor Database (FTFD) pipeline [114], which utilizes data from Interpro v12 [115].

Statistical analysis

Chi-square tests of proportions

Chi-square tests of proportions for comparative analyses of CAZyme secondary metabolite biosynthesis clusters were performed using R [116].