Background

Small RNA-mediated RNA silencing is a conserved mechanism that regulates various bioprocesses in eukaryotes [1]. Two types of endogenous small RNAs, microRNAs (miRNAs) and small interfering RNAs (siRNAs), are highly abundant in plants.

The biogenesis of a miRNA in plants begins with the transcription of a primary miRNA (pri-miRNA). Next, an RNase III family of DICER-LIKE (DCL) enzyme, usually DCL1, sequentially processes the pri-miRNA into a precursor (pre-miRNA) and further cut into a miRNA/miRNA* short duplex with the help of HYPONASTIC LEAVES 1 (HYL1) and SERRATE (SE). This results in one strand (miRNA*) of the short duplex is degraded and the mature miRNA strand is incorporated into the RNA-induced silencing complex (RISC), which is highly complementary to the target gene and subsequently leads to the cleavage of the target mRNA followed by its degradation in plants [2]. Recently, research indicated the critical roles of miRNAs in various biological processes in plants, such as growth and development, stress response and plant metabolism [3, 4]. For example, OsmiR393a and OsmiR393b regulated rice primary root elongation and adventitious roots number via auxin signaling pathway [5]. The miR398 directly linked to the Arabidopsis stress regulatory networks such as oxidative stress,water deficit, salt stress, etc. [6].

In terms of siRNAs, their biogenesis could be triggered either endogenously by its genetic events or exogenously causes, such as virus infection or transgenic operation [7]. In contrast to miRNA, the precursor of siRNAs are usually long and double-stranded [7]. Recent years, researchers found the biogenesis of some siRNAs are “triggered” by miRNAs-mediated cleavage. Fragments resulted from mRNA cleavage are typically subjected to rapid degradation. However, a small proportion of the fragments will survive and subsequently be processed into double-stranded RNA (dsRNA) by RNA-dependent RNA polymerase 6(RDR6) with the aid of Suppressor of Gene Silencing 3 (SGS3). These double-stranded fragments will further be cleaved by Dicer-like (DCL) proteins in different phased manners to produce a series of 21- or 24-nt siRNAs, termed phased small interfering RNAs (phasiRNAs) [8].

SiRNAs in 21-nt length regulates gene expression by cleaving complementary transcripts the same as miRNA-mediated cleavage in plant. The best-characterized phasiRNAs are TAS loci-derived 21-nt trans-acting siRNAs (tasiRNAs) in Arabidopsis. Research discovered that miR173 targeted to TAS1 and TAS2 and resulted in the production of tasiRNAs. Interestingly, some of these tasiRNAs continued to recognize target transcripts to produce tertiary phasiRNAs [9]. The TAS1- and TAS2-derived tasiRNAs were involved in regulation of stress responses, such as improvement of thermotolerance [10], maintaining the normal morphogenesis of flowers in plants under drought stress conditions [11]. The biogenesis of TAS3-derived tasiRNAs were triggered by the miR390 recognition [12]. These tasiRNAs targeted to ARF family members which regulates various biological processes, including embryo development, thermotolerance developmental transitions, leaf morphology, flower and root architecture and stress responses [13, 14]. Besides, report showed TAS4-derived tasiRNAs induced by miR828 regulated anthocyanin biogenesis via repression of MYB genes [15]. For siRNAs in 24-nt length, researches revealed that they were key players in triggering of RNA-directed DNA methylation (RdDM) [16], which is the major small RNA-mediated epigenetic pathway that causes transposable element repression and transcriptional gene silencing (TGS) in plants [17]. For example, recent research discovered that the distribution of 24-nt siRNAs differs in rice gametes (sperm and egg), as well as from vegetative tissues, which further suggest a major difference in reprogramming of their genomes prior to fertilization [18].

Different algorithm and software tools have been employed not only in mining the novel miRNA-phasiRNA pathways, but in exploring the miRNAs’ extended regulatory networks [19]. Current research discovered two miRNAs, miR2118 and miR2275, were mainly responsible for the triggering of 21-nt and 24-nt phasiRNAs biogenesis, respectively [20]. And subsequent reports were then focused on the investigation of miR2118-phasiRNA and miR2275-phasiRNA biogenesis pathways and their biological functions [21, 22]. To our knowledge, about 56 phasiRNA precursors (PHAS loci) have been identified in rice. For PHAS loci in other economic crops, approximately 261 PHAS loci in Zea mays (maize), 916 PHAS loci in Setariaitalica (foxtail millet), 201 PHAS loci in Solanumtuberosum (potato), and 123 in Solanumlycopersicum(tomato) have been discovered, respectively [23]. Besides, in addition to those non-coding regions in genome, protein-coding genes could also be the PHAS loci in plants [23, 24], which implies a more complicated mechanism of plant phasiRNA biogenesis.

Due to the biological significance of phaiRNAs, mining of novel miRNA-phasiRNA pathways as well as functional cascade amplification have attracted wide attention. As an important economic crop, investigating novel phasiRNA pathways will not only benefit our understanding in post-transcriptional regulations in this organism, but also could be used as references across economic crops.

Previously, we discovered lots of siRNAs in a three-week-old seedling sample by using the corresponding sRNA high-throughput screening (HTS) datasets. This inspired us that some of them might be phasiRNAs. Here, we continued to use our previously developed approach [25] for systematically mining of phasiRNA biogenesis pathways with these sRNA HTS datasets. In addition, considering some phasiRNAs expression might be tissue specific or stress dependent, we collected comparable sRNA HTS data sets published elsewhere using tissue-specific rice samples, which cultured under normal (control) or stress condition. The targets of novel phasiRNAs were further predicted and verified in order to provide substantial information of miRNA/sRNA-phasiRNA regulatory network in rice.

Results

Identification of novel phasiRNA biogenesis pathways in Oryza sativa

The sRNA HTS datasets from different rice samples were employed as inputs and rice cDNA sequences as alignment reference for searching PHAS loci capable of producing 21-nt or 24-nt phasiRNAs. As a result, fourteen 21-nt and nineteen 24-nt PHAS loci candidates passed through the filtering procedures as well as the corresponding searching of sRNA triggers for phasiRNA production (Additional file 1:Table S1, Additional file 2: Table S2). Recent reports discovered that processing of 21-nt phasiRNAs mainly depends on OsDCL4, and OsDCL3 is required for biogenesis of 24-nt phasiRNAs in rice [20]. Therefore, we evaluated the abundance of 21-nt and 24-nt phasiRNAs generated from potential PHAS loci candidates by comparing the wild-type (wt) with osdcl4 knockdown mutant (osdcl4–1) [26] (for 21-nt phasiRNAs) and osdcl3 knockdown mutant (osdcl3–1) [20] (for 24-nt phasiRNAs), respectively.

As a result, five novel 21-nt PHAS loci and five novel 24-nt PHA loci along with their corresponding miRNA/sRNA triggers were identified (Table 1). As shown in Fig. 1 and Fig. 2, the miRNA/sRNA triggers-mediated cleavages in target PHAS loci were detected by at least one degradome sequencing dataset. Indeed, each cleavage site was close to the flank of phasiRNA production region as indicated by the relative abundances of phasiRNAs (middle panel), which suggested these sites were primary registers for phasing process. Additionally, the abundance of phasiRNAs generated from these newly found 21-nt and 24-nt PHAS loci in wild type were relatively higher than that in osdcl4–1 mutant and osdcl3–1 mutant, respectively. This indicated that the 21-nt and 24-nt phasiRNA productions are OsDCL4- and OsDCL3 dependent, respectively (Additional file 3: Figure S1 and Additional file 4: Figure S2). Taken together, these results demonstrated that these newly found PHAS loci fit the profiles of canonical phasiRNA precursors [8, 20].

Table 1 Novel PHAS loci in Oryza sativa
Fig. 1
figure 1

Identification of novel 21-nt phasiRNAs biogenesis pathways in Oryza sativa. a OSsRNA-1 induced phasiRNAs generation from the transcript of LOC_Os01g57968.1 in seedling, b OSsRNA-2 induced phasiRNAs generation from the transcript of LOC_Os02g18750.1 in panicle, c osa-miR2118f induced phasiRNAs generation form the transcript of LOC_Os04g25740.1 in panicle, d OSsRNA-3 induced phasiRNAs generation from the transcript of LOC_Os05g43650.1 in seedling, e OSsRNA-4 induced phasiRNAs generation from the transcript of LOC_Os06g30680.1 in panicle, and f OSsRNA-4 induced phasiRNAs generation from the transcript of LOC_Os06g30680.1 in panicle (drought stress). For each graph, degradome supported cleavage signature on PHAS loci were profiled above, four high throughput degradome sequencing datasets (GSM434596, GSM455938, GSM455939 and GSM476257 which were represented by triangle, diamond, circle and square with different colors, respectively) were employed for scanning the sRNA triggers’ cleavage sites, which were marked by black arrows. The x axis represents the position on PHAS loci and the y axis represents the signature abundance. The abundance of 21-nt phasiRNAs which generated from the sense and antisense strand of PHAS loci in different samples were evaluated and profiled in middle images, the x axis represents the position on PHAS loci and the y axis represents the phasiRNA abundance. The phasing score of 21-nt PHAS windows were profiled at bottom, the x axis represents the position on PHAS loci and the y axis represents the phasing score

Fig. 2
figure 2

Identification of novel 24-nt phasiRNAs biogenesis pathways in Oryza sativa. a OSsRNA-14 induced phasiRNAs generation from the transcript of LOC_Os01g37325.1 in seedling (salt stress). b OSsRNA-14 induced phasiRNAs generation from the transcript of LOC_Os01g37325.1 in panicle. c OSsRNA-14 induced phasiRNAs generation from the transcript of LOC_Os01g37325.1 in panicle (drought stess). d OSsRNA-15 and OSsRNA-16 induced phasiRNAs generation from the transcript of LOC_Os02g20200.1 in seedling. e OSsRNA-17 induced phasiRNAs generation from the transcript of LOC_Os02g55550.1 in seedling. f OSsRNA-18 or OSsRNA-19 induced phasiRNAs generation from the transcript of LOC_Os04g45834.2 in seedling. g OSsRNA-20 induced phasiRNAs generation from the transcript of LOC_Os09g14490.1 in seedling. For each graph, degradome supported cleavage signature on PHAS loci were profiled above, four high throughput degradome sequencing datasets (GSM434596, GSM455938, GSM455939 and GSM476257 which were represented by triangle, diamond, circle and square with different colors, respectively) were employed for scanning the sRNA triggers’ cleavage sites, which were marked by black arrows. The x axis represents the position on PHAS loci and the y axis represents the signature abundance. The abundance of 24-nt phasiRNAs which generated from the sense and antisense strand of PHAS loci in different samples were evaluated and profiled in middle images, the x axis represents the position on PHAS loci and the y axis represents the phasiRNA abundance. The phasing score of 24-nt PHAS windows were profiled at bottom, the x axis represents the position on PHAS loci and the y axis represents the phasing score

Previously, we found lots of sRNAs generated in three-week-old seedling tissues (see details about the information of GEO number of dataset and culture condition of plants in Additional file 5: Table S3). Here, we tested whether these sRNA are phasiRNAs by using our mining method. As expected, the phasiRNAs generated from two novel 21-nt PHAS loci (LOC_Os01g57968.1and LOC_Os05g43650.1) and four novel 24-nt PHAS loci (LOC_Os02g20200.1, LOC_Os02g55550.1, LOC_Os04g45834.2 and LOC_Os09g14490.1) were identified in three-week-old seedling tissues (Table 1).

Since the triggering of phasiRNAs are sometimes tissue specific and stress dependent, a serial of sRNA HTS datasets of different rice samples were employed for mining novel PHAS loci in different tissues and stress conditions (see details about the information of GEO number of datasets, culture and treatment conditions of plants in Additional file 5: Table S3). As shown in Fig. 1, transcripts of 21-nt PHAS loci, LOC_Os02g18750.1 and LOC_Os04g25740.1, were able to produce 21-nt phasiRNAs in panicle under normal condition. LOC_Os06g30680.1-derived 21-nt phasiRNAs and LOC_Os01g37325.1-derived 24-nt phasiRNAs were detected in panicle under both drought and normal condition.

According to the gene annotation, LOC_Os01g57968.1, LOC_Os02g18750.1, LOC_Os04g25740.1 and LOC_Os05g43650.1 encode proteins with unknown function and LOC_Os06g30680.1 encodes a WD domain, G-beta repeat domain containing protein. LOC_Os01g37325.1 and LOC_Os02g20200.1 encode two retrotransposon genes, LOC_Os02g55550.1 encodes an F-box/LRR-repeat protein, LOC_Os04g45834.2 encodes a protein with DUF584 domain, and LOC_Os09g14490.1 encodes a TIR-NBS type disease resistance protein.

Taken together, these protein-coding genes acted as PHAS loci in different tissues and stress conditions suggested these coding sequences were regulated at post-transcriptional level in response to different stages of growth and stress conditions.

In consistent to previous discovery, two known 21-nt PHAS loci (LOC_Os12g42380.1 and LOC_Os12g42390.1) were also uncovered by our screening procedure (Additional file 1: Table S1), which have been identified as two parts of a long non-coding RNAs [27]. LOC_Os12g42380.1-derived phasiRNAs were detected in both seedling and panicle under normal, drought and salinity stress conditions. Yet they were only detected in shoot under salinity stress. LOC_Os12g42390.1-derived phasiRNAs were detected in shoot under normal condition, and panicle in drought. These results implied there are three alternative phasiRNA production regions within their lncRNA PHAS loci, and therefore the capability of phasiRNA production might vary in different development stages and stress conditions.

To note, to our knowledge, for all these newly found PHAS loci, only the biogenesis of LOC_Os04g25740.2-derived phasiRNAs were triggered by a known miRNA, miR2118f. The rest of them were first-time discovered, and were recognized by novel sRNAs (Table 1), which suggested these phasiRNA biogenesis pathways are not belong to the miR2118 or miR2257 mediated regulatory networks.

Analysis of the regulatory function of novel phasiRNAs generated from 21-nt PHAS loci

The tasiRNAs are those 21-nt phasiRNAs with trans-regulatory function by cleaving target mRNAs in plant. In order to identify novel tasiRNAs generated from the newly found 21-nt PHAS loci, all the 21-nt phasiRNAs were systematically “predicted” based on modified tasiRNA biogenesis model [28]. All of detectable phasiRNAs were then employed for target prediction based on miRU algorithm and verified by using degradome-based HTS data (see details in “methods”). The results indicated ten novel tasiRNAs were generated from three newly found 21 nt PHAS loci (LOC_Os02g18750.1, LOC_Os05g43650.1 and LOC_Os06g30680.1). These tasiRNAs mediated forty sRNA-target interactions (Table 2, Fig. 3, Additional file 6: Figure S3). Among these targets, LOC_Os02g39380.1 played important roles in plant cellular signaling cascades [29]. LOC_Os01g34620.8, LOC_Os02g52900.2, LOC_04g39600.1, LOC_08g40440.1, LOC_Os6g23274.1, LOC_Os06g47850.1, LOC_11g41860.1,LOC_11g41860.2 and LOC_Os05g46580.1 were involved in plant growth and development [30,31,32,33,34,35]. LOC_Os09g12230.1, LOC_Os04g38450.1 and LOC_Os04g49160.1 were related to plant defense and stress response [36,37,38].

Table 2 Targets of novel tasiRNAs in Oryza sativa
Fig. 3
figure 3

Examples of degradome sequencing-based validation of the phasiRNA-target interactions. Two libraries of degradome sequencing data libraries (GSM434596 and GSM476257) were recruited for T-plot profiling. The IDs of the target transcripts and the corresponding phasiRNAs generated from the transcript of LOC_Os02g18750.1and LOC_Os05g43650.1 are listed on the top. The y axis measure the normalized reads (in RMP, reads per million) of the degradome signals, and the x axis represent the position of the cleavage signals on the target transcripts. The binding sites of the phasiRNA on their target transcripts were denoted by gray horizontal lines, and the dominant cleavage signals were marked by black arrows

Although the transcript of LOC_Os12g42380.1 has been identified as part of an lncRNA phasiRNA precursor [27], one novel LOC_Os12g42380.1-derived tasiRNA was found based on our revised tasiRNA biogenesis model [28]. LOC_Os12g42380.1 (414)21 5’D7(+) targeted to a NAD-dependent epimerase/dehydratase gene (LOC_Os07g47700.1) (Table 1, Additional file 6: Figure S3) suggested it might be involved in regulation of plant growth, development and environmental stress [39, 40]. Taken together, these results suggested the OSsRNA-2-LOC_Os02g18750.1-phasiRNA, OSsRNA-3-LOC_Os05g43650.1-phasiRNA, OSsRNA-4-LOC_Os06g30680.1-phasiRNA and OSsRNA-5-LOC_Os12g42380.1-phasiRNA pathways might play crucial regulatory roles in rice growth, development and stress response. In addition, the regulatory networks of the phasiRNA pathways mentioned above were constructed based on the target information (Fig. 4).

Fig. 4
figure 4

The regulatory networks of phasiRNA pathways in Oryza sativa. The OSsRNA-2- LOC_Os02g18750.1-phasiRNA (a), OSsRNA-3- LOC_Os05g43650.1-phasiRNA (b), OSsRNA-4- LOC_Os06g30680.1-phasiRNA (c) and OSsRNA-5- LOC_Os12g42380.1-phasiRNA (d) regulatory network were constructed by Cytoscape based on the validated phasiRNAs and their targets. The phasiRNAs are the gray nodes, the orange nodes represent the targets involving plant development, stress response, disease resistance or signaling transport. The blue nodes represent the expressed proteins with unknown functions

Analysis of the RNA directed DNA methylation (RdDM) regulated promoters of novel 24-nt phasiRNAs

RdDM is an important regulatory event with regards to repressive epigenetic modification which triggers transcriptional gene silencing. In order to analysis the novel 24-nt phasiRNA mediated RdDM in rice, we focused on all the known promoter sequences for scanning the target sites of novel phasiRNAs generated from the newly found five 24-nt PHAS loci. The result indicated a promoter of LOC_Os02g40860.1 gene was targeted by five LOC_Os01g37325.1-derived phasiRNAs (Table 3). Since LOC_Os01g37325.1-derived phasiRNAs were detected in panicle rather than in root tissue (Fig. 2), we used the bisulfite-seq and RNA-seq datasets [41] of rice panicle and root for identification of LOC_Os01g37325.1-derived phasiRNAs-mediated DNA methylation intarget promoter and their role in transcriptional repression of target gene (LOC_Os02g40860.1). It was reported that CG and CHG methylation contexts are maintained by DNA methyltransferases and histone modifications, while CHH methylation was associated with 24-nt siRNA guided RdDM [16]. We discovered the CHH methylation status of promoter was relative higher in panicle than in root (Fig. 5). In addition, the expression level of LOC_Os02g40860.1 was relatively lower in panicle than in root. These results implied a methylation mediated transcriptional silencing of the promoter of LOC_Os02g40860.1.

Table 3 The target promoter of LOC_Os01g37325.1-derived phasiRNAs
Fig. 5
figure 5

DNA methylation status and expression analysis of target promoter. DNA methylation by CG, CHG and CHH context at the promoter of LOC_Os02g40860.1in panicle (a) and root(b) were analyzed and profiled. X-axis represents the position on promoter sequence and Y-axis represents the abundance of CG, CHG or CHH. The expression level of LOC_Os02g40860.1 in panicle and root were also showed in a bar chart(c)

For LOC_Os02g40860.1, it encodes a Casein kinase I1 (OsCKI1) protein belongs to the CKIs protein family, which are highly conserved in eukaryotes. They are involved in a variety of important biological events since they have a wide substrate specificity in vitro [42]. Taken together, we speculated that the OSsRNA-14-LOC_Os01g37325.1-phasiRNA pathway might play crucial roles for rice seedling and panicle development.

Discussion

In recent years, researches on Oryza sativa have shown that 21- or 24-nt sRNAs distribute to genomic clusters [43]. To date, dozens of PHAS loci have been discovered in rice [23]. However, two miRNAs, miR2118 and miR2275 are mainly responsible for the triggering of 21-nt or 24-nt phasiRNAs biogenesis from these PHAS loci.

Considering there are rich sources for miRNA/sRNA-phasiRNA pathways in other plant species, we conceived that the miRNA-phasiRNA pathways have not been fully discovered in rice either. Therefore, it is worthy of continuing the mining for better understanding the mechanism of phasiRNA biogenesis and the miRNA-derived regulatory network. In our previous work, we found plenty of sRNAs with unknown function and origin from a sRNA HTS data set of three-week-old seedling tissue, and speculated some of them were phasiRNAs with regulatory functions. In this study, we performed a systematically searching of novel PHAS loci from rice cDNA by utilizing the same seedling dataset with our previous established mining approach [25].

As we expected, two novel 21-nt phasiRNA biogenesis pathways (OSsRNA-2-LOC_Os01g57968.1-phasiRNA and OSsRNA-3-LOC_Os05g43650.1-phasiRNA pathway) and four novel 24-nt phasiRNA biogenesis pathways (OSsRNA-15/OSsRNA-16-LOC_Os02g20200.1-phasiRNA, OSsRNA-17-LOC_Os02g55550.1-phasiRNA, OSsRNA-18/OSsRNA-19-LOC_Os04g45834.2-phasiRNA and OSsRNA-20-LOC_Os09g14490.1- phasiRNA pathway) were discovered. In addition, since the phasiRNAs are involved in regulation of plant growth and development, stress responses, we integrated a serial of sRNA HTS datasets from different tissues (including two-week-old seedling samples) under normal and stress conditions. As a result, three novel 21-nt phasiRNA biogenesis pathways (OSsRNA-2-LOC_Os02g18750.1-phasiRNA, osa-miR2118f-LOC_Os04g25740.1-phasiRNA and OSsRNA-4-LOC_Os06g30680.1-phasiRNA pathway) and one novel 24-nt phasiRNA biogenesis pathway (OSsRNA-2-LOC_Os01g37325.1-phasiRNA) were discovered. These results substantially extend the knowledge in phasiRNA biogenesis pathways in rice. However, the six novel phasiRNAs biogenesis pathways that we discovered in three-week-old seedling were undetected in two-week-old seedling samples, which might be caused by the low expression level of phasiRNAs generated from these pathways in younger seedlings.

The novel 21-nt PHAS loci, LOC_Os05g43650.1, is a miniature inverted-repeat transposable element (MITE) gene [44]. Also, with regards to two 24-nt PHAS loci, LOC_Os01g37325.1 and LOC_Os02g20200.1, they are two retrotransposon genes. These indicated that the transcripts of transponsons and retrotransponsons are capable of producing secondary siRNAs, which is consistent with the same phenomenon reported by Creaseyet al. in Arabidopsis [45, 46].

According to the target information of phasiRNAs, the OSsRNA-3- LOC_Os05g43650.1-phasiRNA and OSsRNA-14- LOC_Os01g37325.1-phasiRNA pathways are required for the rice development. Transponsons and retrotransponsons that play important roles in plant gene and genome evolution are ubiquitous in plants [47]. We hypothesized that the transcripts of transponson and retrotransponson might also function as important sources of phasiRNA in plants. Further exploration of such phasiRNA biogenesis pathways could benefits the in-depth investigation of their biogenesis mechanism and the miRNA/sRNA directed regulatory networks in plants.

For those phasiRNAs generated from the transcripts of LOC_Os01g57968.1, LOC_Os02g20200.1, LOC_Os02g55550.1, LOC_Os04g45834.2 and LOC_Os09g14490.1, none of their targets were identified. However, considering these phasiRNAs were detected only in seedling, it still cannot rule out the possibility that these phasiRNA biogenesis pathways might take place in rice seedling development. LOC_Os04g45834.2 encodes a DUF584 domain containing protein. These protein family has been involved in leaf senescence in plant [48]. LOC_Os09g14490.1 encodes a TIR-NBS type disease resistance protein, which has been identified in resistance to multiple viruses in plant [49,50,51]. LOC_Os02g55550.1 encodes a F-box/LRR-repeat protein 14, which is involved in plant immune response [52]. These genes have been proved to play important roles in plants, however, their capability of producing secondary phasiRNAs suggest they might be involved in much more complex function than what we expected. Similarly, no targets of LOC_Os01g57968.1-derived phasiRNAs was identified, however, since these phasiRNAs only expressed in panicle tissue under normal condition, it might suggest the OSsRNA-1- LOC_Os01g57968.1-phasiRNA pathway might related to the rice panicle development. Thus, systematically investigation of the temporal and spatial expression specificity of phasiRNAs generated from the transcripts of protein-coding genes in our future work might gain insight into these phasiRNAs biogenesis requirement mechanism.

In this study, two cDNA sequences, LOC_Os09g00999.1 and LOC_Os09g01000.1, which were able to produce plenty of Dicer-independent secondary siRNAs in most of tissues, have attracted our attention. We further employed the searching of phasiRNAs generated from LOC_Os09g00999.1 and LOC_Os09g01000.1 for target prediction and identification. The results indicated plenty of siRNA-target interaction pairs were discovered (data not shown). This might suggest a novel pattern of secondary siRNAs biogenesis pathways. Therefore, further investigation of Dicer-independent secondary siRNAs biogenesis pathways in plant might provide more strong evidence of this biogenesis pattern, and more meaningful information of the small RNA regulatory mechanism in plant.

Conclusions

Here, we performed degradome-based screening of novel phasiRNA biogenesis pathways in rice. Five novel 21-nt phasiRNA biogenesis pathways and five novel 24-nt phasiRNA biogenesis pathways were also identified in addition to two known 21-nt phasiRNA biogenesis pathways. Further analysis on the targets of these novel phasiRNAs in 21-nt and 22-nt length revealed that eleven novel phasiRNAs mediated forty-one siRNA-target interactions during rice growth and development (Table 2, Additional file 1: Table S1, Additional file 2: Table S2 and Additional file 6: Figure S3). These results demonstrated the effectiveness of degradome-based screening in mining novel phasiRNA biogenesis pathways and substantially extend the information of phasiRNA biogenesis pathways in rice. We believed that, more novel phasiRNA biogenesis pathways might be identified if extend our approach to other plant species.

Methods

Data source

The Oryza sativa sRNA HTS datasets of seedling, root, shoot and panicle samples under normal (control) and stress conditions, the sRNA HTS datasets of wild type, osdcl4 and osdcl3 mutants and the degradome sequencing datasets were retrieved from GEO (Gene Expression Omnibus; http://www.ncbi.nlm.nih.gov/geo/). The bisulfite-seqand RNA-seq datasets of panicle and root were contributed by Zhao et al. [41]. All the HTS datasets employed for our study were listed in Additional file 5: Table S3.

The cDNAs, full-length genomic sequences of Oryza sativa were retrieved from PlantGDB (http://plantgdb.org/XGDB/phplib/). The promoter sequences of Oryza sativa were retrieved from PlantProm DB (http://linux1.softberry.com/). All the high-throughput sequencing data were pre-processed before use, the data of each library was normalized in RPM (reads per million) as described in our previous report [53].

Identification of phasiRNA biogenesis pathways in Oryza sativa

The phasiRNA loci identification criteria were established based on the revised trans-acting siRNA (tasiRNA) biogenesis model as we reported previously [28]. The screening of PHAS loci in rice was followed by four steps: (1) cDNA/genome sequences-derived 21-nt phased duplexes were computational predicted by “phase processing”, each of these duplexes has a 2-nt overhang at 3′-end. (2) Each of these duplexes was separated into two increments and used for matching with small RNAs from small RNA high throughput sequencing datasets of Rice. A potential phasiRNA production region shall contain at least 5 tandem “processing” duplexes and each of these duplexes shall contains detectable phasiRNA from sense strand (plus siRNA) and/or antisense strand (minus siRNA). (3) Degradome HTS libraries which contributed by the works of Wu et al. [54], Li et al. [55] and Zhou et al. [56] were employed for systematically scanning the degradome-supported cleavage signatures on the screened possible phasiRNA production regions as we described in our previous work [28], and maintain the PHAS loci candidates with cleavage signatures which located in the phasiRNA production region. (4) The sRNAs bound to the PHAS loci were analyzed by using miRU algorithm [57], and the sRNA cleavage sites on those loci were further verified by using degradome sequencing libraries. The degradome-supported cleavage site of a sRNA trigger shall reside within 10 to 11-nt from the 5′ end of the binding site [58]. (5) The phasing score of phasiRNA production from each PHAS loci candidate should above 1.

Calculation of phasing score

Phasing scores of phasiRNA regions were calculated based on the formula which contributed by Zheng et al. [23]: \( \mathrm{Phasing}\ \mathrm{score}=1\mathrm{n}\Big[{\left(1+10\times \frac{\sum_{i=1}^5 pi}{1+\sum U}\right)}^{n-2} \), where N represents the number of phase register occupied by at least one unique 21-nt/24-nt small RNA within a five-phase register window, p represents the total number of reads for all 21-nt/24-nt small RNA falling into a given phase in a given window, U represents the total number of unique reads for all 21-nt/24-nt small RNA falling out of a given phase.

Identitification of phasiRNA-target interaction based on degradome sequencing

The expressed novel phasiRNAs generated from 21-nt PHAS loci were predicted based on previously modified model of tasiRNA biogenesis in plant [28]. The predicted phasiRNAs were recruited for target prediction by using miRU with default parameters [57], and followed by degradome sequencing-based verification, as described previously [53, 59].

Gene expression level analysis

The sequences of RNA-seq datasets were mapped to the reference cDNA sequences, and each gene expression level was calculated by the total RPM of mapped sequences.

Identification of 24 nt phasiRNA target

In order to identify the potential 24-nt phasiRNA target sites in promoter sequences, BLAST analysis was performed for finding the location of the complementary sequence of 24-nt phasiRNA with no mismatch [60]. The promoters possessed phasiRNA binding sites were remained as potential target promoters. As each of the downloaded promoter sequence containing partial mRNA sequence, we identified the corresponding potential target genes by mapping the partial mRNA sequence to cDNA sequences. The DNA methylation status of potential target promoters were analyzed by utilizing the bisulfite-seq datasets of panicle and root of rice. The expression specificity of phasiRNA in different tissues should consistent with the occurring of increasing methylation of the target promoter.

The DNA methylation analysis of promoters were performed according to the method developed by Zhao et al. [41]. The sequences of bisulfite sequencing libraries were mapped to the potential promoter sequences, and the uniquely mapped sequences were used for further DNA methylation level analysis. The DNA methylation level of each cytosine was obtained by calculation of the total coverage of individual cytosines in RPM.