Abstract
DNA copy number aberrated regions in cancer are known to harbor cancer driver genes and the short non-coding RNA molecules, i.e., microRNAs. In this study, we integrated the multi-omics datasets such as copy number aberration, DNA methylation, gene and microRNA expression to identify the signature microRNA-gene associations from frequently aberrated DNA regions across pan-cancer utilizing a LASSO-based regression approach. We studied 7294 patient samples associated with eighteen different cancer types from The Cancer Genome Atlas (TCGA) database and identified several cancer-specific and common microRNA-gene interactions enriched in experimentally validated microRNA-target interactions. We highlighted several oncogenic and tumor suppressor microRNAs that were cancer-specific and common in several cancer types. Our method substantially outperformed the five state-of-art methods in selecting significantly known microRNA-gene interactions in multiple cancer types. Several microRNAs and genes were found to be associated with tumor survival and progression. Selected target genes were found to be significantly enriched in cancer-related pathways, cancer hallmark and Gene Ontology (GO) terms. Furthermore, subtype-specific potential gene signatures were discovered in multiple cancer types.
Similar content being viewed by others
Introduction
MICRORNAs (miRNAs) are small non-coding RNAs that act as modulators of the target genes' expression either by inhibiting translation or promoting RNA degradation1. Several studies found miRNAs to be the regulators of cancer driver genes that promote tumor initiation, progression and proliferation2,3,4.
Several state-of-the-art methods utilize miRNA and gene expression data to infer miRNA-gene regulatory networks. Among these, ARACNe5 and ProMISe6 use mutual information-based algorithms and HiddenICP7, idaFast8 and jointIDA9 use invariant causal relationships, i.e., direct or indirect effects of miRNAs on targets to infer miRNA-gene regulatory networks.
Several studies found that DNA copy number aberrated areas, i.e., amplification and deletion regions harbor cancer-driving genes10,11 and miRNAs12,13,14.
Several studies integrated copy number data, DNA methylation and gene expression to compute miRNA-gene regulatory networks in cancer15,16 using regression-based approaches. These studies, however, mined miRNAs and target genes from the entire genomic locations.
In our previous study, we developed a computational pipeline called miRDriver based on the hypothesis that copy number data from cancer patient samples can be utilized to discover driver miRNAs of cancer17. miRDriver assumes that miRNAs located within an aberrated region regulate the expression of the genes outside the aberration, extending the aberration effects across the genome and beyond the aberrated region. Since other factors can influence the expression of the genes outside the aberration, miRDriver integrates DNA methylation and copy number aberration (CNA) of these genes, transcription factors (TFs) and the expression of the genes located inside an aberration along with the miRNAs to select the regulatory miRNAs for these genes17. We computed frequently aberrated chromosomal copy number regions, namely, GISTIC regions, among tumor patient samples (see Materials and Methods). Then, for each GISTIC region, we computed differentially expressed (DE) genes between the tumor samples with the aberration and the samples that did not have the aberration. Afterward, we computed DE trans genes (genes outside of aberrated areas) and cis genes (genes inside of aberrated areas) for each GISTIC region. Finally, we applied a LASSO-based18 regression model to select miRNAs regulating DE genes' expression (Fig. 1).
miRDriver outperformed ARACNe, ProMISe, Hidden-ICP, ICP-PAM50, idaFast and jointIDA in retrieving significantly enriched miRNA-gene interactions with the known miRNA-gene interactions. miRDriver discovered several potentially novel interactions in multiple cancer types. Several oncogenic and tumor suppressor miRNAs and genes were found to be enriched in the computed miRNA-gene networks. Several miRNAs were found to be associated with patients' survival and disease progression. Selected target genes were found to be significantly enriched in cancer-related biological pathways and GO terms19. Furthermore, subtype-specific gene signatures were discovered in multiple cancer types.
In our previous publication, we have demonstrated miRDriver’s statistical robustness by applying it to two different cancer types. This study has unique contributions. In the current study, we present miRDriver as an R software package with various options for users to run our workflow. We have also demonstrated its application and biological importance by running miRDriver on eighteen different cancer types. We have presented extensive results on these cancer types that were not present in our prior publication. We have also presented pan-cancer-wide findings and their relevance to cancer. We have put together a resource of pan-cancer miRNA-gene interactions that will be useful to biologists, clinicians and scientists working on cancer research.
Results
In this study, we integrated CNA, DNA methylation, TF-gene interactions, gene, and miRNA expression datasets in the miRDriver tool to compute miRNA-gene interactions based on DNA copy number aberrated regions in eighteen different cancer types from TCGA. Table 1 shows the cohort sizes for each data modality, the number of all GISTIC regions, the count of trans genes in the LASSO step, and the computed miRNA-gene interactions in eighteen different cancer types.
Computed miRNAs were significantly enriched in the experimentally-validated oncogenic miRNAs
We performed a two-sided Fisher's exact test to check the association between the cancer-related miRNAs in OncomiRDB (see Materials and Methods) and the computed miRNAs by miRDriver. For each cancer type, the background set in the Fisher's exact test consisted of all TCGA miRNAs used in the LASSO step (see Materials and Methods) for that cancer type. For all cancer types, computed miRNAs were significantly enriched (Fisher's exact test p-value < 0.05) in the oncogenic miRNAs in OncomiRDB (Table 1).
Computed miRNA-gene interactions were enriched in the known miRNA-gene interactions
To check if the miRNA-gene interactions computed by miRDriver were significantly enriched in the known miRNA-gene interactions, we performed a hypergeometric test for each miRNA's computed target genes in each cancer type. We considered only those miRNAs that had at least one known target in the ground truth data (i.e., known miRNA-gene interactions) (see Materials and Methods) from the computed target list. We labeled them as "Eligible miRNAs" for the hypergeometric test. The background set, i.e., the hypergeometric test universe, was the set of all the trans genes in the HGNC symbol20 that were common to the ground truth data. For fourteen cancer types, at least 50% of the "Eligible miRNAs" had significant enrichment (p-value < 0.05) (Table 2). The entire list of the computed miRNAs with individual hypergeometric p-values for all eighteen cancer types can be accessed in Supplemental Table S1.
miRDriver outperformed five state-of-the-art methods in inferring significant miRNA-gene interactions
We compared miRDriver with five state-of-the-art methods, namely, ARACNe, ProMISe, HiddenICP, idaFast and jointIDA, by running them on eighteen different cancer types from TCGA. For all these methods, we used gene expression data to compute miRNA-gene interaction networks for our comparison (see Materials and Methods). We performed the hypergeometric test to measure each miRNA's computed targets' enrichment significance in the known miRNA-gene interaction data. We selected only "Eligible miRNAs" (i.e., miRNAs with at least one known target in the ground truth data) for this test. We computed the overlapping "Eligible miRNAs" for miRDriver and each comparable method. We checked if the count of the "Significant miRNAs" (i.e., miRNAs with target enrichment test p-value < 0.05) in miRDriver was more (i.e., miRDriver won), less (i.e., miRDriver lost), or equal (i.e., there was a draw) than the other method in the overlap. miRDriver had more "Significant miRNAs" than all other methods for most of the cancer types. For ACC, LUSC and THCA, miRDriver and the different methods had no common "Eligible miRNAs"; hence, we eliminated these three cancer types from this test. Table 3 summarizes the comparison results in all the cancer types. Table 4 presents the comparison results for ovarian cancer (OV) in detail with the number of "Eligible miRNAs" and "Significant miRNAs" in all the methods. For a detailed comparison with all the cancer types, see Supplemental Table S2. We also compared miRDriver with sequence-based competing endogenous RNA (ceRNA) prediction tool, Cupid21 for BRCA. miRDriver outperformed Cupid as well. Cupid predicts miRNAs that are also predicted to "mediate" ceRNA interactions. For TCGA BRCA, the authors of Cupid predicted 299K candidate miRNA–target interactions. We filtered this list with 6504 input genes and 255 miRNAs, the same inputs we used in miRDriver for BRCA. We considered the top 2437 (top 1 percentile) of miRNA-gene interactions based on Cupid reported scores to get highly confident interactions for our comparison. The count of the "Significant miRNAs" in miRDriver was higher than Cupid in the overlap (see Supplemental Table S2).
Computed genes were enriched in biological pathways, cancer hallmark and GO terms
To evaluate the functional roles of the computed target genes by miRDriver for each cancer type, we checked whether these genes were enriched in the biological pathways and GO terms19. For this purpose, we performed pathway enrichment analysis with the pathways in REACTOME22 and KEGG23 databases. For REACTOME pathway enrichment, we used R package Pathfinder24 and for KEGG pathways, hallmark gene set from the MSigDB25,26 database and GO enrichment, we used R package clusterProfiler27. We selected the pathways and GO terms with significant enrichment (multiple testing corrected, i.e., adjusted p-value < 0.05). We found 213 unique REACTOME pathways spanning over seventeen cancer types, twelve unique KEGG pathways in twelve cancer types and 224 unique enriched GO terms spanning over fifteen cancer types. Table 5 shows the enriched pathways and GO terms that were common in multiple cancer types. We provided the entire list of enriched pathways and GO terms for all the cancer types in Supplemental Table S3. Among these pathways, "Immune System" related pathways were found to play essential roles in cancer28,29. The G protein-coupled receptors (GPCRs)-related REACTOME pathways such as "Signaling by GPCR", "GPCR ligand binding" and "GPCR downstream signalling", which were implicated in several cancer-related studies, were found to be enriched in the computed target genes in more than ten cancer types in our study. These pathways were found to play crucial roles in tumor development, invasion, migration, survival, and metastasis30,31. The GO terms, such as "receptor ligand activity" and "receptor regulator activity", enriched in at least five cancer types, were highlighted in several cancer studies for playing roles in drug toxicity, cell function, tumor growth32,33,34. The computed target genes in each cancer type were also enriched in the cancer hallmark gene set (Table 6).
Furthermore, miRDriver computed 22 common miRNAs that were shared in at least eight different cancer types among eighteen total cancer types used in the study (Table 7). The targets of these miRNAs could regulate the common biological processes in cancer. Hence, we performed a GO enrichment test with 1161 computed genes targeted by at least one of these 22 miRNAs among eighteen cancer types and found 49 GO terms with significant enrichment. Table 8 shows a few of these GO terms with their cancer-related citations; the entire list can be found in Supplemental Table S4.
Although there were common miRNAs across multiple cancer types, there were not many common miRNA-gene interactions due to a much higher number of trans genes than the miRNAs in this pan-cancer analysis. Table 9 presents fourteen common gene-miRNA interactions shared in two cancer types among 11,548 selected interactions from pan-cancer. Among these, RSPO3 and miR-22 interaction have been selected in LAML (leukemia) and LUAD (lung cancer). Interestingly, RSPO3 was found to play a role in leukemia35 and promote tumors in lung cancer36. miR-22 was found to play the anti-tumor role with therapeutic potential in acute myeloid leukemia37 and found to have roles in lung cancer via CNAs38. Another interaction PAX5 with miR-5699 was found in BLCA (bladder cancer) and OV (ovarian). Interestingly, PAX5 was found to have a role in bladder cancer39 and ovarian cancer40 as a co-regulator of PAX8. miR-5699 has a proven role in ovarian cancer treatment's oxidative response41. There are some miRNA-long noncoding RNA (lncRNA) interactions in Table 9. lncRNAs are known to have binding sites for miRNAs, also lncRNAs can be direct–indirect targets of miRNAs42,43. Several lncRNAs were found to be prevalent in cancer44. In our case, LINC01833- miR-1226, was found in BRCA (breast cancer) and LGG (brain cancer). LINC01833 was listed in the top five lncRNAs according to the prioritization of variation in ER-negative-associated lncRNAs in breast cancer45. miR-1266 was found to regulate the expression of the mucin 1 oncoprotein and induce cell death in a breast cancer study46.
Several cancer-related terms and pathways were enriched in the targets of the computed miRNAs
We checked the involvement of the computed miRNAs in cancer-related pathways. For this analysis, we collected all 556 miRNAs that were computed by miRDriver in at least one of the cancer type. We collected the computed target genes for each of these miRNAs from all the cancer types where that miRNA was present. We performed cancer hallmark gene set enrichment with these collected target genes of each miRNA. We found 38 unique enriched cancer hallmark terms (adjusted p-value < 0.05) for 134 miRNAs (Supplemental Table S5).
We also performed REACTOME pathway enrichment analysis with these collected target genes of each miRNA. We found 240 unique enriched REACTOME pathways (adjusted p-value < 0.05) for 69 miRNAs with these target genes (Supplemental Table S5). Eleven of these enriched pathways, such as, "Epithelial-Mesenchymal Transition", "Hypoxia", "Inflammatory Response", "KRAS Signaling Up", "p53 Pathway", "P13 AKT MTOR Signaling", "Xenobiotic Metabolism", "Apoptosis", "DNA Repair" and "Immune" were present in nineteen experimentally-validated cancer-related pathways for miRNAs57.
Furthermore, we performed an analysis to find cancer-driving miRNAs (i.e., tumor-suppressor, oncogenes or both) using the enriched cancer hallmark terms (Supplemental Table S5). We hypothesized that a miRNA could be a candidate cancer-driving miRNA if its target genes that were found to be enriched in the cancer hallmark terms could also be enriched in the known cancer-driving genes. Hence, for each of the enriched cancer hallmark terms, we gathered all the miRNAs with their target genes for which that term was enriched (Table 10). We downloaded a list of 83 cancer-driving genes found to be frequently mutated in different cancer types from the Catalogue Of Somatic Mutation In Cancer (COSMIC) database from the cancer gene census project58. We performed a hypergeometric test for the overlapping target genes with the 83 cancer-driving genes for each cancer hallmark term. The background gene set for this test was all 5604 target genes computed by miRDriver in pan-cancer. We considered the miRNAs related to the hypergeometric p-value < 0.05 as the candidate miRNAs to be evaluated as cancer-driving miRNAs since their targets were enriched in known cancer-driving genes. Furthermore, considering the fact that the up- or down-regulation of a miRNA causes the inverse regulation of its target genes59,60,61, we specifically checked the target genes of these candidate miRNAs for different cancer types that were found to have negative LASSO regression coefficient computed by miRDriver (Table 11). Interestingly, all of the target genes in this group (Table 11), except OLIG2, were found to be oncogene in the previous studies62,63,64,65,66,67,68. OLIG2 was found to be working as a tumor-suppressor gene (TSG) in human glioblastoma69. All the miRNAs except miR-5001 and miR-2276 were found to act as TSGs in cancer in several studies70,71,72,73,74. miR-5001 and miR-2276 were found to have evidence of working as oncogenes in endometrial cancer and colorectal cancer, respectively75,76. These studies support the findings of miRDriver in terms of connecting miRNAs and genes that were related inversely, having a possibility to be working as drivers in pairs of TSG-oncogene in different cancer types.
Computed target genes revealed the subtype-specific expression signature in multiple cancer types
We checked the subtype-specific association of gene expression of computed target genes in BRCA, LGG, LUSC and PAAD. We used the R package TCGAbiolinks77 to download the different subtype labels for the different cancer types. Since TPM (transcript per million reads) values are normalized and comparable across samples, for this analysis, we utilized RNA-Seq data in TPM of TCGA samples whose subtype labels were available. We applied log2(TPM + 1) transformation from Cancer Dependency Map [https://depmap.org]. For all these cancer types, we performed unsupervised clustering using gene expression of these target genes and compared these clusters with baseline (i.e., known) subtype clusters using Rand Index (RI) and Uniform Manifold Approximation and Projection (UMAP)78 plots.
For BRCA, we computed a UMAP plot using around 1000 BRCA samples and 106 high-degree genes (i.e., computed genes targeted by more than three miRNAs) to check the PAM50 gene-based subtypes79. These subtypes were, Basal-like (BL), HER2-enriched (HER2+), LuminalA (LA), LuminalB (LB) and Normal-like (NL) (Fig. 2A). We also computed the UMAP plot using the PAM50 genes with PAM50 gene-based subtypes (Fig. 2B). These UMAP plots show a clear separation between different subtype-specific clusters. We also performed an unsupervised clustering (k-means) (with R base package Stats with k = 5 and all other parameters as default) on the BRCA cohort with high-degree target genes (Fig. 2C) and with PAM50 genes (Fig. 2D). The computed RIs between five known subtype labels with the five predicted clusters by computed high-degree target genes and PAM50 genes were 0.74 and 0.82, respectively. This result shows that both the computed high-degree target genes and PAM50 gene set were able to detect subtype structure in BRCA samples with high accuracy.
Furthermore, we used the high-degree genes to classify the BRCA cohort into five different classes. For this purpose, we used R package keras80 (https://github.com/rstudio/keras) implementation of the Random Forest classifier with 80% samples for training with 10-fold cross-validation where 20% of data was held out to test the performance of the model. We achieved a high classification accuracy of 0.86. The same sample cohort was classified with PAM50 genes and achieved a classification accuracy of 0.89. Figure 2E,F present the confusion matrices for both cases with F1 scores. The F1 scores for the classification with high-degree target genes were comparable to F1 scores of the PAM50-based classification, which suggests that these high-degree target genes can serve as potential markers for PAM50-based subtype signatures in BRCA.
For the other cancer types except for LGG, we computed UMAP plots to check the baseline subtype clusters with the selected high-degree target genes. For these cancer types, since there was a fewer number of genes targeted by more than three miRNAs, we defined high-degree genes as the genes targeted by more than two miRNAs. For LGG, we used 402 samples with all 151 computed target genes since no gene was targeted by multiple miRNAs (Fig. 2G). For LUSC, we used 178 patient samples with 75 high-degree target genes (Fig. 2H), and in PAAD, we used 150 patient samples with 101 selected high-degree target genes (Fig. 2I). We also performed k-means clustering for all these cancer types. For LGG, LUSC and PAAD, the computed RIs between known subtype clusters with the predicted clusters were 0.71, 0.62 and 0.70, respectively. For LGG and PAAD in which we achieved high RI values, we visualized clear separation among the known subtype-specific clusters based on UMAP plots. For LUSC, although we achieved a lower RI value, the "Basal" cluster was separated from other clusters (Fig. 2H). These results showed that the computed high-degree target genes could reveal subtype-specific expression signatures in multiple cancer types.
Computed miRNAs were found to be potential biomarkers for patients' survival and progression of the disease in each cancer type
We performed survival analysis with the computed miRNAs to assess the miRNAs' prognostic relevance as clinical biomarkers for patients' survival (Fig. 3). For each miRNA, we divided the patient cohort of each cancer type into two groups, such as high expression and low expression for that miRNA. We considered the available clinical variables among age, race, gender, stage, and grade as independent variables (see Materials and Methods). To remove the confounding effect of multiple factors, we used the Adjusted Kaplan–Meier Estimator and computed adjusted survival curves by weighting the individual contributions by the inverse probability weighting (IPW) using the R package IPWsurvival82. We considered four different survival endpoints, namely, Overall Survival (OS), Progression Free Interval (PFI), Disease Specific Survival (DSS) and Disease Free Interval (DFI) (see Materials and Methods). We found several prognostic miRNAs (adjusted log-rank test p-value < 0.05) based on Adjusted Kaplan–Meier survival plots in multiple cancer types. Figure 3 shows the survival plots for the common miRNAs in different cancer types. Among 22 common miRNAs (Table 7), eighteen had significant survival differences in high and low miRNA expression patient groups in at least one cancer type (Fig. 3). We provided the survival plots for all miRNAs for eighteen cancer types in Supplemental Figure S1–S18.
miRDriver discovered several cancer-specific miRNAs
In this study, miRDriver discovered 240 cancer-specific miRNAs, i.e., these miRNAs were selected in only one cancer type. We used the R package OncoScore83 to measure these miRNAs' association with cancer based on citation frequencies in cancer-related biomedical literature. Fifty percent of these miRNAs (i.e., 121) were found to be cited in cancer-related studies (Supplemental Table S6). Moreover, several of these miRNAs were found to be prognostic, i.e., associated with patients' survival based on Adjusted Kaplan–Meier survival analysis (adjusted log-rank test p-value < 0.05) (Table 12).
The copy number changes of the computed miRNAs were predictive of their expressions
We computed the Spearman correlation values between copy number and expression across all the samples of the computed miRNAs of miRDriver in eighteen different cancer types (Supplemental Figure S19). As expected, we observed that most miRNAs had a positive correlation between their copy number and expression. There were also some negative correlations, but this is not surprising as miRNA expression is dependent on regulatory factors beyond copy number events, too. Despite this, the positive median distribution of correlations across all cancer types supports our hypothesis that miRNA expression in copy number areas may be predictive of DE trans gene expression variation.
Selected high-degree genes were highly significant as potential biomarkers to predict prognosis in cancer patients than low-degree genes in several cancer types
We computed the hazard ratio (HR) of the selected high-degree target genes as the genes targeted by four or more miRNAs and low-degree target genes as the genes targeted by only one miRNA to get the optimized list of high-degree and low-degree genes. We performed the multivariate Cox regression analysis84 using these genes. Due to the low sample size of the high-degree target genes, we computed effect size using the r-value of the Mann–Whitney test with |ln (HR)|. Higher |ln (HR)| implies a higher association with an event's risk with an increase or decrease of gene expression. The r-value was negative if the |ln (HR)| values in the high-degree group were higher than the low-degree group and positive otherwise. We used OS, PFI, DSS and DFI as clinical endpoints in this analysis. We ran this analysis on fifteen different cancer types omitting the cancer types with no high-degree target gene (THCA and PRAD) and no clinical endpoint (LAML). In our previous work17 with BRCA and OV, we discussed the significance of high-degree target genes; hence, we omitted these two cancer types as well, leaving us thirteen cancer types for this analysis. Although the Wilcoxon rank-sum test p-values for the comparison between the boxplots of the two groups were insignificant (p-value > 0.05), we found negative r-values in most of the cancer types (see Fig. 4). The hazard ratio boxplots of all thirteen cancer types with r-values in different clinical endpoints can be found in Supplemental Figure S20–S23. Table 13 shows the high-degree target genes with OS in seven cancer types that had negative r-values. These genes were found to be cited in cancer-related work in a high percentage (≥ 50%) among total citations in biomedical literature by OncoScore. The entire list of high-degree genes with OncoScore frequencies has been provided in Supplemental Table S7.
Materials and methods
All the experiments were conducted in accordance with relevant guidelines and regulations.
Running miRDriver on pan-cancer
In this study, we conducted a pan-cancer analysis where we applied the miRDriver R package to identify copy-number derived miRNA-gene interactions. We integrated gene expression, CNA, DNA methylation, TF-gene interactions and miRNA expression data from eighteen different cancer types (Table 1). miRDriver has four computational steps: GISTIC Step, DE Step, REGULATOR Step, and LASSO Step. In the following paragraphs, we described the miRDriver R functions to run these steps. The entire pipeline of miRDriver running on pan-cancer is illustrated in Fig. 1.
To mine miRNAs that reside in the aberrated chromosomal regions of cancer patients, in the first step (i.e., GISTIC Step), we computed frequently aberrated chromosomal regions, namely, GISTIC regions, for eighteen different cancer cohorts. We utilized segmented chromosomal copy number profiles of each cancer cohort as inputs in GISTIC 2.085 tool in GenePattern86 webserver and computed chromosomal regions that were frequently aberrated within each patient cohort using a confidence interval of 0.90. The GISTIC regions with a \({\mathrm{log}}_{2}\) ratio above 0.1 were considered amplified and the GISTIC regions with a \({\mathrm{log}}_{2}\) ratio below − \(0.1\) were considered deleted. We further processed the GISTIC regions of each cancer type using the getRegionWiseGistic function in the miRDriver R package to gather patients from each region with their aberration status (i.e., aberrated and non-aberrated).
In the second step (i.e., the DE Step), we computed DE genes for each GISTIC region. We computed these DE genes between frequently aberrated and non-aberrated patient sample groups in each cancer type cohort using getDifferentiallyExpressedGenes function in miRDriver with default parameters. This function employed edgeR87 package in R utilizing mRNA raw counts to compute DE genes among these two groups using absolute log fold change (logFC) ≥ 1 and adjusted p-value < 0.05. Using the makingCisAndTransGenes function, we annotated DE genes located inside the GISTIC region as cis genes and DE genes outside of the GISTIC region as trans genes. This step also retrieves the miRNAs (i.e., cis miRNAs) in each GISTIC region. Since the number of cis miRNAs per GISTIC region was extremely low, to avoid reducing the sensitivity and precision of our findings, we did not further filter cis miRNAs based on differential expression. The counts of trans genes, cis genes and cis miRNAs for each GISTIC region in eighteen different cancer types can be accessed from Supplemental Table S8.
In the REGULATOR Step (i.e., the third step) of miRDriver, we collected all the potential predictors, namely, cis genes, cis miRNAs, gene-centric copy number data, gene-centric DNA methylation beta values and TFs in each cancer type that could influence each DE trans gene's expression. We used the getTransGenePredictorFile function to gather all the predictors. This function only considered those trans genes that had at least one cis miRNA as a possible predictor.
In the LASSO Step, we computed the potential cis miRNAs that regulate the DE trans genes' expression variation. We used the lassoParallelForTransGene function in the miRDriver R package that utilized R package glmnet88 to perform LASSO to compute miRNA regulators of the DE trans genes. This function considered the gene-centric copy number, gene-centric DNA methylation, TFs, miRNA expression as independent variables and the trans gene's expression as the response variable. For each trans gene, out of all its candidate predictors (i.e., independent variables), LASSO selected a set of non-zero coefficient predictors. Since the independent variables selected by LASSO have been shown to be inconsistent, especially when the sample size gets large89, we ran LASSO 100 times for each trans gene and kept the cis miRNAs selected by LASSO at least 70 times. We found that miRNAs with threshold 70 to be the most consistent set of potential regulator miRNAs to be considered in the computed miRNA-gene interaction networks in each cancer type cohort (Supplemental Fig. S24). To optimize the regularization parameter λ of LASSO, for each of 100 runs, we applied 10-fold cross-validation and picked λ that provided the simplest model with the minimum cross-validation error.
Although miRNAs typically cause the inverse regulation of their target genes59,60,61, miRDriver considers both positively and negatively correlated miRNA-target pairs for each cancer type. Since miRDriver computes miRNA-gene interactions that could be direct or indirect interactions, a positive correlation between them is also possible. Furthermore, a positive correlation between miRNAs and their direct targets is also possible90,91,92,93. The computed miRNA-gene interactions in eighteen different cancer types can be accessed from Supplemental Table S9.
Running state-of-the-art-methods
We compared miRDriver with five state-of-the-art methods, namely, ARACNe5, ProMISe6, HiddenICP7, idaFast8 and jointIDA9 by running them on datasets from eighteen cancer types in TCGA. Since these methods can only utilize gene expression data, we used gene expression data to compute miRNA-gene interaction networks for our comparison For ARACNe, ProMIse and hiddenICP, we used the same number of input genes and miRNAs that we used in miRDriver for each cancer type. Since idaFast and jointIDA methods have high computational complexity and therefore are not scalable to large datasets, we run these two methods with ≤ 50 top miRNAs and ≤ 1500 top genes selected by Feature Selection Based on The Most Variant Median Absolute Deviation (FSbyMAD)94 for each cancer type. After running ARACNe, we selected all of the miRNA-gene interactions that had non-zero scores to be compared with our method. For ProMIse, hiddenICP, idaFast and jointIDA, we considered the top 3, 3, 3.5 and 3.5 percentile of miRNA-gene interactions based on reported scores, respectively. Based on our previous work with the breast cancer cohort, these thresholds were chosen to get highly confident gene-miRNA interactions for comparison and were used for all eighteen different cancer types. The details of running these methods can be found in our previous publication17.
Datasets to run miRDriver on pan-cancer
In this study, we utilized gene expression, CNA, DNA methylation, TF-gene interaction and miRNA expression data from eighteen different cancer types. We used the R Bioconductor package TCGAbiolinks77 to download the genomic data of cancer patient samples from TCGA. We retrieved gene expression quantification data for raw count (Illumina HiSeq) and RNA sequencing data with FPKM (Fragments Per Kilobase of the transcript, per Million, mapped reads) for all the cancer types. TCGA gene expression data consist of mRNAs (i.e., messenger RNAs), lncRNAs, and pseudogenes. Thus, our analysis considered all these RNAs.
We downloaded miRNAs' gene quantification expression with file type hg19.mirbase20.mirna and isoform gene quantification data with file type hg19.mirbase20.isoform from the legacy data of TCGA. For each cancer type, we used the miRNAs that have ≥ 0.01 RPM (reads per million mapped reads) value across ≥ 30% of the cohort.
We retrieved masked copy number variation (Affymetrix SNP Array 6.0) and computed the gene-centric copy number value compatible with hg38 using the R Bioconductor package CNTools95.
We downloaded DNA methylation data of Infinium HumanMethylation27 Bead-Chip (27K) and Infinium HumanMethylation450 Bead-Chip (450K) platforms from TCGA. Gene-specific beta values were calculated separately for both platforms. For the 450K platform, the average beta value for promoter-specific probes was considered due to their role in transcriptional silencing96. Given lower coverage in the 27K platform, we utilized all the probes. In this case, we set the DNA methylation of a gene as the average beta values of all its probes.
We downloaded experimentally-validated TF-gene interactions from TRED and TRRUST databases to incorporate TF-gene interactions in the LASSO step. Table 1 shows the sample sizes of different data modalities used in this study for eighteen different cancer types from TCGA.
Datasets to evaluate miRDriver
To check the correlation between copy number and expression across all the samples of the computed miRNAs of miRDriver, we used TCGA's masked copy number variation (Affymetrix SNP Array 6.0) data. We utilized the R Bioconductor package CNTools95 to compute the miRNA-centric copy number value by giving miRNA coordinates extracted from the TCGA's legacy data file type hg19.mirbase20.isoform.
To evaluate if the miRNAs computed by miRDriver were enriched in cancer-related miRNAs, we downloaded a list of 351 known oncogenic miRNAs from the oncomiRDB database97. Each miRNA listed in oncomiRDB is involved in at least one cancer-related phenotype or cellular process. We harmonized the names of oncomiRDB miRNAs regarding the miRBase98 database.
To check if the miRNA-gene interactions computed by miRDriver were significantly enriched in the known miRNA-gene interactions, we performed a hypergeometric test for the computed target genes of each miRNA. For this purpose, we compiled a list of experimentally-validated miRNA-gene interactions from miRTarBasev6.1, TarBasev7.0 and miRWalk databases99 as our ground truth data. Considering that miRDriver could compute direct targets and the indirect downstream targets (i.e., targets of the direct targets), we included potential indirect targets to the ground truth dataset. Hence, for each miRNA-gene interaction where the gene was a known TF, we included the experimentally-validated targets of this TF obtained from TRED and TRRUST databases.
To assess the prognostic relevance of the miRDriver-selected miRNAs as clinical biomarkers, we performed multivariate survival analysis82 and multivariate Cox regression84. We downloaded the clinical data for eighteen different cancer types using TCGAbiolinks77. We considered the available clinical variables from age, race, gender, stage, and grade as independent variables whenever available (see Table 14).
We considered four different endpoints, namely, OS, PFI, DSS and DFI. In OS, patients who were dead from any cause were considered as dead, otherwise censored. In PFI, patients having new tumor event whether it was a progression of the disease, local recurrence, distant metastasis, new primary tumor event, or died with cancer without new tumor event, including cases with a new tumor event whose type is N/A were considered as "event occurred" and all other patients were censored. DFI was similar to PFI with the inclusion of censored patients with new primary tumors in other organs; patients who were dead with tumors without new tumor event and patients with stage IV were excluded. In DSS, disease-specific survival time in days, last contact days, or death days, whichever was larger, was used to identify "event occurred" versus censored patients100.
We checked the subtype-specific association of gene expression of computed target genes in BRCA, LGG, KIRC, LUSC and PAAD. We used the R package TCGAbiolinks77 to download the different subtype labels for the different cancer types.
Discussion
We developed a computational pipeline called miRDriver, which integrates multi-omics datasets such as CNA, DNA methylation, TFs, gene, and miRNA expression to infer copy number-derived miRNA-gene interactions in cancer. In the current study, we extended the use of miRDriver with an R package and carried out a comprehensive and rigorous analysis of the pan-cancer characterization of TCGA samples to infer miRNA-gene interaction networks integrating multi-omics datasets. We focused on DNA aberration regions of 7294 cancer samples associated with eighteen different cancer types uncovering the tissue-specific omics interplay in establishing the miRNA–gene associations. miRDriver outperformed several existing methods in all different cancer types used in the study. In each case, miRDriver was able to select many miRNA-gene interactions enriched in known miRNA-target databases. We observed that selected miRNAs by miRDriver were significantly enriched in the known cancer-related miRNAs.
Several cancer-related biological pathways and GO terms were found to be enriched in the computed genes. Among these, GPCR-related pathways, which play crucial roles in tumor development, invasion, migration, survival, and metastasis, were enriched in ten or more cancer types. More than 40% of the total computed genes were cited in cancer-related studies based on OncoScore frequency. Among these, at least 50% of genes had more than ten cancer-related citations.
We highlighted 22 common miRNAs that were frequently selected in multiple cancer types and explored their prognostic roles. Several of these miRNAs had significant survival differences in high and low-expression patient sample groups. Among these, miRNAs belonging to the let-7 family were found to act as both tumor suppressors and oncogene in several studies101. miR-100, miR-149, miR-210, miR-31, miR-346, miR-34b, miR-486 and miR-675 were cited in cancer-related studies with high OncoScore frequency. We found several enriched GO terms with the computed targets of these 22 common miRNAs. Among these, GO terms such as "Regulation of gene silencing by miRNA" and "Regulation of post-transcriptional gene silencing" were implicated in several cancer-related studies explaining the miRNAs' roles in cancer initiation and progression 53,102. The GO term "Chromatin silencing" was involved in cancer 49,103. The GO term "DNA replication-dependent nucleosome assembly" has been studied concerning cell fate and differentiation regulation and suggested to be explored in cancer in a recent study104.
We also assessed these common miRNAs as non-invasive biomarkers, such as the presence of these miRNAs as the circulating miRNAs that can be detected in organic liquids effectively after getting discharged by the tumor cells. For this purpose, we submitted these 22 miRNAs to the MiRandola105 database as a knowledge base for extracellular circulating miRNAs for inferring their relevance as non-invasive biomarkers. We found ten out of 22 common miRNAs, namely let-7b, miR-100, miR-1249, miR-149, miR-210, miR-31, miR-346, miR-34b, miR-486 and miR-675, to be as potent non-invasive biomarkers.
Although there were common miRNAs across multiple cancer types, there were not many common miRNA-gene interactions. Only fourteen common interactions were shared in at least two cancer types among ~ 10,000 computed interactions. Considering the much higher number of target genes than the miRNAs used in this analysis, these findings were not surprising. We discussed several of these interactions that were found to be in experimental studies.
We identified several cancer driver genes targeted by multiple miRNAs (i.e., high-degree genes) across different cancer types. Also, high-degree target genes have been shown to have a strong association with the molecular subtypes in multiple cancer types, such as BRCA, LGG, LUSC and PAAD. Specifically, in BRCA, 106 high-degree genes (three genes were common with PAM50 genes) were found to serve as subtype-specific gene signatures with high classification accuracy with respect to the baseline PAM50 gene-based subtypes. We compared the prognostic significance of low-degree target genes with high-degree target genes in the disease progression and survival hazards. We discovered high-degree genes to be more significant prognostic factors than low-degree genes. These findings point out that multiple miRNAs in coordination can impact the gene expression stronger than a single miRNA.
The presented pan-cancer-wide analysis discovering copy number-aberration-influenced miRNA-target associations may be used in future experimental work to validate the roles of the miRNAs in context-specific gene regulation to derive even greater confidence in their tissue-specific associations. We integrated several potential co-regulators such as CNA, DNA methylation, miRNA expression and TFs, that can influence trans gene's expression in the LASSO step. Other potential regulators such as histone modification and chromatin accessibility (such as ATAC-seq) could also be integrated. miRDriver outperformed the existing sequence-based ceRNA inference tool, Cupid. This analysis reveals that this work can be further examined by taking into account the presence of recognized target sites that contribute to gene regulation, as well as utilizing ceRNA interactions to improve the inferred miRNA-gene networks. miRDriver does compute both direct and indirect targets of miRNAs, which helps decipher the downstream biological processes and pathways regulated by these miRNAs. To identify the direct targets of these selected miRNAs, one could utilize sequence-based filtering.
Finally, in this study, we established miRDriver as an R software package and provided users with a variety of options for running our workflow with their preferred settings. Users can, for example, utilize the tool exclusively with up or down-regulated genes from amplified or deleted regions, or both. However, in these cases, the context in which miRNA-gene interactions are discovered will limit their detection. To receive the most comprehensive list of miRNA-gene interactions, we propose that users evaluate all of the directions. In the software, we have also included the flexibility to utilize user-defined TF-targets with evidence-based confidence levels filtering options for cancer-related TF-target interactions from the DoRothEA gene set resource106. In this study, however, we used only the highly confident TF-target interactions from TRED and TRRUST in the LASSO step as using many predictors in LASSO could affect its performance, and cause false positive and false negative interactions. Furthermore, considering gene expression is controlled at multiple levels, including transcriptional regulation and post-transcriptional regulation, our software provides the flexibility to run the LASSO step in two phases. In the first run, only the transcriptional predictors could be utilized to explain the expression variation. In the second run, post-transcriptional predictors and the residual of the first LASSO run can be utilized as the independent and dependent variables, respectively. Alternatively, if the user has the transcriptional and post-transcriptional expression change data, both LASSO runs can be performed in any order. The details of all these options can be accessed in the vignette of the miRDriver R package.
Data availability
The miRDriver pipeline was developed as an R package. The source codes of the package are available at https://github.com/bozdaglab/miRDriver under Creative Commons Attribution Non Commercial 4.0 International Public License. The scripts for running the pipeline and the evaluation results can be accessed from the supplementary documents. The datasets can be accessed from Figshare via https://figshare.com/s/7400ad8445b2e78e4636 .
References
He, L. & Hannon, G. J. MicroRNAs: Small RNAs with a big role in gene regulation. Nat. Rev. Genet. 5, 522–531 (2004).
Esquela-Kerscher, A. & Slack, F. J. Oncomirs—MicroRNAs with a role in cancer. Nat. Rev. Cancer 6, 259–269 (2006).
Liu, W., Lv, C., Zhang, B., Zhou, Q. & Cao, Z. MicroRNA-27b functions as a new inhibitor of ovarian cancer-mediated vasculogenic mimicry through suppression of VE-cadherin expression. RNA 23, 1019–1027 (2017).
Parikh, A. et al. microRNA-181a has a critical role in ovarian cancer progression through the regulation of the epithelial–mesenchymal transition. Nat. Commun. 5, 1–16 (2014).
Margolin, A. A. et al. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S7 (2006).
Li, Y., Liang, C., Wong, K.-C., Jin, K. & Zhang, Z. Inferring probabilistic miRNA–mRNA interaction signatures in cancers: a role-switch approach. Nucleic Acids Res. 42, e76 (2014).
Pham, V. V. et al. Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction. BMC Bioinformatics 20, 143 (2019).
Williams, J. Causal inference using invariant prediction: identification and confidence intervals | Max Planck Institute for Intelligent Systems. https://is.tuebingen.mpg.de/.
Le, T. D. et al. Inferring microRNA–mRNA causal regulatory relationships from expression data. Bioinformatics 29, 765–771 (2013).
Shlien, A. & Malkin, D. Copy number variations and cancer. Genome Med. 1, 62 (2009).
Taylor, B. S. et al. Functional copy-number alterations in cancer. PLoS ONE 3, e3179 (2008).
Bertoli, G., Cava, C. & Castiglioni, I. MicroRNAs: New biomarkers for diagnosis, prognosis, therapy prediction and therapeutic tools for breast cancer. Theranostics 5, 1122–1143 (2015).
Calin, G. A. et al. MiR-15a and miR-16-1 cluster functions in human leukemia. Proc. Natl. Acad. Sci. 105, 5166–5171 (2008).
Zhang, L. et al. microRNAs exhibit high frequency genomic alterations in human cancer. Proc. Natl. Acad. Sci. USA 103, 9136–9141 (2006).
Setty, M. et al. Inferring transcriptional and microRNA-mediated regulatory programs in glioblastoma. Mol. Syst. Biol. 8, 605 (2012).
Li, Y., Liang, M. & Zhang, Z. Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia. PLOS Comput. Biol. 10, e1003908 (2014).
Bose, B. & Bozdag, S. miRDriver: A Tool to Infer Copy Number Derived miRNA-Gene Networks in Cancer. in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 366–375 (Association for Computing Machinery, 2019). https://doi.org/10.1145/3307339.3342172.
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996).
Ashburner, M. et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Braschi, B. et al. Genenames.org: The HGNC and VGNC resources in 2019. Nucleic Acids Res. 47, D786–D792 (2019).
Chiu, H.-S. et al. Cupid: Simultaneous reconstruction of microRNA-target and ceRNA networks. Genome Res. 25, 257–267 (2015).
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Ulgen, E., Ozisik, O. & Sezerman, O. U. pathfindR: An R package for comprehensive identification of enriched pathways in omics data through active subnetworks. Front. Genet. 10, 858 (2019).
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545 (2005).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Gonzalez, H., Hagerling, C. & Werb, Z. Roles of the immune system in cancer: From tumor initiation to metastatic progression. Genes Dev. 32, 1267–1284 (2018).
Nicolini, A., Ferrari, P., Diodati, L. & Carpi, A. Alterations of signaling pathways related to the immune system in breast cancer: New perspectives in patient management. Int. J. Mol. Sci. 19, 2733 (2018).
Arakaki, A. K. S., Pan, W.-A. & Trejo, J. GPCRs in cancer: Protease-activated receptors, endocytic adaptors and signaling. Int. J. Mol. Sci. 19, 1886 (2018).
Bar-Shavit, R. et al. G Protein-Coupled Receptors in Cancer. Int J Mol Sci 17, 1320 (2016).
Murray, I. A., Patterson, A. D. & Perdew, G. H. Aryl hydrocarbon receptor ligands in cancer: friend and foe. Nat. Rev. Cancer 14, 801–814 (2014).
van Waarde, A. et al. Potential applications for sigma receptor ligands in cancer diagnosis and therapy. Biochim. Biophys. Acta 10, 2703–2714. https://doi.org/10.1016/j.bbamem.2014.08.022 (2015).
Nguyen-Vu, T. et al. Liver × receptor ligands disrupt breast cancer cell proliferation through an E2F-mediated mechanism. Breast Cancer Res. 15, R51 (2013).
Salik, B. et al. Targeting RSPO3-LGR4 signaling for leukemia stem cell eradication in acute myeloid leukemia. Cancer Cell 38, 263-278.e6 (2020).
Gong, X. et al. Aberrant RSPO3-LGR4 signaling in Keap1-deficient lung adenocarcinomas promotes tumor aggressiveness. Oncogene 34, 4692–4701 (2015).
Jiang, X. et al. miR-22 has a potent anti-tumour role with therapeutic potential in acute myeloid leukaemia. Nat. Commun. 7, 11452 (2016).
Wang, J. et al. Molecular mechanisms and clinical applications of miR-22 in regulating malignant progression in human cancer (Review). Int. J. Oncol. 50, 345–355 (2016).
Mhawech-Fauceglia, P. et al. Pax-5 immunoexpression in various types of benign and malignant tumours: a high-throughput tissue microarray analysis. J. Clin. Pathol. 60, 709–714 (2007).
Adler, E. K. et al. The PAX8 cistrome in epithelial ovarian cancer. Oncotarget 8, 108316–108332 (2017).
Belotte, J. et al. The role of oxidative stress in the development of cisplatin resistance in epithelial ovarian cancer. Reprod. Sci. 21, 503–508 (2014).
López-Urrutia, E., BustamanteMontes, L. P., Ladrón de Guevara Cervantes, D., Pérez-Plasencia, C. & Campos-Parra, A. D. Crosstalk between long non-coding RNAs, micro-RNAs and mRNAs: Deciphering molecular mechanisms of master regulators in cancer. Front. Oncol. 9, 669 (2019).
Paraskevopoulou, M. D. & Hatzigeorgiou, A. G. Analyzing MiRNA-LncRNA interactions. Methods Mol Biol 1402, 271–286 (2016).
Jiang, M.-C., Ni, J.-J., Cui, W.-Y., Wang, B.-Y. & Zhuo, W. Emerging roles of lncRNA in cancer and therapeutic opportunities. Am. J. Cancer Res. 9, 1354–1366 (2019).
Zhang, J. et al. The transcriptional landscape of lncRNAs reveals the oncogenic function of LINC00511 in ER-negative breast cancer. Cell Death Dis. 10, 1–16 (2019).
Jin, C., Rajabi, H. & Kufe, D. miR-1226 targets expression of the mucin 1 oncoprotein and induces cell death. Int. J. Oncol. 37, 61–69 (2010).
Ballestar, E. & Esteller, M. The impact of chromatin in human cancer: linking DNA methylation to gene silencing. Carcinogenesis 23, 1103–1109 (2002).
Sarthy, J. F., Henikoff, S. & Ahmad, K. Chromatin bottlenecks in cancer. Trends Cancer 5, 183–194 (2019).
Brock, M. V., Herman, J. G. & Baylin, S. B. Cancer as a manifestation of aberrant chromatin structure. Cancer J. 13, 3–8 (2007).
Foglizzo, M. et al. A bidentate Polycomb Repressive-Deubiquitinase complex is required for efficient activity on nucleosomes. Nat. Commun. 9, 3932 (2018).
Lu, Y. et al. Epigenetic regulation in human cancer: the potential role of epi-drug in cancer therapy. Mol. Cancer 19, 79 (2020).
Perri, F. et al. Epigenetic control of gene expression: Potential implications for cancer treatment. Crit. Rev. Oncol. Hematol. 111, 166–172 (2017).
Oliveto, S., Mancino, M., Manfrini, N. & Biffo, S. Role of microRNAs in translation regulation and cancer. World J. Biol. Chem. 8, 45–56 (2017).
Peng, Y. & Croce, C. M. The role of MicroRNAs in human cancer. Signal Transduct. Target Ther. 1, 1–9 (2016).
Lemoine, N. R. Silencing RNA: A novel treatment for pancreatic cancer?. Gut 54, 1215 (2005).
DeOcesano-Pereira, C. et al. Post-Transcriptional Control of RNA Expression in Cancer. Gene Expression and Regulation in Mammalian Cells - Transcription From General Aspects (IntechOpen, 2018). https://doi.org/10.5772/intechopen.71861.
Dhawan, A., Scott, J. G., Harris, A. L. & Buffa, F. M. Pan-cancer characterisation of microRNA across cancer hallmarks reveals microRNA-mediated downregulation of tumour suppressors. Nat. Commun. 9, 5228 (2018).
Tate, J. G. et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Ritchie, W., Rajasekhar, M., Flamant, S. & Rasko, J. E. J. Conserved Expression Patterns Predict microRNA Targets. PLOS Comput. Biol. 5, e1000513 (2009).
Catalanotto, C., Cogoni, C. & Zardo, G. MicroRNA in control of gene expression: An overview of nuclear functions. Int. J. Mol. Sci. 17, 1712 (2016).
Valencia-Sanchez, M. A., Liu, J., Hannon, G. J. & Parker, R. Control of translation and mRNA degradation by miRNAs and siRNAs. Genes Dev. 20, 515–524 (2006).
Zhang, Z., Wang, Y., Zhang, J., Zhong, J. & Yang, R. COL1A1 promotes metastasis in colorectal cancer by regulating the WNT/PCP pathway. Mol. Med. Rep. 17, 5037–5042 (2018).
Duah, E. et al. Cysteinyl leukotriene 2 receptor promotes endothelial permeability, tumor angiogenesis, and metastasis. Proc. Natl. Acad. Sci. USA 116, 199 (2019).
Pellecchia, A. et al. Overexpression of ETV4 is oncogenic in prostate cells through promotion of both cell proliferation and epithelial to mesenchymal transition. Oncogenesis 1, e20–e20 (2012).
Ganaie, A. A. et al. Characterization of novel murine and human PDAC Cell models: Identifying the role of intestine specific homeobox gene ISX in hypoxia and disease progression. Transl. Oncol. 12(8), 1056–1071. https://doi.org/10.1016/j.tranon.2019.05.002 (2019).
Li, N.-F. et al. Genetic Variations in the KCNJ5 Gene in Primary Aldosteronism Patients from Xinjiang, China. PLoS ONE 8, e54051 (2013).
Yang, X. et al. NTRK1 is a positive regulator of YAP oncogenic function. Oncogene 38, 2778–2787 (2019).
Zhang, L. et al. SALL4, a novel marker for human gastric carcinogenesis and metastasis. Oncogene 33, 5491–5500 (2014).
Tabu, K. et al. A novel function of OLIG2 to suppress human glial tumor cell growth via p27Kip1 transactivation. J. Cell. Sci. 119, 1433–1441 (2006).
Pekow, J. et al. miR-4728-3p functions as a tumor suppressor in ulcerative colitis-associated colorectal neoplasia through regulation of focal adhesion signaling. Inflamm. Bowel Dis. 23, 1328–1337 (2017).
Yu, Q. et al. miRNA-346 promotes proliferation, migration and invasion in liver cancer. Oncol. Lett. 14, 3255–3260 (2017).
An, T. et al. Comparison of alterations in miRNA expression in matched tissue and blood samples during spinal cord glioma progression. Sci. Rep. 9, 9169 (2019).
Sun, C.-C. et al. The lncRNA PDIA3P interacts with miR-185-5p to modulate oral squamous cell carcinoma progression by targeting cyclin D2, molecular therapy. Nucleic Acids 9, 100–110. https://doi.org/10.1016/j.omtn.2017.08.015 (2017).
Yan, W., Liu, Z., Yang, W. & Wu, G. miRNA expression profiles in Smad4-positive and Smad4-negative SW620 human colon cancer cells detected by next-generation small RNA sequencing. Cancer Manag. Res. 10, 5479–5490 (2018).
Canlorbe, G. et al. Identification of microRNA expression profile related to lymph node status in women with early-stage grade 1–2 endometrial cancer. Mod. Pathol. 29, 391–401 (2016).
Zhang, J., Luo, X., Li, H., Deng, L. & Wang, Y. Genome-wide uncovering of STAT3-mediated miRNA expression profiles in colorectal cancer cell lines. Biomed Res Int 2014, 187105 (2014).
Colaprico, A. et al. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 44, e71 (2016).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform manifold approximation and projection. J. Open Sour. Softw. 3, 861 (2018).
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27, 1160–1167 (2009).
Chollet, F. et al. R Interface to Keras. https://github.com/rstudio/keras (2017).
Collisson, E. A., Bailey, P., Chang, D. K. & Biankin, A. V. Molecular subtypes of pancreatic cancer. Nat. Rev. Gastroenterol. Hepatol. 16, 207–220 (2019).
Borgne, F. L. & Foucher, Y. IPWsurvival: Propensity Score Based Adjusted Survival Curves and Corresponding Log-Rank Statistic (2017).
Sano, L. D., Passerini, C. G., Piazza, R., Ramazzotti, D. & Spinelli, R. OncoScore: A tool to identify potentially oncogenic genes (Bioconductor version: Release (3.11), 2020). https://doi.org/10.18129/B9.bioc.OncoScore.
Bradburn, M. J., Clark, T. G., Love, S. B. & Altman, D. G. Survival analysis part II: Multivariate data analysis—An introduction to concepts and methods. Br. J. Cancer 89, 431–436 (2003).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Reich, M., Liefeld, T., Tamayo, P. & Mesirov, J. GenePattern 2.0. Nat. Genet. 38(5), 500–501 (2006).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. https://doi.org/10.18637/jss.v033.i01 (2010).
Tibshirani, R. J. The lasso problem and uniqueness. Electron. J. Statist. 7, 1456–1490 (2013).
Couzigou, J.-M. et al. Positive gene regulation by a natural protective miRNA enables arbuscular mycorrhizal symbiosis. Cell Host Microbe 21, 106–112 (2017).
Vasudevan, S. & Steitz, J. A. AU-rich-element-mediated upregulation of translation by FXR1 and Argonaute 2. Cell 128, 1105–1118 (2007).
Vasudevan, S., Tong, Y. & Steitz, J. A. Switching from repression to activation: MicroRNAs can up-regulate translation. Science 318, 1931–1934 (2007).
Xiao, M. et al. MicroRNAs activate gene transcription epigenetically as an enhancer trigger. RNA Biol 14, 1326–1334 (2017).
Xu, T. & Thuc, L. FSbyMAD: Biological feature (such as gene) selection based on the most... in CancerSubtypes: Cancer subtypes identification, validation and visualization based on multiple genomic data sets. https://rdrr.io/bioc/CancerSubtypes/man/FSbyMAD.html.
Zhang,Jianhua. CNTools: Convert segment data into a region by sample matrix to allow for other high level computational analyses. R package version 1.40.0. (2019).
Maunakea, A. K. et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253–257 (2010).
Wang, D., Gu, J., Wang, T. & Ding, Z. OncomiRDB: A database for the experimentally verified oncogenic and tumor-suppressive microRNAs. Bioinformatics 30, 2237–2238 (2014).
Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: MicroRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34, D140–D144 (2006).
Karagkouni, D. et al. DIANA-TarBase v8: A decade-long collection of experimentally supported miRNA–gene interactions. Nucleic Acids Res. 46, D239–D245 (2018).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400-416.e11 (2018).
Chirshev, E., Oberg, K. C., Ioffe, Y. J. & Unternaehrer, J. J. Let-7 as biomarker, prognostic indicator, and therapy for precision medicine in cancer. Clin. Transl. Med. https://doi.org/10.1186/s40169-019-0240-y (2019).
Macfarlane, L.-A. & Murphy, P. R. MicroRNA: Biogenesis, function and role in cancer. Curr. Genomics 11, 537–561 (2010).
Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).
Serra-Cardona, A. & Zhang, Z. Replication-coupled nucleosome assembly in the passage of epigenetic information and cell identity. Trends Biochem. Sci. 43, 136–148 (2018).
Russo, F. et al. miRandola 2017: A curated knowledge base of non-invasive biomarkers. Nucleic Acids Res 46, D354–D359 (2018).
Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).
Acknowledgements
This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133657.
Author information
Authors and Affiliations
Contributions
B.B. and S.B conceived the study, B.B conducted the study, S.B supervised the study, B.B and M.M developed the software, B.B wrote the manuscript, B.B and S.B reviewed and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bose, B., Moravec, M. & Bozdag, S. Computing microRNA-gene interaction networks in pan-cancer using miRDriver. Sci Rep 12, 3717 (2022). https://doi.org/10.1038/s41598-022-07628-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-07628-z
- Springer Nature Limited