Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis

Verleyen, Wim; Langdon, Simon P.; Faratian, Dana; Harrison, David J.; Smith, V. Anne

doi:10.1038/srep15563

Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis

Article
Open access
Published: 27 October 2015

Volume 5, article number 15563, (2015)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis

Download PDF

Wim Verleyen¹^nAff4,
Simon P. Langdon²,
Dana Faratian²,
David J. Harrison³ &
…
V. Anne Smith¹

1017 Accesses
1 Altmetric
Explore all metrics

Abstract

Current clinical practice in cancer stratifies patients based on tumour histology to determine prognosis. Molecular profiling has been hailed as the path towards personalised care, but molecular data are still typically analysed independently of known clinical information. Conventional clinical and histopathological data, if used, are added only to improve a molecular prediction, placing a high burden upon molecular data to be informative in isolation. Here, we develop a novel Monte Carlo analysis to evaluate the usefulness of data assemblages. We applied our analysis to varying assemblages of clinical data and molecular data in an ovarian cancer dataset, evaluating their ability to discriminate one-year progression-free survival (PFS) and three-year overall survival (OS). We found that Cox proportional hazard regression models based on both data types together provided greater discriminative ability than either alone. In particular, we show that proteomics data assemblages that alone were uninformative (p = 0.245 for PFS, p = 0.526 for OS) became informative when combined with clinical information (p = 0.022 for PFS, p = 0.048 for OS). Thus, concurrent analysis of clinical and molecular data enables exploitation of prognosis-relevant information that may not be accessible from independent analysis of these data types.

A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models

Article Open access 03 November 2018

Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction

Article Open access 25 September 2021

MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature

Article Open access 19 June 2019

Introduction

Most current clinical oncology practice stratifies patients based on tumour histology to inform prognosis. Molecular analyses are heralded as the solution for personalised medicine¹, yet most such analyses view patients in segmented populations, either comparing molecular signatures across clinical and pathological categories^2,3,4,5,6 or evaluating clinicopathological characteristics of clusters based upon molecular features^7,8,9,10. This tends to underestimate the proven value of clinical and pathological information. When clinical and pathological information is used in combination with molecular analyses, it is typically in a post-hoc manner, that is, attempting to improve a molecular model with clinical information¹¹. This places a high burden on molecular data, as it is required to be useful in isolation before the sequential addition of clinicopathological data. Here, we investigate a more integrative approach, using ovarian cancer as an example, where we analyse molecular and clinical data in concert. We take the point of view that molecular data should not replace traditional clinical pathology, but instead add to it.

We show the added value of molecular data in ovarian cancer, a disease with particularly poor prognosis: despite often initially good responses to chemotherapy, 65% die by 5 years^12,13. There are no predictive biomarkers to direct specific treatment regimens¹⁴. Most patients undergo costly, neurotoxic platinum plus taxane therapy, though 20–30% do not respond. Alternative therapy with platinum only or, less commonly, lower toxicity agents can sometimes be equally effective^12,15,16,17. Thus, personalising prognosis to enable better selection of these treatment options would be of great benefit in ovarian cancer.

We take advantage of the Edinburgh Ovarian Cancer Database¹⁸, a resource in which molecular data are available on samples with complete histopathology plus clinical outcomes. We develop a novel Monte Carlo approach to quantify the usefulness of different data assemblages and show that while proteomics data has low information content alone, selected informative proteomic features have high information content when viewed in the context of clinicopathological data.

Results

We measured protein and phosphoprotein profiles of 339 clinically-annotated samples from the Edinburgh Ovarian Cancer Database (EOCD)¹⁸, including markers of proliferation, cell cycle, apoptosis, DNA damage response, estrogen signalling and epithelial to mesenchymal (EMT) transition. We applied a Cox proportional hazards regression model (CPHR) for both progression-free survival (PFS) and overall survival (OS) to this proteomics data alone, clinicopathological data alone and combined proteomics and clinicopathological data (Fig. 1a–c; measures detailed in Table 1; data available in Supplementary Data S1 and described in Supplementary Table S1). The combined models had higher concordance (c-index)¹⁹ than either data type alone (Fig. 1d for PFS; results for OS shown in Supplementary Fig. S1), indicating a greater discriminative ability; however, both the proteomics and combined models showed significant differences in cross-validation, suggesting potential overfitting (Supplementary Table S2).

Table 1 Clinicopathological and proteomic measures.

Full size table

We then developed a novel Monte Carlo (MC) method to assess the information content of variable assemblages, measuring their capacity to discriminate prognoses. We shuffled the values of the variables in question independently with respect to patient (Fig. 2), then built a CPHR, for each of 10,000 randomised datasets. A p-value was calculated as the proportion of randomised datasets with c-index equal to or above the actual model (one-tailed due to directional nature of the c-index). A high (non-significant) p-value indicates that the actual data discriminates prognoses little differently than does randomly assigned data and thus the information content in that data assemblage is low; a low p-value indicates high information content and significant discriminative capacity.

The MC analysis revealed that the proteomic data alone had low information content (P = 0.889 for PFS, 0.617 for OS; Fig. 1e, Supplementary Fig. S1) while the clinicopathological data alone had high information content (P < 0.0001 for both PFS and OS; Fig. 1f, Supplementary Fig. S1). Since we were specifically interested in whether adding proteomics data to the already information-rich clinicopathological data was beneficial, we shuffled only the proteomics data in the combined model. This confirmed that the apparent increased discriminative ability of the combined model was an artefact (P = 0.530 for PFS, 0.117 for OS; Fig. 1g, Supplementary Fig. S1). This MC result held regardless of whether the c-index from the full model (as in Fig. 1) or a corrected c-index based on cross-validation was used (Supplementary Fig. S2).

We then applied LASSO feature selection²⁰ to the data before building our CPHR models, to select only the most informative measures. Again, the combined models had greater discriminative ability than either individual model (Fig. 1h, Supplementary Fig. S1); this time, cross-validation showed no significant differences from the full models (Supplementary Table S2). However, the MC analysis revealed more detail: proteomics data alone still had low information content (P = 0.245 for PFS, 0.526 for OS; Fig. 1i, Supplementary Fig. S1) and clinicopathological high information content (P < 0.0001 for both PFS and OS; Fig. 1j, Supplementary Fig. S1), while the combined models now showed significantly increased discriminative capacity due to the added proteomics (P = 0.022 for PFS, 0.048 for OS; Fig. 1k, Supplementary Fig. S1). Again, the MC result also held if a corrected c-index based on cross validation was used (Supplementary Fig. S2); thus, the significant increase was not due to overfitting in the context of the full model. Because only the proteomics data were shuffled in the combined model, the results in Fig. 1i and Fig. 1k are directly comparable: proteomics data, which alone had low information content, showed added value when used alongside clinicopathological information.

This was not true for the entire proteomics profile, however (Fig. 1e compared to Fig. 1g); thus, only carefully selected molecular measures can significantly increase discriminative ability above that provided by clinicopathological information. Figure 1i–k and Supplementary Fig. S1 show the features selected for PFS and OS, respectively.

Discussion

Our work demonstrates the power of concurrent integration of traditional histopathology plus newer molecular measures to create something greater than either alone. Using proteomic profiles of samples with complete clinicopathological data, we have shown how incorporating molecular alongside clinicopathological data improves survival analyses. In doing so, we have developed a novel Monte Carlo analysis to quantify the usefulness of data assemblages.

Machine learning methodologies in molecular analyses of cancer have been criticised for overfitting problems²¹ and we directly address this problem with our Monte Carlo analysis. We reveal data assemblages with low information content yet high performance, whose performance must then be due to overfitting. Where 10-fold cross validation of the c-index suggested overfitting issues, our MC analysis agreed, showing low information content for both proteomics alone and combined datasets with no feature selection. However, our MC analysis provided further information where cross-validation showed no significant differences, revealing low information content in selected proteomics features alone. Only when these proteomics features were combined with selected clinical features did they prove to be informative.

We found that feature selection before survival analysis is key to producing sensible information out of the molecular data. Using all available proteomic measures in addition to clinicopathological data at first appears to increase the discriminatory ability of survival analysis, but this is in fact due to overfitting. However, if feature selection is first applied, the addition of proteomic to clinicopathological data significantly increases the discriminatory ability of our CPHR model. The measures selected provide insights into the biology of ovarian cancer. E-cadherin is related to cell adhesion and its loss has been reported to be associated with poor survival^22,23,24. Caspase-3 perhaps indicates benefits of propensity to apoptosis and has been associated with more favourable patient outcomes^25,26. pH2AX is a marker of DNA damage repair, while expression of the Wilms’ tumour 1 (WT1) gene has been associated with poor prognosis in ovarian cancer^27,28. In contrast, nuclear beta-catenin expression has been associated with favourable outcomes in this disease^29,30,31.

There is merit in further examination of the data, because the details reveal important features. Comparing Fig. 1d,h reveals that the CPHR models that contain all the proteomic data are more discriminatory (higher c-index) than those with only selected proteomic measures; however, we know this is due to overfitting from the MC analysis (Fig. 1g). Yet even the selected proteomics measures alone have poor discrimination (c-index close to 0.5) and non-significant MC p-values (Fig. 1i), indicating low information content. Only when these selected proteomics measures are combined with clinicopathological measures do we see improvement in the c-index and significant information content revealed by MC analysis (Fig. 1k). In particular, this MC analysis is directly comparable to that with just proteomics: since only the proteomics variables are shuffled, only the information content of these proteomics measures are revealed. Thus, the information content of the proteomics differs depending on the context. The proteomic data, which alone was uninformative, added value when used alongside clinicopathological information.

The above shows the power of our MC approach for assessing data assemblages. The information content of a data set can be assessed as a whole by shuffling all variables; alternatively, shuffling only those additional variables assesses the benefit of adding specific measurements to an already useful group of features. Thus, we present a method of quantifying usefulness of measures when direct success of a model may be less meaningful due to overfitting concerns. This quantification methodology could be applied to evaluate the discriminative ability of features used to assess patient outcome in many diseases, a necessary step for personalised medicine.

Our work demonstrates the path towards a systems pathology approach for personalised medicine. We move beyond sequential application of clinicopathological and molecular data to stratify groups or to refine models. We analyse proteomics data in concert with traditional histology and clinical measures, enabling better discrimination than either alone. This was true even though the proteomics data was uninformative alone, a stage at which many such molecular studies might otherwise be abandoned. Our Monte Carlo-based assessment of information content can quantify the added value of new data, thus both enabling the identification of beneficial variable additions and avoiding overfitting. Our results generalise to other diseases where long-established pathological analyses already produce valuable information that should not be ignored.

Methods

Study Population

Formalin-fixed, paraffin-embedded ovarian tumour samples were obtained from the Edinburgh Ovarian Cancer Database (EOCD) as previously described^8,18. The data set consisted of 339 samples, which form a subset of those analysed in Faratian et al.⁸. This research was approved by the Lothian Research Ethics Committee (08/S1101/41).

Clinicopathological Measures

Samples in the EOCD were annotated with clinicopathological information which were divided into “input” measures—those relating to patient, disease and treatment characteristics—and “output” measures—those relating to survival. A summary of the clinicopathological measures is shown in Table 1; data are available in Supplementary Data S1 and described in Supplementary Table S1. The output measure of progression-free survival (PFS) represents the number of days between the start of treatment and the first signs of cancer recurrence; overall survival (OS) represents the number of days between the first histological diagnosis and the day of death. Both survival measures were right-censored.

Proteomic Measures

Proteins and subcellular location measured are shown in Table 1. Protein and phosphoprotein levels were obtained by automated quantitative immunofluorescence using carefully validated antibodies as previously described⁸. Briefly, tissue microarrays were constructed using triplicate samples from each tumour. Immunofluorescence detection of phosphoprotein and other targets was performed using methods previously described^8,32; antibodies and conditions used are shown in Supplementary Table S3. Pan-cytokeratin antibody was used to identify infiltrating tumour cells, DAPI counterstain to identify nuclei and Cy-5-tyramide detection of target for compartmentalised (tissue and subcellular) analysis of tissue sections. Monochromatic images of each TMA core were captured at x20 objective using an Olympus AX-51 epifluorescence microscope and high-resolution digital images analysed by the AQUAnalysis^TM software. If the epithelium comprised <5% of total core area, the core was excluded from analysis. Protein and phosphoprotein expression was quantified by calculating the Cy5 fluorescence signal intensity on a scale of 0–255 within each image pixel and the AQUA score generated by dividing the sum of Cy5 signal within the epithelium by the area of the cytoplasm or nucleus for cytoplasmic or nuclear measurements, respectively. AQUA scores were averaged from triplicate cores and mean values obtained.

Survival Analysis

Cox proportional hazards regression (CHPR) was applied to clinicopathological inputs and proteomic measures, using the cph function in the R package rms (Breslow method; x and y set to ‘TRUE’ for use in cross-validation, below), to predict both PFS and OS. Models without feature selection were full multivariate models using all measures in Table 1; models using LASSO feature selection were multivariate models including those features as noted in Fig. 1 and Supplementary Figure S1. Validity of the proportional hazards assumption was assessed using visual inspection of plots from the R functions survplot and cox.zph and examination of statistics of Schoenfeld residuals. Coefficients with 95% confidence intervals and associated Schoenfeld residual statistics for all models are presented in Supplementary Table S4. CPHR models were assessed using the concordance index (c-index)¹⁹, available from the R function validate. The c-index represents the probability that, for two randomly chosen patients, the model correctly orders the patients in their outcome measure (here PFS and OS). Ten-fold cross-validation was performed computing the c-index for each resample (dxy = ‘TRUE’) and repeated 100 times to provide average performance in cross-validation.

Feature Selection

Feature selection was performed using the least absolute shrinkage and selection operator (LASSO)²⁰ to identify the most informative features for OS and PFS. LASSO was applied using functions optL1 and profL1 in the R package penalized (and verified with glmnet); the sparsity parameter (λ) was obtained by a likelihood cross-validation with settings: 10-folds and the sparsity parameter lies in the interval: 0.001 < λ < 50.

Monte Carlo Analysis

We developed a novel Monte Carlo analysis to evaluate information content of any variable assemblage. Figure 2 describes the shuffling methodology graphically: each variable is shuffled independently of all others and of patient outcome; all variables or a subset can be shuffled to analyse the information content of the entire assemblage or a particular group, respectively. This methodology can be applied with any analysis method that provides a scalar performance measure; we applied it to CHPR models evaluated via the c-index (see Results). R code to perform our Monte Carlo analysis for CHPR models is provided as Supplementary Data S2; an example vignette applying it to our data is available as Supplementary Note S1.

Additional Information

How to cite this article: Verleyen, W. et al. Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis. Sci. Rep. 5, 15563; doi: 10.1038/srep15563 (2015).

References

Patani, N., Martin, L.-A. & Dowsett, M. Biomarkers for the clinical management of breast cancer: International perspective. Int. J. Cancer 133, 1–13 (2013).
Article CAS Google Scholar
Marquez, R. T. et al. Patterns of gene expression in different histotypes of epithelial ovarian cancer correlate with those in normal fallopian tube, endometrium and colon. Human Cancer Biology 11, 6116–6126 (2005).
CAS Google Scholar
Santin, A. D. et al. Discrimination between uterine serous papillary carcinomas and ovarian serous papillary tumours by gene expression profiling. Brit. J. Cancer 90, 1814–1824 (2004).
Article CAS Google Scholar
Albain, K. et al. Prognostic and predictive value of the 21-gene recurrence score assay in postmenopausal, node-positive, estrogen receptor-positive breast cancer. Lancet Oncol. 11, 55–65 (2010).
Article CAS Google Scholar
Schwartz, D. R. et al. Gene expression in ovarian cancer reflects both morphology and biological behavior, distinguishing clear cell from other poor-prognosis ovarian carcinomas. Cancer Res. 62, 4722–4729 (2002).
CAS PubMed Google Scholar
Zorn, K. K. et al. Gene expression profiles of serous, endometrioid and clear cell subtypes of ovarian and endometrial cancer. Human Cancer Biology 11, 6422–6430 (2005).
CAS Google Scholar
Wamunyokoli, F. W. et al. Expression profiling of mucinous tumors of the ovary identifies genes of clinicopathologic importance. Clin. Cancer Res. 12, 690–700 (2006).
Article CAS Google Scholar
Faratian, D. et al. Phosphoprotein pathway profiling of ovarian carcinoma for the identification of potential new targets for therapy. Eur. J. Cancer 47, 1420–1431 (2011).
Article CAS Google Scholar
Tothill, R. W. et al. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 14, 5198–5208 (2008).
Article CAS Google Scholar
Warrenfeltz, S. et al. Gene expression profiling of epithelial ovarian tumours correlated with malignant potential. Mol. Cancer 3, 27 (2004).
Article Google Scholar
Verhaak, R. G. W. et al. Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. J Clin Invest. 123, 517–525 (2012).
PubMed PubMed Central Google Scholar
Cannistra, S. A. Cancer of the ovary. N. Engl. J. Med. 351, 2519–2529 (2004).
Article CAS Google Scholar
Levi, F., Lucchini, F., Negri, E. & La Vecchia, C. Trends in mortality from major cancers in the European Union, including acceding countries, in 2004. Cancer 101, 2843–2850 (2004).
Article Google Scholar
Faratian, D., Clyde, R. G., Crawford, J. W. & Harrison, D. J. Systems pathology: taking molecular pathology into a new dimension. Nat. Rev. Clin. Oncol. 6, 455–464 (2009).
Article Google Scholar
Muggia, F. M. et al. Phase III randomized study of cisplatin versus paclitaxel versus cisplatin and paclitaxel in patients with suboptimal stage III or IV ovarian cancer: a gynecologic oncology group study. J. Clin. Oncol. 18, 106–115 (2000).
Article CAS Google Scholar
Banerjee, S. & Gore, M. The future of targeted therapies in ovarian cancer. Oncologist 14, 706–716 (2009).
Article CAS Google Scholar
Yap, T. A., Carden, C. P. & Kaye, S. B. Beyond chemotherapy: targeted therapies in ovarian cancer. Nat. Rev. Cancer 9, 167–181 (2009).
Article CAS Google Scholar
Clark, T. G., Stewart, M. E., Altman, D. G., Gabra, H. & Smyth, J. F. A prognostic model for ovarian cancer. Brit. J. Cancer 85, 944–952 (2001).
Article CAS Google Scholar
Harrell, F. E. J., Califf, R. M., Pryor, D. B. & Rosati, K. L. L. R. A. Evaluating the yield of medical tests. J. Amer. Med. Assoc. 247, 2543–2546 (1982).
Article Google Scholar
Tibshirani, R. The LASSO method for variable selection in the Cox model. Statistics in Medicine 16, 385–395 (1997).
Article CAS Google Scholar
Ransohoff, D. F. Bias as a threat to the validity of cancer molecular-marker reserach. Nat. Rev. Cancer 5, 142–149 (2005).
Article CAS Google Scholar
Bačić, B. et al. Prognostic role of E-cadherin in patients with advanced serous ovarian cancer. Arch. Gynecol. Obstet. 287, 1219–1224 (2012).
Article Google Scholar
Ho, C. M. et al. Prognostic and predictive values of E-cadherin for patients of ovarian clear cell adenocarcinoma. Int. J. Gynecol. Cancer 20, 1490–1497 (2010).
PubMed Google Scholar
Quattrocchi, L., Green, A. R., Martin, S., Durrant, L. & Deen, S. The cadherin switch in ovarian high-grade serous carcinoma is associated with disease progression. Virchows Arch. 459, 21–29 (2011).
Article CAS Google Scholar
Flick, M. B. et al. Apoptosis-based evaluation of chemosensitivity in ovarian cancer patients. J. Soc. Gynecol. Investig. 11, 252–259 (2004).
Article CAS Google Scholar
Kleinberg, L. et al. Cleaved caspase-3 and nuclear factor-κB p65 are prognostic factors in metastatic serous ovarian carcinoma. Hum. Pathol. 40, 795–806 (2009).
Article CAS Google Scholar
Netinatsunthorn, W., Hanprasertpong, J., Chavaboon Dechsukhum, Leetanaporn, R. & Geater, A. WT1 gene expression as a prognostic marker in advanced serous epithelial ovarian carcinoma: an immunohistochemical study. BMC Cancer 6, 90 (2006).
Article Google Scholar
Yamamoto, S. et al. Clinicopathological significance of WT1 expression in ovarian cancer: a possible accelerator of tumor progression in serous adenocarcinoma. Virchows Arch. 451, 27–35 (2007).
Article CAS Google Scholar
Gamallo, C. et al. β-catenin expression pattern in stage I and II ovarian carcinomas : relationship with β-catenin gene mutations, clinicopathological features and clinical outcome. Am. J. Pathol. 155, 527–536 (1999).
Article CAS Google Scholar
Kildal, W. et al. Beta-catenin expression, DNA ploidy and clinicopathological features in ovarian cancer: a study in 253 patients. Eur. J. Cancer 41, 1127–1134 (2005).
Article CAS Google Scholar
Lee, C. M. et al. Beta-catenin nuclear localization is associated with grade in ovarian serous carcinoma. Gynecol. Oncol. 88, 363–368 (2003).
Article CAS Google Scholar
Faratian, D. et al. Systems biology reveals new strategies for personalizing cancer medicine and confirms the role of PTEN in resistance to trastuzumab. Cancer Res. 69, 6713–6720 (2009).
Article CAS Google Scholar

Download references

Acknowledgements

WV is a SULSA Systems Biology Prize PhD Student; VAS is supported by the BBSRC Research Council [grant number BB/F001398/1] and Medical Research Scotland [grant number FRG353]. DJH is supported by CASyM Concerted Action [grant number EU HEALTH-F4-2012-305033] and the Chief Scientist Office of Scotland.

Author information

Wim Verleyen
Present address: Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laborator, , 500 Sunnyside Boulevard, Woodbury, NY, 11797, USA

Authors and Affiliations

School of Biology, University of St Andrews, St Andrews, Fife, KY16 9TH, UK
Wim Verleyen & V. Anne Smith
Division of Pathology, University of Edinburgh, Edinburgh, EH4 2XU, UK
Simon P. Langdon & Dana Faratian
School of Medicine, University of St Andrews, St Andrews, Fife, KY16 9TF, UK
David J. Harrison

Authors

Wim Verleyen
View author publications
You can also search for this author in PubMed Google Scholar
Simon P. Langdon
View author publications
You can also search for this author in PubMed Google Scholar
Dana Faratian
View author publications
You can also search for this author in PubMed Google Scholar
David J. Harrison
View author publications
You can also search for this author in PubMed Google Scholar
V. Anne Smith
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.A.S. and D.J.H. conceived the study. W.V. and V.A.S. performed computational analyses. D.F. and S.P.L. conducted proteomics measurements. D.J.H. provided clinical samples. All authors consulted on analyses and results and prepared the manuscript together.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplementary Data S1

Supplementary Data S2

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Verleyen, W., Langdon, S., Faratian, D. et al. Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis. Sci Rep 5, 15563 (2015). https://doi.org/10.1038/srep15563

Download citation

Received: 29 July 2013
Accepted: 22 September 2015
Published: 27 October 2015
DOI: https://doi.org/10.1038/srep15563
Springer Nature Limited

Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis

Abstract

Similar content being viewed by others

A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models

Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction

MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature

Introduction

Results

Discussion

Methods

Study Population

Clinicopathological Measures

Proteomic Measures

Survival Analysis

Feature Selection

Monte Carlo Analysis

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Supplementary Data S1

Supplementary Data S2

Rights and permissions

About this article

Cite this article

Navigation

Novel Monte Carlo approach quantifies data assemblage utility and reveals power of integrating molecular and clinical information for cancer prognosis

Abstract

Similar content being viewed by others

A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models

Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction

MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature

Introduction

Results

Discussion

Methods

Study Population

Clinicopathological Measures

Proteomic Measures

Survival Analysis

Feature Selection

Monte Carlo Analysis

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Supplementary Data S1

Supplementary Data S2

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation