Predicting potential cancer genes by integrating network properties, sequence features and functional annotations

Liu, Wei; Xie, HongWei

doi:10.1007/s11427-013-4500-6

Predicting potential cancer genes by integrating network properties, sequence features and functional annotations

Research Paper
Open access
Published: 10 July 2013

Volume 56, pages 751–757, (2013)
Cite this article

Download PDF

You have full access to this open access article

Science China Life Sciences Aims and scope Submit manuscript

Predicting potential cancer genes by integrating network properties, sequence features and functional annotations

Download PDF

Wei Liu¹ &
HongWei Xie¹

632 Accesses
5 Citations
Explore all metrics

Abstract

The discovery of novel cancer genes is one of the main goals in cancer research. Bioinformatics methods can be used to accelerate cancer gene discovery, which may help in the understanding of cancer and the development of drug targets. In this paper, we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence, including protein-protein interaction network properties, and sequence and functional features. We detected 55 features that were significantly different between cancer genes and non-cancer genes. Fourteen cancer-associated features were chosen to train the classifier. Four machine learning methods, logistic regression, support vector machines (SVMs), BayesNet and decision tree, were explored in the classifier models to distinguish cancer genes from non-cancer genes. The prediction power of the different models was evaluated by 5-fold cross-validation. The area under the receiver operating characteristic curve for logistic regression, SVM, Baysnet and J48 tree models was 0.834, 0.740, 0.800 and 0.782, respectively. Finally, the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database, and 1976 cancer gene candidates were identified. We found that the integrated prediction model performed much better than the models based on the individual biological evidence, and the network and functional features had stronger powers than the sequence features in predicting cancer genes.

Article PDF

Essentiality, protein–protein interactions and evolutionary properties are key predictors for identifying cancer-associated genes using machine learning

Article Open access 22 April 2024

Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer

Article Open access 05 October 2023

Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks

Article Open access 01 August 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Vogelstein B, Kinzler K W. Cancer genes and the pathways they control. Nat Med, 2004, 10: 789–799
Article PubMed CAS Google Scholar
Futreal P A, Coin L, Marshall M, et al. A census of human cancer genes. Nat Rev Cancer, 2004, 4: 177–183
Article PubMed CAS PubMed Central Google Scholar
Strausberg R L, Simpson A J, Wooster R. Sequence-based cancer genomics: progress, lessons and opportunities. Nat Rev Genet, 2003, 4: 409–418
Article PubMed CAS Google Scholar
Altshuler D, Daly M J, Lander E S. Genetic mapping in human disease. Science, 2008, 322: 881–888
Article PubMed CAS PubMed Central Google Scholar
Aragues R, Sander C, Oliva B. Predicting cancer involvement of genes from heterogeneous data. BMC Bioinformatics, 2008, 9: 172
Article PubMed PubMed Central Google Scholar
Furney S J, Higgins D G, Ouzounis C A, et al. Structural and functional properties of genes involved in human cancer. BMC Genomics, 2006, 7: 3
Article PubMed PubMed Central Google Scholar
Ostlund G, Lindskog M, Sonnhammer E L. Network-based Identification of novel cancer genes. Mol Cell Proteomics, 2010, 9: 648–655
Article PubMed PubMed Central Google Scholar
Li L, Zhang K, Lee J, et al. Discovering cancer genes by integrating network and functional properties. BMC Med Genomics, 2009, 2: 61
Article PubMed PubMed Central Google Scholar
Wang E, Lenferink A, O’Connor-McCourt M. Cancer systems biology: exploring cancer-associated genes on cellular networks. Cell Mol Life Sci, 2007, 64: 1752–1762
Article PubMed CAS Google Scholar
Milenkovic T, Memisevic V, Ganesan A K, et al. Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data. J R Soc, 2010, 7: 423–437
Article Google Scholar
Brown K R, Jurisica I. Online predicted human interaction database. Bioinformatics, 2005, 21: 2076–2082
Article PubMed CAS Google Scholar
Alfarano C, Andrade C E, Anthony K, et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res, 2005, 33: D418–D424
Article PubMed CAS PubMed Central Google Scholar
Peri S, Navarro J D, Kristiansen T Z, et al. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res, 2004, 32: D497–D501
Article PubMed CAS PubMed Central Google Scholar
Chatr-aryamontri A, Ceol A, Palazzi L M, et al. MINT: the Molecular INTeraction database. Nucleic Acids Res, 2007, 35: D572–D574
Article PubMed CAS PubMed Central Google Scholar
Cui Q, Ma Y, Jaramillo M, et al. A map of human cancer signaling. Mol Syst Biol, 2007, 3: 152
Article PubMed PubMed Central Google Scholar
Hamosh A, Scott A F, Amberger J S, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res, 2005, 33: D514–D517
Article PubMed CAS PubMed Central Google Scholar
D’Antonio M, Pendino V, Sinha S, et al. Network of Cancer Genes (NCG 3.0): integration and analysis of genetic and network properties of cancer genes. Nucleic Acids Res, 2012, 40: D978–D983
Article PubMed PubMed Central Google Scholar
Maglott D, Ostell J, Pruitt K D, et al. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res, 2007, 35: D26–D31
Article PubMed CAS PubMed Central Google Scholar
Tu Z, Wang L, Xu M, et al. Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC Genomics, 2006, 7: 31
Article PubMed PubMed Central Google Scholar
Frank E, Hall M, Trigg L, et al. Data mining in bioinformatics using Weka. Bioinformatics, 2004, 20: 2479–2481
Article PubMed CAS Google Scholar
Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29–36
Article PubMed CAS Google Scholar
Xu J, Li Y. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics, 2006, 22: 2800–2805
Article PubMed CAS Google Scholar
Kyte J, Doolittle R F. A simple method for displaying the hydropathic character of a protein. J Mol Biol, 1982, 157: 105–132
Article PubMed CAS Google Scholar
Bakheet T M, Doig A J. Properties and identification of human protein drug targets. Bioinformatics, 2009, 25: 451–457
Article PubMed CAS Google Scholar
Harris M A, Clark J, Ireland A, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res, 2004, 32: D258–D261
Article PubMed CAS Google Scholar
Huang da W, Sherman B T, Lempicki R A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res, 2009, 37: 1–13
Article PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

College of Mechanical & Electronic Engineering and Automatization, National University of Defense Technology, Changsha, 410073, China
Wei Liu & HongWei Xie

Authors

Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
HongWei Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to HongWei Xie.

Additional information

This article is published with open access at Springerlink.com

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Liu, W., Xie, H. Predicting potential cancer genes by integrating network properties, sequence features and functional annotations. Sci. China Life Sci. 56, 751–757 (2013). https://doi.org/10.1007/s11427-013-4500-6

Download citation

Received: 25 October 2012
Accepted: 14 May 2013
Published: 10 July 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s11427-013-4500-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Predicting potential cancer genes by integrating network properties, sequence features and functional annotations

Abstract

Article PDF

Similar content being viewed by others

Essentiality, protein–protein interactions and evolutionary properties are key predictors for identifying cancer-associated genes using machine learning

Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer

Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting potential cancer genes by integrating network properties, sequence features and functional annotations

Abstract

Article PDF

Similar content being viewed by others

Essentiality, protein–protein interactions and evolutionary properties are key predictors for identifying cancer-associated genes using machine learning

Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer

Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation