Abstract
Genes associated with similar diseases are often functionally related. This principle is largely supported by many biological data sources, such as disease phenotype similarities, protein complexes, protein-protein interactions, pathways and gene expression profiles. Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases. To capture the gene-disease associations based on biological networks, a kernel-based MRF method is proposed by combining graph kernels and the Markov random field (MRF) method. In the proposed method, three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks, respectively, and a novel weighted MRF method is developed to integrate those data. In addition, an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method. Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm, achieving an AUC score of 0.771 when integrating all those biological data in our experiments, which indicates that our proposed method is very promising compared with many existing methods.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Hwang T, Zhang W, Xie M, Liu J, Kuang R. Inferring disease and gene set associations with rank coherence in networks. Bioinformatics, 2011, 27: 2692–2699
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol, 2010, 6: e1000641
Li Y, Agarwal P. A pathway-based view of human diseases and disease relationships. PLoS One, 2009, 4: e4346
Wu X, Jiang R, Zhang MQ, Li S. Network-based global inference of human disease genes. Mol Syst Biol, 2008, 4: 189
Ma X, Lee H, Wang L, Sun F. CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics, 2007, 23: 215–221
Lage K, Karlberg EO, Størling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol, 2007, 25: 309–316
Chen Y, Wang W, Zhou Y, Shields R, Chanda SK, Elston RC, Li J. In silico gene prioritization by integrating multiple data sources. PLoS One, 2011, 6: e21137
Strohman R. Maneuvering in the complex path from genotype to phenotype. Science, 2002, 296: 701–703
Deng M, Zhang K, Mehta S, Chen T, Sun F. Prediction of protein function using protein-protein interaction data. J Comput Biol, 2003, 10: 947–960
Deng M, Chen T, Sun F. An integrated probabilistic model for functional prediction of proteins. J Comput Biol, 2004, 11: 463–475
Kourmpetis YA, van Dijk AD, Bink MC, van Ham RC, ter Braak CJ. Bayesian Markov random field analysis for protein function prediction based on network data. PLoS One, 2010, 5: e9293
Lee H, Tu Z, Deng M, Sun F, Chen T. Diffusion kernel-based logistic regression models for protein function prediction. OMICS, 2006, 10: 40–55
Deng M, Tu Z, Sun F, Chen T. Mapping gene ontology to proteins based on protein-protein interaction data. Bioinformatics, 2004, 20: 895–902
Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics, 2003, 19: i197–i204
Wei Z, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics, 2007, 23: 1537–1544
Kondor RI, Lafferty J. Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the Nineteenth International Conference on Machine Learning, San Mateo, USA, 2002. 315–322
Ma X, Chen T, Sun F. Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks. Brief Bioinform, 2014, 15: 685–698
Schölkopf B, Tsuda K, Vert JP. Kernel Methods in Computational Biology. Cambridge: The MIT Press, 2004
Chen B, Wang J, Wu FX. Prioritizing human disease genes by multiple data integration. In: IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China, 2013. 621
Chen B, Wang J, Li M, Wu FX. Identifying disease genes by integrating multiple data sources. BMC Med Genomics, 2014, Suppl2: S2
Li SZ. Markov Random Field Modeling in Image Analysis. 3rd ed. Berlin Heidelberg: Springer, 2009
Besag J. Spatial interaction and the statistical analysis of lattice systems. J Royal Statist Soc B, 1974, 36: 192–236
Kolaczyk ED. Statistical Analysis of Network Data. Berlin Heidelberg: Springer, 2009
Kamberova G. Markov random field models: a Bayesian approach to computer vision problems. Department of Computer & Information Science Technical Reports, University of Pennsylvania, 1992
Suess EA, Trumbo BE. Introduction to probability simulation and Gibbs sampling with R. New York: Springer, 2010
McKsick VA. Mendelian inheritance in man and its online version, OMIM. Am J Hum Genet, 2007, 80: 588–604
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease network. Proc Natl Acad Sci USA, 2007, 104: 8685–8690
Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes HW. CORUM: the comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res, 2010, 38: D497–D501
Kikugawa S, Nishikata K, Murakami K, Sato Y, Suzuki M, Altaf-Ul-Amin M, Kanaya S, Imanishi T. PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-invitational protein-protein interactions integrative dataset. BMC Syst Biol, 2012, 6: S7
Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A. Human protein reference database-2009 update. Nucleic Acids Res, 2009, 37: D767–772
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res, 2006, 34: D535–539
Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. IntAct-open source resource for molecular interaction data. Nucleic Acids Res, 2007, 35: D561–565
Zhao B, Wang J, Li M, Wu, FX, Pan, Y: Detecting protein complexes based on uncertain graph model. IEEE/ACM Trans Comput Biol Bioinform, 2014, 11: 486–497
Wang J, Li M, Chen J, Pan Y. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform, 2011, 8: 607–620
Li M, Wu X, Wang J, Pan Y. Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data. BMC Bioinformatics, 2012, 13: 109
Li M, Chen J, Wang J, Hu B, Chen G: Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinformatics, 2008, 9: 398
Wang J, Li M, Wang H, Pan, Y: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinform, 2012, 9: 1070–1080
Li M, Zheng R, Zhang H, Wang J, Pan Y. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods, 2014, 67: 325–333
Tang X, Wang J, Zhong J, Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans Comput Biol Bioinform, 2014, 11: 407–418
Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 2000, 28: 27–30
Vastrik I, D’Eustachio P, Schmidt E, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L, Wu G, Birney E, Stein L. Reactome: a knowledge base of biologic pathways and processes. Genome Biol, 2007, 8: R39
Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Altman RB, Klein TE. Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther, 2012, 92: 414–417
Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the pathway interaction database. Nucleic Acids Res, 2009, 37: D674–679
Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW 3rd, Su AI. BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol, 2009, 10: R130
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA, 2004, 101: 6062–6067
Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet, 2008, 82: 949–958
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Chen, B., Li, M., Wang, J. et al. Disease gene identification by using graph kernels and Markov random fields. Sci. China Life Sci. 57, 1054–1063 (2014). https://doi.org/10.1007/s11427-014-4745-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-014-4745-8