Abstract
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Dear JW, Lilitkarntakul P, Webb DJ. Are rare diseases still orphans or happily adopted? The challenges of developing and using orphan medicinal products. British J Clin Pharmacol, 2006, 62: 264–271
Schieppati AHJ, Daina E, Aperia A. Why rare diseases are an important medical and social issue. Lancet, 2008, 371: 2039–2041
Stolk P, Willemen MJC, Leufkens HGM. Rare essentials: drugs for rare diseases as essential medicines. Bull World Health Org, 2006, 84: 745–751
Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet, 2003, 33: 228–237
Glazier AM, Nadeau JH, Aitman TJ. Finding genes that underlie complex traits. Science, 2002, 298: 2345–2349
McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 2008, 9: 356–369
Wang J, Li M, Deng Y, Pan Y. Recent advances in clustering methods for protein interaction networks. BMC Genomics, 2010, 11: S10
Li M, Wu X, Wang J, Pan Y. Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data. BMC Bioinformatics, 2012, 13: 109
Zhao B, Wang J, Li M, Wu FX. Detecting protein complexes based on uncertain graph model. IEEE/ACM Trans Comput Biol Bioinform, 2014, doi:10.1109/TCBB.2013.2297915
Zhong J, Wang J, Peng W, Zhang Z, Pan Y. Prediction of essential proteins based on gene expression programming. BMC Genomics, 2013, 14: 1–8
Wang J, Peng W, Wu FX. Computational approaches to predicting essential proteins: a survey. Proteomics Clin Appl, 2013, 7: 181–192
Peng, W, Wang, J, Cai, J, Chen L, Li M, Wu FX. Improving protein function prediction using domain and protein complexes in PPI networks. BMC Syst Biol, 2014, 8: 35
Wang J, Ren J, Li M, Wu FX. Identification of hierarchical and overlapping functional modules in PPI networks. IEEE Trans Nano-Biosci, 2012, 11: 386–393
Wang J, Liu B, Li M and Pan Y. Identifying protein complexes from interaction networks based on clique percolation and distance restraction. BMC Genomics, 11: S10
Li M, Wang J, Chen J, Cai Z, Chen G. Identifying the overlapping complexes in protein interaction networks. Int J Data Min Bioinform, 2010, 4: 91–108
Peng W, Wang J, Cheng Y, Lu Y, Wu FX, Pan Y. UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform, 2014, doi: 10.1109/TCBB.2014.2338317
Zhao B, Wang J, Li M, Wu FX, Pan Y. Prediction of essential proteins based on overlapping essential modules. IEEE Trans NanoBiosci, 2014, doi: 10.1109/TNB.2014.2337912
Li M, Wang J, Wang H, Pan Y. Identification of essential proteins from weighted protein interaction networks. J Bioinform Comput Biol, 2013, 11: 1341002
Wang J, Li M, Wang H, Pan Y. Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinform, 2012, 9: 1070–1080
Li M, Zhang H, Wang J, Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol, 2012, 6: 15
Li M, Wang J, Chen X, Wang H, Pan Y. A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem, 2011, 35: 143–150
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet, 2011, 12: 56–68
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease network. Proc Natl Acad Sci USA, 2007, 104: 8685–8690
Feldman I, Rzhetsky A, Vitkup D. Network properties of genes harboring inherited disease mutations. Proc Natl Acad Sci USA, 2008, 105: 4323–4328
Oti M, Brunner HG. The modular nature of genetic diseases. Clin Genet, 2007, 71: 1–11
Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JK, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PV, Ballinger DG, Sparks AB, Hartigan J, Smith DR, Suh E, Papadopoulos N, Buckhaults P, Markowitz SD, Parmigiani G, Kinzler KW, Velculescu VE, Vogelstein B. The genomic landscapes of human breast and colorectal cancers. Science, 2007, 318: 1108–1113
Lim J, Hao T, Shaw C, Patel AJ, Szabó G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabási AL, Vidal M, Zoghbi HY. A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell, 2006, 125: 801–814
Navlakha S, Kingsford C. The power of protein interaction networks for associating genes with diseases. Bioinformatics, 2010, 26: 1057–1063
Ganegoda GU, Wang J, Wu FX, Li M. Prediction of disease genes using tissue-specified gene-gene network. BMC Syst Biol, 2014, 8(Suppl 3): S3
Wang J, Chen G, Li M, Pan Y. Integration of breast cancer gene signatures based on graph centrality. BMC Syst Biol, 2011, 5: S10
Chen B, Wang J, Li M, Wu FX. Identifying disease causing genes by integrating multiple data sources. BMC Med Genom, 2014, 7(Suppl 2): S2
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol, 2010, 6: e1000641
Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet, 2008, 82: 949–958
Chen J, Aronow BJ, Jegga AG. Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics, 2009, 10: 73
Hsu CL, Huang YH, Hsu CT, Yang UC. Prioritizing disease candidate genes by a gene interconnectedness-based approach. BMC Genomics, 2011, 12: S25
Zhu C, Kushwaha A, Berman K, Jegga AG. A vertex similarity-based framework to discover and rank orphan disease-related genes. BMC Syst Biol, 2012, 6: S8
Navlakha S, Rastogi R, Shrivastava N. Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008. 419–432
van Dongen S. Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl, 2008, 30: 121–141
Navlakha S, White J, Nagarajan N, Pop M, Kingsford C. Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. J Computat Biol, 2010, 17: 503–516
Li M, Chen J, Wang J, Hu B, Chen G. Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinformatics, 2008, 9: 398
Ding X, Wang W, Peng X, Wang J. Mining protein complexes from PPI networks using the minimum vertex cut. Tsinghua Sci Technol, 2012, 17: 674–681
Wang J, Li M, Chen J, Pan Y. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Computat Biol Bioinform, 2011, 8: 607–620
Montanez G, Cho YR. Predicting false positives of protein-protein interaction data by semantic similarity measures. Curr Bioinform, 2013, 8: 339–346
Li M, Zheng R, Zhang H, Wang J, Pan Y. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods, 2014, 67: 325–333
Tang X, Wang J, Zhong J, Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans Comput Biol Bioinform, 2014, 11: 407–418
Wang J, Peng X, Peng W, Wu FX. Dynamic protein interaction network construction and applications. Proteomics, 2014, 8: 338–352
Wang J, Peng X, Li M, Pan Y. Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics, 2013, 13: 301–312
Tang X, Feng Q, Wang J, He Y, Pan Y. Clustering based on multiple biological information: approach for predicting protein complexes. IET Syst Biol, 2013, 7: 223–230
Peng W, Wang J, Wang W, Liu Q, Wu FX, Pan Y. Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst Biol, 2012, 6: 87
Aymé S. Orphanet, an information site on rare diseases. Soins; la revue de référence infirmière, 2003, 672: 46
Wolfe CJ, Kohane IS, Butte AJ. Systematic survey reveals general applicability of guilt-by-association within gene coexpression networks. BMC Bioinformatics, 2005, 6: 227
Dijkstra EW. A note on two problems in connexion with graphs. Numerische Mathematik, 1959, 1: 269–271
Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, 2006, 78: 1011–1025
Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nat Genet, 2002, 31: 316–319
Turner FS, Clutterbuck DR, Semple CAM. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol, 2003, 4: R75–R75
Freudenberg J, Propping P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics, 2002, 18: S110–115
Zhang P, Zhang J, Sheng H, Russo JJ, Osborne B, Buetow K. Gene functional similarity search tool (GFSST). BMC Bioinformatics, 2006, 7: 135
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene Ontology: tool for the unification of biology. Nat Genet, 2000, 25: 25–29
Li M, Wu X, Pan Y, Wang J. hF-measure: a new measurement for evaluating clusters in protein-protein interaction networks. Proteomics, 2013, 13: 291–300
Wang J, Dai L, Li M. GO semantic similarity-based false positive reduction of protein-protein interactions. In: IEEE International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, 2009. 211–214
Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TK, Chandrika KN, Deshpande N, Suresh S, Rashmi BP, Shanker K, Padma N, Niranjan V, Harsha HC, Talreja N, Vrushabendra BM, Ramya MA, Yatish AJ, Joy M, Shivashankar HN, Kavitha MP, Menezes M, Choudhury DR, Ghosh N, Saravana R, Chandran S, Mohan S, Jonnalagadda CK, Prasad CK, Kumar-Sinha C, Deshpande KS, Pandey A. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res, 2004, 32: D497–501
Ikeda MD, Larkin A. ZAP70-related severe combined immunodeficiency. In: Pagon RA, Adam MP, Ardinger HH, Bird TD, Dolan CR, Fong CT, Smith RJH, Stephens K, eds. SourceGeneReviews®. Seattle: University of Washington, Seattle, 2009
Russell SM, Johnston JA, Noguchi M, Kawamura M, Bacon CM, Friedmann M, Berg M, McVicar DW, Witthuhn BA, Silvennoinen O. Interaction of IL-2R beta and gamma c chains with Jak1 and Jak3: implications for XSCID and XCID. Science, 1994, 266: 1042–1045
Sebastian K, Borowski A, Kuepper M, Friedrich K. Signal transduction around thymic stromal lymphopoietin (TSLP) in atopic asthma. Cell Commun Signal, 2008, 6: 5
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Li, M., Li, Q., Ganegoda, G.U. et al. Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks. Sci. China Life Sci. 57, 1064–1071 (2014). https://doi.org/10.1007/s11427-014-4747-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-014-4747-6