dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation

Chen, Junjie; Long, Ren; Wang, Xiao-long; Liu, Bin; Chou, Kuo-Chen

doi:10.1038/srep32333

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation

Article
Open access
Published: 01 September 2016

Volume 6, article number 32333, (2016)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation

Download PDF

Junjie Chen¹,
Ren Long¹,
Xiao-long Wang^1,2,
Bin Liu^1,2,3 &
…
Kuo-Chen Chou^3,4

2813 Accesses
85 Citations
3 Altmetric
Explore all metrics

Abstract

Protein remote homology detection is an important task in computational proteomics. Some computational methods have been proposed, which detect remote homology proteins based on different features and algorithms. As noted in previous studies, their predictive results are complementary to each other. Therefore, it is intriguing to explore whether these methods can be combined into one package so as to further enhance the performance power and application convenience. In view of this, we introduced a protein representation called profile-based pseudo protein sequence to extract the evolutionary information from the relevant profiles. Based on the concept of pseudo proteins, a new predictor, called “dRHP-PseRA”, was developed by combining four state-of-the-art predictors (PSI-BLAST, HHblits, Hmmer, and Coma) via the rank aggregation approach. Cross-validation tests on a SCOP benchmark dataset have demonstrated that the new predictor has remarkably outperformed any of the existing methods for the same purpose on ROC50 scores. Accordingly, it is anticipated that dRHP-PseRA holds very high potential to become a useful high throughput tool for detecting remote homology proteins. For the convenience of most experimental scientists, a web-server for dRHP-PseRA has been established at http://bioinformatics.hitsz.edu.cn/dRHP-PseRA/.

SOFM-Top: Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix

Protein Remote Homology Detection Based on Profiles

Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis

Article 21 April 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Introduction

In the post-genomic age, protein sequence database (such as UniProtKB¹) has been greatly enriched benefited from the rapid development of sequencing technology, while protein structure and function data in PDB² is growing relatively much slower. Such a gap is increasingly getting enlarged³. To deal with this situation, it is critical to use the sequence data to infer the structures and functions of proteins⁴. Because protein structure is more conserved than sequences, proteins sharing low sequence similarities might have similar structures, known as remote homologs. Protein remote homology detection is aimed at finding the remote homologs with known structures and functions⁴. Unfortunately, it remains a challenging task in computational biology due to the low sequence identities.

Protein remote homology detection has been studied for a long time, and many researchers have proposed various approaches to address this task. They can be categorized into three groups^4,5,6: (1) alignment method, (2) discriminative method, and (3) ranking method.

The alignment method is the traditional detection one, which identifies the remote homology relationships by using the pairwise alignment scores via a specified threshold. The early approaches were based on the sequence alignment tools, such as Blast⁷ and FASTA⁸. Owing to the low similarity among remote homologous proteins, their performance was quite limited. By considering the information from the multiple sequence alignments (MSA), the profile alignment approaches were proposed to improve the detection sensitivity. For examples, PSI-BLAST⁹ and IMPALA¹⁰ are two sequence-alignment methods, while COMPASS¹¹, FFAS^12,13,14, SPARK-X¹⁵ and COMA¹⁶ are the methods based on profile-profile alignment. The latter have achieved much better results than the former. In comparison with the sequence/profile alignment, however, the profile Hidden Markov Model (profile-HMM) alignment approaches (such as Hmmer¹⁷ and HHblits¹⁸) can further take into account the position-specific probabilities for insertions and deletions, and hence can achieve even better performance.

The discriminative method refers to classification models based on machine learning techniques. It can be used to classify a new protein into one of the superfamilies. Many machine learning techniques (such as RF¹⁹, NN²⁰, SVM²¹) were used to train the models, in which SVM achieved the state-of-the-art performance²², such as SVM-fisher²³, SVM-DR²⁴, SVM-LA²⁵, SVM-LSA²⁶, SVM-pairwise²⁷, and SVM-PDT²⁸. Most of them can construct the kernel-based feature vectors by using the pairwise score output by the alignment approaches. Unfortunately, since these approaches require the labelled samples for training the models, they cannot work for those proteins whose superfamilies or families are still unknown. Besides, it is often difficult to construct useful web-servers or standalone tools for these classification models.

The ranking method is with the idea to build a ranking model to detect the remote homologs relationships. Similar to the alignment method, the ranking approach is also based on the estimated score over a specific threshold. But the ranking method is featured by training the ranking model according to the ranking list returned from the basic method to construct a kernel-based feature space, and measure the homology relationship by using the distance in the feature space. Based on the PageRank algorithm of Google²⁹, an unsupervised graph diffusion-based method called RankProp³⁰ was proposed, which built a protein similarity network. Motivated by the techniques in the field of natural language processing, ProtEmbed³¹ employs a large-scale semantic embedding method to learn a semantic embedding of protein sequences. Recently, ProtDec-L2R⁶ is proposed, which combines various ranking approaches via a learning to rank algorithm.

The aforementioned computational methods have considerably stimulated the development of protein remote homology detection. However, there is still some further work needed to do because of the following reasons. (1) Since remote homologous proteins share very low sequence similarity (<30%), a more accurate protein representation by incorporating the evolutionary information into the profiles is needed. (2) The outcomes of homology detection methods based on different techniques and models are complementary with each other; hence, it would be much more efficient to develop a new framework by which to combine them into one. (3) Although several tools or web-servers have been proposed, most of them are not suitable for large scale analysis due to the high computational cost; in this sence an easy-to-use web-servers or stand-alone tools will be certainly welcome.

To address the aforemention three points, we construct a profile-based pseudo protein sequence to replace the original protein sequence. This protein representation approach can transform the evolutionary information of profiles into a pseduo protein sequence. Furthermore, the new approch is featured by combining a rank aggregation method. The newly proposed predictor thus formed is called dRHP-PseRA. Finally, a web-server for dRHP-PseRA is established, and it is avaliable at http://bioinformatics.hitsz.edu.cn/dRHP-PseRA/. The detailed usage about this webserver can be found in the ReadMe page.

Results and Discussion

Performance of different predictors can be improved by profile-based pseudo protein sequences

The four state-of-the-art predictors, namely PSI-BLAST⁹, HHblits¹⁸, Hmmer¹⁷, and Coma¹⁶, are selected to verify whether the proposed pseudo protein representation can improve their performance or not. The corresponding results were listed in Table 1. As we can see, the pseudo protein representation can improve the performance of PSI-BLAST, Hmmer, and Coma, as reflecting by both the ROC1 and ROC50 scores. Such outcomes are not surprising at all since the pseudo proteins contain the evolutionary information from the relevant profiles. Consequently, they are more smart and accurate than the original sequence representation in detecting remote homology proteins. One exception is that the pseudo protein representation cannot improve the performance of HHblits. This is because HMM model has already incorporated the evolutionary information via the position-specific probabilities for insertions and deletion.

**Table 1 The performance of various predictors on the benchmark dataset .**

Combining complementary predictors via the rank aggregation approach

As shown in Table 1, the performances of various predictors on the same benchmark dataset are quite uneven. They can be combined together to improve the performance. The pairwise comparison results of the four basic predictors are shown in Fig. 1, from which we can see that for each sub-figure most of the points are located at the both sides of diagonal line and only a few points are located on the diagonal line, indicating that their predictive results are complementary to each other. Various combinations of these predictors are combined via the proposed linear weighting rank aggregation approach (see the Method section later). The dRHP-PseRA predictor shows the best performance when combining the three methods PsePro-PSI-BLAST, PsePro-Hummer, and HHblits, with the corresponding weights being 0.01, 0.29 and 0.7, respectively (Table 1). The correlations between the weight values and the ROC1 scores of the three methods are plotted in Fig. 2, from which we can see that the method with higher performance is assigned higher weight value, indicating that the rank aggregation approach is able to reflect the different importance of the three predictors. The performance of each method is plotted in Fig. 3, where a larger area under the curve means a better performance. As we can see from the figure, dRHP-PseRA obviously outperforms other predictors on ROC50 score, indicating that combining different predictors via a rank aggregation approach is indeed a quite promising strategy, and that dRHP-PseRA is a more powerful predictor for protein remote homology detection.

**Figure 1: Pairwise comparison results of the four methods.**

**Figure 3: Comparisons of various methods.**

Discussion

Protein remote homology detection is a key technique for studying protein structures and functions. However, it is still a big challenging task since remote homologous proteins usually share very low sequence similarities (<30%). Although several computational methods have been proposed, their performances are still too low for many practical applications. In this paper, we introduced the profile-based pseudo protein sequence formulation derived from protein profile, and found that it was quite useful to improve the performance compared with their individual approaches. Based on such interesting findings, a novel predictor called dRHP-PseRA is proposed by combining the aforementioned four state-of-the-art predictors into one framework through the pseudo protein approach. Experimental results show that dRHP-PseRA outperforms each of the individual predictors based on ROC50 scores. Furthermore, a user-friendly web-server for dRHP-PseRA has been established at http://bioinformatics.hitsz.edu.cn/dRHP-PseRA/.

It is instructive to point out that, in addition to the four basic predictors selected in the current study, there are some other methods as well in the area of protein remote homology detection, such as FFAS^12,13,14, SPARK-X¹⁵. It would be intriguing to extend the current study by exploring whether these methods can also be incorporated into the proposed ensemble learning framework, and to further improve the performance. We will address this interesting problem in our future study.

Materials and Methods

Benchmark Datasets

In this study, we adopted a commonly used benchmark dataset³¹, which was constructed based on the SCOP database and the sequences were extracted from Astral³². Because this benchmark data is used to evaluate the performance of the un-supervised methods (training set is not required), a higher similarity threshold score of 95% was used to exclude the redundancy. Therefore, the similarity between any two sequences must be lower than 95%. The benchmark dataset contains 7329 protein sequences from 1824 families and 1070 superfamilies (Fig. 4), which can be defined as

**Figure 4: A schematic drawing to show the dataset for protein remote homology detection.**

where (i = 1, 2,…, 1070) represents the i-th superfamily; (k = 1, 2,…, 1824) represents the k-th family, and the symbol ∪ represents the ‘union’ in the set theory. The benchmark dataset is given in the Supplementary Information S1.

First, for a given query protein P, we search its potential homologues against . According to the searched results, we can form a rank vector R with its components in a decending order

where p_i (i = 1, 2, …, n) represents the i-th homologous sequence with P in R; n is the total number of potential proteins in the ranking list of R; and T denotes the traspose operator. If all the query protein’s homologous proteins are ranked before the non-homologous ones, then the prediction is perfect.

Descriptions of four predictors

For the reason of diversity and mutually complementary, here we selected the following four state-of-the-art ranking methods as the basic predictors: PSI-BLAST⁹, HHblits¹⁸, Hmmer¹⁷ and Coma¹⁶.

PSI-BLAST is a profile-sequence alignment method, which uses the query proteins to construct profiles and iteratively searches the sequence database. In this study, the PSI-BLAST version 2.2.30 was employed with the iterations times set at 3.

HHblits is a HMM-HMM alignment method, which constructs a HMM model for both the sequence of the query protein P and the sequences in the database, and then iteratively searches the query HMM profile against the database of HMM profiles. HHblits version 2.0.16 was employed with the default parameters except that the maximum time of iterations was set at 2.

Hmmer is a method based on probabilistic inference and HMM model. In this study, the Hmmer version 3.1b2 with default parameters was used.

Coma is a profile-profile alignment method adopting position-dependent gap penalties and a global score system. The multiple sequence alignments generated by using PSI-BLAST version 2.2.30 are fed into the Coma for calculation. In this study, the Coma version 1.10 with default parameters was employed.

Profile-based pseudo protein sequence

Remote homology proteins have very low sequence similarities (<30%), therefore only based on sequence information is not enough for accurate homology detection. As demonstrated in previous studies³³, evolutionary information extracted from profiles is useful for improving protein remote homology detection. Here, we construct the protein representation by using the proposed method in these studies^5,33. The main steps of generating the profile-based pseudo protein sequence representation are simply descripted as following.

Firstly, for a protein sequence P, it is searched against the NCBI’s nrdb90 database by running PSI-BLAST⁹ with parameters (-j 10, -e 0.001) to generate a MSA. Then the frequency profile of sequence P, a matrix M of size 20*L (20 is the number of native amino acids and L is the length of sequence P), can be calculated based on the frequency of each amino acid at each site in generated MSA.

Secondly, for each column in M, we sort the amino acids in the descending order according to their frequency values, and then select the amino acids with the maximal frequency value in each column. These selected amino acids are combined to form a new pseudo protein sequence, which is called profile-based pseudo protein sequence. The higher scores in M represent more conserved sites in protein sequence P. Such representation of proteins defined by frequency profiles would be more sensitivity than raw protein sequences for detecting remote homologs.

The profile-based pseudo protein sequences were used to replace the raw protein sequences as inputs for the aforementioned four predictors without the need to modify the programs.

Rank aggregation

The aim of rank aggregation is to combine different ranking lists (Eq. 2) so as to obtain more accurate ranking results³⁴. In this study, a rank aggregation method based on the linear weighting method was introduced to combine various methods, as described below.

Given k ranking lists (Eq. 2) generated by k predictors, the rank aggregation calculates a rank aggregation score S(p_i) between a query protein P and a potential homologous protein p_i in the database , which can be defined as

where w_j (j = 1, 2,…k) is the weight of j-th predictor; (p_i) (i = 1, 2, … n) is the normalized alignment score between protein P and protein p_i calculated by the j-th predictor; and (p_i) can be calculated by

where s_j(p_i) (i = 1, 2, …, n) is the aligned score between query protein P and p_i given by the j-th predictor, max(s_j) and min(s_j) present the maximum and minimum aligned scores returned by the j-th predictor for the query protein P, respectively.

Larger rank aggregation score S(p_i) means the query protein P and protein p_i has closer homologous relationship. Consequently, the rank aggregation approach will automatically sort the proteins in in a descending order according to their rank aggregation scores. By means of such an approach, various ranking lists generated by different predictors can be combined into a framework to produce a more accurate ranking list. Figure 5 is a flowchart of the proposed dRHP-PseRA predictor based on the rank aggregation approach.

**Figure 5: The flowchart of dRHP-PseRA.**

Evaluation method of performance

The jackknife or leave-out-out test was employed in remote homology detection. The jackknife test is deemed the most objective cross-validation approach³. ROC1 and ROC50 scores are used to evaluate the performance of various predictors⁴. ROC1 and ROC50 represent the area under the ROC curve³⁵ when first false positive and fiftieth false positives appear, respectively. The larger score means a better performance.

Additional Information

How to cite this article: Chen, J. et al. dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation. Sci. Rep. 6, 32333; doi: 10.1038/srep32333 (2016).

References

Consortium, T. U. UniProt: a hub for protein information. Nucleic Acids Research 43, D204–D212, doi: 10.1093/nar/gku989 (2015).
Article CAS Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Research 28, 235–242, doi: 10.1093/nar/28.1.235 (2000).
Article CAS PubMed PubMed Central Google Scholar
Chou, K.-C. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology 273, 236–247, doi: http://dx.doi.org/10.1016/j.jtbi.2010.12.024 (2011).
Article CAS MathSciNet PubMed Google Scholar
Liu, B., Chen, J. & Wang, X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Molecular Genetics and Genomics 290, 1919–1931 (2015).
Article CAS PubMed Google Scholar
Liu, B., Wang, X., Lin, L., Dong, Q. & Wang, X. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics 9, 510 (2008).
Article PubMed PubMed Central Google Scholar
Liu, B., Chen, J. & Wang, X. Application of Learning to Rank to protein remote homology detection. Bioinformatics 31, 3492–3498 (2015).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic Local Alignment Search Tool. J Mol Biol 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Pearson, W. R. Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650, doi: http://dx.doi.org/10.1016/0888-7543(91)90071-L (1991).
Article CAS PubMed Google Scholar
Altschul, S. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Schäffer, A. A. et al. IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000–1011 (1999).
Article PubMed Google Scholar
Sadreyev, R. & Grishin, N. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326, 317–336 (2003).
Article CAS PubMed Google Scholar
Jaroszewski, L., Li, Z., Cai, X.-h., Weber, C. & Godzik, A. FFAS server: novel features and applications. Nucleic acids research 39, W38–W44 (2011).
Article CAS PubMed PubMed Central Google Scholar
Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. & Godzik, A. FFAS03: a server for profile–profile sequence alignments. Nucleic acids research 33, W284–W288 (2005).
Article CAS PubMed PubMed Central Google Scholar
Rychlewski, L., Li, W., Jaroszewski, L. & Godzik, A. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science 9, 232–241 (2000).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y., Faraggi, E., Zhao, H. & Zhou, Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27, 2076–2082 (2011).
Article CAS PubMed PubMed Central Google Scholar
Margelevicius, M. & Venclovas, C. Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison. BMC Bioinformatics 11, 89 (2010).
Article PubMed PubMed Central Google Scholar
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Research, 39, W29-W37, doi: 10.1093/nar/gkr367 (2011).
Article CAS PubMed PubMed Central Google Scholar
Remmert, M., Biegert, A., Hauser, A. & Soding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Meth 9, 173–175, doi: http://www.nature.com/nmeth/journal/v9/n2/abs/nmeth.1818.html#supplementary-information (2012).
Article CAS Google Scholar
Breiman, L. Random forests. Machine learning 45, 5–32 (2001).
Article Google Scholar
Hagan, M. T., Demuth, H. B., Beale, M. H. & De Jesús, O. Neural network design. Vol. 20 (PWS publishing company Boston, 1996).
Cortes, C. & Vapnik, V. Support-vector networks. Machine learning 20, 273–297 (1995).
MATH Google Scholar
Liu, X., Zhao, L. & Dong, Q. Protein remote homology detection based on auto-cross covariance transformation. Computers in biology and medicine 41, 640–647, doi: 10.1016/j.compbiomed.2011.05.015 (2011).
Article CAS PubMed Google Scholar
Jaakkola, T., Diekhans, M. & Haussler, D. A Discriminative Framework for Detecting Remote Protein Homologies. J. Comput Biol. 7, 95–114 (2000).
Article CAS PubMed Google Scholar
Liu, B. et al. Using distances between Top-n-gram and residue pairs for protein remote homology detection. BMC Bioinformatics 15, S3 (2014).
ADS PubMed PubMed Central Google Scholar
Saigo, H., Vert, J. P., Ueda, N. & Akutsu, T. Protein Homology Detection Using String Alignment Kernels. Bioinformatics 20, 1682–1689 (2004).
Article CAS PubMed Google Scholar
Dong, Q. W., Wang, X. L. & Lin, L. Application of Latent Semantic Analysis to Protein Remote Homology Detection. Bioinformatics 22, 285–290 (2006).
Article CAS PubMed Google Scholar
Muh, H. C., Tong, J. C. & Tammi, M. T. AllerHunter: a SVM-pairwise system for assessment of allergenicity and allergic cross-reactivity in proteins. PLoS One 4, e5861 (2009).
Article ADS PubMed PubMed Central Google Scholar
Liu, B., Wang, X., Chen, Q., Dong, Q. & Lan, X. Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection. PLoS ONE 7, e46633 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
Franceschet, M. PageRank: Standing on the shoulders of giants. Communications of the ACM 54, 92–101 (2011).
Article Google Scholar
Melvin, I., Weston, J., Leslie, C. & Noble, W. S. RANKPROP: a web server for protein remote homology detection. Bioinformatics 25, 121–122 (2009).
Article CAS PubMed Google Scholar
Melvin, I., Weston, J., Noble, W. S. & Leslie, C. Detecting remote evolutionary relationships among proteins by large-scale semantic embedding. PLoS computational biology 7, e1001047 (2011).
Article CAS ADS MathSciNet PubMed PubMed Central Google Scholar
Chandonia, J. et al. The ASTRAL Compendium in 2004. Nucleic Acids Res. D189–D192 (2004).
Liu, B. et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014).
Article CAS PubMed Google Scholar
Lui, B., Chen, J. & Wang, S. Protein Remote Homology Detection by Combining Pseudo Dimer Composition with an Ensemble Learning Method. Current Proteomics 13, 86–91 (2016).
Article Google Scholar
Chen, J., Wang, S. & Liu, B. iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions. Scientific Reports 6, 19062 (2016).
Article CAS ADS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by National High Technology Research and Development Program of China (863 Program) [2015AA015405], the National Natural Science Foundation of China (No. 61300112, 61573118 and 61272383), the Natural Science Foundation of Guangdong Province (2014A030313695), Guangdong Natural Science Funds for Distinguished Young Scholars (2016A030306008), and Scientific Research Foundation in Shenzhen (Grant No. JCYJ20150626110425228).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Junjie Chen, Ren Long, Xiao-long Wang & Bin Liu
Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Xiao-long Wang & Bin Liu
Gordon Life Science Institute, Boston, 02478, MA, USA
Bin Liu & Kuo-Chen Chou
Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
Kuo-Chen Chou

Authors

Junjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ren Long
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-long Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-Chen Chou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.L. conceived of the study. J.C. and R.L. carried out the protein remote homology detection study, participated in designing the study, coding the experiments, drafting the manuscript and performing the statistical analysis.X.-L.W. participated in performing the statistical analysis. K.-C.C. and B.L. participated in drafting the manuscript. Informed consent was obtained from all individual participants included in the study.

Corresponding author

Correspondence to Bin Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information (PDF 1696 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Chen, J., Long, R., Wang, Xl. et al. dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation. Sci Rep 6, 32333 (2016). https://doi.org/10.1038/srep32333

Download citation

Received: 17 March 2016
Accepted: 04 August 2016
Published: 01 September 2016
DOI: https://doi.org/10.1038/srep32333
Springer Nature Limited

This article is cited by

Protein Structure Prediction: Conventional and Deep Learning Perspectives
- V. A. Jisna
- P. B. Jayaraj
The Protein Journal (2021)
Predicting protein subchloroplast locations: the 10th anniversary
- Jian Sun
- Pu-Feng Du
Frontiers of Computer Science (2021)
miRNALoc: predicting miRNA subcellular localizations based on principal component scores of physico-chemical properties and pseudo compositions of di-nucleotides
- Prabina Kumar Meher
- Subhrajit Satpathy
- Atmakuri Ramakrishna Rao
Scientific Reports (2020)
Identification of 4-carboxyglutamate residue sites based on position based statistical feature and multiple classification
- Asghar Ali Shah
- Yaser Daanial Khan
Scientific Reports (2020)
Multidimensional scaling method for prediction of lysine glycation sites
- Taoying Li
- Qian Yin
- Yan Chen
Computing (2019)

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation

Abstract

Similar content being viewed by others

SOFM-Top: Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix

Protein Remote Homology Detection Based on Profiles

Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis

Introduction

Results and Discussion

Performance of different predictors can be improved by profile-based pseudo protein sequences

Combining complementary predictors via the rank aggregation approach

Discussion