iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance

Liu, Bingquan; Liu, Yumeng; Jin, Xiaopeng; Wang, Xiaolong; Liu, Bin

doi:10.1038/srep33483

iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance

Article
Open access
Published: 19 September 2016

Volume 6, article number 33483, (2016)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance

Download PDF

Bingquan Liu¹^na1,
Yumeng Liu²^na1,
Xiaopeng Jin³^na1,
Xiaolong Wang^2,4^na1 &
…
Bin Liu^2,4^na1

1541 Accesses
29 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

Meiotic recombination presents an uneven distribution across the genome. Genomic regions that exhibit at relatively high frequencies of recombination are called hotspots, whereas those with relatively low frequencies of recombination are called coldspots. Therefore, hotspots and coldspots would provide useful information for the study of the mechanism of recombination. In this study, we proposed a computational predictor called iRSpot-DACC to predict hot/cold spots across the yeast genome. It combined Support Vector Machines (SVMs) and a feature called dinucleotide-based auto-cross covariance (DACC), which is able to incorporate the global sequence-order information and fifteen local DNA properties into the predictor. Combined with Principal Component Analysis (PCA), its performance was further improved. Experimental results on a benchmark dataset showed that iRSpot-DACC can achieve an accuracy of 82.7%, outperforming some highly related methods.

Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM

Article Open access 20 November 2014

iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples

Article 30 August 2015

Recombination spot identification Based on gapped k-mers

Article Open access 31 March 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Introduction

Meiotic recombination is the process alleles exchange between homologous chromosomes during meiosis^1,2. It plays an important role in the process of genome evolution^3,4. Since recombination can produce diverse gametes, so it provides material for natural selection. Moreover, Recombination also influences the genome evolution via gene conversion or mutagenesis^5,6.

Although the mechanism of recombination is still unclear, it has been assured that recombination plays an important part in promoting genome evolution. The distribution pattern of recombination position has drawn much attention and several studies have been performed on chromosomes^7,8,9. Some studies have found that recombination presents an uneven distribution across the genome. Genomic regions that exhibit at relatively high frequencies of recombination are called hotspots, while those with relatively low frequencies of recombination are called coldspots^10,11. In the era of rapid development of biology sequencing technology, the number of sequenced genome shows explosive growth. Therefore, it is necessary to develop stable methods for the identification of recombination spots.

Although a great deal of recombination information can be acquired from experiments concerning recombination, identifying recombination hot/cold spots by using the information of DNA sequence is still a challenging task. Recently, several models have been proposed to predict recombination hotspots and coldspots. For example, Zhou et al.¹² proposed a SVM-based method based on codon composition to identify hotspots from coldspots. Later, Jiang et al.¹³ employed the Random Forest classifier trained with the gapped dinucleotide composition features to identify hotspots from coldspots in Saccharomyces cerevisiae. Guo et al.¹⁴ proposed a SVM model based on DNA physical properties to predict hot/cold spots in yeast. Combining increment of diversity with quadratic discriminant analysis (IDQD), Liu et al.¹ presented a model based on sequence k-mer frequencies along with DNA sequences. Wu et al.¹⁵ proposed a SVM model based on the features of genomic and epigenomic to predict meiotic recombination hotspots in human and mouse. Chen et al.¹⁶ presented a SVM model based on pseudo dinucleotide composition. Wang et al.¹⁷ proposed a method based on gapped kmers. Most of these predictors only considered the local sequence-order information, while little global sequence-order information was taken into account. However, in many bioinformatics’ tasks, the global sequence-order information has showed strong discriminative power as shown in many studies. Therefore, in a predictor, the global sequence-order factor should be incorporated. Unfortunately, it is not an easy job, because the lengths of DNA sequences are different.

To address this problem, a feature called dinucleotide-based auto-cross covariance (DACC)¹⁸ is applied to recombination hot/cold spots identification, which is able to incorporate the global sequence-order effects in the DNA sequences into the predictor. Combined with Support Vector Machines (SVMs), a predictor called iRSpot-DACC is proposed. Later, in order to further improve its performance and computational cost, Principal Component Analysis (PCA)¹⁹ is adopted. Experimental results on a benchmark dataset demonstrate that the proposed method outperformed some highly related models, including IDQD¹ and iRSpot-PseDNC¹⁶.

Results

Influence of parameters on the predictive performance of iRSpot-DACC

In iRSpot-DACC, there is a parameter, the distance between two dinucleotides lag, would affect its predictive performance. In the current study, lag is optimized via the 5-fold cross validation. The influence of lag on the performance of iRSpot-DACC is shown in Fig. 1, from which we can see that the optimized value can be achieved when lag = 6, and this parameter has little impact on the performance. DACC is the combination of Dinucleotide-based auto covariance (DAC) and Dinucleotide-based cross covariance (DCC) (cf. section Material and Methods). With this parameter setting, the lengths of the feature vectors for DAC and DCC are 15 × 6 = 90 and 15 × 14 × 6 = 1260 respectively. Therefore, the dimension of DACC is 90 + 1260 = 1350.

The computational performance of iRSpot-DACC can be further improved by using PCA

In order to further improve its performance and computational cost of iRSpot-DACC, the Principal Component Analysis (PCA)¹⁹ is employed.

There is a parameter w (cf. Eq. (18)) in PCA, which would have impact on both the predictive accuracy and the dimension of the feature vectors. Therefore, we optimize this parameter utilizing 5-fold cross validation. The results show that the iRSpot-DACC-PCA (iRSpot-DACC combined with PCA) achieves the best performance when w = 0.99 and its performance is shown in Table 1, from which we can see that iRSpot-DACC-PCA outperforms iRSpot-DACC.

Table 1 Results of different predictors on benchmark dataset.

Full size table

The feature vector’s dimension of iRSpot-DACC-PCA is 173, which is significantly smaller than the original dimension of iRSpot-DACC (1350). Therefore, the predictive accuracy and the computational cost of iRSpot-DACC are further improved by using PCA.

Discriminative visualization and interpretation

In order to further explore the discriminative power and indicate the meaning of the feature space in biology, we calculate the discriminative weight vector according to the study²⁰. The specific formula of the feature discriminative weight vector W can be formulated as:

where A is the specific weight for each training samples obtained from SVM training process; M is the feature space of the benchmark dataset used in the current study; N is the number of DNA sequences in the training dataset; j is the dimension of the feature vector. Therefore, W is a 1 × j vector and each element in it represents the corresponding feature’s discriminative power.

The feature discriminative weight vector with 1350 features (cf. section Results) is depicted in Fig. 2, in which the deeper color spots represent stronger discriminative power than the lighter color spots. From Fig. 2 we can see that the top three discriminative features are DAC(2, 3), DCC(2, 8, 3) and DCC(2, 15, 1). All the three features are deduced from the same property (F-tilt), which suggests the importance of this property of F-tilt (μ = 2). The top ten discriminative features are listed in Table 2. In this table, we can conclude several conclusions. First, the correlation between properties F-roll (μ = 1) and several other properties shows strongly discriminative power for identifying recombination hot/cold spots. Second, the correlation between F-tilt (μ = 2) and other properties including itself also shows strongly discriminative power. Third, when the distance between two dinucleotides equals to 1, 2, 3 or 5, the influence of the corresponding features would be important for identifying hot/cold spots.

Table 2 The top ten most important features in iRSpot-DACC for identifying hot/cold spots.

Full size table

Comparison with other related predictors

Two methods for hot/cold spots identification are compared with the proposed methods iRSpot-DACC and iRSpot-DACC-PCA, including IDQD¹ and iRSpot-PseDNC¹⁶. The results of various methods on the benchmark dataset S are shown in Table 1.

According to Table 1, we can see that iRSpot-DACC outperforms the two methods IDQD¹ and iRSpot-PseDNC¹⁶. Furthermore, iRSpot-DACC-PCA outperforms iRSpot-DACC by adopting Principal Component Analysis (PCA). The main reasons are described as follows: IDQD¹ only consider the local sequence-order information, and iRSpot-PseDNC¹⁶ improves it by incorporating global sequence-order information. However, iRSpot-DACC not only incorporates the global sequence-order information but also contains more DNA properties into the feature vectors. Therefore, we conclude that iRSpot-DACC would be a useful tool for hot/cold spots identification.

Discussion

In this study, we propose a computation method called iRSpot-DACC for yeast hot/cold spots identification. The method incorporates long range or global sequence-order information. The result shows that iRSpot-DACC outperform other state-of-the-art predictors. Furthermore, iRSpot-DACC incorporates the correlations between different dinucleotide DNA properties. Another important advantage of our approach derived from PCA (principal component analysis)²¹ which not only can improve the predictive accuracy, but also can reduce the computational cost. It can be expected that DACC would be a powerful feature extraction method, and it can be applied to other tasks in the field of bioinformatics, such as DNA-binding proteins identification²², protein fold prediction^23,24, cytokine detection^25,26, protein-protein interaction site prediction²⁷, tumor classification and analysis²⁸, etc. Moreover, since publicly accessible web-server is beneficial to develop more useful predictors, we would make efforts in our future work to develop a web-server for the method proposed in this paper. Furthermore, we will apply other advanced machine learning techniques to establish more accurate predictors for hot spot identification, such as deep learning, and neural networks^29,30,31,32.

Material and Methods

Benchmark Dataset

The benchmark dataset used in this study was constructed by Jiang et al.¹³, which contains 490 hotspots and 591 coldspots. For more detailed information of this benchmark dataset, please refer to¹³.

Therefore, the benchmark dataset for the current study can be expressed as:

where S⁺ is the set of recombination hotspots, S⁻ is the set of recombination coldspots, and is a mathematical operator representing “union”.

Dinucleotide-based auto-cross covariance (DACC)

As described above, the global sequence-order information shows strongly discriminative power for identifying recombination hot/cold spots. Therefore, it is crucial to incorporate the global sequence-order information into our model. In order to deal with this problem, a feature called Dinucleotide-based auto-cross covariance (DACC)¹⁸ is adopted, which incorporates global sequence-order information along DNA sequences. DACC is the combination of Dinucleotide-based auto covariance (DAC) and Dinucleotide-based cross covariance (DCC). Next, we will introduce DAC and DCC respectively.

Given a DNA sequence D

where L is the length of DNA sequence, R₁ means the nucleic acid residue at the first position in the sequence, R₂ means the nucleic acid residue at the second position and so forth.

The DAC^18,33,34 represents the correlation of one DNA local property between two dinucleotides at a distance of lag in the sequence. DAC can be calculated by:

and

where μ is the index of dinucleotide local property; L represents the DNA sequence length; P_μ(R_iR_i+1) means the value of the dinucleotide R_iR_i+1 at position i for the local property index μ; is the average value of P_μ(R_iR_i+1) for a DNA sequence and can be calculated as:

In such way, the feature vector’s length of DAC is N*LAG, where N is the number of dinucleotide properties used in this study and LAG is the maximum of lag .

The DCC^33,34,35 calculates the correlation of two different properties between two dinucleotides at a distance lag nucleic acid residues in the DNA sequence. DCC can be calculated by using the following equation:

and

where μ₁, μ₂ are two different property indices, L represents the DNA sequence length; Pμ₁(R_iR_i+1) Pμ₂(R_iR_i+1)) is the numerical value of the dinucleotide (R_iR_i+1) at position i for the property index μ₁ (μ₂); is the average value for property index value μ₁ (μ₂) along the whole sequence and have the same form with Eq. (6). In such way, the feature vector’s length of DCC is N * (N − 1) * LAG, where N is the number of dinucleotide properties used in this study and LAG is the maximum of lag . The processes for generating the feature vectors of DAC and DCC are presented in the Fig. 3(a,b) respectively.

In this study, fifteen properties from³⁶ are used. Their values are listed in Table 3.

Table 3 The values of the fifteen DNA dinucleotide properties.

Full size table

Support vector machine (SVM)

Support Vector Machine (SVM) is a pattern recognition technique introduced by Vapnik³⁷, which has been employed for many computational tasks in bioinformatics^38,39,40,41. It seeks an optimal hyperplane via transforming the original feature space into a high dimensional vector space to achieve classification.

In the current study, the ANACONDA package (http://www.continuum.io/) is adopted, which contains the implementation of SVM. The selected kernel function is radial basis function (RBF), which is defined as:

Two parameters, the regularization parameter C and the kernel width parameter γ are optimized on the dataset by using a grid tool provided by ANACONDA. In the current study, the values of the two parameters are shown below:

Principal Component Analysis (PCA)

Feature selections are able to remove the noise so as to improve the classification performance⁴². In order to reduce redundant information, in this study, we adopt Principal Component Analysis (PCA)¹⁹ to reduce the dimension of the original feature vectors. It reduces the dimension of the feature vectors through projecting a feature space onto a smaller subspace that represents the dataset well.

Suppose, the original feature space of iRSpot-DACC can be represented as:

where N is the number of training sample, k is the dimension of the feature vectors. Then, the averages for every dimension of X can be expressed as:

where N and k have the same meaning with Eq. (11). Therefore, the matrix which is composed of mean vectors for every dimension in X can be represented as:

where e_ij represents the element of X and can be acquired from Eq. (12).

Then, the covariance matrix and its eigenvalues can be calculated and the eigenvalues can be represented as:

Next, l eigenvectors whose corresponding eigenvalues are more bigger than other eigenvectors’ are chosen to form a matrix, which can be represented as:

where each column represents an eigenvector and their corresponding eigenvalues can be represented as:

where . Finally, the new subspace M can be calculated by

Therefore, the dimension of the feature space is reduced from k to l. The values of k and l have been discussed in section Results.

The selection of principal components is based on the cumulative weight ratio w:

The values of w and l have been discussed in section Results.

Jackknife test

In statistical prediction, three cross-validation methods including independent dataset test, sub-sampling (or K-fold cross-validation) test and jackknife test are often used to measure the performance of a predictor^43,44,45. Among the three methods, jackknife test is deemed the most objective which urging it to be widely adopted by researchers to evaluate the performance of various classifiers. Therefore, in the current study, jackknife test is also adopted to measure the performance of iRSpot-DACC and iRSpot-DACC-PCA. In the jackknife test, each sequence in the benchmark dataset would be selected as test sample and the corresponding remaining samples as training samples.

Criteria for performance evaluation

Sensitivity (Se), Specificity (Sp), Accuracy (Acc), and Matthew’s Correlation Coefficient (Mcc)⁴⁶ are used to evaluate the performance of different methods. They are defined as follows:

where TP, FP, TN and FN represent the true positive, false positive, true negative and false negative respectively.

Additional Information

How to cite this article: Liu, B. et al. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Sci. Rep. 6, 33483; doi: 10.1038/srep33483 (2016).

References

Liu, G., Liu, J., Cui, X. & Cai, L. Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. Journal of theoretical biology 293, 49–54 (2012).
Article CAS MathSciNet PubMed Google Scholar
Lynn, A., Ashley, T. & Hassold, T. Variation in human meiotic recombination. Annu. Rev. Genomics Hum. Genet. 5, 317–349 (2004).
Article CAS PubMed Google Scholar
Lewin, B. Genes VIII. 8th. 428–456 (New Jersey: Pearson/Prentice-Hall, Upper Saddle River, 2004).
Spencer, C. C. et al. The influence of recombination on human genetic diversity. PLoS Genet 2, e148 (2006).
Article PubMed PubMed Central Google Scholar
Galtier, N., Piganeau, G., Mouchiroud, D. & Duret, L. GC-Content Evolution in Mammalian Genomes: The Biased Gene Conversion Hypothesis. Genetics 159, 907–911 (2001).
CAS PubMed PubMed Central Google Scholar
Lercher, M. J. & Hurst, L. D. Human SNP variability and mutation rate are higher in regions of high recombination. Trends in genetics 18, 337–340 (2002).
Article CAS PubMed Google Scholar
Baudat, F. & Nicolas, A. Clustering of meiotic double-strand breaks on yeast chromosome III. Proceedings of the National Academy of Sciences 94, 5213–5218 (1997).
Article CAS ADS Google Scholar
Klein, S. et al. Patterns of meiotic double-strand breakage on native and artificial yeast chromosomes. Chromosoma 105, 276–284 (1996).
Article CAS PubMed Google Scholar
Liu, B., Wang, S., Long, R. & Chou, K.-C. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics, doi: 10.1093/bioinformatics/btw539 (2016).
Mancera, E., Bourgon, R., Brozzi, A., Huber, W. & Steinmetz, L. M. High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Nature 454, 479–485 (2008).
Article CAS ADS PubMed PubMed Central Google Scholar
Gerton, J. L. et al. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences 97, 11383–11390 (2000).
Article CAS ADS Google Scholar
Zhou, T., Weng, J., Sun, X. & Lu, Z. Support vector machine for classification of meiotic recombination hotspots and coldspots in Saccharomyces cerevisiae based on codon composition. BMC Bioinformatics 7, 223 (2006).
Article PubMed PubMed Central Google Scholar
Jiang, P. et al. RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Research 35, W47–W51 (2007).
Article PubMed PubMed Central Google Scholar
Guo, S.-H., Xu, L.-Q., Chen, W., Liu, G.-Q. & Lin, H. Recombination spots prediction using DNA physical properties in the saccharomyces cerevisiae genome. AIP Conference Proceedings 1479, 1556–1559 (2012).
Article CAS ADS Google Scholar
Wu, M., Kwoh, C. K., Przytycka, T. M., Li, J. & Zheng, J. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine 297–304 (ACM, Orlando, Florida, 2012).
Chen, W., Feng, P.-M., Lin, H. & Chou, K.-C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic acids research, gks1450 (2013).
Wang, R., Xu, Y. & Liu, B. Recombination spot identification Based on gapped k-mers. Scientific reports 6 (2016).
Liu, B. et al. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic acids research 43, W65–W71 (2015).
Article CAS PubMed PubMed Central Google Scholar
Liu, B., Chen, J. & Wang, X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Molecular Genetics and Genomics 290, 1919–1931 (2015).
Article CAS PubMed Google Scholar
Liu, B. et al. Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. Journal of theoretical biology 385, 153–159 (2015).
Article CAS PubMed Google Scholar
Peason, K. On lines and planes of closest fit to systems of point in space. Philosophical Magazine 2, 559–572 (1901).
Google Scholar
Song, L. et al. nDNA-prot: identification of DNA-binding proteins based on unbalanced classification. BMC bioinformatics 15, 1 (2014).
Article Google Scholar
Wei, L., Liao, M., Gao, X. & Zou, Q. Enhanced protein fold prediction method through a novel feature extraction technique. IEEE transactions on nanobioscience 14, 649–659 (2015).
Article PubMed Google Scholar
Zhao, X., Zou, Q., Liu, B. & Liu, X. Exploratory predicting protein folding model with random forest and hybrid features. Current Proteomics 11, 289–299 (2014).
Article CAS Google Scholar
Zou, Q. et al. An approach for identifying cytokines based on a novel ensemble classifier. BioMed research international 2013 (2013).
Zeng, X., Yuan, S., Huang, X. & Zou, Q. Identification of cytokine via an improved genetic algorithm. Frontiers of Computer Science 9, 643–651 (2015).
Article Google Scholar
Wang, B. et al. Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS letters 580, 380–384 (2006).
Article CAS PubMed Google Scholar
Huang, D.-S. & Zheng, C.-H. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22, 1855–1862 (2006).
Article CAS PubMed Google Scholar
Huang, D.-s. Radial basis probabilistic neural networks: model and application. International Journal of Pattern Recognition and Artificial Intelligence 13, 1083–1101 (1999).
Article Google Scholar
Huang, D.-S. A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Transactions on Neural Networks 15, 477–491 (2004).
Article PubMed Google Scholar
Huang, D.-S. & Du, J.-X. A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Transactions on Neural Networks 19, 2099–2115 (2008).
Article PubMed Google Scholar
Zhang, J.-R., Zhang, J., Lok, T.-M. & Lyu, M. R. A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training. Applied Mathematics and Computation 185, 1026–1037 (2007).
Article Google Scholar
Dong, Q., Zhou, S. & Guan, J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25, 2655–2662 (2009).
Article CAS PubMed Google Scholar
Chen, W. et al. PseKNC-General: A cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics, doi: 10.1093/bioinformatics/btu602 (2014).
Chen, W., Lei, T. Y., Jin, D. C., Lin, H. & Chou, K. C. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Analytical biochemistry 456, 53–60 (2014).
Article CAS PubMed Google Scholar
Liu, G., Xing, Y. & Cai, L. Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. Journal of theoretical biology 382, 15–22 (2015).
Article PubMed Google Scholar
Vapnik, V. N. & Vapnik, V. Statistical learning theory. Vol. 1 (Wiley: New York,, 1998).
Liu, B., Wang, S., Dong, Q., Li, S. & Liu, X. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning. IEEE Transactions on NanoBioscience, doi: 10.1109/TNB.2016.2555951 (2016).
Zou, Q., Mao, Y., Hu, L., Wu, Y. & Ji, Z. miRClassify: an advanced web server for miRNA family classification and annotation. Comput Biol Med 45, 157–160 (2014).
Article CAS PubMed Google Scholar
Dapeng, L., Ying, J. & Quan, Z. Protein Folds Prediction with Hierarchical Structured SVM. Current Proteomics 13, 79–85 (2016).
Article Google Scholar
Chen, W. & Lin, H. Prediction of midbody, centrosome and kinetochore proteins based on gene ontology information. Biochemical and biophysical research communications 401, 382–384 (2010).
Article CAS PubMed Google Scholar
Zou, Q., Zeng, J., Cao, L. & Ji, R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173, 346–354 (2016).
Article Google Scholar
Chen, W., Tran, H., Liang, Z., Lin, H. & Zhang, L. Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep 5, 13859 (2015).
Article ADS PubMed PubMed Central Google Scholar
Liu, B., Fang, L., Long, R., Lan, X. & Chou, K.-C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32, 362–369 (2016).
Article CAS PubMed Google Scholar
Chen, W., Feng, P., Ding, H., Lin, H. & Chou, K.-C. iRNA-methyl: identifying N 6-methyladenosine sites using pseudo nucleotide composition. Analytical biochemistry 490, 26–33 (2015).
Article CAS PubMed Google Scholar
Chen, J., Wang, X. & Liu, B. iMiRNA-SSF: improving the identification of MicroRNA precursors by combining negative sets with different distributions. Scientific reports 6 (2016).
Chang, C.-C. & Lin, C.-J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27 (2011).
Google Scholar

Download references

Acknowledgements

This work was supported by the National High Technology Research and Development Program of China (863 Program) [2015AA015405], the National Natural Science Foundation of China (No. 61300112, 61672184, 61573118 and 61272383, 61572151), the Natural Science Foundation of Guangdong Province (2014A030313695), Guangdong Natural Science Funds for Distinguished Young Scholars (2016A030306008), and Scientific Research Foundation in Shenzhen (Grant No. JCYJ20150626110425228).

Author information

Liu Bingquan and Liu Yumeng contributed equally to this work.

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150080, Heilongjiang, China
Bingquan Liu
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Yumeng Liu, Xiaolong Wang & Bin Liu
School of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
Xiaopeng Jin
Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Xiaolong Wang & Bin Liu

Authors

Bingquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yumeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.L. conceived of the study and designed the experiments, participated in designing the study, drafting the manuscript and performing the statistical analysis. Y.L. participated in coding the experiments and drafting the manuscript. B.Q.L., X.P.J. and X.L.W. participated in performing the statistical analysis. All authors read and approved the final manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Liu, B., Liu, Y., Jin, X. et al. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Sci Rep 6, 33483 (2016). https://doi.org/10.1038/srep33483

Download citation

Received: 17 March 2016
Accepted: 25 August 2016
Published: 19 September 2016
DOI: https://doi.org/10.1038/srep33483
Springer Nature Limited

This article is cited by

Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting
- Shengli Zhang
- Tian Xue
Molecular Genetics and Genomics (2020)

iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance

Abstract

Similar content being viewed by others

Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM

iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples

Recombination spot identification Based on gapped k-mers

Introduction

Results

Influence of parameters on the predictive performance of iRSpot-DACC

The computational performance of iRSpot-DACC can be further improved by using PCA