Abstract
New approaches aiming at a detailed similarity/dissimilarity analysis of DNA sequences are formulated. Several corrections that enrich the information which may be derived from the alignment methods are proposed. The corrections take into account the distributions along the sequences of the aligned bases (neglected in the standard alignment methods). As a consequence, different aspects of similarity, as for example asymmetry of the gene structure, may be studied either using new similarity measures associated with four-component spectral representation of the DNA sequences or using alignment methods with corrections introduced in this paper. The corrections to the alignment methods and the statistical distribution moment-based descriptors derived from the four-component spectral representation of the DNA sequences are applied to similarity/dissimilarity studies of β-globin gene across species. The studies are supplemented by detailed similarity studies for histones H1 and H4 coding sequences. The data are described according to the latest version of the EMBL database. The work is supplemented by a concise review of the state-of-art graphical representations of DNA sequences.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Fuchs R.: Bioinformatics 18, 505 (2002)
Herzel H., Ebeling W., Schmidt A.O.: Phys. Rev. E 50, 5061 (1994)
Mantegna R.N., Buldyrev S.V., Goldberger A.L., Havlin S., Peng C.-K., Simons M., Stanley H.E.: Phys. Rev. E 52, 2939 (1995)
Li W.: Comput. Chem. 21, 257 (1997)
Berger J.A., Mitra S.K., Carli M., Neri A.: J. Franklin Inst. 341, 37 (2004)
Voss R.F.: Phys. Rev. Lett. 68, 3805 (1992)
Arneodo A., Bacry E., Graves P.V., Muzy J.F.: Phys. Rev. Lett. 74, 3293 (1995)
Buldyrev S.V., Goldberger A.L., Havlin S., Mantegna R.N., Matsa M.E., Peng C.-K., Simons M., Stanley H.E.: Phys. Rev. E 51, 5084 (1995)
Azbel M.Y.: Phys. Rev. Lett. 75, 168 (1995)
Peng C.-K., Buldyrev S.V., Goldberger A.L., Havlin S., Sciortino F., Simons M., Stanley H.E.: Nature 356, 168 (1992)
Silverman B.D., Linsker R.: J. Theor. Biol. 118, 295 (1986)
Audit B., Thermes C., Vaillant C., d’Aubenton-Carafa Y., Muzy J.F., Arneodo A.: Phys. Rev. Lett. 86, 2471 (2001)
Afreixo V., Bastos C.A.C., Pinho A.J., Garcia S.P., Ferreira P.J.S.G.: Bioinformatics 25, 3064 (2009)
Chenna R., Sugawara H., Koike T., Lopez R., Gibson T.J., Higgins D.G., Thompson J.D.: Nucleic Acids Res. 31, 3497 (2003)
Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.: J. Mol. Biol. 215, 403 (1990)
Needleman S.B., Wunsch C.D.: J. Mol. Biol. 48, 443 (1970)
Notredame C., Higgins D.G., Heringa J.: J. Mol. Biol. 302, 205 (2000)
Durbin R., Eddy S.R., Krogh A., Mitchison G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
Waterman M.S.: Introduction to Computational Biology: Maps, Sequences, and Genomes: Interdisciplinary Statistics. Chapman and Hall/CRC, Boca Raton, FL (1995)
Vinga S., Almeida J.: Bioinformatics 19, 513 (2003)
Pham T.D., Zuegg J.: Bioinformatics 20, 3455 (2004)
Jaklič G., Pisanski T., Randić M.: J. Comput. Biol. 13, 1558 (2006)
Zhang B.-H., Wang H.-S., Xu L.: Chemometr. Intell. Lab. Syst. 87, 194 (2007)
Chen W., Liao B., Liu Y., Zhu W., Su Z.: MATCH Commun. Math. Comput. Chem. 60, 291 (2008)
Zhang Y.: MATCH Commun. Math. Comput. Chem. 60, 313 (2008)
Li C., Yu X., Helal N.: Chem. Phys. Lett. 459, 172 (2008)
Li C., Wang J.: J. Math. Chem. 43, 26 (2008)
Chen W., Zhang Y.: MATCH Commun. Math. Comput. Chem. 61, 533 (2009)
Chen W., Zhang Y.: MATCH Commun. Math. Comput. Chem. 61, 781 (2009)
Feng J., Hu Y., Wan P., Zhang A., Zhao W.: J. Theor. Biol. 266, 703 (2010)
Bai F., Zhang J., Zheng J.: Appl. Math. Lett. 24, 232 (2011)
Randić M., Balaban A.T.: J. Chem. Inf. Comput. Sci. 43, 532 (2003)
Chi R., Ding K.: Chem. Phys. Lett. 407, 63 (2005)
Tang X.C., Zhou P.P., Qiu W.Y.: Chinese Sci. Bull. 55, 701 (2010)
Liao B., Li R., Zhu W., Xiang X.: J. Math. Chem. 42, 47 (2007)
Liao B., Wang T.: J. Chem. Inf. Comput. Sci. 44, 1666 (2004)
Hönl M., Ragan M.A.: Syst. Biol. 56, 206 (2007)
Liao B., Tan M., Ding K.: Chem. Phys. Lett. 414, 296 (2005)
Liao B.: Chem. Phys. Lett. 401, 196 (2005)
Liao B., Shan X., Zhu W., Li R.: Chem. Phys. Lett. 422, 282 (2006)
Liao B., Xiang X., Zhu W.: J. Comput. Chem. 27, 1196 (2006)
Yau S.S., Wang J., Niknejad A., Lu C., Jin N., Ho Y.: Nucleic Acid Res. 31, 3078 (2003)
Yu C., Liang Q., Yin C., He R.L., Yau S.-T.: DNA Res. 17, 155 (2010)
Wang W., Liao B., Wang T., Zhu W.: Int. J. Quantum Chem. 106, 1998 (2006)
Wang H., Zhang Y.: Int. J. Quantum Chem. 110, 1964 (2010)
Bai F., Zhu W., Wang T.: Chem. Phys. Lett. 408, 258 (2005)
Randić M., Plavšić D.: Chem. Phys. Lett. 476, 277 (2009)
Nandy A., Basak S.C., Gute B.D.: J. Chem. Inf. Model. 47, 945 (2007)
Ghosh A., Nandy A., Nandy P., Gute B.D., Basak S.C.: J. Chem. Inf. Model. 49, 2627 (2009)
Ghosh A., Nandy A., Nandy P.: BMC Struct. Biol. 10, 22 (2010)
Randić M.: Chem. Phys. Lett. 440, 291 (2007)
Li Y., Huang G., Liao B., Liu Z.: MATCH Commun. Math. Comput. Chem. 61, 519 (2009)
Yao Y.-H., Dai Q., Li L., Nan X.-Y., He P.-A., Zhang Y.-Z.: J. Comput. Chem. 31, 1045 (2010)
He P.-A., Zhang Y.-P., Yao Y.-H, Tang Y.-F., Nan X.-Y.: J. Comput. Chem. 31, 2136 (2010)
Bender A., Glen R.C.: Org. Biomol. Chem. 2, 3204 (2004)
Bielińska-Wa̧ż D., Wa̧ż P., Basak S.C.: Eur. Phys. J. B 50, 333 (2006)
Carbó-Dorca, R., Mezey, P.G. (eds): Advances in Molecular Similarity, vol. 2, pp. 297. JAI Press, Stamford (1998)
Livingstone D.J., Clark T., Ford M.G., Hudson B.D., Whitley D.C.: SAR QSAR Environ. Res. 19, 285 (2008)
Devillers, J., Balaban, A.T. (eds): Topological Indices and Related Descriptors in QSAR and QSPR, pp. 811. Gordon and Breach Science Publishers, The Netherlands (1999)
Basak S.C., Gute B.D., Mills D., Hawkins D.M.: J. Mol. Struct. (Theochem) 622, 127 (2003)
Basak S.C., Mills D.: J. Math. Chem. 49, 185 (2011)
D. Bielińska-Wa̧ż, P. Wa̧ż, S.C. Basak, R. Natarajan, in Symmetry, Spectroscopy and SCHUR, ed. by R.C. King et al. (Nicolaus University Press, Toruń, 2006), pp. 27–32
Bielińska-Wa̧ż D., Wa̧ż P., Basak S.C.: J. Math. Chem. 42, 1003 (2007)
Aguero-Chapin G., González-Díaz H., Molina R., Varona-Santos J., Uriarte E., González-Díaz Y.: FEBS Lett. 580, 723 (2006)
Bielińska-Wa̧ż D.: J. Math. Chem. 47, 41 (2010)
Bhasi K., Zhang L., Brazeau D., Zhang A., Ramanathan M.: Bioinformatics 22, 1569 (2006)
Yin C., Yau S.-T: J. Theor. Biol. 247, 687 (2007)
Clamp M., Fry B., Kamal M., Xie X., Cuff J., Lin M.F., Kellis M., Lindblad-Toh K., Lander E.S.: PNAS 104, 19428 (2007)
Zhang C.T., Wang J.: Nucleic Acids Res. 28, 2804 (2000)
Yu J.-F., Sun X.: J. Comput. Chem. 31, 2126 (2010)
D. Bielińska-Wa̧ż, S. Subramaniam, A new view on similarity of DNA sequences (in preparation).
Mon K.K., French J.B.: Ann. Phys. NY 95, 90 (1975)
Brody T.A., Flores J., French J.B., Mello P.A., Pandey A., Wong S.S.M.: Rev. Mod. Phys. 53, 385 (1981)
J.B. French, V.K. Kota, Annual Review of Nuclear and Particle Science, ed. J.D. Jackson, H.E. Gove, R.F. Schwitters (Palo Alto, CA, 1982), p. 35
Bethe H.A.: Phys. Rev. 50, 332 (1936)
Ratcliff K.F.: Phys. Rev. C 3, 117 (1971)
Bancewicz M., Diercksen G.H.F., Karwowski J.: Phys. Rev. A 40, 5507 (1989)
Bielińska-Wa̧ż D., Flocke N., Karwowski J.: Phys. Rev. B 59, 2676 (1999)
Kendall M.G.: The Advanced Theory of Statistics, vol. 1. Charles Griffin, London (1943)
Ivanov V.S., Sovkov V.B.: Opt. Spectrosc. 74, 30 (1993)
Ivanov V.S., Sovkov V.B.: Opt. Spectrosc. 74, 52 (1993)
Bielińska-Wa̧ż D., Karwowski J.: Phys. Rev. A 52, 1067 (1995)
Bielińska-Wa̧ż D., Karwowski J.: J. Quant. Spec. Rad. Transf. 59, 39 (1998)
Lax M.: J. Chem. Phys. 20, 1752 (1952)
Bauche-Arnoult C., Bauche J., Klapisch M.: Phys. Rev. A 31, 2248 (1985)
D. Bielińska-Wa̧ ż, Symmetry and Structural Properties of Condensed Matter, ed. T. Lulek et al. (World Scientific, Singapore 1999), pp. 212–221.
Hamori E.: Nature 314, 585 (1985)
Gates M.A.: Nature 316, 219 (1985)
Nandy A.: Curr. Sci. 66, 309 (1994)
Leong P.M., Morgenthaler S.: Comput. Appl. Biosci. 11, 503 (1995)
Hamori E., Ruskin J.: J. Biol. Chem. 258, 1318 (1983)
Nandy A.: Curr. Sci. 66, 821 (1994)
Mizraji E., Ninio L.: Biochemie 67, 445 (1985)
Lobry J.R.: Biochemie 78, 323 (1996)
Guo X., Randić M., Basak S.C.: Chem. Phys. Lett. 350, 106 (2001)
Liu Y., Guo X., Pan L., Wang S.: J. Chem. Inf. Comput. Sci. 42, 529 (2002)
Huang G., Liao B., Li Y., Liu Z.: Chem. Phys. Lett. 462, 129 (2008)
Huang G., Liao B., Li Y., Yu Y.: Biophys. Chem. 143, 55 (2009)
Li C., Wang J.: Internet Electron. J. Mol. Des. 1, 000 (2003)
Bielińska-Wa̧ż D., Clark T., Wa̧ż P., Nowak W., Nandy A.: Chem. Phys. Lett. 442, 140 (2007)
Bielińska-Wa̧ż D., Nowak W., Wa̧ż P., Nandy A., Clark T.: Chem. Phys. Lett. 443, 408 (2007)
Zhang Z.-J.: Bioinformatics 25, 1112 (2009)
Liu Z., Liao B., Zhu W., Huang G.: Int. J. Quantum Chem. 109, 948 (2009)
Liu Z., Liao B., Zhu W.: MATCH Commun. Math. Comput. Chem. 61, 541 (2009)
Randić M., Vračko M., Lerš N., Plavšić D.: Chem. Phys. Lett. 368, 1 (2003)
Randić M., Vračko M., Lerš N., Plavšić D.: Chem. Phys. Lett. 371, 202 (2003)
Scholes P.A.: The Oxford Companion to Music, 10th ed. Oxford University Press, Oxford (1986)
Li C., Wang J.: Comb. Chem. High Throughput Screen. 6, 795 (2003)
Song J., Tang H.: J. Biochem. Biophys. Methods 63, 228 (2005)
Liao B., Wang T.: J. Comput. Chem. 25, 1364 (2004)
Wang J., Zhang Y.: Chem. Phys. Lett. 423, 50 (2006)
Yao Y., Wang T.: Chem. Phys. Lett. 398, 318 (2004)
Randić M.: Chem. Phys. Lett. 456, 84 (2008)
Jeffrey H.J.: Nucleic Acids Res. 18, 2163 (1990)
Jeffrey H.J.: Comput. Graphics 16, 25 (1992)
Randić M., Vračko M., Zupan J., Novič M.: Chem. Phys. Lett. 373, 558 (2003)
Randić M.: Chem. Phys. Lett. 386, 468 (2004)
Randić M., Lerš N., Plavšić D., Basak S.C., Balaban A.T.: Chem. Phys. Lett. 407, 205 (2005)
Pesek I., Zerovnik J.: MATCH Commun. Math. Comput. Chem. 60, 301 (2008)
Randić M., Vračko M., Nandy A., Basak S.C.: J. Chem. Inf. Comp. Sci. 40, 1235 (2000)
Li C., Wang J.: Comb. Chem. High Throughput Screen. 7, 23 (2004)
Yao Y., Nan X., Wang T.: Chem. Phys. Lett. 411, 248 (2005)
Yuan C., Liao B., Wang T.: Chem. Phys. Lett. 379, 412 (2003)
Liao B., Wang T.: J. Mol. Struct. Theochem 681, 209 (2004)
Liao B., Wang T.: Chem. Phys. Lett. 388, 195 (2004)
Liao B., Zhang Y., Ding K., Wang T.J.: Mol. Struct. 717, 199 (2005)
Chen W., Liao B., Xiang X., Zhu W.: MATCH Commun. Math. Comput. Chem. 61, 767 (2009)
Cao Z., Li R., Chen W.: Int. J. Quantum. Chem 110, 975 (2010)
Zhang C.-T., Zhang R., Ou H.-Y: Bioinformatics 19, 593 (2003)
Cao Z., Liao B., Li R.: Int. J. Quantum. Chem. 108, 1485 (2008)
Qi Z.-H., Fan T.-R.: Chem. Phys. Lett. 442, 434 (2007)
Qi X.-Q., Wen J., Qi Z.-H.: J. Theor. Biol. 249, 681 (2007)
Yu J.-F., Wang J.-H., Sun X.: MATCH Commun. Math. Comput. Chem. 63, 493 (2010)
Yu J.-F., Sun X., Wang J.-H.: J. Theor. Biol. 261, 459 (2009)
A. Nandy, M. Harle, S.C. Basak, Arkivoc ix (2006) 211.
Yuan C., Liu L., Wang T., Li C.: J. Math. Chem. 43, 1177 (2008)
He P., Wang J.: Internet Electron. J. Mol. Des. 1, 668 (2002)
Liao B., Tan M., Ding K.: Chem. Phys. Lett. 414, 296 (2005)
Gates M.A.: J. Theor. Biol. 119, 319 (1986)
Raychaudhury C., Nandy A.: J. Chem. Inf. Comput. Sci. 39, 243 (1999)
Guo X., Nandy A.: Chem. Phys. Lett. 369, 361 (2003)
D. Bielińska-Wa̧ż, P. Wa̧ż, W. Nowak, A. Nandy, S.C. Basak, American Institute of Physics (AIP) Conference Proceedings 963 (New York 2007), pp. 28–30.
Bielińska-Wa̧ż D., Nowak W., Pepłowski Ł., Wa̧ż P., Basak S.C., Natarajan R.: J. Math. Chem. 43, 1560 (2008)
Bielińska-Wa̧ż D., Wa̧ż P.: J. Math. Chem. 43, 1287 (2008)
Guo Y., Wang T.: J. Mol. Struct. Theochem. 853, 62 (2008)
Bielińska-Wa̧ż D., Wa̧ż P., Clark T.: Chem. Phys. Lett. 445, 68 (2007)
Bielińska-Wa̧ż D., Subramaniam S.: J. Theor. Biol. 266, 667 (2010)
Larkin M.A., Blackshields G., Brown N.P., Chenna R., McGettigan P.A., McWilliam H., Valentin F., Wallace I.M., Wilm A., Lopez R., Thompson J.D., Gibson T.J., Higgins D.G.: Bioinformatics 23, 2947 (2007)
He P., Wang J.: J. Chem. Inf. Comput. Sci. 42, 1080 (2002)
Randić M., Vračko M.J.: J. Chem. Inf. Comput. Sci. 40, 599 (2000)
Randić M., Guo X., Basak S.C.: J. Chem. Inf. Comput. Sci. 41, 619 (2001)
Liu Y.-z., Wang T.-m.: Chem. Phys. Lett. 417, 173 (2006)
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Bielińska-Wąż, D. Graphical and numerical representations of DNA sequences: statistical aspects of similarity. J Math Chem 49, 2345–2407 (2011). https://doi.org/10.1007/s10910-011-9890-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10910-011-9890-8