Abstract
Transcriptome reconstruction is an important application of RNA-Seq, providing critical information for further analysis of transcriptome. Although RNA-Seq offers the potential to identify the whole picture of transcriptome, it still presents special challenges. To handle these difficulties and reconstruct transcriptome as completely as possible, current computational approaches mainly employ two strategies: de novo assembly and genome-guided assembly. In order to find the similarities and differences between them, we firstly chose five representative assemblers belonging to the two classes respectively, and then investigated and compared their algorithm features in theory and real performances in practice. We found that all the methods can be reduced to graph reduction problems, yet they have different conceptual and practical implementations, thus each assembly method has its specific advantages and disadvantages, performing worse than others in certain aspects while outperforming others in anther aspects at the same time. Finally we merged assemblies of the five assemblers and obtained a much better assembly. Additionally we evaluated an assembler using genome-guided de novo assembly approach, and achieved good performance. Based on these results, we suggest that to obtain a comprehensive set of recovered transcripts, it is better to use a combination of de novo assembly and genome-guided assembly.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10: 57–63
Garber M, Grabherr M G, Guttman M, et al. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods, 2011, 8: 469–477
Martin J, Bruno V M, Fang Z, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 2010, 11: 663
Martin J A, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet, 2011, 12: 671–682
Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc, 2012, 7: 562–578
Birol I, Jackman S D, Nielsen C B, et al. De novo transcriptome assembly with ABySS. Bioinformatics, 2009, 25: 2872–2877
Zerbino D R, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 2008, 18: 821–829
Robertson G, Schein J, Chiu R, et al. De novo assembly and analysis of RNA-seq data. Nat Methods, 2010, 7: 909–912
Grabherr M G, Haas B J, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol, 2011, 29: 644–652
Haas B J, Zody M C. Advancing RNA-Seq analysis. Nat Biotechnol, 2010, 28: 421–423
Chen G, Wang C, Shi T. Overview of available methods for diverse RNA-Seq data analyses. Sci China Life Sci, 2011, 54: 1121–1128
Roberts A, Pimentel H, Trapnell C, et al. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics, 2011, 27: 2325–2329
Trapnell C, Williams B A, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol, 2010, 28: 511–515
Guttman M, Garber M, Levin J Z, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol, 2010, 28: 503–510
Schulz M H, Zerbino D R, Vingron M, et al. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 2012, 28: 1086–1092
Lander E S, Waterman M S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics, 1988, 2: 231–239
Simpson J T, Wong K, Jackman S D, et al. ABySS: a parallel assembler for short read sequence data. Genome Res, 2009, 19: 1117–1123
Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009, 10: R25
Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics, 2009, 25: 2078–2079
Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111
Kent W J. BLAT-the BLAST-like alignment tool. Genome Res, 2002, 12: 656–664
Quinlan A R, Hall I M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010, 26: 841–842
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22: 1658–1659
DeLuca D S, Levin J Z, Sivachenko A, et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics, 2012, 28: 1530–1532
Stanke M, Tzvetkova A, Morgenstern B. AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol, 2006, 7(Suppl1): S11.1–8
Medvedev P, Georgiou K, Myers G, et al. Computability of Models for Sequence Assembly, in Algorithms in Bioinformatics. Berlin Heidelberg: Springer, 2007. 289–301
Nagarajan N, Pop M. Parametric complexity of sequence assembly: theory and applications to next generation sequencing. J Comput Biol, 2009, 16: 897–908
Miller J R, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics, 2010, 95: 315–327
Lee C. Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics, 2003, 19: 999–1008
Schulz M H. Data structures and algorithms for analysis of alternative splicing with RNA-seq data. Dissertation for doctoral degree. Berlin: Free University of Berlin, 2010
Xing Y, Resch A, Lee C. The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res, 2004, 14: 426–441
Trapnell B C. Transcript assembly and abundance estimation with high-throughput RNA sequencing. Dissertation for doctoral degree. College Park: University of Maryland, 2010
Iyer M K, Chinnaiyan A M. RNA-Seq unleashed. Nat Biotechnol, 2011, 29: 599–600
Chen G, Li R Y, Shi L M, et al. Revealing the missing expressed genes beyond the human reference genome by RNA-Seq. BMC Genomics, 2011, 12: 590
Zhao Q Y, Wang Y, Kong Y M, et al. Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 2011, 12: S2
Henschel R, Lieber M, Wu L S, et al. Trinity RNA-Seq assembler performance optimization. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond, Chicago, Illinois, USA, 2012. 1–8
Author information
Authors and Affiliations
Corresponding authors
Additional information
This article is published with open access at Springerlink.com
Electronic supplementary material
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Lu, B., Zeng, Z. & Shi, T. Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq. Sci. China Life Sci. 56, 143–155 (2013). https://doi.org/10.1007/s11427-013-4442-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-013-4442-z