Sequencing technologies reshaped the way we are doing molecular research for the last two decades. Directly sequencing the transcriptome has become a fast, cost-effective, and robust way to study gene expression in various manners. To sequence genomes or transcriptomes of organisms, we have to break apart the sequence into millions of short sequence fragments that can be read by the sequencing machine. In the absence of an appropriate reference genome, those reads need to be put back together to the original genomic/transcriptomic sequence de novo.
Many tools have been developed to solve the de novo transcriptome assembly problem that is especially hard due to alternative splicing events, uneven coverage distributions, and repetitive regions. The big question is:
Which assembly software should you use to build a complete de novo assembly?
Martin did a large-scale comparative study, applying ten de novo assembly tools to nine RNA-Seq data sets spanning different kingdoms of life (bacteria, fungi, plant, animal).
And the winners are:
Trinity, rnaSPAdes, and Trans-ABySS generally outperformed all other tools tested. But: None of the tools was best on all data sets. Well-maintained tools generally outperformed other tools and should be preferred.
If you have to assemble transcriptomic sequencing data de novo, you should run different assembly tools and parameter settings combined with a careful choice and normalization of evaluation metrics to select the best results as a critical step in the reconstruction of a most complete de novo transcriptome assembly.
The study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions and evaluation results. All data (processed read data, assemblies, blast alignments, and mapping files) has been uploaded to the Open Science Framework.[bibtex file=https://raw.githubusercontent.com/rnajena/literature/master/webpage_literature.bib key=Hölzer:19]