Transcriptome assembly, also known as novel transcript discovery or reconstruction, is the problem of assembling the full length transcript sequences from the RNA sequencing data. Here, we focus on two main problems in transcriptome analysis, namely, transcriptome assembly and quantification. The ability to reconstruct full length transcript sequences and accurately estimate their expression levels is widely believed to be critical for unraveling gene functions and transcription regulation mechanisms. Most prevalently, alternative splicing is estimated to take place for over 90% of the multi-exon human genes across diverse cell types, with as much as 68% of multi-exon genes expressing multiple isoforms in a clonal cell line of colorectal cancer origin. Ubiquitous regulatory mechanisms such as the use of alternative transcription start and polyadenylation sites, alternative splicing, and RNA editing result in multiple messenger RNA (mRNA) isoforms being generated from a single genomic locus. Another challenge in transcriptomic analysis comes from the ambiguities in read/tag mapping to transcripts.
The incompleteness of annotation libraries poses a serious limitation to using this powerful technology since accurate normalization of RNA-Seq data critically requires knowledge of expressed transcript sequences. Unfortunately, as shown by recent targeted RNA-Seq studies, existing transcript libraries still miss large numbers of transcripts. Most current research using RNA-Seq employs methods that depend on existing transcriptome annotations.
The Ion Torrent technology offers the fastest sequencing protocol for RNA-Seq experiments able to sequence whole transcriptome in few hours. RNA-Seq has become the technology of choice for performing transcriptome analysis, rapidly replacing array-based technologies. Massively parallel whole transcriptome sequencing, commonly referred to as RNA-Seq, and its ability to generate full transcriptome data at the single transcript level, provides a powerful tool with multiple interrelated applications, including transcriptome assembly, gene and transcript expression level estimation, also known as transcriptome quantification, studying trans- and cis-regulatory effects, studying parent-of-origin effects, and calling expressed variants. Experimental results suggest increased transcriptome assembly and quantification accuracy of MaLTA-IsoEM solution compared to existing state-of-the-art approaches. The MaLTA-IsoEM tool is publicly available at: ConclusionsĮxperimental results on both synthetic and real datasets show that Ion Torrent RNA-Seq data can be successfully used for transcriptome analyses. A new version of the IsoEM algorithm suitable for Ion Torrent RNA-Seq reads is used to accurately estimate transcript expression levels. Our approach explores transcriptome structure and incorporates a maximum likelihood model into the assembly and quantification procedure. We present MaLTA, a method for simultaneous transcriptome assembly and quantification from Ion Torrent RNA-Seq data. Another challenge in transcriptomic analysis comes from the ambiguities in mapping reads to transcripts. The sequences of novel transcripts can be reconstructed from deep RNA-Seq data, but this is computationally challenging due to sequencing errors, uneven coverage of expressed transcripts, and the need to distinguish between highly similar transcripts produced by alternative splicing. High throughput RNA sequencing (RNA-Seq) can generate whole transcriptome information at the single transcript level providing a powerful tool with multiple interrelated applications including transcriptome reconstruction and quantification.