Skip to content

RNA Assembly Optimization #
Find similar titles

Structured data

Category
Analysis

Introduction: #

Next-generation sequencing has made it possible to perform differential Gene expression studies in non-model organisms. For these studies, reference genome were prepared from De novo assembly using the RNA-seq data. However, Transcriptome assembly produces a multitude of contigs with differnt isoform which can also be created by sequence artifacts and assembler artifacts. Those can create a false prediction at the down stream analysis. so to remove those assembled contigs must be clustered into unitigs prior to differential gene expression detection. Based on these objective the Corset, a method were developed, which can cluster contigs based on the reference mapped read count from multiple samples.

Image

Clustering: #

Corset is the best methods to cluster the assembled contigs from mapped reads, when compared to CAP3 and CD-HIT-EST

Commands : #

Step 1: #

Map the reads to assembled transcriptome

Step 2: #

/data/Bioinformatics/Tools/corset-1.03-linux64/corset test.bam,sample1.bam,sample2.bam,

Step 3: #

/data/Bioinformatics/Tools/corset-1.03-linux64/corset_fasta_ID_changer clusters.txt ../All.fasta > Corset.fasta

source code #

https://code.google.com/p/corset-project/

Reference #

  1. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes, Genome Biology

Suggested Pages #

0.0.1_20140628_0