RNA Assembly Optimization
#
Find similar titles
- 최초 작성자
- 최근 업데이트
Structured data
- Category
- Analysis
Table of Contents
Introduction: #
Next-generation sequencing has made it possible to perform differential Gene expression studies in non-model organisms. For these studies, reference genome were prepared from De novo assembly using the RNA-seq data. However, Transcriptome assembly produces a multitude of contigs with differnt isoform which can also be created by sequence artifacts and assembler artifacts. Those can create a false prediction at the down stream analysis. so to remove those assembled contigs must be clustered into unitigs prior to differential gene expression detection. Based on these objective the Corset, a method were developed, which can cluster contigs based on the reference mapped read count from multiple samples.
Clustering: #
Corset is the best methods to cluster the assembled contigs from mapped reads, when compared to CAP3 and CD-HIT-EST
Commands : #
Step 1: #
Map the reads to assembled transcriptome
Step 2: #
/data/Bioinformatics/Tools/corset-1.03-linux64/corset test.bam,sample1.bam,sample2.bam,
Step 3: #
/data/Bioinformatics/Tools/corset-1.03-linux64/corset_fasta_ID_changer clusters.txt ../All.fasta > Corset.fasta
source code #
https://code.google.com/p/corset-project/
Reference #
- Corset: enabling differential gene expression analysis for de novo assembled transcriptomes, Genome Biology