Skip to content

CNCI #
Find similar titles

Structured data

Category
Software

Coding-Non-Coding Index (CNCI) #

  1. 단백질로 coding되는 mRNA와 coding되지 않는 long non-coding RNA (lncRNAs)를 구별하기 위한 분석 프로그램이다.
  2. NGS reads를 활용한 de novo transcripts의 coding/ non-coding RNA로 구별한다.
  3. Human의 training set을 대상으로 척추동물들의 lncRNA구분은 탁월하게 분석된 반면, 무척추 동물 및 식물에서는 맞지 않음을 통해 유연관계 분석에도 적용이 가능함을 제시한다.
    Image

  4. CNCI version2 release를 통해 plant model도 가능하다.

원리 #

  1. protein-coding sequence와 non-coding sequence간의 Adjoining nucleotide triplets(ANT)의 frequence 차이를 이용하여 구분한다.
    Image

  2. 잘 연구된 protein-coding sequence와 non-coding sequence를 training set으로 하는 ANT profiling을 수행하여 matrix화 한다.
    Image

  3. 미지의 서열을 대상으로 ANT frequency의 matrix Score value를 통해 protein-coding과 non-protein coding으로 구분한다.

분석방법 #

Install #

git clone git@github.com:www-bioinfo-org/CNCI.git
cd CNCI
unzip libsvm-3.0.zip
cd libsvm-3.0
make
cd ..

Analysis (cmd) #

compare.py : compare the merged/assembled transcripts with known gene annotation! #

Usage: compare.py [-h] -c coding_ref -n noncoding_ref -i input_gtf -o out_dir

Parameters:

-h, --help show this help message and exit.

-c CODING_REF, --coding_ref=CODING_REF

(Required.) The path of coding reference gtf file. Two mandatory attributes (gene_id "value"; transcript_id "value") should be provided in the file. Some files which has been prepared could be download at http://www.bioinfo.org/np/

-n NONCODING_REF, --noncoding_ref=NONCODING_REF.

(Required.) The path of lincRNA reference gtf file. Two mandatory attributes (gene_id "value"; transcript_id "value") should be provided in the file. Some files which has been prepared could be download at http://www.bioinfo.org/np/

-i INPUT_GTF, --input_gtf=INPUT_GTF

(Required.) The path of user input assemble gtf file. This file usually be generated by cufflinks/cuffcompare/cuffmerge. Also, two mandatory attributes (gene_id "value"; transcript_id "value") should be provided in the file.

-o OUT_DIR, --out_dir=OUT_DIR

(Required.) Output dirctory of the results.

CNCI : A classification tool for identify coding or non-coding transcripts (fasta files and gtf files) #

Parameters:

-f or --file : input files

-o or --out : assign your output file in current directory (this parameter will produce a Temp sub-folder in current directory, and will remove it automatically at the end of programming), and the result is stored in xxx.index

-p or --parallel : assign the running CUP numbers

-m or --model : assign the classification models ("ve" for vertebrate species, "pl" for plat species)

-g or --gtf : if you input files is gtf format please use this parameter

-d or --directory : if you use the -g or --gtf this parameter must be assigned, within this parameter please assign the path of your reference genome.

filter_novel_lincRNA.py : A tool that can convert the index file which produced by python CNCI_package/CNCI.py to four gene classes (novel_lincRNA, novel_coding, ambiguous_genes and filter_out_noncoding) #

Usage: filter_novel_lincRNA.py [-h] [-s 0] [-l 200] [-e 2] -i cnci_index -g unannotated_gtf -o out_dir

Parameters:

-h, --help show this help message and exit

-i INDEX, --index=INDEX

(Required.) The path of coding/noncoding index file. This file is the output file of CNCI.py.

-g GTF, --gtf=GTF

(Required.) The path of potentially_novel gtf file. This file could be generated by compare.py.

-s SCORE, --score=SCORE

(Optional.) Threoshold of CNCI score. RNAs with score less than SCORE will be classified as noncoding. The Default is 0 .

-l LENGTH, --length=LENGTH

(Optional.) Minimal length of lincRNA. lincRNA with length >= LENGTH will be kept. The Default is 200.

-e EXON_NUM, --exon_num=EXON_NUM

(Optional.) Minimal exon number of lincRNA. lincRNA with exon number >= EXON_NUM will be kept. The Default is 2.

-o OUT_DIR, --out_dir=OUT_DIR

(Requried.) Output directory of the results.

Example #

python CNCI_package/CNCI.py -f unannotation.gtf -g -o test -m ve -p 8 -d hg19.2bit

python filter_novel_lincRNA.py -i test.index -g unannotation.gtf -s 0 -l 200 -e exon_num -o out_dir

python extract.py -i novel-noncoding.gtf,nov.gtf -n known-non-coding.gtf -c known-coding.gtf

Reference #

  1. Liang Sun, Haitao Luo, Dechao Bu, Guoguang Zhao, Kuntao Yu, Changhai Zhang, Yuanning Liu, RunSheng Chen and Yi Zhao* Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Research (2013), doi: 10.1093/nar/gkt646

Suggested Pages #

0.0.1_20140628_0