Skip to content

Find similar titles

Structured data


Genome annotation의 유전자 구조를 예측하는 MAKER #

Genome 시퀀싱 이후 유전체 어셈블리와 유전체 구조 분석 및 단백질 기능 분석은 기본적인 생물종의 정보를 확인하는 방법이다. 이중 유전체 구조 분석은 mRNA 서열 혹은 단백질 서열을 유전체 서열에 mapping을 통해 진행되는 것이 일반적이다. 그 대표적인 프로그램으로 MAKER를 들 수 있다

MAKER의 주요 기능 #

  1. RepeatMasker를 통한 repeat elements 분석
  2. ESTs 서열 mapping을 통한 유전자 모델링 (BLASTN, Exonerate)
  3. 단백질 서열 mapping을 통한 유전자 모델링 (BLASTX, Genewise)
  4. Ab initio 프로그램을 통한 유전자 모델 예측 (SNAP, Augustus, GeneMark-ES, Fgenesh)
  5. 여러 유전자 모델 정보를 통한 cosensus gene model 예측

설치요구사항 #

Perl Modules
    IO::All(Optional, for accessory scripts)
    IO::Prompt(Optional, for accessory scripts)
    forks(Optional, for MPI scripts)
    forks::shared(Optional, for MPI scripts)
External Programs
    Perl 5.8.0 or Higher
    SNAP version 2009-02-03 or higher
    RepeatMasker 3.1.6 or higher
    Exonerate 1.4 or higher
    NCBI BLAST 2.2.X or higher
    Genewise 2.2.0
Optional Components:
    Augustus 2.0 or higher
    GeneMark-ES 2.3a or higher
    FGENESH 2.6 or higher
Required for optional MPI support:

분석 방법 (진핵생물) #

> maker -f -base [outhandle] -cpus 20 maker_opts.ctl >& maker_opts.ctl.log

> cat maker_opts.ctl
#-----Genome (Required for De-Novo Annotation)
genome=my_genome.fasta #genome sequence file in fasta format
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is     eukaryotic

#-----EST Evidence (for best results provide a file for at least one)
est= #non-redundant set of assembled ESTs in fasta format (classic EST analysis)
est_reads= #unassembled nextgen mRNASeq in fasta format (not fully implemented)
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #EST evidence from an external gff3 file
altest_gff=rnaseq_transcripts.gff3 #Alternate organism EST evidence from a separate gff3 file

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=ref1_protein.fasta,ref2_protein.fasta #protein sequence file in fasta format
protein_gff=  #protein homology evidence from an external gff3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib=my_genome_repeat.fasta #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to run repeat masking on prokaryotes (don't change this), 1 = yes, 0 = no

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species=fly #Augustus gene prediction species model
fgenesh_par_file= #Fgenesh parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #gene prediction from protein homology (prokaryotes only), 1 = yes, 0 = no
unmask=0 #Also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

참고 문헌 #