Biopython
GATK
#
Find similar titles
-
최초 작성자
yhshin@insilicogen.com
- 최근 업데이트
Structured data
- Category
- Software
Table of Contents
GATK #
- GenomeAnalysisToolKit
- BroadInstitute에서 개발된 NGS 데이터를 이용한 유전체 분석 tool package
Requirements #
- Mapping file은 반듯이 .bam 파일
- bam 파일은 indexing 되어 있어야 함
- Mapping reads는 refernece ordering을 이용한 sorting이 되어 있어야함
- Reads는 하나 이상의 reads group 정보를 가지고 있어야함
Install programs #
- BWA
- SAMtools
- HTSlib (optional)
- Picard
- Genome Analysis Toolkit (GATK)
- IGV
- R
- FASTQToolkit
Manual #
-
Pre-Processing ( FASTQToolkit )
$ fastx_clipper –a [adapter_seq] -n –o [output.fastq] –i [input.fastq] > [report.txt] $ fastx_quality_filter [-i INFILE] [-o OUTFILE] > [report.txt]
-
Mapping ( BWA )
$ bwa index [-p prefix] [-a bwtsw|is] <in.fasta> $ bwa mem [reference_genome] [read1] [read2] > [output.sam]
-
Mark Duplicates ( Picard )
#ordering $java -jar picard.jar SortSam I=[input.sam] O=[output.bam] SO=coordinate > [log] #duplicate marking $java -jar picard.jar MarkDuplicates I=[input.bam] O=[output.dup.bam] M=[matrix] > [log] #RG tag insertion $java -jar picard.jar \ AddOrReplaceReadGroups \ I=[input.dup.bam] \ O=[output.dup.RG.bam]\ RGID=[id] RGLB=[library] RGPL=[illumina] \ RGPU=[barcord] RGSM=[group] RGCN=[center]> [log]
-
Indel Realignment ( GATK )
# index / dictionary file $ java -jar picard.jar\ CreateSequenceDictionary \ R=[input_ref_seq.fasta] \ O=[ouput_ref_seq.dict] $ samtools faidx [genome.fasta] $ samtools index [input.bam] # RealignerTargetCreator $ java -jar GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R [reference] \ -I [original bam] \ -known [vcf_file] \ -o [output_candiate_region] # Realignment $ java -jar GenomeAnalysisTK.jar\ -T IndelRealigner\ -R [reference] \ -I [original bam] \ -known [vcf_file] \ –targetIntervals [file with target region] \ -o [output_candiate_region] \ –filterNoBases
-
Base Recalibration ( GATK )
#searching $java -jar GenomeAnalysisTK.jar\ -T BaseRecalibrator \ -R [reference]\ -I [realigned.bam] \ -knownSites [.vcf] \ –knowSites [.vcf] \ -o [output_file] #ReCalibration $ java -jar GenomeAnalysisTK.jar \ -T PrintReads \ -R [genome.fasta] \ -I [original.bam] \ -BQSR [recal_searching.table] \ -o [output.bam]
-
Variation Calling
-
UnifiedGenotyper ( GATK )
$ java -jar GenomeAnalysisTK.jar \ -T UnifiedGenotyper \ -R [reference] \ -I [input.bam] \ -o [output.vcf] \ -stand_call_conf 30 \ -stand_emit_conf 10
-
HaplotypeCaller ( GATK )
$ java -jar GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R [reference] \ -I [input.bam] \ -o [output.vcf] \ -stand_call_conf 30 \ -stand_emit_conf 10 \ -minPruning 3
-
Variant Recalibration ( GATK )
# Recalibration training $java -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ –R [human_refernece] \ –input CEUTrio.HiSeq.WGS.b37.bestPractices.b37.chr20.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf \ -an DP -an QD -an FS \ -mode [SNP|inde] \ -recalFile [recalibration_file] \ -tranchesFile [reacal.tranches] \ -rscriptFile [R_recal.plots.R] # Recalibration apply for VQSR $java -jar /DATA/1.src/bin/GenomeAnalysisTK.jar \ -T ApplyRecalibration \ -R [human_refernece] \ -input [input.vcf] \ -mode SNP \ -recalFile [recalibration_file] \ -tranchesFile [reacal.tranches] \ -ts_filter_level [99.0] \ -o [output.vcf]
-
Genotype annotation ( snpEff )
# databases search $java -jar snpEff.jar databases # databases download $java -jar snpEff.jar download -v [GRCh37.71|rice5] # datbases building $java -jar /DATA/1.src/snpEff/snpEff.jar \ -v -onlyProtein \ -i vcf \ -o gatk [database]\ [input.vcf] > [snpEff.output.vcf] # Annotation with GATK $java -jar /DATA/1.src/bin/GenomeAnalysisTK.jar \ -T VariantAnnotator \ -R [human_reference.fasta]\ -A SnpEff \ --variant [input.SNP.vcf] \ --snpEffFile [snpEff.vcf] \ -o [output.SNP.anno.vcf]
Suggested Pages #