GSNAP
#
Find similar titles
- 최초 작성자
-
최근 업데이트
dskyoung-intern@insilicogen.com
Structured data
- Category
- Analysis
Table of Contents
개요 #
GSNAP(Genomic Short-read Nucleotide Alignment Program)은 Illumina/Solexa 또는 ABI/SOLiD와 플렛폼에서 얻은 짧은 single-, paired-end reads를 모두 읽어 alignment 하기 위한 알고리즘이다. Command line으로 명령어를 통해 이용할 수 있는 알고리즘으로써 옵션은 아래와 같다.
Option #
Input options #
-D, --dir=directory
Genome directory
-d, --db=STRING
Genome database
-q, --part=INT/INT
Process only the i-th out of every n sequences e.g., 0/100 or
99/100 (useful for distributing jobs to a computer farm).
--input-buffer=INT
Size of input buffer (program reads this many sequences at a
time for efficiency) (default 1000)
--barcode-length=INT
Amount of barcode to remove from start of read (default 0)
--pc-linefeeds
Strip PC line feeds (ASCII 13) from input
-o, --orientation=STRING
Orientation of paired-end reads Allowed values: FR (fwd-rev, or
typical Illumina; default), FR (rev-fwd, for circularized
inserts), or FF (fwd-fwd, same strand)
Computation options #
-B, --batch=INT
Mode Offsets Positions Genome
0 allocate mmap mmap
1 allocate mmap & preload mmap
2 allocate mmap & preload mmap & preload (default)
3 allocate allocate mmap & preload
4 allocate allocate allocate
Note: For a single sequence, all data structures use mmap. If
mmap not available and allocate not chosen, then will use fileio
(slow)
-m, --max-mismatches=FLOAT
Maximum number of mismatches allowed (if not specified, then
defaults to the ultrafast level of ((readlength+2)/12 - 2)) If
specified between 0.0 and 1.0, then treated as a fraction of
each read length. Otherwise, treated as an integral number of
mismatches (including indel and splicing penalties) For RNA-Seq,
you may need to increase this value slightly to align reads
extending past the ends of an exon.
--terminal-penalty=INT
Penalty for a terminal alignment (alignment from one end of the
read to the best possible position at the other end) (default 1)
-i, --indel-penalty=INT
Penalty for an indel (default 1). Counts against mismatches
allowed. To find indels, make indel-penalty less than or equal
to max-mismatches For 2-base reads, need to set indel-penalty
somewhat high
-I, --indel-endlength=INT
Minimum length at end required for indel alignments (default 3)
-y, --max-middle-insertions=INT
Maximum number of middle insertions allowed (default 9)
-z, --max-middle-deletions=INT
Maximum number of middle deletions allowed (default 30)
-Y, --max-end-insertions=INT
Maximum number of end insertions allowed (default 3)
-Z, --max-end-deletions=INT
Maximum number of end deletions allowed (default 6)
-M, --suboptimal-levels=INT
Report suboptimal hits beyond best hit (default 0) All hits with
best score plus suboptimal-levels are reported
-R, --masking=INT
Masking of frequent/repetitive oligomers to avoid spending time
on non-unique or repetitive reads
0 = no masking (will try to find non-unique or repetitive
matches)
1 = mask frequent oligomers
2 = mask frequent and repetitive oligomers (fastest) (default)
3 = greedy frequent: mask frequent oligomers first, then try no
masking if alignments not found
4 = greedy repetitive: mask frequent and repetitive oligomers
first, then try no masking if alignments not found
-a, --adapter-strip=STRING
Method for removing adapters from reads. Currently allowed
values: paired
--trim-mismatch-score=INT
Score to use for mismatches when trimming at ends (default is
-3; to turn off trimming, specify 0)
-V, --snpsdir=STRING
Directory for SNPs index files (created using snpindex) (default
is location of genome index files specified using -D and -d)
-v, --use-snps=STRING
Use database containing known SNPs (in <STRING>.iit, built
previously using snpindex) for tolerance to SNPs
-C, --cmetdir=STRING
Directory for methylcytosine index files (created using
cmetindex) default is location of genome index files specified
using -D, -V, and -d)
-c, --cmet
Use database for methylcytosine experiments, built previously
using cmetindex)
-t, --nthreads=INT
Number of worker threads
Splicing options for RNA-Seq #
-s, --splicesites=STRING
Look for splicing involving known splice sites (in
<STRING>.iit), at short or long distances
-S, --splicetrie-precompute=INT
Pre-compute splicetrie for all known splice sites (0=no, 1=yes
(default)). Requires --splicesites flag and multiple sequence
input.
-N, --novelsplicing=INT
Look for novel splicing, not in known splice sites (if -s
provided)
--novel-doublesplices
Allow GSNAP to look for two splices in a single-end involving
novel splice sites (default is not to allow this). Caution: this
option can slow down the program considerably. A better way to
detect double splices is with known splice sites, using the
--splicesites option.
-w, --localsplicedist=INT
Definition of local novel splicing event (default 200000)
-w, --localsplicedist=INT
Definition of local novel splicing event (default 200000)
-e, --local-splice-penalty=INT
Penalty for a local splice (default 0). Counts against
mismatches allowed
-E, --distant-splice-penalty=INT
Penalty for a distant splice (default 3). Counts against
mismatches allowed
-k, --local-splice-endlength=INT
Minimum length at end required for local spliced alignments
(default 15, min is 14)
-K, --distant-splice-endlength=INT
Minimum length at end required for distant spliced alignments
(default 16, min is 14)
-l, --shortend-splice-endlength=INT
Minimum length at end required for short-end spliced alignments
(default 2)
--distant-splice-identity=FLOAT
Minimum identity at end required for distant spliced alignments
(default 0.95)
Options for paired-end reads #
--pairmax-dna=INT
Max total genomic length for paired reads (default 1000). Should
increase for RNA-Seq reads.
--pairmax-rna=INT
Max total genomic length for RNA-Seq paired reads, or other
reads that could have a splice (default 200000). Used if -N or
-s is specified. Should probably match the value for -w,
--localsplicedist.
--pairexpect=INT
Expected paired-end length (default 200)
--pairdev=INT
Allowable deviation from expected paired-end length, used for
discriminating between alternative alignments (default 50)
Options for quality scores #
--quality-protocol=STRING
Protocol for input quality scores. Allowed values:
illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
sanger (ASCII 33-126) (equivalent to -J 33 -j 0)
Default is sanger (no quality print shift) SAM output files
should have quality scores in sanger protocol
Or you can customize this behavior with these flags:
-J, --quality-zero-score=INT
FASTQ quality scores are zero at this ASCII value (default is 33
for sanger protocol; for Illumina, select 64)
-j, --quality-print-shift=INT
Shift FASTQ quality scores by this amount in output (default is
0 for sanger protocol; to change Illumina input to Sanger
output, select -31)
Output options #
-n, --npaths=INT
Maximum number of paths to print (default 100).
-Q, --quiet-if-excessive
If more than maximum number of paths are found, then nothing is
printed.
-O, --ordered
Print output in same order as input (relevant only if there is
more than one worker thread)
--show-refdiff
For GSNAP output in SNP-tolerant alignment, shows all
differences relative to the reference genome as lower case
(otherwise, it shows all differences relative to both the
reference and alternate genome)
--print-snps
Print detailed information about SNPs in reads (works only if -v
also selected) (not fully implemented yet)
--failsonly
Print only failed alignments, those with no results
--nofails
Exclude printing of failed alignments
--fails-as-input=STRING
Print completely failed alignments as input FASTA or FASTQ
format Allowed values: yes, no
-A, --format=STRING
Another format type, other than default. Currently implemented:
sam Also allowed, but not installed at compile-time: goby (To
install, need to re-compile with appropriate options)
Options for SAM output #
--no-sam-headers
Do not print headers beginning with '@'
--sam-headers-batch=INT
Print headers only for this batch, as specified by -q
--read-group-id=STRING
Value to put into read-group id (RG-ID) field
--read-group-name=STRING
Value to put into read-group name (RG-SM) field
Help options #
--version
Show version
--help Show this help message
Reference #
Incoming Links #
Related Bioinformaticses (Bioinformatics 0) #
Suggested Pages #
- 0.384 GATK
- 0.153 차등발현 유전자 발현량 계산방법 (책 발간용 글 편집 및 수정)
- 0.116 RNAMiner
- 0.070 차등발현 유전자 발현량 계산방법
- 0.066 Transcriptome 개요
- 0.062 GBS read mapping/BWA
- 0.031 JELLYFISH
- 0.026 Genevestigator
- 0.020 MAPRseq
- 0.014 전사체
- More suggestions...