Skip to content

GSNAP #
Find similar titles

Structured data

Category
Analysis

개요 #

GSNAP(Genomic Short-read Nucleotide Alignment Program)은 Illumina/Solexa 또는 ABI/SOLiD와 플렛폼에서 얻은 짧은 single-, paired-end reads를 모두 읽어 alignment 하기 위한 알고리즘이다. Command line으로 명령어를 통해 이용할 수 있는 알고리즘으로써 옵션은 아래와 같다.

Option #

Input options #

   -D, --dir=directory
          Genome directory

   -d, --db=STRING
          Genome database

   -q, --part=INT/INT
          Process only the i-th out of every n sequences  e.g.,  0/100  or
          99/100 (useful for distributing jobs to a computer farm).

   --input-buffer=INT
          Size  of  input  buffer  (program reads this many sequences at a
          time for efficiency) (default 1000)

   --barcode-length=INT
          Amount of barcode to remove from start of read (default 0)

   --pc-linefeeds
          Strip PC line feeds (ASCII 13) from input

   -o, --orientation=STRING
          Orientation of paired-end reads Allowed values: FR (fwd-rev,  or
          typical   Illumina;  default),  FR  (rev-fwd,  for  circularized
          inserts), or FF (fwd-fwd, same strand)

Computation options #

   -B, --batch=INT
           Mode     Offsets       Positions       Genome
             0      allocate      mmap            mmap
             1      allocate      mmap & preload  mmap
             2      allocate      mmap & preload  mmap & preload (default)
             3      allocate      allocate        mmap & preload
             4      allocate      allocate        allocate

          Note: For a single sequence, all data structures use  mmap.   If
          mmap not available and allocate not chosen, then will use fileio
          (slow)

   -m, --max-mismatches=FLOAT
          Maximum number of mismatches allowed  (if  not  specified,  then
          defaults  to  the ultrafast level of ((readlength+2)/12 - 2)) If
          specified between 0.0 and 1.0, then treated  as  a  fraction  of
          each  read  length.  Otherwise, treated as an integral number of
          mismatches (including indel and splicing penalties) For RNA-Seq,
          you  may  need  to  increase  this value slightly to align reads
          extending past the ends of an exon.

   --terminal-penalty=INT
          Penalty for a terminal alignment (alignment from one end of  the
          read to the best possible position at the other end) (default 1)

   -i, --indel-penalty=INT
          Penalty  for  an  indel  (default 1).  Counts against mismatches
          allowed. To find indels, make indel-penalty less than  or  equal
          to  max-mismatches  For  2-base reads, need to set indel-penalty
          somewhat high

   -I, --indel-endlength=INT
          Minimum length at end required for indel alignments (default 3)

   -y, --max-middle-insertions=INT
          Maximum number of middle insertions allowed (default 9)

   -z, --max-middle-deletions=INT
          Maximum number of middle deletions allowed (default 30)

   -Y, --max-end-insertions=INT
          Maximum number of end insertions allowed (default 3)

   -Z, --max-end-deletions=INT
          Maximum number of end deletions allowed (default 6)

   -M, --suboptimal-levels=INT
          Report suboptimal hits beyond best hit (default 0) All hits with
          best score plus suboptimal-levels are reported

   -R, --masking=INT
          Masking  of frequent/repetitive oligomers to avoid spending time
          on non-unique or repetitive reads
           0 = no masking (will  try  to  find  non-unique  or  repetitive
          matches)
           1 = mask frequent oligomers
           2 = mask frequent and repetitive oligomers (fastest) (default)
           3 = greedy frequent: mask frequent oligomers first, then try no
          masking if alignments not found
           4 = greedy repetitive: mask frequent and  repetitive  oligomers
          first, then try no masking if alignments not found

   -a, --adapter-strip=STRING
          Method  for  removing  adapters  from  reads.  Currently allowed
          values: paired

   --trim-mismatch-score=INT
          Score to use for mismatches when trimming at  ends  (default  is
          -3; to turn off trimming, specify 0)

   -V, --snpsdir=STRING
          Directory for SNPs index files (created using snpindex) (default
          is location of genome index files specified using -D and -d)

   -v, --use-snps=STRING
          Use database  containing  known  SNPs  (in  <STRING>.iit,  built
          previously using snpindex) for tolerance to SNPs

   -C, --cmetdir=STRING
          Directory   for   methylcytosine   index  files  (created  using
          cmetindex) default is location of genome index  files  specified
          using -D, -V, and -d)

   -c, --cmet
          Use  database  for  methylcytosine experiments, built previously
          using cmetindex)

   -t, --nthreads=INT
          Number of worker threads

Splicing options for RNA-Seq #

   -s, --splicesites=STRING
          Look   for   splicing   involving   known   splice   sites   (in
          <STRING>.iit), at short or long distances

   -S, --splicetrie-precompute=INT
          Pre-compute  splicetrie  for all known splice sites (0=no, 1=yes
          (default)). Requires --splicesites flag  and  multiple  sequence
          input.

   -N, --novelsplicing=INT
          Look  for  novel  splicing,  not  in  known  splice sites (if -s
          provided)

   --novel-doublesplices
          Allow GSNAP to look for two splices in  a  single-end  involving
          novel splice sites (default is not to allow this). Caution: this
          option can slow down the program considerably. A better  way  to
          detect  double  splices  is  with  known splice sites, using the
          --splicesites option.

   -w, --localsplicedist=INT
          Definition of local novel splicing event (default 200000)

   -w, --localsplicedist=INT
          Definition of local novel splicing event (default 200000)

   -e, --local-splice-penalty=INT
          Penalty  for  a  local  splice  (default  0).   Counts   against
          mismatches allowed

   -E, --distant-splice-penalty=INT
          Penalty  for  a  distant  splice  (default  3).   Counts against
          mismatches allowed

   -k, --local-splice-endlength=INT
          Minimum length at end  required  for  local  spliced  alignments
          (default 15, min is 14)

   -K, --distant-splice-endlength=INT
          Minimum  length  at  end required for distant spliced alignments
          (default 16, min is 14)

   -l, --shortend-splice-endlength=INT
          Minimum length at end required for short-end spliced  alignments
          (default 2)

   --distant-splice-identity=FLOAT
          Minimum  identity at end required for distant spliced alignments
          (default 0.95)

Options for paired-end reads #

   --pairmax-dna=INT
          Max total genomic length for paired reads (default 1000). Should
          increase for RNA-Seq reads.

   --pairmax-rna=INT
          Max  total  genomic  length  for  RNA-Seq paired reads, or other
          reads that could have a splice (default 200000). Used if  -N  or
          -s  is  specified.   Should  probably  match  the  value for -w,
          --localsplicedist.

   --pairexpect=INT
          Expected paired-end length (default 200)

   --pairdev=INT
          Allowable deviation from expected paired-end  length,  used  for
          discriminating between alternative alignments (default 50)

Options for quality scores #

   --quality-protocol=STRING
          Protocol for input quality scores.  Allowed values:

           illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
           sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

          Default  is  sanger  (no  quality  print shift) SAM output files
          should have quality scores in sanger protocol

          Or you can customize this behavior with these flags:

   -J, --quality-zero-score=INT
          FASTQ quality scores are zero at this ASCII value (default is 33
          for sanger protocol; for Illumina, select 64)

   -j, --quality-print-shift=INT
          Shift  FASTQ quality scores by this amount in output (default is
          0 for sanger  protocol;  to  change  Illumina  input  to  Sanger
          output, select -31)

Output options #

   -n, --npaths=INT
          Maximum number of paths to print (default 100).

   -Q, --quiet-if-excessive
          If  more than maximum number of paths are found, then nothing is
          printed.

   -O, --ordered
          Print output in same order as input (relevant only if  there  is
          more than one worker thread)

   --show-refdiff
          For   GSNAP   output   in   SNP-tolerant  alignment,  shows  all
          differences relative to  the  reference  genome  as  lower  case
          (otherwise,  it  shows  all  differences  relative  to  both the
          reference and alternate genome)

   --print-snps
          Print detailed information about SNPs in reads (works only if -v
          also selected) (not fully implemented yet)

   --failsonly
          Print only failed alignments, those with no results

   --nofails
          Exclude printing of failed alignments

   --fails-as-input=STRING
          Print  completely  failed  alignments  as  input  FASTA or FASTQ
          format Allowed values: yes, no

   -A, --format=STRING
          Another format type, other than default.  Currently implemented:
          sam  Also  allowed,  but not installed at compile-time: goby (To
          install, need to re-compile with appropriate options)

Options for SAM output #

   --no-sam-headers
          Do not print headers beginning with '@'

   --sam-headers-batch=INT
          Print headers only for this batch, as specified by -q

   --read-group-id=STRING
          Value to put into read-group id (RG-ID) field

   --read-group-name=STRING
          Value to put into read-group name (RG-SM) field

Help options #

   --version
          Show version

   --help Show this help message

Reference #

Incoming Links #

Related Bioinformaticses #

Suggested Pages #

0.0.1_20140628_0