Skip to content

Find similar titles

Structured data


개요 #

GSNAP(Genomic Short-read Nucleotide Alignment Program)은 Illumina/Solexa 또는 ABI/SOLiD와 플렛폼에서 얻은 짧은 single-, paired-end reads를 모두 읽어 alignment 하기 위한 알고리즘이다. Command line으로 명령어를 통해 이용할 수 있는 알고리즘으로써 옵션은 아래와 같다.

Option #

Input options #

   -D, --dir=directory
          Genome directory

   -d, --db=STRING
          Genome database

   -q, --part=INT/INT
          Process only the i-th out of every n sequences  e.g.,  0/100  or
          99/100 (useful for distributing jobs to a computer farm).

          Size  of  input  buffer  (program reads this many sequences at a
          time for efficiency) (default 1000)

          Amount of barcode to remove from start of read (default 0)

          Strip PC line feeds (ASCII 13) from input

   -o, --orientation=STRING
          Orientation of paired-end reads Allowed values: FR (fwd-rev,  or
          typical   Illumina;  default),  FR  (rev-fwd,  for  circularized
          inserts), or FF (fwd-fwd, same strand)

Computation options #

   -B, --batch=INT
           Mode     Offsets       Positions       Genome
             0      allocate      mmap            mmap
             1      allocate      mmap & preload  mmap
             2      allocate      mmap & preload  mmap & preload (default)
             3      allocate      allocate        mmap & preload
             4      allocate      allocate        allocate

          Note: For a single sequence, all data structures use  mmap.   If
          mmap not available and allocate not chosen, then will use fileio

   -m, --max-mismatches=FLOAT
          Maximum number of mismatches allowed  (if  not  specified,  then
          defaults  to  the ultrafast level of ((readlength+2)/12 - 2)) If
          specified between 0.0 and 1.0, then treated  as  a  fraction  of
          each  read  length.  Otherwise, treated as an integral number of
          mismatches (including indel and splicing penalties) For RNA-Seq,
          you  may  need  to  increase  this value slightly to align reads
          extending past the ends of an exon.

          Penalty for a terminal alignment (alignment from one end of  the
          read to the best possible position at the other end) (default 1)

   -i, --indel-penalty=INT
          Penalty  for  an  indel  (default 1).  Counts against mismatches
          allowed. To find indels, make indel-penalty less than  or  equal
          to  max-mismatches  For  2-base reads, need to set indel-penalty
          somewhat high

   -I, --indel-endlength=INT
          Minimum length at end required for indel alignments (default 3)

   -y, --max-middle-insertions=INT
          Maximum number of middle insertions allowed (default 9)

   -z, --max-middle-deletions=INT
          Maximum number of middle deletions allowed (default 30)

   -Y, --max-end-insertions=INT
          Maximum number of end insertions allowed (default 3)

   -Z, --max-end-deletions=INT
          Maximum number of end deletions allowed (default 6)

   -M, --suboptimal-levels=INT
          Report suboptimal hits beyond best hit (default 0) All hits with
          best score plus suboptimal-levels are reported

   -R, --masking=INT
          Masking  of frequent/repetitive oligomers to avoid spending time
          on non-unique or repetitive reads
           0 = no masking (will  try  to  find  non-unique  or  repetitive
           1 = mask frequent oligomers
           2 = mask frequent and repetitive oligomers (fastest) (default)
           3 = greedy frequent: mask frequent oligomers first, then try no
          masking if alignments not found
           4 = greedy repetitive: mask frequent and  repetitive  oligomers
          first, then try no masking if alignments not found

   -a, --adapter-strip=STRING
          Method  for  removing  adapters  from  reads.  Currently allowed
          values: paired

          Score to use for mismatches when trimming at  ends  (default  is
          -3; to turn off trimming, specify 0)

   -V, --snpsdir=STRING
          Directory for SNPs index files (created using snpindex) (default
          is location of genome index files specified using -D and -d)

   -v, --use-snps=STRING
          Use database  containing  known  SNPs  (in  <STRING>.iit,  built
          previously using snpindex) for tolerance to SNPs

   -C, --cmetdir=STRING
          Directory   for   methylcytosine   index  files  (created  using
          cmetindex) default is location of genome index  files  specified
          using -D, -V, and -d)

   -c, --cmet
          Use  database  for  methylcytosine experiments, built previously
          using cmetindex)

   -t, --nthreads=INT
          Number of worker threads

Splicing options for RNA-Seq #

   -s, --splicesites=STRING
          Look   for   splicing   involving   known   splice   sites   (in
          <STRING>.iit), at short or long distances

   -S, --splicetrie-precompute=INT
          Pre-compute  splicetrie  for all known splice sites (0=no, 1=yes
          (default)). Requires --splicesites flag  and  multiple  sequence

   -N, --novelsplicing=INT
          Look  for  novel  splicing,  not  in  known  splice sites (if -s

          Allow GSNAP to look for two splices in  a  single-end  involving
          novel splice sites (default is not to allow this). Caution: this
          option can slow down the program considerably. A better  way  to
          detect  double  splices  is  with  known splice sites, using the
          --splicesites option.

   -w, --localsplicedist=INT
          Definition of local novel splicing event (default 200000)

   -w, --localsplicedist=INT
          Definition of local novel splicing event (default 200000)

   -e, --local-splice-penalty=INT
          Penalty  for  a  local  splice  (default  0).   Counts   against
          mismatches allowed

   -E, --distant-splice-penalty=INT
          Penalty  for  a  distant  splice  (default  3).   Counts against
          mismatches allowed

   -k, --local-splice-endlength=INT
          Minimum length at end  required  for  local  spliced  alignments
          (default 15, min is 14)

   -K, --distant-splice-endlength=INT
          Minimum  length  at  end required for distant spliced alignments
          (default 16, min is 14)

   -l, --shortend-splice-endlength=INT
          Minimum length at end required for short-end spliced  alignments
          (default 2)

          Minimum  identity at end required for distant spliced alignments
          (default 0.95)

Options for paired-end reads #

          Max total genomic length for paired reads (default 1000). Should
          increase for RNA-Seq reads.

          Max  total  genomic  length  for  RNA-Seq paired reads, or other
          reads that could have a splice (default 200000). Used if  -N  or
          -s  is  specified.   Should  probably  match  the  value for -w,

          Expected paired-end length (default 200)

          Allowable deviation from expected paired-end  length,  used  for
          discriminating between alternative alignments (default 50)

Options for quality scores #

          Protocol for input quality scores.  Allowed values:

           illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
           sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

          Default  is  sanger  (no  quality  print shift) SAM output files
          should have quality scores in sanger protocol

          Or you can customize this behavior with these flags:

   -J, --quality-zero-score=INT
          FASTQ quality scores are zero at this ASCII value (default is 33
          for sanger protocol; for Illumina, select 64)

   -j, --quality-print-shift=INT
          Shift  FASTQ quality scores by this amount in output (default is
          0 for sanger  protocol;  to  change  Illumina  input  to  Sanger
          output, select -31)

Output options #

   -n, --npaths=INT
          Maximum number of paths to print (default 100).

   -Q, --quiet-if-excessive
          If  more than maximum number of paths are found, then nothing is

   -O, --ordered
          Print output in same order as input (relevant only if  there  is
          more than one worker thread)

          For   GSNAP   output   in   SNP-tolerant  alignment,  shows  all
          differences relative to  the  reference  genome  as  lower  case
          (otherwise,  it  shows  all  differences  relative  to  both the
          reference and alternate genome)

          Print detailed information about SNPs in reads (works only if -v
          also selected) (not fully implemented yet)

          Print only failed alignments, those with no results

          Exclude printing of failed alignments

          Print  completely  failed  alignments  as  input  FASTA or FASTQ
          format Allowed values: yes, no

   -A, --format=STRING
          Another format type, other than default.  Currently implemented:
          sam  Also  allowed,  but not installed at compile-time: goby (To
          install, need to re-compile with appropriate options)

Options for SAM output #

          Do not print headers beginning with '@'

          Print headers only for this batch, as specified by -q

          Value to put into read-group id (RG-ID) field

          Value to put into read-group name (RG-SM) field

Help options #

          Show version

   --help Show this help message

Reference #

Incoming Links #

Related Bioinformaticses #

Suggested Pages #