Skip to content

MUSCLE #
Find similar titles

Structured data

Category
Software

대표적인 Multiple alignment software #

  1. MUSCLE은 대표적인 alignment tools로써 1초에 수백개의 서열을 alignment하는 clustalw에 비해 매우 빠르면서도 정확하다.
  2. Phylogenetic tree 구성을 위한 대량의 데이터를 활용하기에 매우 유용하다.
  3. 기존의 Alignment 정보에 새로운 서열을 추가하여 update 하거나 각기 구성되어있던 alignment profile간의 비교도 가능하다.

Download #

  1. Linux : http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz

  2. Mac OSX : http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86darwin64.tar.gz

  3. Windows/Cygwin : http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86cygwin32.exe

Alignment : Making a multiple sequence alignment(MSA) #

$muscle -in <seqs.fa> -fastaout <seqs.afa> -clwout [seqs.aln]

-clw    -clwout filename    CLUSTALW format. By default, will write 
                            MUSCLE as the program name in the file
                            header. If the -clwstrict option is 
                            specified, then the program name will be
                            written as "CLUSTAL W (1.81)". This is
                            useful if the output will be parsed by
                            scripts that check the program name.

-fasta  -fastaout filename  FASTA format (default).

‑html   -htmlout filename   HTML (web page) output. The alignment is
                            colored using a color scheme from Eric
                            Sonnhammer's Belvu editor.

‑phys   -physout filename   PHYLIP sequential format.

‑phyi   -phyiout filename   PHYLIP interleaved format.

‑msf    -msfout filename    MSF format, as used in the GCG package, is
                            requested by using the –msf option. As
                            with CLUSTALW format, this is easier for
                            people to read than FASTA. As of MUSCLE
                            3.52, the MSF format has been tweaked to
                            be more compatible with GCG. The
                            following differences remain.

                            (a) MUSCLE truncates labels at the first
                            white space or after 63 characters, which
                            ever comes first. The GCG package
                            apparently truncates after 10 characters.
                            If this is a problem for you, please let
                            me know and I'll add an option to
                            truncate after 10 in a future version.

                            (b) MUSCLE allows duplicate sequence
                            labels, while GCG forbids duplicates. If
                            you use the –stable option of muscle,
                            then the order of the input sequences is
                            preserved and sequences can be
                            unambiguously identified even if the
                            labels differ.

Tree making : Making a Neighbor-joining or UPGMA tree #

  1. Make a UPGMA tree from a multiple alignment:

    muscle -maketree -in seqs.afa -out seqs.phy
    
  2. Make a Neighbor-Joining tree from a multiple alignment:

    muscle -maketree -in seqs.afa -out seqs.phy -cluster neighborjoining
    
  3. Input file -in option. Must be in aligned FASTA format. : 앞서 진행한 alignment FASTA format을 이용

  4. Output file The tree is written in Newick format, which is supported by most phylogenetic analysis packages such as PHYLIP.

  5. UPGMA 과 Neighbor joining의 차이 Neighbor-joining trees는 일반적인 phylogenetic tree를 분석할 때 주로 이용되는 반면, UPGMA is 이보다 조금 빠르기 때문에 큰 데이터를 활용한 분석을 수행할 때 Neighbor-joining을 사용할 경우 너무 느려질 수 있는 부분을 보완할 수 있다.

이미 만들어진 MSA (alignment file)에 새로운 서열 추가하기 #

  muscle -profile -in1 existing_msa.afa -in2 new_seq.fa -out combined.afa

If you have more than one, you can align them first then add them, for example:

muscle -in new_seqs.fa -out new_seqs.afa

muscle -profile -in1 existing_aln.afa -in2 new_seqs.afa -out combined.afas

MUSCLE 알고리즘 #

Image

  • 1 단계: Draft progressive
    첫 번째 단계에서는 속도와 정확도를 높이는 측면으로 draft alignment를 수행한다.
    1.1 kmer counting을 통해 kmer distance matrix를 생성한다. (D1 생성)
    1.2 UPGMA를 사용하여 matrix D1을 tree 형태로 변환한다. (TREE1 생성)
    1.3 progressive alignment가 TREE1의 분지순서에 따라 수행된다. (MSA1 수행)

  • 2 단계: Improved progressive
    이 단계에서는 tree를 Kimura distance방법으로 재평가하여 좀 더 정확한 alignment를 수행한다.
    2.1 MSA1 으로부터 %ids를 계산하여 kimura distance를 이용하여 matrix를 생성한다. (D2 생성)
    2.2 UPGMA를 사용하여 matrix D2를 tree로 변환한다. (TREE2 생성)
    2.3 MSA2 가 수행 된다.

  • 3 단계: Iterative refinement
    3.1 edge가 TREE2에서 선택된다.
    3.2 TREE2 가 edge가 삭제하면서 두 개의 subtree로 나뉜다.
    3.3 두 개의 profile에서 re-align이 수행되면서 새로운 multiple alignment가 수행된다.
    3.4 SP score에 따라 새로운 alignment가 유지되거나 버려진다.

  • Step 3.1~3.4 단계는 사용자 정의 또는 한 지점으로 수렴될 때까지 반복된다.

두개의 MSA(alignment file)서로 비교하기 : Profile-profile alignment #

 muscle -profile -in1 one.afa -in2 two.afa -out both.afa
  • Profile-profile alignment is not for homolog recognition MUSCLE does not compute a similarity measure or measure of
    statistical significance (such as an E-value), so this option is not useful for discriminating homologs from unrelated sequences.

Refining an existing alignment #

  • Refinement is an attempt to improve an existing alignment. This can be done with the -refine option of MUSCLE, in which case the input is an existing MSA

      muscle -in msa.afa -out refined_msa.afa -refine
    

참고문헌 #

  1. MUSCLE home : http://www.drive5.com/muscle/

  2. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Res. 32(5):1792-1797 [http://www.ncbi.nlm.nih.gov/pubmed/15034147].

  3. Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity BMC Bioinformatics, (5) 113 [http://www.ncbi.nlm.nih.gov/pubmed/15318951].

Incoming Links #

Related Bioinformaticses #

Suggested Pages #

0.0.1_20140628_0