Skip to content

MMSeq #
Find similar titles

MMSeqs2 (Many-Against-Many searching) #

Introduction: #

Another protein sequence search software, which is similar to BLAST, but 400 times faster than PSI-BLAST with same sensitivity. The MMseqs2 software search and cluster huge protein sequence sets in short time. It is an open source GPL-licensed software, which implemented in C++ and tested in Linux, Mac OS and Windows platforms and it support for multiple cores and servers.

Installation in Linux: #

tar xvzf mmseqs-static_sse41.tar.gz
export PATH=$(pwd)/mmseqs/bin/:$PATH

Minimal Execution steps: #

Here the search process is perform by index against index search. The sequence search conducted by following steps.

Step 1: Initially need to create the reference database.

mmseqs createdb <User Fasta File>  <Name of DB>  --dont-split-seq-by-len

Here we can use two type of fasta file (Nucleotide and Proteins), once you use the nucleotide fasta files you have to use (--dont-split-seq-by-len) for translated sequence search.

Step 2: Once you use the nucleotide fasta file you have to extract the translated orf from the source file.

mmseqs extractorfs <database> <orfs> --longest-orf --min-length <int> --max-length <int>

Step 3: translate the orfs to protein sequences.

mmseqs translatenucs <Orfs> <Orfs_AA>

Step 4: Index the both (query and target) DB.

mmseqs createindex <Name> tmp.

Step 5: search between query and target, here you can perform iterative search.

mmseqs search <query DB> <Target DB> <Result DB> tmp --num-iterations 2

Step 6: Convert in to the BLAST DB output format.

mmseqs convertalis <QueryDB> <Target DB> <ResultDB> <Output Name>

Reference: #

  1. Hauser, M.; Steinegger, M.; Söding, J., MMseqs software suite for fast and deep clustering and searching of large protein sequence sets. Bioinformatics 2016, 32, (9), 1323-1330.