Table of Contents
MMSeqs2 (Many-Against-Many searching) #
Introduction: #
Another protein sequence search software, which is similar to BLAST, but 400 times faster than PSI-BLAST with same sensitivity. The MMseqs2 software search and cluster huge protein sequence sets in short time. It is an open source GPL-licensed software, which implemented in C++ and tested in Linux, Mac OS and Windows platforms and it support for multiple cores and servers.
Installation in Linux: #
wget https://mmseqs.com/latest/mmseqs-static_sse41.tar.gz
tar xvzf mmseqs-static_sse41.tar.gz
export PATH=$(pwd)/mmseqs/bin/:$PATH
Minimal Execution steps: #
Here the search process is perform by index against index search. The sequence search conducted by following steps.
Step 1: Initially need to create the reference database.
mmseqs createdb <User Fasta File> <Name of DB> --dont-split-seq-by-len
Here we can use two type of fasta file (Nucleotide and Proteins), once you use the nucleotide fasta files you have to use (--dont-split-seq-by-len) for translated sequence search.
Step 2: Once you use the nucleotide fasta file you have to extract the translated orf from the source file.
mmseqs extractorfs <database> <orfs> --longest-orf --min-length <int> --max-length <int>
Step 3: translate the orfs to protein sequences.
mmseqs translatenucs <Orfs> <Orfs_AA>
Step 4: Index the both (query and target) DB.
mmseqs createindex <Name> tmp.
Step 5: search between query and target, here you can perform iterative search.
mmseqs search <query DB> <Target DB> <Result DB> tmp --num-iterations 2
Step 6: Convert in to the BLAST DB output format.
mmseqs convertalis <QueryDB> <Target DB> <ResultDB> <Output Name>
Reference: #
- Hauser, M.; Steinegger, M.; Söding, J., MMseqs software suite for fast and deep clustering and searching of large protein sequence sets. Bioinformatics 2016, 32, (9), 1323-1330.
- https://github.com/soedinglab/mmseqs2/wiki