Skip to content

NLP in Peptide Functional Predcition #
Find similar titles

Source of peptides #

Entire life in the earth have tremendous logics to sustain (or) maintain their own biological systems. Particularly, the individual cell types, holding their specific molecular signature to operate the molecular machine differently/specifically to combat/adjust the multiple challenging environments. These are maintained by the various molecular machines, which are programed with the various signature specific logics and those were encoded into central dogma of cells. Current advances in molecular techniques and computation algorithms (which included in the field of bioinformatics), largely helping the researchers to extract the functional/diseases specific molecular signatures at various level of cellular functions. However, the peptides and its signatures are overlooked due to limitations in experimental and sequencing techniques. These gaps were mostly filled by the computation methods, particularly through implementation of machine learning algorithms. The small fragments of the proteins, which has around 100 amino acids are known as peptides and the peptides are present in all cell fluids in different concentrations, which involve in various cellular molecular functions. In principle, these peptides are generated from the large proteins through the post translational modifications events take place in cellular mechanism. Direct capture of peptides and quantification of these peptides are feasible due the advances in the protein sequencing technologies, such as mass-spectrometry and denovo sequencing.

There are huge number of peptides present in nature, with various scaffolds of 20 AA, which could observe by various type of cost and time effective sequencing. Recently, the advances in DNA sequencing technologies; i.e., two–fourth generation sequencing technologies hooked all the level of researcher to encode their desire organism in the tree of life with minimal cost. Example, the first human genome was encoded in 13 years with $300 million funds from worldwide. But, currently, the advancement in the sequencing technologies make it for $1000/genome, and sequenced in a week . This huge reduction in cost and time, promoting various large scale genome projects in human population, plants, insects, birds and animals to understand the evolutionary history and utilize therapeutic values to enhance the life. However, while compare to DNA sequencing techniques the proteins sequencing is far behind due to its complexity in chemical nature, since the DNA has only four nucleotides and the protein/peptide has 20AA. Although, there is difficulties, the technologies shared from the DNA sequencing technologies, hooked the protein sequencing to single-molecule sequencing from single cell. These sequencing efforts lead to organism specific peptide database and to study the immune system and other diseases. Example, the immune-peptidome consortium establishing the SystemMHC Atlas ( database to hinder the detail functioning of the human immune system.

However, the sequencing platforms were progressing to capture the peptides and improve the throughputs, the another existing problem is the functional annotation of peptides. These are feasible through the machine learning models develop recently for predicting various bio-activities of peptides. Example, SystemMHC Atlas data sets were effectively utilized to develop the immune epitope binding peptides predictions with high confidence ( Similarly, most of the machine learning models were continuously developing and optimizing the bio-activities of peptides, such as anti-cancer, anti-diabetic, anti-hypertension, anti-microbial and with other properties. In this review we are focusing, how the proteins converted to peptides and how those peptide bioactivities were predicted and other functional properties to utilize those peptides for therapeutic applications, with the machine learning models. Particularly focusing on the post translational modifications (PTMs), cell-penetrating peptides (CPP), anti-microbial (AMP), anti-cancer, toxins and allergens properties of the peptides in last five years.

Points: #

Immune system and peptides: Understand the physiological process of Immune system (Adaptive and innate) immune systems. Host defense peptides (Cytotoxic mechanism – Immunomodulatory activities-suppressing pro-inflammatory cytokines- also disturb the adaptive immune system-used these peptide as a vaccine adjuvants) /antibiotic peptides/ antimicrobial peptides.