Chemical Checker #
Chemical checker (CC) provides processed harmonized and integrated bioactivity data on ~800,000 small molecules mainly focused on human pharmacology datasets. CC categorized into 5 categories and each category were further sub-categorized into five subcategories as shown in Table 1.
Whilst, CC is a resource that expands the similarity principal along the drug discovery pipeline from in vitro assays to clinical observations by treating bioactivity data within a unified analytical framework. Evidently, fewer molecules are available as we advance along the CC levels from A to E: chemical information (A) is always available (778,460 molecules), whereas clinical data (E) are scarce (9,165 molecules, including 4,232 drugs).
Table 1. Categories & Sub-categories of Chemical checker
Space | Name | Description | Source(S) |
---|---|---|---|
A1 | 2D fingerprints | Binary representation of the 2D structure of a molecule. The neighborhood of every atom is encoded using circular topology hashing. | RDKIT |
A2 | 3D fingerprints | Similar to A1, the 3D structures of the three best conformers after energy minimization are hashed into a binary representation without the need for structural alignment. | E3FP |
A3 | Scaffolds | Largest molecular scaffold (usually a ring system) remaining after applying Murcko's pruning rules. In addition, we keep the corresponding framework, i.e. a version of the scaffold where all atoms are carbons and all bonds are single. The scaffold and the framework are encoded with path-based fingerprints, suitable for capturing substructures in similarity searches. | RDKIT |
A4 | Structural keys | 166 functional groups and substructures widely accepted by medicinal chemists (MACCS keys). | RDKIT |
A5 | Physicochemical parameters | Physicochemical parameters such as molecular weight, logP and refractivity. Number of hydrogen-bond donors and acceptors, rings, etc. Drug-likeness measurements e.g. number of structural alerts, Lipinski's rule-of-5 violations or chemical beauty (QED). | RDKIT and Silico-IT |
B1 | Mechanisms of action | Drug targets with known pharmacological action and modes (agonist, antagonist, etc.). | DrugBank and ChEMBL |
B2 | Metabolic genes | Drug-metabolizing enzymes, transporters and carriers. | DrugBank and ChEMBL |
B3 | Crystals | Small molecules co-crystallized with protein chains. Data are organized on the basis of the structural families of the protein chains. | PDB and ECOD |
B4 | Binding | Compound-protein binding data available in major public chemogenomics databases. Data come mainly from academic publications and patents. Binding affinities below a class-specific threshold are favored (kinases ≤ 30 nM, GPCRs ≤ 100 nM, nuclear receptors ≤ 100 nM, ion channels ≤ 10 µM and others ≤ 1 µM), and activities at most one order of magnitude higher are kept (capped at 10 µM). | ChEMBL and BindingDB |
B5 | HTS bioassays | Hits from screening campaigns against protein targets (mainly confirmatory functional assays below 10 µM). | PubChem Bioassays (from ChEMBL) |
C1 | Small-molecule roles | Ontology terms associated with small molecules that have recognized biological roles, such as known drugs, metabolites and other natural products. | ChEBI |
C2 | Small molecule pathways | Curated reconstruction of human metabolism, containing metabolites and reactions. Data are represented as a network where nodes are metabolites and edges connect substrates and products of reactions. | Recon |
C3 | Signaling pathways | Canonical pathways related to known receptors of compounds (as recorded in B4). Pathways are assigned via a guilt-by-association approach, i.e. a molecule is related to a pathway when at least one of the targets is a member of it. | Reactome |
C4 | Biological processes | Similar to C3, biological processes from the gene ontology are associated with compounds via a guilt-by- association approach from B4 data. All parent terms are kept, from the 'leaves' of the ontology to its 'root'. | Gene Ontology |
C5 | Interactome | Neighborhoods of B4 targets are collected by inspecting several protein-protein interaction networks. A random- walk algorithm is used to obtain a robust measure of 'proximity' in the network. | STRING, InWeb and Pathway Commons, among others |
D1 | Gene expression | Transcriptional response of cell lines upon exposure to small molecules. A reference collection of gene expression profiles is used to map all compound profiles using a two-sided gene set enrichment analysis. | L1000 Connectivity Map (Touchstone reference) |
D2 | Cancer cell lines | Small-molecule sensitivity data (GI50) of a panel of 60 cancer cell lines. | NCI-60 |
D3 | Chemical genetics | Growth inhibition profiles in a panel of ~300 yeast mutants. Data are combined with yeast genetic interaction data, so that compounds can be assimilated to genetic alterations when they have similar profiles. | MOSAIC |
D4 | Morphology | Changes in U-2 OS cell morphology measured after compound treatment using a multiplexed-cytological 'cell painting' assay. 812 morphology features are recorded via automated microscopy and image analysis. | LINCS Portal |
D5 | Cell bioassays | Small-molecule cell bioassays reported in ChEMBL. Mainly, growth and proliferation measurements found in the literature. | ChEMBL |
E1 | Therapeutic areas | Anatomical Therapeutic Chemical (ATC) classification codes of drugs. All ATC levels are considered. | DrugBank and KEGG |
E2 | Indications | Indications of approved drugs and drugs in clinical trials. A controlled medical vocabulary is used. | DrugBank and ChEMBL |
E3 | Side effects | Side effects extracted from drug package inserts via text-mining techniques. | SIDER |
E4 | Diseases and toxicology | Manually curated relationships between chemicals and diseases. Chemicals include drug molecules and environmental substances, among others. | CTD |
E5 | Drug-drug interactions | Changes in the effect of a drug when is co-administered with a second drug. Data are related to pharmacokinetic issues and/or adverse events. | DrugBank |
Data Availability: #
Reference: #
- Duran-Frigola, M., Pauls, E., Guitart-Pla, O., Bertoni, M., Alcalde, V., Amat, D., Juan-Blanco, T., and Aloy, P. (2020). Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker. Nature Biotechnology 38, 1087-1096.