Skip to content

Chemical Checker #
Find similar titles

Chemical Checker #

Chemical checker (CC) provides processed harmonized and integrated bioactivity data on ~800,000 small molecules mainly focused on human pharmacology datasets. CC categorized into 5 categories and each category were further sub-categorized into five subcategories as shown in Table 1.

Whilst, CC is a resource that expands the similarity principal along the drug discovery pipeline from in vitro assays to clinical observations by treating bioactivity data within a unified analytical framework. Evidently, fewer molecules are available as we advance along the CC levels from A to E: chemical information (A) is always available (778,460 molecules), whereas clinical data (E) are scarce (9,165 molecules, including 4,232 drugs).

Table 1. Categories & Sub-categories of Chemical checker

Space Name Description Source(S)
A1 2D fingerprints Binary representation of the 2D structure of a molecule. The neighborhood of every atom is encoded using circular topology hashing. RDKIT
A2 3D fingerprints Similar to A1, the 3D structures of the three best conformers after energy minimization are hashed into a binary representation without the need for structural alignment. E3FP
A3 Scaffolds Largest molecular scaffold (usually a ring system) remaining after applying Murcko's pruning rules. In addition, we keep the corresponding framework, i.e. a version of the scaffold where all atoms are carbons and all bonds are single. The scaffold and the framework are encoded with path-based fingerprints, suitable for capturing substructures in similarity searches. RDKIT
A4 Structural keys 166 functional groups and substructures widely accepted by medicinal chemists (MACCS keys). RDKIT
A5 Physicochemical parameters Physicochemical parameters such as molecular weight, logP and refractivity. Number of hydrogen-bond donors and acceptors, rings, etc. Drug-likeness measurements e.g. number of structural alerts, Lipinski's rule-of-5 violations or chemical beauty (QED). RDKIT and Silico-IT
B1 Mechanisms of action Drug targets with known pharmacological action and modes (agonist, antagonist, etc.). DrugBank and ChEMBL
B2 Metabolic genes Drug-metabolizing enzymes, transporters and carriers. DrugBank and ChEMBL
B3 Crystals Small molecules co-crystallized with protein chains. Data are organized on the basis of the structural families of the protein chains. PDB and ECOD
B4 Binding Compound-protein binding data available in major public chemogenomics databases. Data come mainly from academic publications and patents. Binding affinities below a class-specific threshold are favored (kinases ≤ 30 nM, GPCRs ≤ 100 nM, nuclear receptors ≤ 100 nM, ion channels ≤ 10 µM and others ≤ 1 µM), and activities at most one order of magnitude higher are kept (capped at 10 µM). ChEMBL and BindingDB
B5 HTS bioassays Hits from screening campaigns against protein targets (mainly confirmatory functional assays below 10 µM). PubChem Bioassays (from ChEMBL)
C1 Small-molecule roles Ontology terms associated with small molecules that have recognized biological roles, such as known drugs, metabolites and other natural products. ChEBI
C2 Small molecule pathways Curated reconstruction of human metabolism, containing metabolites and reactions. Data are represented as a network where nodes are metabolites and edges connect substrates and products of reactions. Recon
C3 Signaling pathways Canonical pathways related to known receptors of compounds (as recorded in B4). Pathways are assigned via a guilt-by-association approach, i.e. a molecule is related to a pathway when at least one of the targets is a member of it. Reactome
C4 Biological processes Similar to C3, biological processes from the gene ontology are associated with compounds via a guilt-by- association approach from B4 data. All parent terms are kept, from the 'leaves' of the ontology to its 'root'. Gene Ontology
C5 Interactome Neighborhoods of B4 targets are collected by inspecting several protein-protein interaction networks. A random- walk algorithm is used to obtain a robust measure of 'proximity' in the network. STRING, InWeb and Pathway Commons, among others
D1 Gene expression Transcriptional response of cell lines upon exposure to small molecules. A reference collection of gene expression profiles is used to map all compound profiles using a two-sided gene set enrichment analysis. L1000 Connectivity Map (Touchstone reference)
D2 Cancer cell lines Small-molecule sensitivity data (GI50) of a panel of 60 cancer cell lines. NCI-60
D3 Chemical genetics Growth inhibition profiles in a panel of ~300 yeast mutants. Data are combined with yeast genetic interaction data, so that compounds can be assimilated to genetic alterations when they have similar profiles. MOSAIC
D4 Morphology Changes in U-2 OS cell morphology measured after compound treatment using a multiplexed-cytological 'cell painting' assay. 812 morphology features are recorded via automated microscopy and image analysis. LINCS Portal
D5 Cell bioassays Small-molecule cell bioassays reported in ChEMBL. Mainly, growth and proliferation measurements found in the literature. ChEMBL
E1 Therapeutic areas Anatomical Therapeutic Chemical (ATC) classification codes of drugs. All ATC levels are considered. DrugBank and KEGG
E2 Indications Indications of approved drugs and drugs in clinical trials. A controlled medical vocabulary is used. DrugBank and ChEMBL
E3 Side effects Side effects extracted from drug package inserts via text-mining techniques. SIDER
E4 Diseases and toxicology Manually curated relationships between chemicals and diseases. Chemicals include drug molecules and environmental substances, among others. CTD
E5 Drug-drug interactions Changes in the effect of a drug when is co-administered with a second drug. Data are related to pharmacokinetic issues and/or adverse events. DrugBank

Data Availability: #

Reference: #

  1. Duran-Frigola, M., Pauls, E., Guitart-Pla, O., Bertoni, M., Alcalde, V., Amat, D., Juan-Blanco, T., and Aloy, P. (2020). Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker. Nature Biotechnology 38, 1087-1096.