Fact-checked by Grok 2 weeks ago

ChEMBL

ChEMBL is a manually curated, open-access database of bioactive small molecules with drug-like properties, aggregating chemical structures, bioactivity measurements, and associated genomic and proteomic target data to facilitate drug discovery and chemical biology research. Developed and maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), it originated from the StARlite system of Inpharmatica Ltd. and was publicly launched in 2009 with funding from the Wellcome Trust. The database's core strength lies in its high-quality curation process, which involves manual extraction of bioactivity data—such as affinities (e.g., IC50, Ki), functional potencies, and ADMET (, , , , ) properties—from peer-reviewed literature, patents via SureChEMBL, and direct submissions from research consortia like EUbOPEN and BindingDB. Data are sourced from approximately 230 scientific journals and public repositories, ensuring compliance with (Findable, Accessible, Interoperable, Reusable) principles, and are standardized using ontologies like the Experimental Factor (EFO) for diseases and phenotypes. As of the ChEMBL 36 release in October 2025, it encompasses approximately 2.8 million distinct compounds, 17,803 targets (primarily proteins but including cell lines and organisms), and millions of bioactivity data points across more than 830,000 functional and 520,000 assays. ChEMBL plays a pivotal role in cheminformatics and computational by enabling applications such as quantitative structure-activity relationship (QSAR) modeling, , machine learning-based target prediction, and assessment. It integrates with other resources like , , and the Open Targets platform, and offers user-friendly access through a interface, RESTful APIs, downloadable SQL dumps, and RDF formats for semantic querying. Recent enhancements in ChEMBL 36 include expanded drug and clinical candidate data from FDA and approvals (e.g., incorporating biotherapeutics and vaccines), tripled patent-derived assays from BindingDB, and new classifications for pesticides and natural products, reflecting its ongoing evolution to support AI-driven research and neglected initiatives. Over the past 15 years, ChEMBL has been cited in nearly 1,000 articles, underscoring its influence in advancing therapeutic development.

Overview

Definition and Purpose

ChEMBL is a manually curated, open-access database of bioactive molecules with drug-like properties. It integrates chemical structures, bioactivity data—such as binding affinities and functional outcomes—and genomic information associated with molecular targets. The primary purpose of ChEMBL is to facilitate the translation of genomic data into effective new medicines by supporting and efforts. It aids in target validation, compound prioritization, and the elucidation of molecular interactions between small molecules and biological targets. As of the ChEMBL 36 release in 2025, the database encompasses over 2.8 million distinct compounds, more than 17,800 targets, and millions of bioactivity measurements, underscoring its scale as a key chemogenomic resource that bridges chemistry and biology.

History and Development

ChEMBL originated as the StARlite database, developed by the biotechnology company Inpharmatica Ltd. in the early 2000s to capture structure-activity relationship data from medicinal chemistry literature. Inpharmatica was acquired by Galapagos NV in 2006, which continued development of the resource as a proprietary chemogenomics platform. In July 2008, Galapagos transferred the database to the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) under a £4.7 million Strategic Award from the Wellcome Trust, enabling its transition to a publicly accessible resource. The database was rebranded as ChEMBL and launched publicly by EMBL-EBI in October 2009, initially comprising over 500,000 compounds with a focus on curated bioactivity data extracted from peer-reviewed literature. This marked a pivotal shift from a commercial tool to an open-access model, broadening its utility for and industrial efforts. EMBL-EBI has since assumed ongoing maintenance, supported by core funding from EMBL member states, the , and projects such as the Innovative Medicines Initiative (IMI) and Framework 7 programs. Key expansions followed the launch, including the integration of absorption, distribution, metabolism, excretion, and toxicity (ADMET) data in 2011 to enhance its applicability in early-stage drug profiling. In 2023, updates to ChEMBL incorporated broader data types, such as detailed profiles for clinical candidate drugs, reflecting its evolution into a multifaceted platform for . The resource marked its 15th anniversary in October 2024, underscoring its growth from a literature-focused to a comprehensive, FAIR-compliant database aiding global cheminformatics research. ChEMBL's development has proceeded through regular releases, with version 17 in September 2013 containing over 12 million bioactivity measurements from more than 1 million assays. By version 35, released in December 2024, the database encompassed 17,500 approved drugs alongside extensive clinical candidate information, demonstrating sustained expansion in scale and scope. The latest release, ChEMBL 36 in October 2025, further expanded drug and clinical candidate data from FDA and approvals, tripled patent-derived assays from BindingDB, and introduced new classifications for pesticides and natural products.

Data Content and Curation

Sources of Data

ChEMBL primarily obtains its data through manual extraction from peer-reviewed literature, focusing on seven core journals: Journal of Medicinal Chemistry, Bioorganic & Medicinal Chemistry Letters, European Journal of Medicinal Chemistry, Bioorganic & Medicinal Chemistry, Journal of Natural Products, ACS Medicinal Chemistry Letters, and MedChemComm. Additional sources include deposited datasets from high-throughput screening efforts, such as those from and BindingDB, as well as public repositories like the GSK, , and St. Jude screening datasets, the Sanger Institute's of Drug Sensitivity in Cancer, and the MMV Box. data, including contributions from BindingDB patents and SureChEMBL, further supplements these origins, alongside clinical candidate information from regulatory sources like the FDA Orange Book and approvals. As of the ChEMBL 36 release in October 2025, enhancements include tripled patent-derived assays from BindingDB (to approximately 13,847 assays), expanded data on biotherapeutics and vaccines from FDA and approvals (up to November 2024), and new classifications for pesticides and natural products. The database encompasses a range of types centered on bioactive molecules, including chemical structures of small molecules and peptides, alongside approved drugs, clinical candidates, and experimental compounds. Bioactivity measurements form a core component, covering binding affinities (e.g., IC50, Ki, Kd), functional assays, and ADMET (, , , , ) endpoints. Target annotations link these to proteins and genes, sourced from databases like and Ensembl, while metadata includes assay descriptions, organism contexts (e.g., human, rodent models), and results. ChEMBL's content spans diverse therapeutic areas, with particular emphasis on neglected diseases through dedicated datasets like those for and cancer sensitivity. Since its inception in 2009, the database has grown in data diversity, initially prioritizing binding data but expanding post-2011 to incorporate broader functional, ADMET, and information from literature and deposited sets, reflecting evolving needs.

Curation Process

The curation process in ChEMBL begins with manual extraction of scientific facts from peer-reviewed journal articles, where curators identify and record key bioactivity data such as , assay results (e.g., values), target mappings, and associated metadata including experimental conditions and organism details. Recent enhancements, as in ChEMBL 36 (October 2025), incorporate tools like LeadMine and for semi-automated extraction of bioactivity and data, while preserving core manual oversight. This workflow involves drawing chemical structures as molfiles or SMILES notations and annotating protein targets using accession numbers to ensure traceability. Automated steps follow to standardize activity data, converting diverse units (e.g., from 133 different concentration formats) to a common scale like nanomolar (nM) and calculating derived values such as pChEMBL for dose-response curves, which represent negative logarithms of activity measurements. These processes aim to maximize data comparability while preserving original reported values in dedicated fields for transparency. Chemical structure standardization is a core component, employing an open-source pipeline integrated with the RDKit cheminformatics toolkit to process incoming structures systematically. The pipeline consists of three modules: a Checker that validates structures against rules (assigning penalty scores from 2 to 7 for issues like invalid valences or stereo mismatches, with scores of 7 preventing loading), a Standardizer that applies FDA and IUPAC guidelines (e.g., normalizing charges, removing explicit hydrogens except in specific cases, and excluding organometallics), and a GetParent module that strips salts and solvents using predefined lists of 162 salts and 9 common solvents to generate compounds. Structures are converted to SMILES, handling isomers by aggregating under forms and flagging duplicates or errors for ; this has standardized over 2 million compounds across releases, with ongoing additions from literature and deposited datasets. Target-assay relationships are assigned confidence scores on a 0-9 scale during curation, reflecting the evidence level and specificity of the mapping (e.g., score 9 for a directly identified single protein via assays, score 4 for multiple homologous proteins in a , and score 0 for uncurated entries). Scores are determined manually based on descriptions, prioritizing direct interactions (e.g., ) over inferred ones (e.g., phenotypic screens), with ambiguities labeled as "protein " or "" to avoid over-assignment. Quality control integrates automated flagging of inconsistencies (e.g., out-of-range values or transcription errors like 1000-fold discrepancies in measurements) with manual validation, ensuring less than 0.1% missing data and annotating potential issues in fields like DATA_VALIDITY_COMMENT. External integrations, such as filtered data introduced since 2011, undergo similar standardization and are cross-validated against ChEMBL's mappings (e.g., using for assay types and QUDT for units) to resolve literature ambiguities like unclear targets or duplicate reports. Periodic releases incorporate these updates, with recent enhancements including semi-automated checks for pharmacokinetic/pharmacodynamic data and chemical probe annotations. The process addresses challenges from inconsistent literature reporting—such as varying assay formats or incomplete structural depictions—through rigorous validation rules and community deposition guidelines that promote principles (Findable, Accessible, , Reusable). For instance, new datasets like EUbOPEN chemical probes or screening results are curated to maintain , with documentation and training resources aiding . This dual manual-automated approach ensures ChEMBL's reliability for downstream applications in .

Access and Interfaces

Web Interface and APIs

The ChEMBL web interface provides an interactive platform for querying and exploring its database of bioactive molecules and bioactivity data. Users can perform searches by compound name, (including substructure and similarity searches), (such as protein families or specific genes), assay type, documents, cell lines, or tissues, utilizing flexible text matching and secure protocol. Browsing options include dedicated sections for approved drugs and clinical candidates, allowing users to filter by development phase, molecule type, or first approval year. Visualization tools enhance data exploration, featuring interactive bubble charts that summarize entity quantities (e.g., approximately 2.8 million compounds and 17,803 ), hierarchical trees for classifications like kinases or proteases, and bar charts for drug distributions by indication or phase. These tools support clicking to drill down into related activities, structures, and plots of potency data (e.g., pChEMBL values), facilitating quick assessment of compound- interactions without downloading data. ChEMBL offers a RESTful for programmatic access, enabling real-time data retrieval without , under a . Key endpoints include /compound/search for keyword or structure-based queries, /target for protein or targets (e.g., filtering by name containing "kinase"), and /activity for bioactivity records (millions of entries). The API supports via limit and offset parameters (default limit: 20), and filtering options such as pchembl_value__gte=5 for potencies above 5 (indicating micromolar activity). Results are returned in format, wrapped in metadata envelopes for total counts and navigation. Web services are extended by ChEMBL Beaker, a of cheminformatic utilities for advanced queries. It enables similarity searches generating maps from SMILES or inputs, substructure matching via SMARTS patterns to highlight fragments or compute maximum common substructures, and calculations of physicochemical properties like molecular weight, , and hydrogen bond donors using RDKit. For usage, the allows retrieving bioactivity data for targets like inhibitors; for example, querying /activity?target_chembl_id=CHEMBL2111439 (for a specific ) returns filtered results with standard relations, types, and pChEMBL values. Integration in workflows is streamlined via the official client chembl_webresource_client, which supports Django-like ing (e.g., new_client.activity.filter(target_chembl_id='CHEMBL2111439', pchembl_value__gte=6.0)) and local caching for efficiency. Although no strict rate limits are enforced, best practices include using for large datasets, enabling client-side caching to minimize requests, specifying only fields to reduce payload size, and implementing timeouts (default: 10 seconds) to handle high-volume queries effectively.

Data Downloads

ChEMBL offers bulk download options for its entire dataset, enabling offline access and local processing for users requiring comprehensive data without relying on online queries. The primary formats include relational database dumps in , , and , which contain the full with tables for compounds, targets, bioactivities, and related entities. Flat file exports are also available, such as files for molecular structures, TSV files for tabular bioactivity and annotation data, and files for target protein sequences. RDF formats, such as , support semantic querying. Accompanying resources include detailed outlining changes per version and documentation describing table relationships, data types, and to support effective local database setup and querying. Releases follow a quarterly cadence to incorporate new curated data, with version 36 made available in October 2025 via anonymous FTP access on EMBL-EBI servers at ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/. This versioning ensures , allowing users to download specific historical releases (e.g., the chembl_35.tar.gz archive) rather than only the latest snapshot. The dump for recent releases, such as ChEMBL 36, is approximately 10 GB when uncompressed, making it a option for single-file portability compared to the larger or dumps that may require up to 35 GB of disk space for import. In addition to the full core dataset covering approximately 2.8 million distinct compounds and 17,803 targets, specialized subsets are provided for targeted analyses, including extracts of approved drugs (encompassing FDA-approved small molecules and biologics) and kinase-focused data from the integrated SARfari project. The open-source tool chembl-downloader automates the retrieval of these releases or subsets, handling decompression, integrity checks, and integration with libraries like RDKit for structure parsing. All ChEMBL data is released under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) , permitting commercial and non-commercial use, modification, and redistribution provided proper attribution is given to EMBL-EBI and derivative works are shared under the same terms. Users are advised to download via stable FTP mirrors to manage large file transfers efficiently, with recommended for quick local setups using tools like DB Browser for ; schema navigation is aided by entity-relationship diagrams in the documentation to query relationships such as compound-to-activity mappings. These download mechanisms support use cases like building local databases for model on bioactivity prediction or incorporating ChEMBL data into proprietary cheminformatics pipelines for workflows. For ad hoc or smaller-scale data needs, the web interface and provide complementary programmatic access without full downloads.

Tools and Integrations

Associated Software Tools

ChEMBL has developed and maintains several software tools to facilitate , analysis, and visualization of its bioactivity and chemical data. Among the key tools is the chembl_webresource_client, an official library that enables programmatic access to the ChEMBL RESTful API for querying compounds, targets, and activities without requiring direct SQL or low-level HTTP handling. This client supports filtering, pagination, and integration into workflows, making it essential for automated data retrieval tasks. Another prominent tool is ChEMBL Beaker, a providing cheminformatic utilities such as molecular , (e.g., ECFP and MACCS keys), similarity , and clustering algorithms based on RDKit. Launched in 2015 as part of the updated ChEMBL web services, Beaker allowed users to perform these computations remotely without local installations of cheminformatics software, supporting tasks like for large datasets. Although retired in September 2025 due to advancements in open-source toolkits, it significantly influenced early integrations for . For structure curation, ChEMBL employs an open-source built on RDKit, consisting of a Checker for validating chemical structures, a Standardizer for formatting (e.g., normalizing salts and tautomers), and a Sanitizer for correcting errors like valence violations. This ensures during database ingestion and is available for local use in reproducible curation workflows. Workflow integration is supported through nodes for platforms like and , allowing users to embed ChEMBL queries and analyses within visual programming environments for tasks such as bioactivity retrieval and target prediction. These nodes, developed in with the platforms, enable drag-and-drop access to endpoints and cheminformatic functions. Additionally, the community-maintained chembl-downloader package facilitates reproducible downloads of ChEMBL releases, handling versioning and extraction for consistent data handling in scripts. The SARfari series represents specialized resources for target-class-specific browsing and exploration, including Kinase SARfari for kinase inhibitor data and structure-activity relationships (SAR), and GPCR SARfari for G protein-coupled receptor ligands. These legacy web-based workbenches, released around 2010, provided interactive interfaces for browsing assays, compounds, and activity landscapes, though current users are directed to the main ChEMBL interface for updated data. SARfari tools included visualization dashboards to map activity cliffs and SAR trends, aiding qualitative analysis of chemical spaces. Source code and database dumps remain available for offline use. All major ChEMBL tools are open-source and hosted on under the chembl organization, offering examples for scripting advanced tasks like batch similarity searches using precomputed fingerprints. Post-2015 developments, including the integration of and enhanced clients, expanded tool capabilities to generate machine learning-ready inputs such as molecular descriptors and embeddings, supporting predictive modeling in .

Integrations with Other Databases

ChEMBL maintains extensive cross-references to external databases to enhance the contextual understanding of its bioactive molecules and targets. For protein targets, ChEMBL provides direct links to entries, enabling users to access detailed protein sequence and functional annotations. Similarly, compound identifiers in ChEMBL are cross-linked to for comprehensive and property data, including the integration of filtered PubChem BioAssay datasets into ChEMBL since 2011, which incorporates confirmatory and panel assays with active outcomes to expand bioactivity coverage. Genomic integrations connect ChEMBL targets to Ensembl, facilitating exploration of gene and variant information relevant to drug targets. In broader cheminformatics ecosystems, ChEMBL interfaces with specialized resources such as BindingDB for quantitative binding affinity data, allowing seamless retrieval of thermodynamic parameters for protein-ligand interactions, and RCSB PDB for insights, where ChEMBL compounds are referenced in ligand annotations within protein structures. As part of the infrastructure, a bioinformatics , ChEMBL serves as a Core Data Resource, promoting standardized and across the continent's research community. Central to these connections is UniChem, a dedicated cross-referencing service developed by the , which maps ChEMBL compounds to over 200 external sources using standardized identifiers like InChI keys, ensuring precise structure-based linkages without reliance on names or synonyms. ChEMBL's own standardized identifiers, such as ChEMBL IDs, further support this by providing unique, resolvable handles for compounds, targets, and assays that integrate with ontologies in resources like ChEBI for chemical entity classifications. Collaborative data flows exist with , where ChEMBL contributes bioactivity data for approved drugs, and reciprocal updates with ChEBI refine chemical ontologies through shared curation efforts. These integrations enable federated queries across platforms, reducing data silos in cheminformatics and accelerating interdisciplinary by allowing users to navigate from a single ChEMBL entry to multifaceted views of , chemistry, and .

Applications

In

ChEMBL plays a pivotal role in target identification and validation during by providing comprehensive bioactivity data that enables researchers to probe biological pathways and predict off-target effects. For instance, its curated of compound-target interactions allows for the analysis of selectivity profiles, such as in kinase inhibitors, where off-target binding to related s can be assessed to mitigate potential toxicities early in development. This data-driven approach supports the prioritization of viable targets by integrating genomic and phenotypic information, facilitating the selection of tool compounds that validate therapeutic hypotheses. In compound screening and prioritization, ChEMBL enables through similarity searches and scaffold hopping analyses, allowing researchers to identify novel hits from its repository of over 2.8 million distinct compounds. Bioactivity measurements, such as and values, aid in selecting tool compounds for experimental assays, streamlining the hit-to-lead process by filtering candidates based on potency and structural diversity. These capabilities are particularly valuable in high-throughput workflows, where ChEMBL data enhances the efficiency of ligand-based to prioritize leads with desirable pharmacological profiles. ChEMBL's curated ADMET data, including and endpoints, is leveraged for lead optimization to predict and improve , , , , and properties. Researchers use this information to refine molecular structures, balancing with , as seen in models trained on ChEMBL's and withdrawn drug datasets to forecast liabilities in lead series. This integration supports iterative design cycles, reducing the risk of late-stage failures by providing quantitative insights into ADMET behaviors. Notable case examples include ChEMBL's support for neglected disease , such as through the ChEMBL-NTD repository, which curates screening data for targets like kinases, enabling open-access hit identification for resource-limited research. Additionally, ChEMBL integrates clinical candidate tracking, encompassing over 17,500 approved drugs and candidates in its latest release (ChEMBL 36, October 2025), sourced from regulatory databases like and FDA, to monitor progression and inform repurposing efforts. These applications underscore ChEMBL's impact, with nearly 1,000 citations for hit-to-lead processes and its foundational role in AI-driven discovery, such as activity prediction models that accelerate therapeutic development.

In Research and Education

ChEMBL plays a pivotal role in academic research by providing a comprehensive of that supports advanced cheminformatics studies, such as and model training for predicting molecular interactions. For instance, researchers utilize ChEMBL datasets to explore vast chemical libraries through techniques like molecular docking and predictive modeling, enabling the identification of novel bioactive compounds. Additionally, ChEMBL contributes to and polypharmacology by facilitating the of multi-target interactions, as demonstrated in tools like the Polypharmacology Browser, which predicts off-target effects using ChEMBL's curated bioactivity annotations. In education, ChEMBL serves as a key resource through EMBL-EBI's dedicated tutorials, webinars, and online courses that teach students how to query bioactivity data and apply it to problems. These materials, including practical sessions on access and data visualization, are integrated into curricula to illustrate real-world applications of cheminformatics. For example, workshops demonstrate how to retrieve and analyze compound-target relationships, fostering hands-on learning in drug-like molecule exploration. ChEMBL embodies principles by adhering to standards—findable, accessible, interoperable, and reusable—which promote reproducible research across disciplines. Its open-access structure allows seamless integration into benchmarking tools for evaluating activity prediction algorithms, where standardized ChEMBL datasets enable consistent comparisons of models' performance. The database encourages community contributions through user-submitted datasets, with over 290 depositions enhancing its coverage of diverse bioactivities. ChEMBL also supports global challenges, such as modeling resistance, by providing large-scale preclinical data on antibacterial compounds for analyses of metabolic networks and . Over its 15-year history, ChEMBL has fostered international collaborations, particularly in neglected disease areas, by supplying data for -driven initiatives that prioritize underserved therapeutic needs. This influence extends to discussions on in , emphasizing equitable access and bias mitigation in predictive models for challenges.

References

  1. [1]
    ChEMBL - EMBL-EBI
    ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic dataDatabase Schema · 2496335 Distinct Compounds · 16003 Targets
  2. [2]
    Fifteen years of ChEMBL and its role in cheminformatics and drug ...
    Mar 10, 2025 · ChEMBL has solidified its role as a pioneering database of highly curated and structured bioactivity data in the fields of cheminformatics and drug discovery.
  3. [3]
    The ChEMBL Database in 2023: a drug discovery platform spanning ...
    Nov 2, 2023 · The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods
  4. [4]
    ChEMBL 36 is live
    ### Summary of ChEMBL 36 Release
  5. [5]
    ChEMBL 36 is out!
    Sep 22, 2025 · Includes phenotype/disease context, mapped to EFO ontology. Stored in assay_parameters . Updated Data Sources. AI-driven Structure-enabled ...
  6. [6]
    What is ChEMBL? - EMBL-EBI
    ChEMBL is a 'chemogenomic' database that brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective ...Missing: definition | Show results with:definition
  7. [7]
    ChEMBL Database in 2023: a drug discovery platform spanning ...
    Nov 2, 2023 · Number of documents . Number of assays . Number of bioactivities . J Med Chem, 24 505, 569 146, 2 848 595. Bioorg Med Chem Lett, 23 763, 291 472 ...
  8. [8]
    ChEMBL 36 is live | EMBL-EBI
    Oct 15, 2025 · The new ChEMBL 36 release is now live. ChEMBL is a manually curated database of bioactive molecules with drug-like properties.
  9. [9]
    Open access drug discovery database launches with half a million ...
    Jan 18, 2010 · It was transferred from biotech firm Galapagos NV in July 2008 through a £4.7 million Strategic Award from the Wellcome Trust. ChEMBLdb is a ...
  10. [10]
    ChEMBL 34 is out!
    Apr 16, 2024 · Work contributing to ChEMBL34 was funded by the Wellcome Trust, EMBL ... EU Innovative Medicines Initiative (IMI) and EU Framework 7 programmes.
  11. [11]
    Acknowledgements | ChEMBL Interface Documentation - GitBook
    Sep 24, 2024 · The ChEMBL resources are made available due to funding from the following: Current Awards, Ongoing: The Member States of the European Molecular Biology ...Missing: projects | Show results with:projects
  12. [12]
    ChEMBL: a large-scale bioactivity database for drug discovery - PMC
    Sep 23, 2011 · ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds.
  13. [13]
    Full article: The Chembl Database: A Taster for Medicinal Chemists
    Mar 17, 2014 · The ChEMBL database is updated on a regular basis and, as of February 2014, the current version (version 17) contains more than 12 million assay ...
  14. [14]
    Drug and Clinical Candidate Drug Data in ChEMBL
    In addition, there are around 2.5 million compounds with experimental bioactivity data (termed “research” compounds for the purposes of this paper).
  15. [15]
    Document and Data Source Questions
    Jul 13, 2023 · ChEMBL consists of data from a wide variety of data sources including scientific literature and patents, deposited data sets, PubChem BioAssay and BindingDB ...
  16. [16]
    Assay and Activity Questions | ChEMBL Interface Documentation
    Apr 2, 2024 · Binding (B) - Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd. Functional (F) - Data measuring the biological effect ...
  17. [17]
    How is ChEMBL data curated? - EMBL-EBI
    ChEMBL data is curated by extracting scientific facts from articles, adding them in a structured format, and then performing additional curation and annotation.
  18. [18]
    Activity, assay and target data curation and quality in the ChEMBL ...
    Jul 23, 2015 · This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve ...
  19. [19]
    An open source chemical structure curation pipeline using RDKit
    Sep 1, 2020 · The pipeline has three components: a Checker to validate structures, a Standardizer to format compounds, and a GetParent to remove salts/ ...
  20. [20]
    Introduction | ChEMBL Data Deposition Guide - GitBook
    Sep 11, 2025 · It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.Missing: curation | Show results with:curation
  21. [21]
    Searching using the web interface | ChEMBL - EMBL-EBI
    The ChEMBL web interface provides a flexible and easy way to access ChEMBL's core bioactivity data. Searching is via an encrypted and secure protocol.Missing: visualization features
  22. [22]
    Visualise ChEMBL - EMBL-EBI
    A visual overview of ChEMBL and a starting point for exploring the database. Explore ChEMBL Description: Shows a summary of the ChEMBL entities and quantities ...Missing: web interface features
  23. [23]
    ChEMBL Interface Documentation: Web Interface
    Mar 26, 2024 · You can see the related activities of compound, target, document, assay, cell line, or tissue by clicking on the visualisations.
  24. [24]
    ChEMBL Data Web Services
    Oct 6, 2025 · ChEMBL includes both mechanism of action information for approved drugs and pharmacology data from published assays.
  25. [25]
  26. [26]
    Cheminformatic Utils Web Services
    Mar 22, 2023 · ChEMBL Chemoinformatic Utils (aka 'ChEMBL Beaker') is a set of useful utility web service tools that provide RESTful access to commonly used cheminformatic ...
  27. [27]
    Official Python client for accessing ChEMBL API - GitHub
    The only official Python client library developed and supported by ChEMBL group. The library helps accessing ChEMBL data and cheminformatics tools from Python.
  28. [28]
    Downloads - ChEMBL Interface Documentation - GitBook
    Sep 19, 2025 · ChEMBL Database Release DOIs ; CHEMBL36. July 2025. 10.6019/CHEMBL.database.36 ; CHEMBL35. December 2024. 10.6019/CHEMBL.database.35 ; CHEMBL34.ChEMBL Downloads · ChEMBL Database Release... · ChEMBL-RDF Release DOIs
  29. [29]
    ChEMBL - EMBL-EBI
    ChEMBL is part of the ELIXIR infrastructure. ChEMBL is and Elixir Core Data Resource Learn More · Global Bio Data logo ... Biology Laboratory | Terms of use.
  30. [30]
    Download Questions - ChEMBL Interface Documentation - GitBook
    Nov 8, 2023 · You will need around 35GB of free space available to import the ChEMBL dump files. What's the difference between the SDFile and the MySQL/ ...Missing: 36 licensing
  31. [31]
    cthoyt/chembl-downloader: Write reproducible code for ... - GitHub
    Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download it and use it automatically.Cthoyt/chembl-downloaderTests
  32. [32]
    General Questions - ChEMBL Interface Documentation - GitBook
    Sep 25, 2025 · ChEMBL is a database of bioactivity data for drug-like compounds and includes data from seven core medicinal chemistry journals, patents and ...
  33. [33]
    Accessing ChEMBL data - EMBL-EBI
    Via our Web Services · Downloading the whole database in a number of different formats. The following pages provide more information on these methods of access.
  34. [34]
    chembl/chembl_beaker: RDKit wrapper - GitHub
    This is chembl_beaker package developed at ChEMBL group, EMBL-EBI, Cambridge, UK. This is wrapper for RDKit which exposes following methods.
  35. [35]
    Saying Goodbye to ChEMBL Beaker
    Aug 14, 2025 · The goal was simple: make it easy for anyone to perform cheminformatics operations without having to wrestle with complex build chains, platform ...
  36. [36]
    ChEMBL Beaker: A Lightweight Web Framework Providing Robust ...
    ChEMBL Beaker is an open source web framework, exposing a versatile chemistry-focused API (Application Programming Interface) to support the development of ...
  37. [37]
    ChEMBL database structure pipelines - GitHub
    ChEMBL structure pipeline. ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.Missing: per | Show results with:per
  38. [38]
    [PDF] ChEMBL resources and KNIME
    ChEMBL provides data for drug discovery. KNIME allows access to ChEMBL via nodes, and democratizes access to data and tools.Missing: Taverna | Show results with:Taverna
  39. [39]
    Legacy Resources | ChEMBL Interface Documentation - GitBook
    Apr 1, 2019 · ADME SARfari. We recommend using the more recent bioactivity and ADME assay data including pharmacokinetics which is available via the main ...
  40. [40]
    GPCR SARfari Released - The ChEMBL-og
    Sep 24, 2010 · We are pleased to announce the release of GPCR SARfari. GPCR SARfari is a web based workbench focused on Class AG Protein-Coupled Receptors (Rhodopsin-like).
  41. [41]
    Chemical Biology Services @ EMBL-EBI - GitHub
    Chemical Biology Services @ EMBL-EBI has 75 repositories available. Follow their code on GitHub.
  42. [42]
    Beaker now officially part of ChEMBL web services
    Mar 12, 2015 · Beaker - what's this? It's a small utility, that makes chemistry software available securely over https. You no longer need to install a ...
  43. [43]
    ChEMBL web services: streamlining access to drug discovery data ...
    In this publication we describe an updated set of web services, which expand access to ChEMBL data and also include a number of new features, such as advanced ...Missing: ADMET | Show results with:ADMET
  44. [44]
    ChEMBL | Cross-referenced databases - UniProt
    ChEMBL is a database of bioactive drug-like small molecules. Its server is at www.ebi.ac.uk/chembl.Missing: integrations PubChem BindingDB RCSB PDB UniChem DrugBank
  45. [45]
    UniChem: a unified chemical structure cross-referencing and ...
    Jan 14, 2013 · UniChem is a freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks ...
  46. [46]
    Data From External Resources Integrated Into RCSB PDB
    Mar 24, 2025 · ChEBI, Chemical entities of biological interest. ChEMBL, Manually curated database of bioactive molecules with drug-like properties. CSD ...
  47. [47]
  48. [48]
    Using ChEMBL for target identification and prioritisation
    Dec 5, 2019 · It contains more than 15 million bioactivity data points for ~1.9 million compounds, including compound interaction data against ~8,000 protein ...
  49. [49]
    A better way to find drug targets - EMBL-EBI
    Mar 31, 2017 · The ChEMBL team at EMBL-EBI collaborated on a new way to identify drug targets from Genome Wide Association Studies.
  50. [50]
    prioritization of compound data sets for scaffold hopping analysis in ...
    Sep 13, 2012 · Systematic assessment of scaffold distances in ChEMBL: prioritization of compound data sets for scaffold hopping analysis in virtual screening.
  51. [51]
    Pros and cons of virtual screening based on public “Big Data”
    Mar 1, 2019 · The nearly two million compounds in ChEMBL are often associated with reliable IC50/Ki measures, but these concern a plethora of different ...
  52. [52]
    predicting ADMET improvements of molecular derivatives with deep ...
    Oct 27, 2023 · Drug design requires a balancing act between optimizing the on-target potency of a drug lead and maintaining an appropriate absorption, ...Methods · Results · Discussion And Conclusion
  53. [53]
    ChEMBL-NTD - GitBook
    Jun 11, 2024 · ChEMBL-NTD is a repository for Open Access primary screening and medicinal chemistry data directed at neglected tropical diseases.Deposited Set 26: 3 March... · Deposited Set 23: 28th... · Deposited Set 21: 14th...
  54. [54]
    Open Source Drug Discovery with the Malaria Box Compound ...
    Jul 28, 2016 · Preclinical development for drugs in neglected diseases remains a slow process due to a lack of access to compounds, and legal complications ...
  55. [55]
    Exploring the Chemical Space of CYP17A1 Inhibitors Using ... - MDPI
    Feb 9, 2023 · In this study, we performed systematic cheminformatic analyses and machine learning modeling techniques to explore the chemical space of CYP17A1 ...2. Results · 2.2. Chemical Space... · 4.5. Qsar Modeling
  56. [56]
    Identifying Differences in the Performance of Machine Learning ...
    Jul 13, 2023 · Since data sets extracted from ChEMBL are often used for model training in research, (34−36) this can be an issue when applied these models ...
  57. [57]
    Rapid traversal of vast chemical space using machine learning ...
    Mar 13, 2025 · Here we explore a strategy that combines machine learning and molecular docking to enable rapid virtual screening of databases containing billions of compounds.
  58. [58]
    The polypharmacology browser: a web-based multi-fingerprint target ...
    Feb 21, 2017 · PPB searches through 4613 groups of at least 10 same target annotated bioactive molecules from ChEMBL and returns a list of predicted targets ...
  59. [59]
    Structural and Functional View of Polypharmacology - Nature
    Aug 31, 2017 · We compiled a dataset of drug-targets and drug-off targets by querying ChEMBL for approved multi-target drugs and the human proteins to which ...
  60. [60]
    A guide to exploring drug-like compounds and their ... - EMBL-EBI
    ChEMBL is a widely-used database of bioactivity data that links drug-like compounds to their biological targets and has applications in drug discovery.<|control11|><|separator|>
  61. [61]
    ChEMBL | EMBL-EBI Training
    By the end of the course you will be able to: Describe what ChEMBL is and how it can help you to understand the interactions between drugs or drug-like ...Missing: applications | Show results with:applications
  62. [62]
    How to use the ChEMBL database | Online drug discovery course
    Dec 14, 2021 · In this video, a practical tutorial on how to use the ChEMBL database for retrieving bioactivity data is shown.
  63. [63]
    ChEMBL workshop by Anna Gaulton - YouTube
    Nov 23, 2020 · Part of the RSC Open Chemical Sciences workshop series (https://www.rsc.org/events/detail/42090/open-chemical-science). ChEMBL is a manually ...<|control11|><|separator|>
  64. [64]
    Drug and Clinical Candidate Drug Data in ChEMBL - PubMed
    Sep 19, 2025 · ChEMBL 35 contains 17,500 approved drugs, and drugs that are progressing through the clinical development pipeline. Drug curation has formed an ...Missing: 2023 | Show results with:2023
  65. [65]
    Benchmarking compound activity prediction for real-world drug ...
    Jun 4, 2024 · CARA was curated mainly based on the ChEMBL database, which contained large-scale activity data of small molecule compounds against the ...
  66. [66]
    Large-scale comparison of machine learning methods for drug ... - NIH
    Jun 6, 2018 · The ChEMBL benchmark dataset which we created and used to compare various target prediction methods consists of 456 331 compounds. Chemical ...
  67. [67]
    Machine Learning Study of Metabolic Networks vs ChEMBL Data of ...
    Jun 7, 2022 · In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL ...
  68. [68]
    Machine Learning Study of Metabolic Networks vs ChEMBL Data of ...
    In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL ...Missing: post- | Show results with:post-<|control11|><|separator|>
  69. [69]
    Machine learning, artificial intelligence, and data science breaking ...
    Jan 25, 2021 · Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases ... ChEMBL: a large-scale ...