Fact-checked by Grok 2 weeks ago

European Bioinformatics Institute

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental outstation of the European Molecular Biology Laboratory (EMBL), a pan-European research organization founded in 1974 with 29 member states, dedicated to advancing molecular biology through open data infrastructure and computational tools. Established in 1992 by the EMBL Council and located on the Wellcome Genome Campus in Hinxton, United Kingdom, including the new Thornton Building opened in 2024, EMBL-EBI functions as the world's leading provider of public biomolecular data, enabling global life sciences research by curating, disseminating, and analyzing vast datasets from genomics, proteomics, and structural biology. EMBL-EBI's core mission is to promote scientific discovery by maintaining freely accessible databases and developing bioinformatics resources that support researchers in addressing challenges from basic biology to clinical applications, such as drug discovery and genomic medicine. Key services include the European Nucleotide Archive (ENA) for raw sequencing data, the European Genome-phenome Archive (EGA) for controlled-access genomic datasets, UniProt for protein sequence and function information, Ensembl for genome annotation, and the Protein Data Bank in Europe (PDBe) for biomolecular structures, among others like ChEMBL for chemical biology and the BioImage Archive for microscopy data. These resources handled over 15 petabytes of data deposits in 2023, serving 4.8 million unique users worldwide in 2023 and facilitating collaborations such as the AlphaFold Protein Structure Database, which contributed to the 2024 Nobel Prize in Chemistry for protein structure prediction. With over 850 from more than 70 nationalities, EMBL-EBI conducts basic research in , offers extensive training programs—including 21 in-person courses and 37 webinars, with online training resources reaching over 600,000 unique users in 2023—and partners with industry and initiatives like to ensure long-term data preservation and interoperability across Europe. Funded by EMBL member states, the , , , and the US , the institute emphasizes principles, making its tools, software, and datasets available without restrictions to foster innovation in areas like , pathogens, and rare diseases.

History

Establishment

The European Bioinformatics Institute (EMBL-EBI) was established on 1 September 1994 in Hinxton, , , as the third outstation of the (EMBL). This founding followed a decision by the EMBL Council in 1992 to create a dedicated for bioinformatics services, marking the transition of core operations from EMBL's in , . The relocation began that year and involved transferring the EMBL Data Library—Europe's primary repository for sequences—and the collaborative SWISS-PROT protein , which had been maintained jointly with the . EMBL-EBI's initial objectives centered on centralizing bioinformatics infrastructure across to facilitate the collection, , , and public dissemination of data, thereby enabling collaborative in the life sciences. The institute operated from the outset under the founding leadership of Graham Cameron, who served as its first director from 1994 to 2003 and had previously headed EMBL's Data Library in . As an integral part of EMBL—an intergovernmental —EMBL-EBI received through contributions from EMBL's member states, which numbered 15 by the mid-1990s.

Key Milestones

In the late 1990s, EMBL-EBI expanded its core data resources, including the launch of the EMBL Data Bank in 1994 as a predecessor to the modern European Archive (ENA), which served as a central repository for sequences. Simultaneously, the integration of the SWISS-PROT protein knowledgebase—initially hosted with around 38,000 entries—laid the foundation for what would evolve into through subsequent mergers and updates. The 2000s marked substantial growth for EMBL-EBI, beginning with the initiation of the Ensembl genome annotation project in 1999 as a joint effort with the to provide automated annotations for the and other species ahead of the Human Genome Project's completion. The institute's facilities were consolidated at the Wellcome Genome Campus in Hinxton, with expansions enhancing computational infrastructure by the early 2000s. Staff numbers grew rapidly, surpassing 200 by to support the increasing demands of genomic data management. During the 2010s, EMBL-EBI advanced its technological capabilities, notably adopting solutions for and storage, such as the R Cloud service launched in 2010 to enable remote access to large datasets. In 2014, the institute celebrated its 20th anniversary with events emphasizing the role of in biological research, highlighting the exponential growth in sequence and structural information. The 2020s brought transformative highlights, including EMBL-EBI's rapid response to the through the launch of the COVID-19 Data Portal in April 2020, which aggregated genomic, proteomic, and clinical data to accelerate global research efforts. In July 2021, in collaboration with DeepMind, EMBL-EBI released the Protein Structure Database, providing predicted 3D structures for over 350,000 proteins from key organisms and enabling breakthroughs in . By 2024, the database expanded to cover over 214 million protein structures, vastly increasing accessible structural predictions. In October 2025, EMBL-EBI and renewed their partnership, releasing a major update to the Database to better align with . These developments aligned with EMBL's 2022–2026 "Molecules to Ecosystems" programme, which integrates molecular data with environmental and ecosystem-level analyses to address complex biological challenges. By 2024, EMBL-EBI's resources handled an average of 123 million daily web requests from 42 million unique addresses annually, underscoring their global scale. These services have enabled over 100,000 scientific publications each year, supporting advancements across , , and .

Organization and Governance

Structure within EMBL

The European Bioinformatics Institute (EMBL-EBI) operates as one of six outstations of the (EMBL), an intergovernmental organization established in 1974 and headquartered in , . The other EMBL sites are located in (), (), (), Hinxton (, hosting EMBL-EBI), and (), each focusing on distinct aspects of research, services, and training. As of 2025, EMBL comprises 29 full member states, including founding members such as , , , , , , the , , , and the , along with later accessions like in 2024. EMBL-EBI reports directly to EMBL's , who is appointed by and accountable to the EMBL , the organization's primary composed of representatives from all member states. The EMBL meets biannually to oversee strategic direction, financial compliance, and programme approval, ensuring alignment across all sites. Additionally, an internal Scientific Advisory Committee, consisting of independent experts, provides strategic advice to the on scientific programmes and priorities, including those relevant to bioinformatics infrastructure at EMBL-EBI. Funding for EMBL-EBI is derived from multiple sources, with EMBL contributions accounting for 42% of its operating expenditure in 2024, supplemented by external grants (29%), capital awards (23%), and commercial collaborations (6%). Key external funders include the through , the , (UKRI), and the US , supporting specific projects and infrastructure. The total operating expenditure for EMBL-EBI in 2024 was approximately €105.5 million, reflecting its scale in managing global bioinformatics resources. As a non-profit entity within the EMBL framework, EMBL-EBI adheres to an open-access policy, making its data resources freely available under permissive licenses such as CC0 where applicable, to promote unrestricted reuse by the global research community. This model emphasizes data resilience through robust archiving and adherence to principles—ensuring data are Findable, Accessible, Interoperable, and Reusable—to facilitate discovery and collaboration in life sciences. EMBL-EBI serves as the hub for , Europe's distributed bioinformatics infrastructure, coordinating activities across 21 member countries and their national nodes to build sustainable capabilities. Through this role, it fosters collaboration between national organizations, such as the Dutch Techcentre for Life Sciences in the , to integrate and standardize bioinformatics tools and datasets continent-wide.

Leadership and Staff

The European Bioinformatics Institute (EMBL-EBI) is led by as Interim since March 2025, following Ewan Birney's transition to the role of Executive for the broader (EMBL). Birney, who served as EMBL-EBI from 2015 to 2025, has overseen strategic advancements in , including the integration of large-scale genomic data resources and initiatives. Associate Directors support specialized functions, such as services, , and operations. For instance, Cath Brooksbank serves as Head of , leading programs that reach tens of thousands of researchers annually through workshops and online resources focused on bioinformatics skills. Historically, Rolf Apweiler held roles as Joint Director from 2015 to 2024 and Associate Director until his retirement in October 2025, contributing to the development of protein databases like . As of 2025, EMBL-EBI employs approximately 700 full-time equivalents, comprising bioinformaticians, data curators, software engineers, and early-career trainees such as students and postdocs. The workforce is highly diverse, drawing from over 60 countries, which fosters interdisciplinary collaboration in data management. Recruitment emphasizes interdisciplinary expertise in , , and , with dedicated programs for early-career researchers including fellowships and training pathways that prioritize inclusivity and no restrictions on nationality, gender, or age. Under current leadership, EMBL-EBI has advanced integration for data curation, using large language models to streamline annotation processes while ensuring accuracy, as detailed in 2024 institutional reports and ongoing pilots.

Facilities and Location

Hinxton Campus

The European Bioinformatics Institute (EMBL-EBI) is situated on the Wellcome Genome Campus in the village of Hinxton, , , approximately 10 miles (16 km) south of . This 125-acre site, shared with the and other genomics organizations, provides a dedicated hub for bioinformatics research and collaboration. EMBL-EBI was established on the Hinxton campus in September 1994, following the relocation of key bioinformatics services from temporary facilities in , , a process that began in 1992. Designed to capitalize on the growing field of , the site transitioned to permanent buildings around 2000, enabling expanded operations and interdisciplinary partnerships. By 2005, further expansions addressed space needs for a growing , moving beyond initial temporary accommodations to support over 400 researchers. The campus amenities include state-of-the-art laboratories, secure data centers for managing vast biological datasets, and expansive green spaces that promote well-being and informal collaboration. Its location near facilitates strong ties with academic institutions, enhancing knowledge exchange in life sciences. efforts are prominent, with buildings like the Thornton Building achieving a Excellent certification (76% score as of 2024) through energy-efficient designs, including optimized systems for petabyte-scale data storage that minimize environmental impact. In March 2025, the Thornton Building opened as EMBL-EBI's third permanent facility, providing space for collaborative research on topics such as infectious diseases and . The Hinxton site fosters a vibrant by hosting public events, guided tours, and educational programs, while integrating with the local ecosystem through initiatives like the Wetlands for monitoring. These activities align with EMBL's broader programmes in environmental research and public engagement.

Infrastructure

The European Bioinformatics Institute (EMBL-EBI) relies on a robust technological to manage and disseminate vast biological datasets. Its systems encompass over 300 petabytes of raw capacity, comprising , SSD, disk, and types that house approximately 25 billion files and objects. To ensure resilience and scalability, EMBL-EBI employs a hybrid model integrating public providers such as AWS and Google Cloud alongside private cloud platforms, with replicated distributed across three geographically separate locations for and automatic . This setup supports intense usage, processing 3.5 billion requests per month from an average of 5.6 million unique addresses in 2024. Computing resources at EMBL-EBI include (HPC) clusters optimized for handling large-scale data analysis, particularly in (AI) and (ML) applications. These clusters facilitate advanced tasks such as and genomic sequencing, with recent integrations of large language models (LLMs) enhancing and curation workflows—for instance, LLMs automate annotation of to accelerate database updates. The open-source BioChatter framework exemplifies this, enabling customizable LLM applications for biomedical research. The software ecosystem supports seamless data interactions through open-source tools for submission and retrieval, including the Job Dispatcher for and DBfetch for efficient data access. RESTful APIs provide programmatic interfaces, allowing developers to integrate EMBL-EBI resources like Ensembl and into external pipelines without restrictions beyond data owner policies. Security measures align with the EU (GDPR) and principles, while policies promote unrestricted access under machine-readable licenses. Long-term preservation is achieved via geo-dispersed backups in public clouds and collaborations with international consortia such as the International Sequence Database Collaboration (INSDC). Recent infrastructure upgrades in 2024 and 2025 have emphasized AI-driven capabilities, including expanded cloud integrations to manage surging demand from resources like the Database, which now hosts over 200 million predictions and serves millions of users globally. These enhancements, part of a renewed partnership with , incorporate multiple sequence alignments and isoform support to bolster predictive accuracy and usability amid post- traffic growth.

Bioinformatics Databases

Ensembl

Ensembl is a bioinformatics resource developed by the European Bioinformatics Institute (EMBL-EBI) in collaboration with the , launched in 1999 to provide automated annotation and analysis of large-scale genomic data during the era. The project initially focused on vertebrate genomes, with its public website debuting in July 2000 to disseminate draft annotations ahead of formal publication. Over the years, Ensembl has evolved into a comprehensive platform integrating sequence data, gene models, and comparative analyses, supporting research across and . The core functions of Ensembl encompass automated gene annotation, identification of regulatory features such as promoters and enhancers, and variant effect prediction to assess the impact of genetic variations on genomic elements. Central to these capabilities is the Ensembl Variant Effect Predictor (VEP), a tool that annotates variants—including single polymorphisms, insertions, deletions, and structural variants—by predicting their consequences on transcripts, proteins, and regulatory regions, incorporating scores like SIFT and PolyPhen for functional impact. These features enable researchers to explore genome architecture and functional elements without relying on manual curation. Ensembl integrates diverse data sources to facilitate holistic genomic inquiries, linking genomic sequences from the European Nucleotide Archive (ENA) and protein annotations from to provide context for functions and evolutionary relationships. It supports by aligning sequences across , highlighting conserved regions and orthologs, particularly among vertebrates but extending to broader eukaryotic and prokaryotic datasets through Ensembl Genomes. This interconnected framework allows users to trace evolutionary changes and identify disease-associated variants in a multi- context. Ensembl powers genomic research in areas such as human health, susceptibility, and , serving as a foundational resource for projects analyzing and regulatory mechanisms. It receives millions of daily requests, reflecting its widespread adoption by the . Complementary tools like enable sequence similarity searches that can be combined with Ensembl's pipelines for deeper analysis. Updates to Ensembl occur through regular releases, approximately every , incorporating new genome assemblies, refined annotations, and expanded datasets. In 2024, enhancements included support for long-read sequencing data via the GENCODE Comprehensive Long-read Sequencing project, improving transcript isoform resolution and annotation accuracy for complex . The current release, Ensembl 115 from September 2025, covers 314 species in its core database, with Ensembl Genomes extending to over 4,800 eukaryotic and 31,300 prokaryotic for broader comparative studies.

UniProt

UniProt, developed and maintained by the European Bioinformatics Institute (EMBL-EBI) in collaboration with the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR), emerged in 2002 as a unified resource through the merger of the manually curated Swiss-Prot database, the automatically annotated TrEMBL database, and the PIR Protein Sequence Database (PIR-PSD). This consolidation addressed the growing volume of protein data from genomic projects, creating a centralized hub for high-quality protein information. As of the 2025_04 release on October 15, 2025, encompasses approximately 199 million protein sequences, following a reduction to focus on high-quality reference proteomes. This release implemented a major update by restricting UniProtKB/TrEMBL to sequences from reference proteomes, removing approximately 54 million redundant entries to enhance data quality and focus on representative sequences. The core of UniProt is the UniProt Knowledgebase (UniProtKB), divided into two sections: UniProtKB/Swiss-Prot, which provides expertly curated entries with detailed functional annotations for a subset of sequences, and UniProtKB/TrEMBL, which includes computationally predicted annotations for the vast majority of sequences to ensure comprehensive coverage. Complementing UniProtKB are UniRef clusters, which reduce redundancy by grouping similar sequences at 100%, 90%, or 50% identity thresholds to facilitate comparative analyses, and UniParc, a non-redundant archive that preserves all protein sequences from public databases without annotations to track historical versions and isoforms. UniProt annotations focus on protein function and structure, including functional domains, post-translational modifications (PTMs) such as and , and molecular interactions like protein-protein binding sites. These are standardized using controlled vocabularies, notably the (GO) for molecular function, biological process, and cellular component terms, enabling consistent cross-species comparisons. Curation in UniProt combines manual expert annotation with automated methods, including rule-based systems and (AI) for propagating information from curated templates to uncharacterized proteins. The manual process involves from over 500,000 publications, , and family-based curation to ensure accuracy and evidence-based claims, with automatic annotation handling the scale of incoming data. UniProt supports critical applications in by providing reference sequences and functional data for identification, and in through insights into protein targets, variants, and interaction networks. Access is facilitated via a RESTful API for programmatic queries and bulk downloads of datasets in formats like and XML, allowing integration into workflows and large-scale analyses. It also links to genomic resources like Ensembl for contextualizing proteins within whole-genome annotations.

PDBe

The Protein Data Bank in (PDBe), hosted by the European Molecular Biology Laboratory's Bioinformatics Institute (EMBL-EBI), serves as the portal for the Worldwide (wwPDB). As a founding member of the wwPDB established in 2003, PDBe plays a key role in collecting, processing, archiving, and disseminating experimentally determined three-dimensional () structures of biological macromolecules. This includes managing depositions through the unified OneDep system and ensuring data quality via rigorous annotation and validation protocols. PDBe specifically handles curation responsibilities for submissions from and African institutions, processing thousands of entries annually to maintain the integrity of the global archive. The PDBe archive encompasses a diverse array of structural data, primarily derived from , (NMR) , and cryo-electron (cryo-EM). These structures cover proteins, nucleic acids, macromolecular complexes, and associated small molecules such as ligands and cofactors. Each entry is accompanied by comprehensive validation reports that assess geometric quality, model-to-data fit, and biological , aiding researchers in interpreting and utilizing the data effectively. As of November 2025, the wwPDB archive, synchronized across PDBe and its partner sites, holds nearly a quarter million such experimentally determined structures, reflecting the rapid growth in . To enhance accessibility and utility, PDBe provides integrated tools for data exploration and analysis. The PDBe (PDBe-KB) aggregates functional and biophysical annotations from multiple specialist resources, offering insights into protein function, evolutionary relationships, interactions, and associations for PDB entries. Visualization is supported through the open-source Mol* viewer, which enables interactive , superposition, and analysis of structures directly in web browsers. In 2024, PDBe-KB introduced enhancements for seamless integration of predicted structures from the Database, allowing users to compare experimental and computational models in a unified interface. PDBe's contributions significantly impact and related fields, facilitating , , and fundamental research by providing high-quality, standardized data to a global community. The wwPDB resources, including PDBe, serve millions of unique users annually, with billions of data downloads underscoring their essential role in advancing biomedical science.

AlphaFold Database

The Protein Structure Database is a comprehensive open-access resource developed by the European Bioinformatics Institute (EMBL-EBI) in collaboration with , providing AI-generated three-dimensional models of proteins to support biological research. Launched in July 2021, the database initially offered predictions for over 365,000 structures across 21 proteomes, marking a significant advancement in making high-accuracy predictions widely available to the . The database's predictions are generated using the , which employs a trained on known protein structures to infer 3D conformations from sequences. Each model includes a per-residue score known as predicted Local Distance Difference Test (pLDDT), ranging from 0 to 100, where scores above 90 indicate very high reliability and correspond to accurate backbone geometry comparable to experimental methods. This scoring system enables users to assess model quality without additional validation, prioritizing regions with high for downstream applications. Coverage encompasses the complete proteomes of humans and 47 key model organisms relevant to research and , such as , , and , alongside predictions for nearly all entries in the database, the central repository for protein sequences and annotations. Integration with UniProt allows seamless linking via unique identifiers, facilitating cross-referencing of sequence data with structural models for enhanced functional annotation. Access to the database is free and unrestricted under a BY 4.0 license, with bulk downloads available for entire proteomes or subsets like the reviewed Swiss-Prot section of , and an enabling programmatic retrieval of metadata and structures. These models have been instrumental in accelerating , such as identifying binding sites for novel therapeutics, and in studying disease mechanisms by revealing previously unknown protein folds. In 2024, the database was updated to include over 214 million predictions, expanding coverage to align more closely with the full knowledgebase and incorporating additional data on proteins of importance. By late 2025, following a renewed partnership between EMBL-EBI and , it had grown to encompass approximately 250 million structures, reflecting ongoing synchronization with UniProt's evolving sequence data. Further developments in 2024 integrated support for multimeric complexes and interactions, derived from advancements in the underlying models, enhancing utility for studying protein-protein and protein-small molecule interfaces. Ethical considerations surrounding the database emphasize responsible use to mitigate potential misuse, such as in designing harmful biomolecules, though analyses indicate that AlphaFold's predictions do not substantially lower barriers to such activities compared to existing experimental techniques. Developers have implemented attribution requirements and licensing terms to promote equitable access while discouraging applications that could pose risks.

Bioinformatics Tools

BLAST

The European Bioinformatics Institute (EMBL-EBI) provides a web-based implementation of the Basic Local Alignment Search Tool (), an algorithm originally developed at the (NCBI) for identifying regions of local similarity between biological sequences, which can reveal functional, structural, or evolutionary relationships. Since the 1990s, EMBL-EBI has hosted this service through its Job Dispatcher framework, offering free, user-friendly access to NCBI BLAST+ software for researchers worldwide, enabling rapid sequence comparisons without local installation. The service supports and protein queries against comprehensive databases, including for proteins and Ensembl for genomic sequences, allowing users to submit sequences in or by identifier for similarity searches. Key variants include BLASTN for comparing sequences to nucleotide databases, BLASTP for protein-to-protein alignments, and TBLASTN for querying proteins against translated databases, with additional options like BLASTX for translated queries. These variants facilitate diverse applications, such as identifying homologous genes or predicting protein functions based on sequence conservation. Statistical significance of alignments is evaluated using the expect value (E-value), which estimates the number of hits of similar quality expected by chance in a database of the given size; lower E-values indicate more reliable matches. The E-value is computed as
E = Kmn e^{-\lambda S}
where m and n are the lengths of the query and effective database size, respectively, S is the raw score, and K and \lambda are constants derived empirically for the scoring matrix and gap penalties used. Database-specific parameters ensure accurate interpretation across different search contexts, such as varying lengths or composition biases.
BLAST at EMBL-EBI enables high-throughput detection of sequence similarities, supporting workflows in , , and by quickly scanning large datasets for potential homologs. It integrates seamlessly with other EMBL-EBI resources, such as for functional of hits and Ensembl for genomic , allowing users to analyses like retrieving aligned sequences for further visualization or modeling. Recent enhancements to the underlying Job Dispatcher in 2024 include a redesigned website with interactive result visualizations, streamlined job submission and monitoring, and updated documentation to improve accessibility and performance for large-scale queries. The service processes a substantial volume of searches annually, contributing to the over 100 million jobs handled across EMBL-EBI's tools each year as of 2023. As a , BLAST approximates optimal local alignments to achieve computational efficiency, trading completeness for speed and thus potentially overlooking faint similarities that global alignment methods might detect, though it remains highly effective for initial screening in most bioinformatics pipelines.

Clustal Omega

Clustal Omega is a multiple sequence alignment (MSA) program developed at the European Bioinformatics Institute (EMBL-EBI), designed for aligning large sets of protein or sequences with high accuracy and efficiency. Released in as a successor to the earlier ClustalW and ClustalX programs, it was created to address the limitations of previous versions in handling very large datasets, enabling alignments of tens of thousands of sequences on standard computing hardware. The tool incorporates advanced techniques such as seeded guide trees and (HMM) profile-profile alignments to improve both speed and quality, making it a cornerstone of bioinformatics workflows at EMBL-EBI. The core algorithm of Clustal Omega employs a progressive alignment strategy, beginning with the construction of a guide tree using the mBed method, which embeds sequences into a low-dimensional for rapid . This approach achieves a of O(N log N) for N sequences during guide tree building, significantly outperforming the O(N²) complexity of traditional neighbor-joining methods in ClustalW. Sequences are then progressively aligned based on this tree, with optional iterations using HHalign for refining HMM profiles, enhancing accuracy for divergent sequences. The program supports external profile alignments, allowing users to incorporate pre-built s from databases like to align query sequences against known families. Key features include scalability to thousands or even hundreds of thousands of sequences, with built-in multi-threading for parallel computation of distance matrices and partial parallelization in the progressive alignment phase, enabling efficient processing on multi-core systems. Input sequences can be provided in formats such as FASTA, GenBank, or EMBL, while outputs are available in multiple formats including FASTA, PHYLIP, and Clustal for compatibility with downstream tools. Since its initial release, updates have added support for DNA and RNA alignments, zipped input files, and customizable clustering parameters to optimize performance for very large datasets. Clustal Omega finds wide application in , where its guide trees facilitate evolutionary analysis, and in motif discovery by revealing conserved regions across sequence families. It is integrated into EMBL-EBI's Ensembl for annotations and , and can process outputs from pairwise similarity searches to build comprehensive alignments. Performance benchmarks demonstrate its superiority in accuracy and speed over alternatives like MAFFT and MUSCLE for datasets up to 50,000 s, with sum-of-pairs scores around 0.708 on reference alignments. The original describing paper has garnered over 12,000 citations, underscoring its impact in bioinformatics research.

Other Analysis Tools

In addition to sequence alignment tools, EMBL-EBI hosts a suite of specialized analysis tools that support diverse aspects of bioinformatics, including protein function prediction, chemical entity , gene expression profiling, pathway mapping, and variant interpretation. These resources enable researchers to perform integrative analyses by querying underlying databases such as for protein sequences. InterPro is a key tool for protein domain prediction and functional classification, integrating signatures from multiple databases including Pfam and SMART to identify protein families, domains, and sites. It processes protein sequences to annotate functional elements, aiding in the understanding of protein evolution and interactions. Developed collaboratively and maintained by EMBL-EBI, InterPro is open-source and community-driven, with its 2025 release (version 105.0) incorporating AI-driven improvements for enhanced classification accuracy. ChEBI serves as a for chemical entities of biological interest, providing structured data on small molecules, their roles, and ontologies for cheminformatics applications. Users can search, visualize, and download chemical structures and annotations, facilitating integration with metabolic and workflows. As an open-source resource curated by EMBL-EBI, ChEBI's 2025 update () introduced new and data products to improve and . The Expression Atlas tool analyzes and visualizes gene and protein expression patterns across species, tissues, and conditions, drawing from , , and datasets. It supports differential expression analysis and baseline expression queries, helping researchers explore regulatory mechanisms. Maintained as an open, community-contributed resource by EMBL-EBI, it receives regular updates with new datasets and features for . For , Reactome provides an open-source database and toolset for visualizing, interpreting, and analyzing biological pathways, including and metabolic processes. Its enrichment analysis functionality identifies overrepresented pathways in gene lists from high-throughput experiments. Developed through international collaboration and hosted by EMBL-EBI, Reactome emphasizes peer-reviewed curation and supports programmatic access for advanced users. The Variant Effect Predictor (VEP) is a tool for interpreting the functional consequences of genetic variants, predicting impacts on transcripts, proteins, and regulatory regions while incorporating frequencies. It processes variant lists to prioritize clinically relevant changes, essential for genomic variant annotation. As an open-source, community-enhanced tool from EMBL-EBI, VEP offers flexible configurations for large-scale analyses. These tools are accessible via intuitive web interfaces, RESTful for programmatic integration, and downloadable software, with many embedded in the workflow platform to streamline multi-step analyses. Overall, EMBL-EBI's analysis tools facilitate integrative bioinformatics by combining diverse data types, supporting over 100 million daily web and API requests across resources as of 2025.

Research and Training

Research Programs

The European Bioinformatics Institute (EMBL-EBI) conducts computational research to address key biological challenges, aligning with EMBL's 2022–2026 programme, "Molecules to ," which emphasizes understanding life in context through molecular mechanisms and interactions. This programme integrates EMBL-EBI's efforts in to explore microbial communities, , and environmental dynamics, fostering interdisciplinary approaches to accelerate discoveries in . EMBL-EBI's research focuses on advancing applications in , including extensions beyond like , to enable predictive modeling of biomolecular functions and interactions. In single-cell , EMBL-EBI maintains resources like the Single Cell Expression Atlas for analyzing immune cell responses and lineages, supporting studies in and . analysis targets ecosystem-level insights, with the Microbiome Informatics team curating sequence data to annotate microbial diversity and functional roles in environmental and host contexts. Key projects include the Data Portal, launched in 2020 and operational through 2025, which aggregates datasets for global research on viral evolution and therapeutics. Sustainable bioinformatics efforts address climate impacts by providing resources for monitoring, such as genomic sequences aiding species interaction studies and resilience assessments. Methodologies emphasize for integrating multi-omics data, enhancing across genomic, proteomic, and phenotypic datasets, alongside computational simulations to model in biological systems. Research outputs encompass high-impact peer-reviewed publications, with EMBL-EBI researchers contributing to hundreds annually, alongside software tools like those integrated into ELIXIR's OpenEBench platform for benchmarking bioinformatics methods. Collaborations drive innovation, including the October 2025 renewal of the partnership with for updates and EU consortia like the Federated European Genome-phenome Archive, highlighted in 2025 for advancing secure genomic medicine applications.

Training Initiatives

The European Bioinformatics Institute (EMBL-EBI) delivers a comprehensive training programme designed to equip scientists worldwide with bioinformatics skills, emphasizing free and accessible learning opportunities. Central to this effort is Train Online, an e-learning platform offering on-demand tutorials and live webinars on EMBL-EBI's core resources, such as Ensembl and , with content tailored for users from beginners to advanced levels. Hands-on workshops and virtual events further support practical application, including sessions on tools like Ensembl for genomic data analysis. The programme reaches tens of thousands of participants annually, with over 67,000 users from 159 countries engaging in the 2024 online course alone, and broader EMBL training attracting 8,246 participants from 101 countries that year. All offerings are provided at no cost, often culminating in certificates of completion to recognize skill acquisition. The curriculum spans key areas of modern bioinformatics, including data analysis techniques for next-generation sequencing, adherence to (Findable, Accessible, Interoperable, Reusable) principles for , and in , such as . Materials are developed to foster conceptual understanding and hands-on proficiency, with examples like introductory modules on bioinformatics fundamentals and advanced topics in for life sciences. EMBL-EBI's training team collaborates closely with the infrastructure to form a pan-European network, coordinating national efforts and integrating resources into a unified platform for bioinformatics education. Looking ahead, expansions in 2025 include the launch of Ada, an AI-driven assistant to enhance , alongside initiatives like the Human Ecosystems Retreat focused on modeling biological ecosystems through and data integration. These initiatives significantly build global capacity, particularly in low-resource regions, by enabling remote access and partnering with international networks to support underrepresented scientists. A 2024 user survey revealed that 89% of respondents credited EMBL-EBI resources and training with enabling research that would otherwise be impossible, underscoring substantial skill improvement and broader scientific impact.

References

  1. [1]
    EMBL's Legal Status
    EMBL is an intergovernmental organisation, headquartered in Heidelberg, and was founded in 1974 with the mission of promoting molecular biology research in ...
  2. [2]
    Sparking a data revolution | EMBL
    Nov 25, 2024 · In 1992, the EMBL Council voted to establish EMBL's European Bioinformatics Institute (EMBL-EBI) at the Wellcome Trust Genome Campus in Hinxton, ...
  3. [3]
    EMBL's European Bioinformatics Institute (EMBL-EBI) in 2024 - PMC
    Nov 28, 2024 · Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory, Europe's only ...
  4. [4]
    [PDF] Highlights 2023 - European Molecular Biology Laboratory
    Apr 8, 2024 · EMBL's European Bioinformatics Institute (EMBL-EBI) is the world's leading source of public biomolecular data. We enable life science research ...
  5. [5]
    25 years of EMBL-EBI
    Sep 3, 2019 · In 1994, the two data resources for EMBL-EBI were the EMBL Nucleotide Data Bank – now the European Nucleotide Archive (ENA) – and Swiss-Prot.Missing: transition | Show results with:transition
  6. [6]
    European Bioinformatics Institute (EBI) databases - Oxford Academic
    The European Bioinformatics Institute (EBI) is an EMBL Outstation, located at Hinxton Hall, near Cambridge, UK. Since September 1994, all activities previously ...
  7. [7]
    [PDF] EMBL at 50 years
    Jun 14, 2024 · The Data Library merged into the Biocomputing programme at EMBL, founded in 1986. In 1992, the. EMBL Council voted to establish EMBL's European.Missing: transition SWISS-
  8. [8]
    Our story - EMBL-EBI
    The transition of two major bioinformatics services from Heidelberg to Hinxton began in 1992 and in September 1994, EMBL-EBI was firmly established in the UK.
  9. [9]
    25 years of EMBL-EBI
    Sep 6, 2019 · In 1994, the two data resources for EMBL-EBI were the EMBL Nucleotide Data Bank – now the European Nucleotide Archive (ENA) – and Swiss-Prot.Humble Beginnings · All About The Data · A Computing RevolutionMissing: transition | Show results with:transition
  10. [10]
    EMBL-EBI - ELIXIR Europe
    The EMBL-European Bioinformatics Institute (EMBL-EBI) is an academic research institute based in the UK, and is part of the European Molecular Biology ...
  11. [11]
    Celebrating 20 years of bioinformatics | EMBL
    Jul 8, 2014 · 1 Graham Cameron and first Head of Research Michael Ashburner, EMBL-EBI Head of Administration Mark Green and EMBL-EBI Director Janet Thornton.Missing: 1994-2003 | Show results with:1994-2003
  12. [12]
    Member states | EMBL.org
    Member states. Established in 1974 by 10 founding countries, the intergovernmental organisation EMBL today is supported by more than 30 countries.Missing: funding | Show results with:funding
  13. [13]
    About us - EMBL-EBI
    We are one of the six sites of the European Molecular Biology Laboratory (EMBL), an intergovernmental research organisation funded by over 20 member states, ...
  14. [14]
    About the Ensembl Project
    The Ensembl project was started in 1999, some years before the draft human genome was completed. Even at that early stage it was clear that manual annotation ...
  15. [15]
    Our History - Wellcome Sanger Institute
    1994. The European Molecular Biology Laboratory (EMBL) locates its new European Bioinformatics Institute (EMBL-EBI) next to the Sanger Centre. Long-standing ...
  16. [16]
    [PDF] Annual Report - 2005-2006 - European Molecular Biology Laboratory
    EMBL-EBI will have to grow to 400 staff during the next Programme, and ... expand the EBI staff and infrastructure to the size required to enable it to ...
  17. [17]
    [PDF] Annual Scientific Report 2010 - EMBL-EBI
    The. R Cloud service, launched in summer 2010, provides remote access to Atlas data and the R cloud-computing environment on EBI servers. Page 38. Paul Kersey.
  18. [18]
    EMBL-EBI launches COVID-19 Data Portal
    Apr 20, 2020 · EMBL-EBI and partners today launched the COVID–19 Data Portal, which enables the sharing and analysis of data related to the new coronavirus, SARS-CoV-2.
  19. [19]
    DeepMind and EMBL release the most complete database of ...
    Jul 22, 2021 · The database contains predicted 3D structures of ~20,000 human proteins, plus 350,000 structures from 20 organisms, more than doubling existing ...
  20. [20]
    EMBL Programme 2022–26: Molecules to Ecosystems
    This new Programme will expand EMBL's ability to bridge the gap between molecular biology and other disciplines, such as ecology, epidemiology, toxicology, ...Explore The Embl Programme · Scientific Services Plans · Data Sciences Plans
  21. [21]
    About EMBL | EMBL.org
    With support from more than 30 countries, the European Molecular Biology Laboratory (EMBL) has more than 110 independent research groups and service teams ...Missing: funding | Show results with:funding<|control11|><|separator|>
  22. [22]
    EMBL Sites
    The European Molecular Biology Laboratory is a single organisation spread across six European locations. Each of our sites hosts its own research units, ...
  23. [23]
    Latvia becomes EMBL's 29th member state
    Jan 9, 2024 · As EMBL's 29th full member state, Latvia joins after a three-year period as a prospect member during which engagement between EMBL and Latvian researchers ...
  24. [24]
    Governance | EMBL.org
    The EMBL Council is composed of all member states of the Laboratory. Each member state is represented by up to two delegates, who may be accompanied by advisers ...
  25. [25]
    EMBL Council
    The Council is composed of all member states of the Laboratory. Each member state is represented by up to two delegates, who may be accompanied by advisers.
  26. [26]
  27. [27]
    None
    ### Funding Details for EMBL-EBI in 2024
  28. [28]
    About us
    - **Position within EMBL**: EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), an intergovernmental research organization funded by over 20 member states, located on the Wellcome Genome Campus near Cambridge, UK.
  29. [29]
    Who we are | ELIXIR
    - **EBI's Coordination Role**: The European Bioinformatics Institute (EBI), part of EMBL-EBI, hosts the ELIXIR Hub at the Wellcome Genome Campus, near Cambridge, UK, coordinating ELIXIR’s activities across Europe.
  30. [30]
    Changes in EMBL-EBI leadership
    Mar 13, 2025 · Changes in EMBL-EBI leadership. Jo McEntyre will become Interim Director of EMBL-EBI as Ewan Birney takes up the role of EMBL Executive Director.
  31. [31]
    Johanna McEntyre, Interim Director, EMBL-EBI | People
    In March 2025, Jo became the Interim Director of EMBL-EBI. Previously, Jo was appointed Deputy Director of EMBL-EBI in 2024, after being Associate Director ...
  32. [32]
    Ewan Birney, Interim Executive Director, EMBL | People
    Ewan Birney is the Interim Executive Director of the European Molecular Biology Laboratory (EMBL). Together with Peer Bork, who is the Interim Director ...
  33. [33]
    Leadership - EMBL-EBI
    A part of EMBL. EMBL is led by Peer Bork as Interim Director General and Ewan Birney as Executive Director, as appointed by EMBL Council.
  34. [34]
    Cath Brooksbank, Head of Training, EMBL-EBI
    Cath joined EMBL-EBI in 2002 to develop the outreach programme, and extended her responsibilities to include trainingin 2006. Her team now coordinates a ...Missing: Rolf | Show results with:Rolf
  35. [35]
    Rolf Apweiler, Emeritus Visitor | People | EMBL
    Before retiring, Rolf was Associate Director between 2024 and 2025. Prior to that, he was Joint Director with Ewan Birney between 2015 and 2024, after many ...Missing: Cath | Show results with:Cath
  36. [36]
    Rolf Apweiler: what I've learned - EMBL-EBI
    Oct 16, 2025 · From student helper to EMBL-EBI Director, Rolf Apweiler has shaped the journey of EMBL and bioinformatics for over four decades.Missing: associate Brooksbank
  37. [37]
    Meet our people - EMBL-EBI
    Meet our people. With over 700 employees and fellows from 68 countries, EMBL-EBI brings together experts from many different disciplines.Leadership · Technical Careers · Phds And PostdocsMissing: 2025 | Show results with:2025
  38. [38]
    Group leader recruitment | EMBL.org
    EMBL requires a PhD or equivalent, with no minimum postdoc experience. There are no nationality, gender, or age restrictions. The selection process includes an ...Funding: Internally Provided... · External Funding · Scientific Services
  39. [39]
    Equality and Diversity - EMBL-EBI
    We encourage and empower our employees to be their authentic selves at work and we take our commitment to equality, diversity and inclusion seriously.Missing: recruitment | Show results with:recruitment
  40. [40]
    How generative AI can help us understand complex biodata - LinkedIn
    Sep 24, 2025 · At EMBL-EBI, researchers are testing how LLMs help streamline data curation and annotation while maintaining high standards of accuracy and ...
  41. [41]
    Campus life – WGC - Wellcome Genome Campus
    The entire Campus covers an area of 125 acres and provides those working here with a beautiful environment to enjoy for work and recreation. cafes, catering and ...
  42. [42]
    EMBL-EBI expansion goes ahead with help from The Wellcome ...
    May 13, 2005 · The new development will provide 1500 square metres of space which, together with their existing 3000 square metre building, will provide the space to house ...
  43. [43]
    EMBL-EBI South Building | abell nepp - Archello
    The EMBL-EBI South Building is located at the Wellcome Genome Campus at Hinxton, Cambridgeshire and is the second of three phases of the campus' South Field ...Missing: coordinates | Show results with:coordinates
  44. [44]
    None
    ### Summary of EMBL-EBI Highlights 2024 Report
  45. [45]
    Local community – WGC - Wellcome Genome Campus
    Visits and tours can be arranged through the Visit the Campus page. ... Fireworks night at the Wellcome Genome Campus open to the public October events, local ...
  46. [46]
    Biodiversity and climate change - EMBL-EBI
    Biodiversity and climate change. We enable world leading biodiversity initiatives to store, share and analyse species data for generations to come.Missing: centers green spaces
  47. [47]
    Wetlands nature trail | EMBL-EBI
    The ChEMBL team at EMBL-EBI have created a guided walk around the Wellcome Genome Campus Wetlands Nature Reserve for invited campus visitors. Species of flora ...Cards · Find Out More About The... · Watermint: Sensation And...Missing: studies | Show results with:studies
  48. [48]
    Storage – ITS Infrastructure - EMBL-EBI
    EMBL-EBI has some 300+ Petabytes of raw storage, holding around 25 billion files and objects, across a range of storage types including flash, SSD, disk and ...Missing: 2010s | Show results with:2010s
  49. [49]
    EMBL-EBI taps the cloud to accelerate biomedical research
    Dec 13, 2021 · UK-based research services provider EMBL-EBI is pursuing a hybrid cloud strategy to offer data and analysis to scientists around the world.
  50. [50]
  51. [51]
    HPC – ITS Infrastructure - EMBL-EBI
    Responsible for EMBL-EBI's high performance computing clusters, storage infrastructure, data centres and private cloud platforms.Missing: Institute | Show results with:Institute
  52. [52]
    Deciphering the data deluge: how large language models are ...
    Nov 16, 2023 · Large language models are changing the way we carry out scientific data curation, annotation, and research, setting the stage for a more efficient ...Missing: high- performance clusters<|separator|>
  53. [53]
    AI and machine learning - EMBL-EBI
    Our thought leaders work closely with external experts in the field to help make AI algorithms and predicted data openly available to the scientific community.Missing: 2024 | Show results with:2024
  54. [54]
    BioChatter: making large language models accessible for ...
    Jan 22, 2025 · BioChatter is an open-source Python framework for employing large language models (LLMs) in biomedical research. BioChatter can support the ...Missing: high- clusters
  55. [55]
    Job Dispatcher < EMBL-EBI
    Job Dispatcher Tools and dbfetch data can be accessed and retrieved via RESTful APIs. Learn more in our Documentation. There is a limit of 30 concurrent ...EMBOSS Tools · Clustal Omega · Multiple Sequence Alignment · EMBOSS NeedleMissing: open- | Show results with:open-
  56. [56]
    The EMBL-EBI search and sequence analysis tools APIs in 2019
    Apr 12, 2019 · The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine.
  57. [57]
    Services
    ### Main Bioinformatics Services and Databases Provided by EMBL-EBI
  58. [58]
    Data Protection at EMBL
    With the entry into force of the EU General Data Protection Regulation (GDPR) in May 2018, data protection in Europe has evolved – and EMBL has kept pace.Missing: EBI backup
  59. [59]
    Long-term data preservation - EMBL-EBI
    Data resource backup and recovery​​ EMBL-EBI infrastructure is distributed in three discrete data centres in different geographical locations to guarantee data ...Missing: GDPR compliance
  60. [60]
    EMBL-EBI and Google DeepMind renew partnership and release ...
    Oct 7, 2025 · The AlphaFold Database contains protein structure predictions for over 200 million proteins, and has been used by over three million people ...Missing: 250 2024
  61. [61]
    The Ensembl project - EMBL-EBI
    The project began in 1999 as a joint project between the EMBL European Bioinformatics Institute and the Wellcome Trust Sanger Institute (then named the Sanger ...Missing: collaboration | Show results with:collaboration
  62. [62]
    Ensembl 2024 | Nucleic Acids Research - Oxford Academic
    Nov 11, 2023 · The service provides data for genes, transcripts, proteins, associated metadata and genomic locations for all genomes available through our beta ...
  63. [63]
    The Ensembl Variant Effect Predictor | Genome Biology | Full Text
    Jun 6, 2016 · For all input variants, the VEP returns detailed annotation for effects on transcripts, proteins, and regulatory regions. For known or ...
  64. [64]
    Comparative Genomics - Ensembl
    Ensembl Compara provides cross-species resources and analyses, at both the sequence level and the gene level. These data can be accessed in various ways. Gene ...
  65. [65]
    Ensembl comparative genomics resources - PMC - PubMed Central
    Feb 20, 2016 · All these multi-species data resources are stored centrally in the Ensembl 'Compara' database. Other comparative resources are available. These ...Missing: ENA | Show results with:ENA
  66. [66]
    [PDF] ebi.ac.uk - European Molecular Biology Laboratory
    Staff growth at EMBL-EBI, 1999 - 2019. Number of staff. Staff in 2019 FTE (full-time equivalent). Gender distribution of staff in 2019. Senior roles held by ...
  67. [67]
    Ensembl Archives
    List of currently available archives · Ensembl GRCh37 · Ensembl 115: Sep 2025 · Ensembl 114: May 2025 · Ensembl 113: Oct 2024 · Ensembl 112: May 2024 · Ensembl 111: ...
  68. [68]
    What's coming in Ensembl release 113 / Ensembl Genomes 60?
    Aug 13, 2024 · We expect to release Ensembl 113 and Ensembl Genomes 60 in September October 2024. Below is a list of updates that we are hoping to include in the upcoming ...Missing: annual | Show results with:annual
  69. [69]
    Species List - Ensembl
    All Species ; American bison, Bison bison bison, 43346 ; American black bear, Ursus americanus, 9643 ; American mink, Neovison vison, 452646 ; Angola colobus ...
  70. [70]
    Ensembl 2025 | Nucleic Acids Research - Oxford Academic
    Dec 4, 2024 · This year has seen a continued expansion in the number of species represented, with >4800 eukaryotic and >31 300 prokaryotic genomes available.
  71. [71]
    The Universal Protein Resource (UniProt) - PMC - PubMed Central
    UniProt is produced by the UniProt Consortium, formed in 2002 by the European Bioinformatics Institute (EBI), the Protein Information Resource (PIR) and the ...Missing: evolution | Show results with:evolution
  72. [72]
    UniProt Knowledgebase User Manual - Expasy
    Until 2002, the EBI/SIB Swiss-Prot + TrEMBL databases and the PIR Protein Sequence Database (PIR-PSD) coexisted as protein databases with differing protein ...
  73. [73]
    UniProt: the Universal Protein knowledgebase - Oxford Academic
    The UniProt knowledgebase is the centrepiece of the consortium activities. We have merged Swiss‐Prot, TrEMBL and PIR‐PSD to form the UniProt knowledgebase in ...Missing: history | Show results with:history
  74. [74]
    UniProtKB | Statistics | UniProt
    This is release 2025_04 of UniProtKB, published on Wed Oct 15 2025 . Previous release statistics are available from the UniProt FTP server.
  75. [75]
    UniProt: the Universal Protein Knowledgebase in 2025
    Nov 18, 2024 · UniProt release 2024_04 contains approximately 246 million sequence records in UniProtKB. This represents a relatively conservative increase ...
  76. [76]
    Why is UniProtKB composed of 2 sections, UniProtKB/Swiss-Prot ...
    Apr 27, 2022 · The TrEMBL section of UniProtKB was introduced in 1996 in response to the increased dataflow resulting from genome projects. It was already ...
  77. [77]
    UniProt Knowledgebase (UniProtKB)
    The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich ...Missing: components | Show results with:components
  78. [78]
    UniRef - UniProt
    The UniProt Reference Clusters (UniRef) provide clustered sets of sequences from the UniProt Knowledgebase (including isoforms) and selected UniParc records.What is a UniRef cluster? · In UniRef search (704786086) · UniRef100Missing: components | Show results with:components
  79. [79]
    UniRef | UniProt help
    Dec 22, 2022 · The UniProt Reference Clusters (UniRef) provide clustered sets of sequences from the UniProt Knowledgebase (including isoforms) and selected UniParc recordsMissing: components | Show results with:components
  80. [80]
    UniParc sequence archive - UniProt
    The UniProt Archive (UniParc) is a comprehensive and non-redundant database of protein sequences. These sequences are sourced from public sequence databases.Missing: current | Show results with:current
  81. [81]
    Sequence annotation (Features) | UniProt help
    Sep 16, 2024 · Sequence annotations describe regions or sites of interest in the protein sequence, such as post-translational modifications, binding sites, enzyme active ...
  82. [82]
    UniProt: a worldwide hub of protein knowledge - PMC
    Nov 5, 2018 · The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life.
  83. [83]
    Gene Ontology (GO) | UniProt help
    Oct 16, 2025 · The Gene Ontology (GO) is a collaborative effort to provide structured, standardized descriptions of gene products (protein or ncRNA) across biological ...Missing: domains PTMs interactions
  84. [84]
    How do we manually annotate a UniProtKB entry? | UniProt help
    Jun 11, 2025 · This process consists of 6 major mandatory steps: (1) sequence curation, (2) sequence analysis, (3) literature curation, (4) family-based curation, (5) ...Missing: AI | Show results with:AI
  85. [85]
    Annotation guidelines | UniProt help
    Oct 16, 2025 · This document describes the manual curation procedure used by the UniProt Consortium members. The UniProt manual curation process comprises ...Missing: AI | Show results with:AI
  86. [86]
    Activities at the Universal Protein Resource (UniProt) - PMC
    UniProt biocuration. Manual and automatic annotation in UniProtKB.UniProt leads the world in providing full and comprehensive curation of the experimental data ...
  87. [87]
    Expert curation in UniProtKB: a case study on dealing with ...
    Mar 12, 2014 · To safely propagate information, we have developed the UniRule system, which uses manually curated annotation rules to enrich uncharacterized ...Introduction · The Sirt5 Case · Collaboration With Other...
  88. [88]
    UniProt Knowledgebase annotation process. Manually protein ...
    Two main approaches to this curation challenge adopted by biological knowledgebases are: 1) artificial intelligence (AI) and machine learning (ML) methods, and ...
  89. [89]
    UniProt and Mass Spectrometry-Based Proteomics—A 2-Way ...
    Reference sets of whole-proteome sequence, including human, made available. •. Includes isoforms, postprocessed chains, PTMs, variants, and functional domains.
  90. [90]
    Downloads | UniProt help
    Oct 16, 2025 · UniProt is updated roughly every eight weeks. You can download small data sets and subsets directly from this website by following the download link.
  91. [91]
    Programmatic access - Downloading data at every UniProt release
    Oct 16, 2025 · You can use the HTTP header X-UniProt-Release-Date to avoid downloading data more than once per release, if you use a download tool that makes use of this ...Missing: applications drug discovery
  92. [92]
  93. [93]
    wwPDB: 2003 News - Worldwide Protein Data Bank
    Nov 21, 2003 · Kim Henrick, head of the PDBe said, "The PDB is a canonical research resource that transcends both scientific and political boundaries. The ...
  94. [94]
    improved findability of macromolecular structure data in the PDB
    Nov 6, 2019 · PDBe is responsible for the processing of OneDep depositions from European and African institutions, totalling over 4,000 PDB entries in 2018, ...
  95. [95]
  96. [96]
    PDB Statistics: Overall Growth of Released Structures Per Year
    PDB Statistics: Overall Growth of Released Structures Per Year ; 2023, 214,192, 14,501 ; 2022, 199,691, 14,290 ; 2021, 185,401, 12,586 ; 2020, 172,815, 14,006.Missing: queries | Show results with:queries
  97. [97]
    PDBe-KB: a community-driven resource for structural and functional ...
    Oct 4, 2019 · PDBe-KB integrates data contributed by partner resources who provide a wide array of functional and biophysical annotations for PDB structures.
  98. [98]
    AlphaFold Protein Structure Database in 2024 - Oxford Academic
    Nov 2, 2023 · Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated ...Missing: upgrades driven
  99. [99]
    Mol* Viewer: modern web app for 3D visualization and analysis of ...
    It is the primary 3D structure viewer used by PDBe and RCSB PDB. It can be easily integrated into third-party services. Mol* Viewer is open source and freely ...
  100. [100]
    Protein Data Bank: Key to the Molecules of Life - NSF Impacts
    Structuring discoveries. With over 60,000 contributors globally and millions of annual users, the PDB has revolutionized drug development, informed ...
  101. [101]
    Download Statistics - wwPDB
    Annual Download Statistics ; 2023, 3,102,043,501, 2,035,853,611, 1,066,189,890 ; 2022, 3,134,697,434, 2,135,291,607, 999,405,827.
  102. [102]
    AlphaFold Protein Structure Database: massively expanding the ...
    Nov 17, 2021 · The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded ...
  103. [103]
    pLDDT: Understanding local confidence | AlphaFold - EMBL-EBI
    Feb 26, 2024 · pLDDT is a per-residue measure of local confidence. It is scaled from 0 to 100, with higher scores indicating higher confidence and usually a more accurate ...Eukaryotic Translation... · Test Your Knowledge · C-Type Lectin-Like Domain...
  104. [104]
    FAQs - AlphaFold Protein Structure Database
    Foldseek has been integrated into the AlphaFold Database, enabling easy access to similar structures across both experimentally determined structures from the ...
  105. [105]
    AlphaFold Protein Structure Database - EMBL-EBI
    AlphaFold DB provides open access to over 200 million protein structure predictions, generated by an AI system, to accelerate scientific research.Missing: launch | Show results with:launch
  106. [106]
    AlphaFold Protein Structure Database in 2024 - PubMed
    Jan 5, 2024 · The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries.Missing: 250 EMBL-
  107. [107]
    Structure annotation in UniProt
    Oct 23, 2025 · AlphaFold structural models. A 3D structural model from AlphaFold prediction is always presented as a full-length monomeric protein based on the ...Missing: organisms integration
  108. [108]
  109. [109]
    Alphafold2 protein structure prediction : Implications for drug discovery
    Jan 6, 2023 · The AlphaFold database, hosted at EMBL-EBI (https://alphafold.ebi.ac.uk/), provides free access for everyone to more than 200million protein ...
  110. [110]
    The rise of AlphaFold in drug design - ScienceDirect.com
    In this chapter we discuss advances across the drug design process, including target identification and validation, the acceleration of hit-finding campaigns ...
  111. [111]
    Accurate structure prediction of biomolecular interactions ... - Nature
    May 8, 2024 · Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes.
  112. [112]
    How our principles helped define AlphaFold's release
    Sep 14, 2022 · Through our discussions with external experts, it became clearer that AlphaFold would not make it meaningfully easier to cause harm with ...
  113. [113]
    Security challenges by AI-assisted protein design - NIH
    Mar 26, 2024 · Scientists and security experts are concerned that the increasing power of AI-assisted protein design and synthesis could be abused by various actors for ...
  114. [114]
    EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024
    Apr 10, 2024 · The EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024 Open Access · Abstract · Introduction · New website · New documentation.
  115. [115]
    Sequence similarity searches / BLAST submissions | UniProt help
    Jun 5, 2025 · BLAST (Basic Local Alignment Search Tool) is a widely used algorithm in bioinformatics that identifies regions of similarity between biological sequences.Missing: Ensembl | Show results with:Ensembl
  116. [116]
    Search and sequence analysis tools services from EMBL-EBI in 2022
    Apr 12, 2022 · The sheer volume of data being generated during the COVID-19 pandemic has resulted in an average of 2.5 million requests per day to the EBI ...
  117. [117]
    Fast, scalable generation of high-quality protein multiple sequence ...
    Oct 11, 2011 · In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate ...Missing: performance | Show results with:performance
  118. [118]
    Clustal Omega for making accurate alignments of many protein ...
    In general, Clustal Omega is fast enough to make very large alignments and the accuracy of protein alignments is high when compared to alternative packages. ...
  119. [119]
  120. [120]
    EMBL-EBI data resources and tools
    EMBL's European Bioinformatics Institute maintains the world's most comprehensive range of freely available and up-to-date molecular data resources.Missing: retrieval | Show results with:retrieval
  121. [121]
    InterPro - EMBL-EBI
    InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. To classify proteins in this ...InterProScan · Download · Pfam · By Domain Architecture
  122. [122]
    InterPro: the protein sequence classification resource in 2025 - PMC
    Nov 20, 2024 · InterPro (https://www.ebi.ac.uk/interpro) is a freely accessible resource for the classification of protein sequences into families.
  123. [123]
    InterPro 105.0: AI for protein classification | EMBL-EBI
    Apr 28, 2025 · InterPro 105.0 is now live. This AI-driven update makes it easier than ever to explore the protein universe.
  124. [124]
    ChEBI - Chemical Entities of Biological Interest - EMBL-EBI
    Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds.
  125. [125]
    ChEBI 2.0 launches | EMBL-EBI
    Oct 20, 2025 · We help scientists exploit complex information to make discoveries that benefit humankind. Services · Data resources and tools · Data submission ...
  126. [126]
    Home < Expression Atlas < EMBL-EBI
    EMBL-EBI Expression Atlas, an open public repository of gene expression pattern data under different biological conditions.
  127. [127]
    Expression Atlas update: gene and protein expression in multiple ...
    Nov 24, 2021 · The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, ...
  128. [128]
    Reactome Pathway Database: Home
    Reactome is pathway database which provides intuitive bioinformatics tools for the visualisation, interpretation and analysis of pathway knowledge.Manually · Analysis Tools · Download · Pathway Browser
  129. [129]
    The reactome pathway knowledgebase - Oxford Academic
    Nov 6, 2019 · Abstract. The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, ...
  130. [130]
    Ensembl Variant Effect Predictor (VEP)
    Ensembl VEP predicts the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on gene transcripts and protein sequence.Running Ensembl VEP · Documentation · Download and install · Data formatsMissing: core features
  131. [131]
    The European Bioinformatics Institute (EMBL-EBI) in 2021 - PMC
    Nov 25, 2021 · EMBL-EBI services are crucial to the life science community. On an average day in 2020, EMBL-EBI resources received over 81 million web and API ...
  132. [132]
    EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022
    Dec 7, 2022 · FUNDING. EMBL-EBI is indebted to its funders, including the EMBL member states; European Commission; Wellcome; UK Research and Innovation; US ...
  133. [133]
    [PDF] The EMBL Programme 2022–2026 - Molecules to Ecosystems
    The EMBL Programme 2022-2026, "Molecules to Ecosystems", covers topics like molecular building blocks, microbial ecosystems, and human ecosystems.
  134. [134]
    Immunology meets single-cell genomics | EMBL-EBI
    Mar 8, 2016 · A new method in single-cell genomics, TraCeR, provides a powerful tool for research into immune response, vaccination, ...Missing: microbiome | Show results with:microbiome
  135. [135]
    Microbiome Informatics - EMBL-EBI
    We aim to simplify access to curated, complex data, and to maximise biological knowledge by extending annotation based on sequence similarity.Missing: AI cell
  136. [136]
    COVID-19 Data Portal - accelerating scientific research through data
    Access the latest COVID-19 related datasets from EMBL-EBI and other biological data repositories. Part of the European COVID-19 Data Platform.About the Portal · Submit Data · Data Statistics · About
  137. [137]
    The COVID-19 Data Portal: accelerating SARS-CoV-2 and ... - NIH
    May 28, 2021 · The resource spans biomolecular data, comprising experimental and computational data types related to the genome, genes, proteins of SARS-CoV-2, ...
  138. [138]
    EMBL-EBI's open data resources for biodiversity and climate research
    Sep 13, 2024 · EMBL-EBI's Biodiversity Portal is the first port of call for scientists who want to access biomolecular data for their biodiversity research.Ensembl Rapid Release · European Nucleotide Archive · Protein Data Bank In Europe...Missing: statistics | Show results with:statistics
  139. [139]
    Scientific publications - EMBL-EBI
    Publications. EMBL-EBI researchers regularly produce high impact publications and strive to make their research and data open access.Missing: annual output
  140. [140]
    OpenEBench - ELIXIR Europe
    OpenEBench (https://openebench.bsc.es) is the ELIXIR benchmarking and technical monitoring platform for bioinformatics tools, web servers and workflows.
  141. [141]
    EMBL-EBI and Google DeepMind renew partnership and release ...
    Oct 7, 2025 · Deeper collaboration between EMBL-EBI and Google DeepMind brings updates to the AlphaFold Database.
  142. [142]
    EMBL's European Bioinformatics Institute (EMBL-EBI) in 2024
    Nov 28, 2024 · EMBL-EBI's vision is to benefit humankind by advancing scientific discovery and impact through bioinformatics.Missing: centralize | Show results with:centralize<|separator|>
  143. [143]
    A federated future to support genomic medicine | EMBL
    Mar 7, 2025 · The Federated European Genome-phenome Archive enables secure access, sharing, and reuse for sensitive genomic data.Missing: DeepMind consortia
  144. [144]
    Fuelling discovery together: 2024 user survey learnings - EMBL-EBI
    Dec 17, 2024 · In summer 2024, EMBL-EBI ran a user survey, inviting our community to let us know how they use the open data resources we jointly manage with our collaborators.Missing: 123 | Show results with:123
  145. [145]
    Training – Annual Report - European Molecular Biology Laboratory
    Notably, in 2024, the European Commission funded EMBL's unique infrastructure training programme, ARISE2. ... 8,246 participants from 101 countries attended ...
  146. [146]
    About EMBL-EBI Training
    ### Summary of EMBL-EBI Training
  147. [147]
    Our partnerships - EMBL-EBI Training
    We collaborate with partners across the world to deliver training, build competence and capacity, and empower scientists to gain new scientific insights.
  148. [148]
    2025 – Course and Conference Office
    Introducing Ada: the new AI assistant from EMBL-EBI Training​​ We can't think of a better way to honour her in 2025 than by naming our LLM-driven, AI assistant ...Missing: expansions ecosystem modeling
  149. [149]
    Human Ecosystems Retreat 2025
    Apr 28, 2025 · Human Ecosystems Retreat 2025. We were delighted to host this year's Human Ecosystems Retreat at European Bioinformatics Institute | EMBL-EBI.Missing: expansions modeling
  150. [150]
    We are EMBL: Kim Gurwitz on working with global partners to ...
    Jul 11, 2024 · The EMBL-EBI Training team delivers an extensive training programme including workshops, courses, and free, on-demand e-learning.Missing: statistics | Show results with:statistics