Fact-checked by Grok 2 weeks ago

ENCODE

The Encyclopedia of DNA Elements (ENCODE) is an international public research consortium aimed at systematically identifying and cataloging all functional elements—such as protein-coding genes, non-coding RNAs, regulatory sequences, and chromatin structures—within the human and mouse genomes to understand their roles in gene regulation and biological function. Launched in September 2003 by the (NHGRI), part of the (NIH), ENCODE began as a pilot phase that tested technologies on 44 selected regions representing 1% of the , focusing on transcription, modifications, and binding. This initial effort, completed by 2007, demonstrated the feasibility of large-scale functional annotation and paved the way for genome-wide expansion. Subsequent phases from 2007 to 2022—including ENCODE 3 (2013–2020) and ENCODE 4 (2017–2022)—have scaled up to produce comprehensive maps across diverse cell types, tissues, and developmental stages in both humans and mice, incorporating advanced assays like sequencing, DNase I , profiling, and single-cell . Funded primarily by NHGRI with contributions from international partners, the project emphasizes sharing through the ENCODE Data Portal and integrations with resources like the . Key achievements include the 2012 integrated analysis revealing that over 80% of the shows biochemical activity, the identification of millions of candidate cis-regulatory elements, and the creation of tools like the SCREEN database for querying functional annotations. By 2020, ENCODE had generated data from approximately 6,000 experiments, supporting more than 2,000 publications; by 2024, this had expanded to over 23,000 experiments across more than 800 cell types and tissues, advancing research into gene regulation, , and disease mechanisms. As of 2025, spanning over two decades, ENCODE continues to evolve by incorporating , functional validation experiments, and for data interpretation, fostering broader applications in precision medicine and .

Overview

Project Goals

The Encyclopedia of DNA Elements (ENCODE) is a public research consortium launched in 2003 to systematically identify all functional elements within the , encompassing protein-coding genes, loci, regulatory sequences, and structural components. Operationally, ENCODE defines a functional element as a discrete genomic segment that either encodes a defined biochemical product, such as a protein or , or exhibits a reproducible biochemical signature, including sites of protein binding, specific structures, or altered rates of chemical reactivity. This comprehensive mapping aims to catalog these elements across diverse cell types and conditions to reveal the genome's organizational principles. The primary goals of ENCODE include constructing an exhaustive "parts list" of functional elements operating at the protein, , and regulatory levels, while developing robust experimental and computational methods to annotate their roles. By integrating these annotations, the project seeks to elucidate how the functions in cellular processes, with a focus on mechanisms underlying and . This integration extends to understanding complex regulatory networks that control , thereby bridging sequence data with biological outcomes. Following the Human Genome Project's completion of the human genome sequence in 2003, which illuminated only about 2% as protein-coding while leaving the vast non-coding regions largely enigmatic, ENCODE was established to fill these knowledge gaps by probing the functional significance of non-coding DNA and intricate gene regulation. In the long term, ENCODE's data resource is envisioned to enable the creation of predictive models of genome function, facilitating interpretations of genetic variation and advancing applications in personalized medicine through enhanced understanding of disease susceptibility and therapeutic responses.

Scope and Methods

The ENCODE project initially concentrated on the , with its pilot phase examining approximately 1% of the genomic sequence across 44 carefully selected regions to test feasibility and methods, while subsequent production efforts scaled to comprehensive, genome-wide coverage. This scope later expanded to encompass the mouse genome, enabling cross-species comparisons of functional elements. As of 2025, ENCODE has profiled a diverse collection of hundreds of cell lines, primary cells, and tissues, including embryonic stem cells, differentiated cell types from various developmental stages, and cancer-derived lines such as those from hematopoietic and epithelial origins. Building on ENCODE 3, which encompassed data from more than 500 biological cell and tissue types sourced from over 1,300 samples, the project now includes over 29,000 biosamples, facilitating broad representation of human and biosamples. In the ongoing ENCODE 4 phase (2020–present), the project has continued to expand, generating over 100,000 datasets as of 2024, incorporating advanced techniques like single-nucleus profiling and integrations. Central to ENCODE's approach are high-throughput sequencing-based assays, including ChIP-seq to map binding and histone modifications, RNA-seq for quantifying transcripts and identifying non-coding RNAs, DNase-seq (or in later iterations) to detect open regions, and bisulfite sequencing for patterns. These methods generate complementary epigenomic, transcriptomic, and chromatin accessibility data, which are integrated to define candidate functional elements across the genome. Reproducibility is prioritized through standardized data production, featuring uniform pipelines that apply consistent and peak-calling algorithms to raw sequencing data, alongside metrics such as library complexity, signal-to-noise ratios, and replicate correlations. standards mandate detailed documentation of biosample origins, protocols, and reagent validations, including specificity tests for ChIP-seq, ensuring comparability and reliability across experiments. Over time, ENCODE's methods have advanced to include single-cell resolution assays, such as single-cell and , for dissecting heterogeneity within cell populations, as well as CRISPR-based perturbation screens to validate the regulatory functions of mapped elements.

History

Pilot Phase (2003–2007)

The ENCODE Pilot Phase was launched in September 2003 by the (NHGRI) as an international consortium involving more than 30 academic, government, and private sector institutions to test methods for identifying functional elements across the . The initiative received approximately $40 million in funding over its duration to support coordinated efforts in developing high-throughput experimental and computational approaches. Target regions were selected to represent about 1% of the euchromatic , totaling roughly 30 megabases across 44 discrete segments distributed over multiple chromosomes, with choices emphasizing variation in gene density, evolutionary conservation, and inclusion of well-studied loci to ensure diverse representation. During this phase, the tested over 20 experimental assays on approximately 12 cell types and tissues, including cell lines such as S3, GM06990, K562, and HepG2, to generate initial maps of transcriptional activity, chromatin structure, replication timing, and other biochemical features. Key methods encompassed followed by microarray analysis (ChIP-chip), DNase I hypersensitivity assays, tiling array-based transcription profiling, and , producing more than 200 datasets that captured diverse aspects of function in the selected regions. These efforts focused on evaluating the reliability and scalability of technologies, primarily array-based at the time, while generating public data releases to foster community integration and validation. The pilot phase results demonstrated the feasibility of scaling functional element annotation to the full , revealing pervasive transcription across 93% of the targeted bases and identifying thousands of transcription start sites, regulatory elements, and constrained sequences, with about 5% of bases showing specific biochemical signatures linked to . Notably, the project annotated approximately 60% of evolutionarily constrained bases as functional, highlighting unexpected complexity in non-coding regions and informing strategies for genome-wide production. These findings were comprehensively published in a landmark 2007 issue of , comprising multiple coordinated papers that detailed the integrated analyses. Challenges encountered included technological constraints, such as reliance on array-based methods that limited resolution compared to emerging sequencing technologies, and difficulties in integrating heterogeneous datasets from diverse assays and cell types to achieve a unified view of functionality. Additionally, the pilot underscored gaps in detecting distal regulatory elements due to incomplete profiling and variability in functional signals across biological contexts, paving the way for methodological refinements in subsequent phases.

Production Phase (2007–2012)

Following the successful pilot phase, the ENCODE project transitioned in 2007 to its production phase, supported by funding from the (NHGRI) to conduct genome-wide assays across more than 100 cell types. This scale-up involved 442 researchers from 32 laboratories worldwide, enabling the systematic mapping of functional elements throughout the using high-throughput sequencing technologies. The effort generated over 1,600 experimental datasets, focusing on diverse biochemical activities in 147 cell types and tissues. The production phase produced extensive data on transcription units, regulatory elements, and evolutionary conservation, culminating in approximately 30 publications released on September 5, 2012, including six in Nature, five in Genome Research, and others in Genome Biology. These papers detailed RNA sequencing for transcript identification, chromatin immunoprecipitation sequencing (ChIP-seq) for transcription factor binding sites, DNase I hypersensitive sites for open chromatin, and comparative analyses for conserved sequences. Key outputs included maps of over 4 million regulatory regions, such as 399,124 enhancer-like and 70,292 promoter-like elements, providing a comprehensive view of cis-regulatory landscapes. A major achievement was the annotation of approximately 80% of the human genome as biochemically active, based on evidence of transcription, protein binding, or chromatin structure, challenging prior views of non-coding "junk" DNA. This work identified candidate cis-regulatory elements, establishing a foundational framework that informed subsequent developments like the formalized candidate cis-regulatory elements (cCREs) registry. Milestones included the integration of ENCODE data into the UCSC Genome Browser for visualization and the establishment of the ENCODE Data Coordination Center (DCC) to standardize, process, and release datasets publicly. The production formally concluded in 2012 after five years of intensive effort, having expended about $123 million in NHGRI , though the generated continued to support ongoing genomic and .

ENCODE 3 and Ongoing Work (2013–Present)

The third of the ENCODE project, ENCODE 3, launched in 2013 following the renewal of by the (NHGRI), with a primary emphasis on validating the functional roles of previously identified genomic elements, advancing single-cell resolution profiling, and conducting comparative studies between and . This generated nearly 6,000 new experiments—4,834 in types and tissues, and 1,158 in —to deepen the annotation of regulatory elements and their dynamic roles across diverse biological contexts. Functional validation efforts included assays such as massively parallel reporter assays and transgenic models to test enhancer activity, confirming regulatory potential in a subset of candidate elements. Key advancements in ENCODE 3 encompassed the development of a comprehensive registry of candidate cis-regulatory elements (cCREs), cataloging over 1.2 million elements across human (926,535) and mouse (339,815) genomes based on integrated epigenetic and transcriptional data, serving as a foundational resource for prioritizing non-coding variants. The project integrated CRISPR-based perturbation screens, including CRISPR interference and activation RNA-seq datasets, to establish causal links between cCREs and target gene expression, thereby bridging correlative annotations with mechanistic insights. Profiling efforts were expanded to include developmental stages, generating single-cell RNA-seq data from mouse tissues like the embryonic limb to capture cell-type-specific trajectories and regulatory dynamics during differentiation. Ongoing work as of 2025 continues to build on these foundations through iterative data releases and portal enhancements, with a redesigned and advanced search tools introduced in October 2025 to improve and of datasets. Recent updates include the release of additional ChIP-seq experiments from collaborative efforts like , enhancing transcription factor binding maps in non-mammalian models such as . of (AI) and has emerged as a core component, with models trained on ENCODE data enabling automated prediction of accessibility and patterns to accelerate hypothesis generation. A landmark 2020 collection in Nature synthesized ENCODE 3 outcomes, detailing expanded assays for RNA-binding proteins, chromatin looping, and disease-relevant cell types while emphasizing the project's role in interpreting non-coding genome function. In 2025, publications have leveraged ENCODE resources for single-cell epigenomics studies, such as profiling chromatin accessibility across aging mouse brain regions to reveal heterochromatin instability, and AI-driven analyses, including models that decode regulatory grammars in non-coding DNA using ENCODE-derived training sets. Future directions for ENCODE emphasize broadening to additional , such as non-human and other vertebrates, to elucidate evolutionary conservation of functional elements, alongside increased focus on disease models to map regulatory disruptions in conditions like cancer and neurodegeneration.

Organization and Consortium

Structure and

The ENCODE Consortium operates as a collaborative network led by the (NHGRI), encompassing over 30 production centers, analysis groups, and supporting facilities dedicated to generating and interpreting functional genomic data. Central to this structure is the Data Coordination Center (DCC) at the (UCSC), which manages data submission, quality control, standardization, and public dissemination. Additional components include analysis working groups that coordinate computational efforts and integrate findings across experiments. Governance of the consortium is provided by NHGRI program directors, who oversee operations, with the ENCODE Research Consortium Steering Committee serving as the primary coordinating body to establish research priorities, resolve issues, and ensure alignment with project goals. The consortium adheres to a strict open-access policy, mandating that all data be released to public repositories within nine months of generation to facilitate broad scientific use and collaboration. Funding for ENCODE has been provided primarily through NHGRI grants via competitive requests for applications (RFAs). The pilot phase from 2003 to 2007 received $36 million over three years to test methods on 1% of the . The subsequent production phase (2007–2012) was supported with approximately $120 million to scale analyses genome-wide. Funding was renewed for ENCODE 3 (2013–2020) with grants supporting expansion of assay types and data integration. ENCODE 4 (2017–2022), which concluded the funded phases of the , with ongoing data maintenance and analysis. Following the completion of ENCODE 4 in 2022, the consortium's data resources remain actively maintained and utilized in ongoing genomic research as of 2025. Consortium policies include use agreements that promote unrestricted access while encouraging collaborative analyses and proper attribution to the ENCODE . Software and analysis tools, such as uniform processing pipelines for harmonization, must be released openly to support and community adoption.

Key Participants and Collaborations

The ENCODE project has been led by a core group of principal investigators who steered its scientific direction and consortium activities. Key figures include Ewan Birney, associate director at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), who contributed to data integration and analysis strategies; Michael Snyder, professor and chair of genetics at Stanford University, who focused on developing high-throughput functional genomics assays; Bradley E. Bernstein from the Broad Institute, who advanced epigenomic mapping techniques; and others such as Gregory E. Crawford from Duke University, Job Dekker from the University of Massachusetts Medical School, and Laura Elnitski from the National Human Genome Research Institute (NHGRI), who served on the steering committee during the production phase. These leaders coordinated efforts among 442 scientists across multiple institutions, culminating in the 2012 publication of 30 coordinated papers that synthesized the project's initial comprehensive findings. Key institutions forming the backbone of the ENCODE consortium include the Broad Institute of MIT and Harvard, which pioneered large-scale epigenomic profiling; , central to assay development and data generation; the (UCSC), responsible for genome browser integration and data visualization tools; and (CSHL), which contributed to chromatin structure and 3D genome mapping studies. International participation was bolstered by EMBL-EBI, which handled data archiving, standards development, and global dissemination. The consortium expanded to over 30 institutions by the production phase, fostering interdisciplinary expertise in , bioinformatics, and . ENCODE has established significant collaborations to enhance its data's utility for variant interpretation and tissue-specific analysis. Integration with the incorporated genotype and sequence data from lymphoblastoid cell lines like GM12878, enabling the annotation of common genetic variants against ENCODE's functional element maps to identify regulatory impacts. Partnerships with the Genotype-Tissue Expression (GTEx) project, particularly through the Enhancing GTEx (eGTEx) initiative, combined ENCODE's epigenomic profiles with GTEx's RNA expression data across 54 tissue types, revealing tissue-specific regulatory mechanisms and quantitative trait loci (QTLs). The consortium emphasized inclusion of early-career researchers and individuals from underrepresented groups to broaden participation and perspectives in genomics research. NHGRI-supported programs within ENCODE facilitated training workshops, data analysis challenges, and mentorship opportunities aimed at early-stage investigators from diverse backgrounds, including those historically underrepresented in . Notable contributions from specific labs have advanced ENCODE's analytical framework, such as the Stanford lab of Anshul Kundaje, which led the development of computational pipelines for integrative analysis, including uniform processing of epigenomic data and models for predicting states and regulatory elements. These methods enabled scalable imputation of missing data types and improved the accuracy of functional annotations across the consortium's datasets.

Data Production and Types

Experimental Assays and Technologies

The ENCODE project initially employed array-based assays during its pilot phase (2003–2007), such as ChIP-chip for mapping protein-DNA interactions and tiling arrays for transcript identification, which provided targeted but limited genome coverage. With the advent of next-generation sequencing (NGS) technologies around 2007, the consortium shifted to sequence-based methods, enabling comprehensive, high-resolution genome-wide profiling across diverse cell types and tissues. This transition marked a pivotal technological advance, allowing assays like ChIP-seq to identify over 636,000 binding regions for 119 DNA-associated proteins in 72 cell lines by 2012. Sequence-based assays form the core of ENCODE data production, including ChIP-seq for and mark mapping, which has profiled 662 proteins and 11 modifications across 79 and 12 tissues in phase 3 (2013–2019). , introduced in later phases for rapid assessment of accessibility, has generated profiles from 66 tissues across developmental stages, revealing over 500,000 accessible regions, and extended to 48 tissues. DNase-seq, an earlier accessibility assay, complements these by cataloging 3.6 million DNase hypersensitive sites across more than 200 cell types. RNA assays include for precise transcription start site mapping, identifying 62,403 sites in and 2 cell types, and polyA+ for quantifying mature transcripts, covering 39.54% of the from promoter to polyA site. Total further captures long non-coding RNAs, with 62% coverage in multiple subcellular fractions. In ENCODE 3 (2013–2019), technological innovations expanded to single-nucleus assays, facilitating profiling of rare cell types in complex tissues like the developing mouse limb via single-nucleus RNA-seq and ATAC-seq, enhancing cell-type-specific functional element annotations. Functional assays were integrated to test regulatory predictions, including massively parallel reporter assays (MPRA) that validated 67 out of 151 candidate cis-regulatory elements (cCREs) in transgenic mouse models, with 44% showing activity in human cell lines like GM12878. In ENCODE 4 (2020–present), these efforts continued with advanced functional validation, including CRISPR-based perturbations such as CRISPRi and CRISPRa, enabling direct assessment of element function through targeted interference or activation, as seen in over 540,000 noncoding perturbations covering 24.85 Mb of the human genome. Phase 4 has also introduced new assays like Perturb-seq for combined perturbation and single-cell readout, SPEAR-ATAC for single-cell ATAC with perturbation, and long-read single-cell RNA-seq to capture full-length transcripts in diverse cell types. Quality metrics ensure data reliability, with requirements for at least two biological replicates per assay to assess reproducibility via the Irreproducible Discovery Rate (IDR), targeting thresholds below 0.1 for peak calls in ChIP-seq and DNase-seq. Signal-to-noise ratios are evaluated using the Fraction of Reads in Peaks (FRiP), where values above 0.3 indicate strong enrichment, and the Signal Portion Of Tags () score, with higher values (approaching 1.0) reflecting minimal . Cross-lab standardization is achieved through uniform experimental guidelines, antibody validation protocols, and shared processing pipelines, applied consistently across the consortium's nearly 6,000 phase 3 experiments and over 23,000 released experiments as of 2025. Innovations like PRO-seq for nascent transcription mapping, which labels engaged at single-nucleotide resolution, further refine these standards by integrating with existing assays to pinpoint active enhancers and promoters.

Key Findings from Data

The ENCODE project's analysis of the revealed that approximately 80% of the genome exhibits biochemical activity, such as transcription, open , or binding by regulatory factors, challenging earlier views of non-coding regions as largely inert. However, this pervasive transcription was found to be under low evolutionary constraint, suggesting that much of this activity may represent transcriptional noise rather than conserved functional elements. These findings highlighted the complexity of the regulatory landscape, where biochemical signals provide a broad map of potential regulatory roles without necessarily implying strict functionality. ENCODE data enabled the systematic identification and annotation of key regulatory element types, including enhancers, promoters, and insulators, which collectively orchestrate . For instance, the project cataloged tens of thousands to over 100,000 candidate regulatory elements, including enhancers, per , demonstrating their abundance and role in fine-tuning transcriptional output across diverse cellular contexts. These elements were distinguished through integrated analyses of states and binding, revealing a modular that supports combinatorial regulation of genes. A major insight from ENCODE was the high degree of cell-type specificity in regulatory elements, with dynamic patterns observed across hundreds of biosamples representing various tissues and conditions. These variations underscore how enhancers and other elements activate or repress genes in a context-dependent manner, linking regulatory landscapes to cellular identity and . Furthermore, overlaps between ENCODE-identified elements and (GWAS) loci have implicated non-coding variants in susceptibility, such as those associated with autoimmune disorders and cancers, by disrupting regulatory functions. Comparative analyses with ENCODE data showed substantial conservation, with 60–80% overlap in key functional regulatory elements between and mouse orthologous regions depending on the , indicating evolutionary preservation of core regulatory mechanisms. This cross-species reinforced the relevance of ENCODE annotations for understanding mammalian genome regulation. In subsequent phases, particularly ENCODE 4 (2020–present), the project advanced to identifying candidate cis-regulatory elements (cCREs) with causal roles in gene regulation, integrating massive-scale functional to prioritize elements likely to influence expression. These efforts have illuminated the regulatory contributions to developmental processes, such as embryonic specification, and have pinpointed non-coding variants driving phenotypes, including those in complex like and .

Resources and Tools

ENCODE Data Portal

The ENCODE Data Portal, hosted at encodeproject.org, serves as the primary repository for the project's data and metadata, facilitating discovery, access, and analysis by the . Launched in 2013, the portal integrates seamlessly with external resources such as the through track hubs for genomic visualization and the NCBI databases (including and ) for data archiving and retrieval. This infrastructure supports the ENCODE Consortium's goal of providing a comprehensive catalog of functional elements in the human and mouse genomes, with data released under policies that promote widespread reuse while requiring proper attribution. Key features of the enable efficient of its extensive . Users can perform advanced experiment searches using facets for , biosamples, and , while matrix views summarize experiments by type, , and , including specialized ChIP-seq matrices and body maps for and samples. As of October 2025, the portal hosts results from over 23,000 experiments and more than 800 functional element characterization experiments, encompassing raw sequencing files, processed alignments, and detailed metadata available for bulk download. Recent updates in 2025 include a redesigned homepage for better data discovery, an enhanced search interface with custom-designed result pages, and improved filtering options to streamline access to complex datasets. Visualization and programmatic access further enhance the portal's utility. Integration with the WashU Epigenome Browser allows users to interactively explore epigenomic tracks from multiple ENCODE experiments alongside other consortia data. A REST enables automated querying and retrieval of metadata, files, and experiment details, supporting computational workflows and large-scale analyses. All ENCODE data are openly accessible without restrictions, licensed under Attribution 4.0, but users must acknowledge the producing laboratory and cite the relevant dataset (e.g., ENCSR accession) and file (e.g., ENCFF accession) identifiers in publications. The portal's data release policy ensures timely public availability, typically within nine months of generation, to accelerate research while adhering to principles for findability, accessibility, interoperability, and reusability.

FactorBook and Derived Resources

FactorBook, introduced in 2012, serves as a (TF)-centric repository that compiles and analyzes sequencing (ChIP-seq) data from the ENCODE project to identify TF binding sites across the . As of 2025, it integrates results from over 3,300 ENCODE ChIP-seq experiments, providing detailed annotations on binding regions for more than 1,100 human TFs across 185 cell types, including sequence features, accessibility, and modifications surrounding these sites. The resource also catalogs TF binding motifs derived from both ChIP-seq and high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX) experiments, enabling predictions of TF-DNA interactions and target genes. Additionally, FactorBook links binding sites to potential disease associations by cross-referencing with genomic variant databases, facilitating research into regulatory disruptions in human diseases. Derived from ENCODE data, RegulomeDB is a specialized tool for interpreting non-coding genetic variants by scoring their potential regulatory impact based on binding, states, and evolutionary conservation. Launched in 2012, it aggregates ENCODE ChIP-seq, DNase-seq, and mark data to prioritize variants likely to affect gene regulation, such as those in enhancers or promoters implicated in traits and diseases. Users can query specific variants to retrieve evidence tracks from ENCODE, including overlap with motifs and quantitative scores for functional likelihood. ENCODE-derived resources extend to visualization tools in genome browsers, where pre-processed tracks display TF binding clusters, chromatin states, and regulatory elements for seamless integration with other genomic annotations. In the , for instance, the ENCODE ChIP-seq Clusters track aggregates binding sites from hundreds of experiments, allowing users to view regions and metrics across cell types. These tracks, updated periodically with new ENCODE releases, support comparative analyses without requiring raw data downloads from the ENCODE Data Portal. Computational tools derived from ENCODE include uniform analysis that standardize processing of data for and comparability. The ENCODE ChIP-seq , for example, employs the Irreproducible Discovery Rate (IDR) framework to call peaks by assessing replicate consistency, filtering out irreproducible signals to generate high-confidence TF binding sites. This processes raw sequencing reads through alignment, duplicate removal, and peak thresholding, producing outputs compatible with downstream tools like FactorBook. For candidate cis-regulatory elements (cCREs), ENCODE provides software such as the SCREEN , which clusters and classifies over 2.3 million human cCREs based on epigenetic signals from DNase-seq and histone ChIP-seq, enabling targeted queries for regulatory potential. These resources have had substantial impact, with FactorBook and related ENCODE derivatives cited in thousands of peer-reviewed publications for advancing understanding of gene regulation and variant function. Recent integrations with models for predicting TF binding disruptions leverage ENCODE datasets to enhance precision in regulatory . Maintenance involves ongoing synchronization with ENCODE data releases, including expanded motif catalogs from Phases II and III experiments, ensuring the tools remain current for emerging research needs.

modENCODE and Model Organisms

The modENCODE project was launched in 2007 by the as a companion to the human ENCODE initiative, targeting the genomes of the invertebrate model organisms and to systematically identify functional elements and uncover conserved regulatory sequences across species. This effort aimed to annotate non-coding regions in these model systems, leveraging their genetic tractability to inform broader evolutionary principles of function. The project's scope included over 1,000 experiments spanning multiple developmental stages, cell types, and conditions, utilizing assays that paralleled those in ENCODE, such as ChIP-seq for mapping binding sites and modifications, RNA-seq for analysis, and DNase-seq for accessibility. For D. melanogaster, more than 700 datasets were generated, profiling transcripts, positioning, and states across the lifecycle. In C. elegans, over 200 genome-wide datasets were collected by 2010 alone, expanding to include comprehensive maps of regulatory elements during embryogenesis and adulthood. Key findings from modENCODE identified that approximately 30% of the C. elegans genome consists of evolutionarily constrained bases, the majority of which overlap with functional elements such as non-coding regulatory regions. Comparative analyses revealed shared signatures and motifs between flies, worms, and humans, underscoring conserved mechanisms of developmental gene regulation despite divergent evolutionary paths. Building on modENCODE, the (model organism Encyclopedia of Regulatory Networks) initiative, active through the 2020s, expanded profiling with ChIP-seq experiments for over 900 factors in D. melanogaster and C. elegans, including 954 binding profiles released in comprehensive datasets by 2024. These efforts have integrated with the main ENCODE data portal, enabling cross-species comparisons that elucidate functional in regulatory circuits, such as motif conservation and co-binding patterns relevant to human disease modeling. All modENCODE and modERN data are accessible via the unified ENCODE portal, facilitating queries and visualizations for homology-based studies.

Roadmap Epigenomics Project

The NIH Roadmap Epigenomics Mapping Consortium was initiated in 2008 as part of the NIH Common Fund's efforts to generate comprehensive reference maps for a diverse set of and types and tissues, aiming to elucidate the role of variation in , , and . Involving contributions from over 20 laboratories across multiple institutions, the project profiled more than 100 reference epigenomes, including 111 in primary cells and tissues as well as 66 in during embryogenesis, selected to represent key developmental stages and physiological states. The consortium's assays primarily targeted core epigenetic features, including DNA methylation assessed via whole-genome (WGBS) and reduced representation (RRBS), modifications such as , H3K27ac, and via followed by sequencing (ChIP-seq), and chromatin accessibility through DNase I hypersensitive site sequencing (DNase-seq). These data were integrated with complementary datasets from the ENCODE project to enhance the annotation of functional regulatory elements across cell types. Major outputs included a series of 2015 publications in Nature, culminating in an integrative analysis that mapped 111 human reference epigenomes and produced an atlas of tissue-specific regulatory elements, identifying approximately 2.3 million enhancers and 80,000 promoters with distinct signatures varying by . This work established 15 core states, providing a standardized framework for interpreting epigenomic landscapes and their dynamic changes. The project's findings have informed disease research by linking epigenomic alterations to , such as identifying cancer-specific epigenotypes where tumor-associated variants are enriched in regulatory regions like enhancers active in relevant tissues. Additionally, Roadmap data have been leveraged in the Genotype-Tissue Expression (GTEx) project to annotate cell-type-specific (eQTLs), revealing tissue-dependent genetic effects on . Ongoing efforts through 2025 have incorporated single-cell epigenomic data into analysis pipelines and portals, extending the reference maps to resolve heterogeneity within tissues and support advanced integrative studies.

Other Initiatives

The Genomics of Gene Regulation (GGR) program, funded by the in the 2010s, aimed to develop advanced methods for constructing predictive models from genomic data, including integration with large-scale datasets like those from ENCODE. A key example is the FANTOM5 consortium, which utilized Cap Analysis of Gene Expression () to map transcription start sites and promoters across and samples at single-base-pair , enabling detailed profiling of over 1,000 cell types and tissues. This effort complemented ENCODE by integrating data with sequencing (ChIP-seq) and profiles to elucidate global gene regulation mechanisms, such as enhancer-promoter interactions. In the 2020s, efforts to extend ENCODE-like to the () advanced with single-cell resolution through projects like the Fly Cell Atlas, building directly on the foundational data from modENCODE. The Fly Cell Atlas, also known as Tabula Drosophilae, generated a comprehensive single-nucleus transcriptomic atlas encompassing approximately 580,000 nuclei from 15 dissected adult tissues, capturing cell-type-specific and regulatory elements in both sexes. This initiative enhanced understanding of dynamic cellular states in the fruit fly, facilitating comparative analyses with human ENCODE data for conserved regulatory principles. The GENCODE project, initiated as a core component of ENCODE, provides high-accuracy reference annotations for genes and transcripts in and genomes, leveraging ENCODE's experimental data for validation. As of its 2025 release, GENCODE annotates 19,433 protein-coding s in the , along with detailed transcripts, pseudogenes, and non-coding RNAs, prioritizing biological evidence from , , and other assays. This annotation effort supports downstream applications in by offering a standardized framework for identifying functional gene features. Internationally, consortium mapped epigenomic landscapes of hematopoietic types, generating reference datasets for over 100 blood samples to reveal regulatory mechanisms in healthy and diseased states, such as . This work, part of the International Human Epigenome Consortium, integrated with ENCODE to compare epigenetic marks like modifications and across diverse lineages. Similarly, PsychENCODE focuses on the molecular underpinnings of disorders, producing multi-omics from postmortem tissues of individuals with conditions like and , including over 79,000 brain-active enhancers and single-cell expression profiles. These resources link genetic variants from genome-wide association studies to regulatory elements, advancing neuropsychiatric research through ENCODE-inspired approaches. Emerging initiatives in 2025, such as those at the Broad Institute, harness ENCODE datasets alongside GTEx for training models in genomic discovery, enabling predictions of gene regulation and variant effects at scale. These -driven projects utilize to analyze vast data, identifying novel regulatory networks and accelerating insights into mechanisms.

Impact and Criticism

Scientific Contributions

The ENCODE project has significantly advanced the annotation of regulatory elements in the , enabling more precise interpretation of non-coding variants. Tools such as RegulomeDB, which integrate ENCODE's data including accessibility, binding, and modifications, allow researchers to prioritize variants likely to have regulatory impacts by scoring them based on overlap with experimentally validated elements. This approach has improved the functional annotation of variants from genome-wide association studies (GWAS), facilitating the identification of causal regulatory elements among thousands of associated loci. In medical applications, ENCODE data have linked non-coding variants to disease mechanisms by mapping them to regulatory regions that influence . For instance, integration of ENCODE annotations with (eQTLs) from projects like GTEx has revealed how variants in enhancers and promoters modulate tissue-specific , contributing to traits and disorders such as autoimmune diseases and metabolic conditions. Additionally, ENCODE's of non-coding mutations in cancer genomes has supported by identifying transcriptional networks altered in tumors, aiding in the of therapeutic targets. ENCODE's technological influence extends to the development of standardized pipelines for processing data, which have been adopted for genome-wide analyses across diverse assays. These uniform pipelines ensure in mapping sequencing reads, calling peaks for features, and integrating multi-omics datasets, as demonstrated in the ENCODE project's processing of over 23,000 experiments as of 2025. This standardization has spurred advancements in single-cell and , where ENCODE's protocols for single-cell and have been extended to profile regulatory dynamics in heterogeneous cell populations, influencing fields like and disease modeling. Phase 4 of ENCODE (2020-present) has further expanded these efforts, incorporating additional datasets and computational methods to enhance the project's comprehensive mapping of functional elements. The project's educational and community impact is evident through its training initiatives and widespread adoption of its resources. ENCODE has hosted numerous interactive workshops and tutorials at international conferences, equipping researchers with skills to access and analyze its data portal, thereby thousands in methodologies. By 2025, ENCODE data have been cited in thousands of scientific papers, underscoring their foundational role in research and fostering collaborative advancements. Interdisciplinary contributions of ENCODE include enabling AI-driven models for prediction and interpretation. Its comprehensive datasets have powered frameworks that decode grammar, predicting regulatory outcomes and patterns with high resolution.

Controversies and Debates

The phase of the ENCODE project generated significant when its flagship publication asserted that at least 80% of the displays biochemical activity, leading to interpretations that the vast majority of is functional in a biologically meaningful way. This claim was sharply critiqued by evolutionary biologist Dan Graur and colleagues, who argued that equating detectable biochemical signatures—such as transcription or protein binding—with evolutionary conflates mere activity with selective constraint, potentially inflating estimates of genomic functionality and undermining principles of . Graur's analysis highlighted that under strict evolutionary definitions of (requiring effects), the functional fraction of the genome remains far smaller, and ENCODE's approach risked reviving discredited notions without rigorous validation. In response, ENCODE members clarified that their "functional" label referred specifically to biochemical —observable molecular interactions like chromatin accessibility or histone modifications—rather than selected evolutionary fitness or causal roles in phenotypes. Subsequent 2013 and 2014 publications refined these definitions, emphasizing a multi-tiered that distinguishes biochemical signatures from genetic and evolutionary evidence of , while acknowledging the limitations of assays in proving . This clarification aimed to decouple the project's data generation from broader claims about "," though critics maintained that the initial publicity had already misled interpretations. Ongoing debates have centered on media overinterpretation of ENCODE's findings, where headlines proclaimed the demise of , amplifying the 80% figure beyond its intended scope and fueling public misconceptions about genomic complexity. A persistent challenge lies in validating for the millions of regulatory elements identified, as biochemical signals alone cannot confirm phenotypic impacts without extensive experiments, which remain infeasible at scale. Criticisms of ENCODE's scope also highlighted a bias toward immortalized cell lines, such as K562 and , which exhibit aberrant regulatory landscapes due to oncogenic transformations and viral integrations, potentially skewing annotations away from physiological states in primary tissues. Early phases underrepresented diverse primary types and tissues, limiting generalizability to normal . These issues were addressed in the ENCODE phase 3 (2013–2020), which expanded to over 1,300 biosamples, including primary cells from multiple tissues and developmental stages, to better capture context-specific regulation. Resolutions to these debates have included the adoption of the candidate cis-regulatory elements (cCREs) framework in 3, which integrates orthogonal datasets (e.g., DNase-seq, histone marks, and binding) to classify over 2.3 million putative enhancers and promoters with probabilistic confidence, prioritizing those with convergent evidence over isolated signals. Later ENCODE work has emphasized integrative approaches, combining biochemical maps with genetic variants from GWAS and perturbations, to infer functional relevance and mitigate overinterpretation risks.

References

  1. [1]
    The Encyclopedia of DNA Elements (ENCODE)
    Sep 17, 2023 · ENCODE is a public research consortium aimed at identifying all functional elements in the human and mouse genomes.Missing: 2025 | Show results with:2025
  2. [2]
    Project Overview - ENCODE
    The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels.
  3. [3]
    The ENCODE (ENCyclopedia Of DNA Elements) Project - PubMed
    Oct 22, 2004 · The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of ...
  4. [4]
    A User's Guide to the Encyclopedia of DNA Elements (ENCODE)
    Apr 19, 2011 · The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence.
  5. [5]
    NIH-funded project creates an encyclopedia detailing the inner ...
    Jul 29, 2020 · The Encyclopedia of DNA Elements (ENCODE) Project is a worldwide effort to understand how the human genome functions.
  6. [6]
    An Integrated Encyclopedia of DNA Elements in the Human Genome
    The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and ...
  7. [7]
    Data navigation on the ENCODE portal | Nature Communications
    Oct 30, 2025 · Spanning two decades, the collaborative ENCODE project aims to identify all the functional elements within human and mouse genomes.
  8. [8]
    Data navigation on the ENCODE portal - PMC
    Oct 30, 2025 · Spanning two decades, the collaborative ENCODE project aims to identify all the functional elements within human and mouse genomes.
  9. [9]
    ENCODE Publications
    The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues.
  10. [10]
    NHGRI History and Timeline of Events
    ... ENCODE - aimed at discovering all parts of the human genome that are crucial ... after the Human Genome Project's Launch: Lessons Beyond the Base Pairs.
  11. [11]
    The ENCODE (ENCyclopedia Of DNA Elements) Project - Science
    Oct 22, 2004 · The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence.
  12. [12]
    Expanded encyclopaedias of DNA elements in the human ... - Nature
    Jul 29, 2020 · The ENCODE Project aims to delineate precisely and comprehensively the segments of the human and mouse genomes that encode functional elements.
  13. [13]
    An integrated encyclopedia of DNA elements in the human genome
    Sep 5, 2012 · The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and ...Missing: scope | Show results with:scope
  14. [14]
    Data standards – ENCODE
    ### Summary of ENCODE Data Standards for Reproducibility
  15. [15]
    Mapping a genetic world beyond genes | Broad Institute
    Sep 5, 2012 · Comprised of more than 30 participating institutions, including the Broad Institute, the ENCODE Project Consortium has helped to ascribe ...
  16. [16]
    ENCODE data describes function of human genome
    Sep 5, 2012 · In addition, NHGRI devoted about $40 million to the ENCODE pilot ... During the next phase, ENCODE will increase the depth of the catalog ...
  17. [17]
    ENCODE Pilot Project - National Human Genome Research Institute
    Oct 18, 2012 · On March 7, 2003, the NHGRI held a meeting to officially launch the ENCODE Pilot Project Research Consortium and to provide information to ...Missing: 2003-2007 | Show results with:2003-2007
  18. [18]
    Identification and analysis of functional elements in 1% of the human ...
    The Encyclopedia of DNA Elements (ENCODE) Project aims to provide a more biologically informative representation of the human genome by using high-throughput ...
  19. [19]
    ENCODE Pilot Project at UCSC
    The pilot project established protocols for scaling up to full-genome coverage and produced a wealth of data, elucidating elements such as protein-coding genes, ...
  20. [20]
    NHGRI completes phase 3 of ENCODE project
    Aug 6, 2020 · The Encyclopedia of DNA Elements (ENCODE) project is an international collaboration involving NHGRI-funded research groups.<|control11|><|separator|>
  21. [21]
    News & Updates - ENCODE
    The ENCODE portal serves as the primary and comprehensive source for data and information about the ENCODE project.
  22. [22]
    ENCODE 3 - Nature
    This Collection showcases the main articles and related content resulting from the third phase of ENCODE, during which almost 6,000 new ...
  23. [23]
    Single-Cell Epigenomics Uncovers Heterochromatin Instability and ...
    Apr 23, 2025 · We use single-cell epigenomics to profile chromatin accessibility and gene expression across eight brain regions in the mouse brain at 2, 9, and 18 months of ...
  24. [24]
    Beyond AlphaFold: how AI is decoding the grammar of the genome
    Aug 18, 2025 · Scientists are seeking to decipher the role of non-coding DNA in the human genome, helped by a suite of artificial-intelligence tools.
  25. [25]
    A conversation about the legacy of ENCODE and what comes next
    Oct 28, 2021 · Epigenetics researcher Charles Epstein discusses the impact of the NHGRI ENCODE consortium on our understanding of genome regulation.
  26. [26]
    ENCODE at UCSC
    ... funded by the National Human Genome Research Institute (NHGRI). ... This covers data generated during the two production phases 2007-2012 and 2013-present.Downloads · Experiment Matrix · Pilot (2003-2007) · Cell TypesMissing: total | Show results with:total
  27. [27]
    ENCODE whole-genome data in the UCSC Genome Browser
    INTRODUCTION. Following a 4-year pilot phase aimed at identifying functional elements in selected regions comprising 1% of the human genome (1–2) ...<|separator|>
  28. [28]
    Expired RFA-HG-16-005: ENCODE Data Coordinating Center (U24)
    Jan 15, 2016 · The ENCODE Research Consortium Steering Committee will serve as the main coordinating board of the ENCODE Research Consortium established ...
  29. [29]
    [PDF] ENCODE Consortia Data Release, Data Use, and Publication Policies
    Nov 22, 2009 · ENCODE/modENCODE research groups will release, to an appropriate public database, data obtained in experiments at the time that this standard ...Missing: steering | Show results with:steering
  30. [30]
    ENCODE Project Data Release Policy (2003-2007)
    Feb 6, 2012 · The ENCODE pilot phase, during which time data corresponding to only 1% of the human genome will be produced, will provide NHGRI with an ...Missing: launch | Show results with:launch
  31. [31]
    NHGRI Kicks Off ENCODE Project Expansion with About $80M in ...
    Oct 9, 2007 · The NHGRI is awarding about $80 million worth of grants over the next four years to expand the ENCyclopedia Of DNA Elements (ENCODE) project ...
  32. [32]
  33. [33]
    NIH ENCODE grants advance effort to survey entire human ...
    Grants totaling $30.3 million in fiscal year 2012 will expand the ENCyclopedia Of DNA Elements (ENCODE), a comprehensive catalog of ...Missing: 2012-2017 | Show results with:2012-2017
  34. [34]
    Data Use, Software, and Analysis Release Policies - ENCODE
    The ENCODE Project aims to enhance biomedical research by generating community resources of genomics data, software, tools and methods for genomics data ...<|separator|>
  35. [35]
    ENCODE Project Telebriefing Participant Bios
    May 2, 2014 · Ewan Birney, Ph.D. Dr. Birney is associate director of EMBL-EBI (European Bioinformatics Institute). He developed a number of databases ...Missing: principal | Show results with:principal
  36. [36]
    The ENCODE Project and the ENCODE Controversy
    The ENCyclopedia Of DNA Elements (ENCODE) project was an international research effort funded by the National Human Genome Research Institute (NHGRI) that ...
  37. [37]
    [PDF] A User's Guide to the Encyclopedia of DNA Elements (ENCODE)
    Apr 19, 2011 · The genotype and sequence data from GM12878 generated by the 1,000 Genomes Project are being integrated with sequence data from ENCODE chromatin ...
  38. [38]
    Collaborations - ENCODE
    The ENCODE consortium has coordinated with the Personal Genome Project (PGP) to deeply profile the epigenomic landscape of various cell types.Missing: 1000 | Show results with:1000
  39. [39]
    [PDF] NHGRI FY 2022 Congressional Justification
    Early and mid-career transitions are common exit points from the scientific pipeline. In FY 2022,. NHGRI will start transition awards to help retain.
  40. [40]
    ENCODE (ENCyclopedia Of DNA Elements) | Bethesda MD
    Rating 5.0 (3) This program is intended to support Early Stage and New Investigators from diverse backgrounds, including those from groups underrepresented in health-related ...
  41. [41]
    Anshul Kundaje - Stanford Profiles
    Sep 30, 2024 · Dr. Kundaje has led computational efforts of large genomics consortia including the ENCODE Project and the Roadmap Epigenomics Project. Dr.
  42. [42]
    Anshul Kundaje | Stanford Medicine
    Dr. Kundaje has led computational efforts of large genomics consortia including the ENCODE Project and the Roadmap Epigenomics Project. Dr. Kundaje is a ...
  43. [43]
    The ENCODE Imputation Challenge: a critical assessment of ...
    Apr 18, 2023 · We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging.
  44. [44]
    ENCODE whole-genome data in the UCSC Genome Browser - PMC
    Nov 17, 2009 · During the transition from pilot to production phase, the bulk of ENCODE investigators shifted methodologies from microarray to assays based ...
  45. [45]
    Multicenter integrated analysis of noncoding CRISPRi screens
    Mar 19, 2024 · The ENCODE CRISPR screening database contains >540,000 individual perturbations covering 24.85 megabases (Mb; 0.82%) of the human genome ( ...
  46. [46]
    2012 Quality Metrics for integrative analysis publications - ENCODE
    Larger SPOT values indicate higher signal to noise; 1.0 is the maximum possible value (all reads are signal) and 0 is the minimum possible value (all reads are ...
  47. [47]
    Current ENCODE Experiment Guidelines – ENCODE
    ### Summary of ENCODE Experiment Guidelines
  48. [48]
    Lis Lab PRO-seq Pipeline - ENCODE
    This pipeline from the Lis lab processes sequencing data from PRO-seq assays to produce signal tracks and bidirectional peaks indicating enhancer regions. Lab ...
  49. [49]
    ENCODE
    No readable text found in the HTML.<|control11|><|separator|>
  50. [50]
    The Encyclopedia of DNA elements (ENCODE): data portal update
    Nov 6, 2017 · To date, ENCODE alone has produced over 9000 high-throughput sequencing libraries from assays such as: RNA-Seq, chromatin immunoprecipitation ( ...
  51. [51]
    ENCODE data at the ENCODE portal - PMC - PubMed Central
    Nov 2, 2015 · The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the ...
  52. [52]
    Experiment Matrix - ENCODE
    Experiment search · Experiment matrix · ChIP-seq matrix · Human and mouse body maps ... Using the portal · Cart · REST API · Citing ENCODE · FAQ.Missing: features views
  53. [53]
    WashU Epigenome Browser - ENCODE
    The WashU Epigenome Browser provides visualization, integration, and analysis tools for epigenomic datasets, including data from ENCODE and other consortia.
  54. [54]
    Getting Started - ENCODE
    The ENCODE Portal contains raw and ground-level analysis data generated by participating mapping centers using a wide-range of assays (integrative analysis data ...Missing: volume | Show results with:volume
  55. [55]
    Citing ENCODE
    Cite the ENCODE Consortium, acknowledge the production lab, and reference the dataset (ENCSR...) and file (ENCFF...) accession numbers.Missing: attribution guidelines
  56. [56]
    Factorbook.org: a Wiki-based database for transcription ... - PubMed
    Factorbook is a transcription factor (TF)-centric repository of all ENCODE ChIP-seq datasets on TF-binding regions, as well as the rich analysis results of ...
  57. [57]
    Factorbook.org: a Wiki-based database for transcription factor ... - NIH
    Nov 29, 2012 · In Wiki format, factorbook is a transcription factor (TF)-centric repository of all ENCODE ChIP-seq datasets on TF-binding regions, as well as ...
  58. [58]
    Factorbook: an updated catalog of transcription factor motifs and ...
    The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif ...Abstract · INTRODUCTION · OVERVIEW
  59. [59]
    Factorbook
    Factorbook offers a comprehensive resource of TF binding motifs and sites, enabling researchers to predict the impact of genetic variants on TF binding and gene ...Missing: 2025 | Show results with:2025<|separator|>
  60. [60]
    Annotation of functional variation in personal genomes using ... - NIH
    We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome.
  61. [61]
    RegulomeDB — Help
    RegulomeDB is a database that provides functional context to variants or regions of interest and serves as a tool to prioritize functionally important ...
  62. [62]
    Annotating and prioritizing human non-coding variants with ... - NIH
    In summary, RegulomeDB provides a user-friendly tool to annotate and prioritize variants in non-coding regions of the human genome, which can aid variant ...
  63. [63]
    hg19 Human: Transcription Factor ChIP-seq Clusters (161 factors ...
    This track shows regions of transcription factor binding derived from a large collection of ChIP-seq experiments performed by the ENCODE project, together with ...
  64. [64]
    ENCODE whole-genome data in the UCSC genome browser (2011 ...
    Oct 30, 2010 · The ENCODE whole-genome data is interspersed among non-ENCODE data tracks on the Genome Browser. As of August 2010 this data was available ...
  65. [65]
    ENCODE Data in the UCSC Genome Browser: year 5 update - NIH
    The ENCODE Data Coordination Center at UCSC (DCC) has accessioned all relevant ENCODE data at the Gene Expression Omnibus (GEO) (20) and the Short Read Archive.
  66. [66]
    Transcription Factor ChIP-seq Data Standards and Processing ...
    Data quality standards for ENCODE2 are outlined in ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. ... 2025 Stanford University.Missing: modERN | Show results with:modERN
  67. [67]
    ENCODE Uniform processing pipeline for ChIP-seq - GitHub
    Calculate and report overlapping peaks from both replicates. Peak calling (transcription factors). Call peaks with SPP. Threshold peaks with IDR. Report ...
  68. [68]
    SCREEN: Search Candidate Regulatory Elements by ENCODE
    SCREEN is a web interface for searching and visualizing the Registry of candidate cis-Regulatory Elements (cCREs) derived from ENCODE data.
  69. [69]
    Factorbook: an Updated Catalog of Transcription Factor Motifs and ...
    Oct 12, 2021 · Factorbook is a transcription factor-centric database cataloging information for 694 distinct human TFs and 62 mouse TFs profiled in 249 and 38 human and mouse ...
  70. [70]
    Factorbook.org: a Wiki-based database for transcription factor ...
    We are adding additional analyses performed by the ENCODE analysis working group and plan to update factorbook whenever more ENCODE data are available.Missing: impact | Show results with:impact
  71. [71]
    The modENCODE Project: Model Organism ENCyclopedia Of DNA ...
    Aug 27, 2014 · The modENCODE Project will provide important insights into the biology of D. melanogaster and C. elegans as well as other organisms, including humans.
  72. [72]
    Integrative Analysis of the Caenorhabditis elegans Genome by the ...
    Nov 17, 2010 · From the project start in 2007 (2), the C. elegans modENCODE groups had by February 2010 collected 237 genome-wide data sets (table S1) ...
  73. [73]
    Identification of Functional Elements and Regulatory Circuits by ...
    Dec 22, 2010 · The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome ...
  74. [74]
    [PDF] Integrative Analysis of the Caenorhabditis elegans Genome by the ...
    Thus,. modENCODE explains an additional 27.4%. (8.1 Mb) of the constrained portion of the ge- nome; together with remaining unconfirmed. WormBase gene ...
  75. [75]
    Comparative analysis of regulatory information and circuits across ...
    Aug 27, 2014 · To compare regulatory architecture and binding across diverse organisms, the modENCODE and ENCODE consortia mapped the binding locations of 93 ...
  76. [76]
    Comparative modENCODE/ENCODE
    The comparative modENCODE/ENCODE project includes studies on metazoan chromatin, regulatory information, and transcriptomes across species, with related ...Missing: project 2007
  77. [77]
    Epigenomics | NIH Common Fund
    The Roadmap Epigenomics program issued its first round of awards in 2008. ... The NIH Roadmap Epigenomics Program joined the International Human Epigenome ...Missing: initiation | Show results with:initiation
  78. [78]
    The NIH Roadmap Epigenomics Mapping Consortium - PMC
    The NIH Roadmap Epigenomics Mapping Consortium aims to produce a public resource of epigenomic maps for stem cells and primary ex vivo tissues.Missing: mouse | Show results with:mouse
  79. [79]
    Integrative analysis of 111 reference human epigenomes - Nature
    Feb 18, 2015 · To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and ...Missing: initiation | Show results with:initiation
  80. [80]
    Annotation of chromatin states in 66 complete mouse epigenomes ...
    Feb 22, 2021 · The Roadmap Epigenomics Consortium previously defined 15 human chromatin states using five histone marks in 127 human biosamples. To investigate ...
  81. [81]
    Genetic effects on gene expression across human tissues - Nature
    Oct 12, 2017 · For 26 GTEx tissues matched with cell-type specific annotations from the Roadmap Epigenomics project, we applied a Bayesian hierarchical model ...
  82. [82]
    WashU Epigenome Browser update 2025 - Oxford Academic
    May 5, 2025 · This approach has been successfully implemented to manage over 200 000 datasets generated by consortia such as Roadmap Epigenomics, ENCODE, and ...
  83. [83]
    Genomics of Gene Regulation
    Jul 24, 2017 · The Genomics of Gene Regulation (GGR) project develops better methods to construct predictive, accurate gene regulatory network models using genomic data.
  84. [84]
    FANTOM5 CAGE profiles of human and mouse samples - Nature
    Aug 29, 2017 · In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies ...
  85. [85]
    Paradigm shifts in genomics through the FANTOM projects - PMC
    Integrating the FANTOM5 CAGE expression atlas with complementary data such as ChIP-Seq and DNA methylation profiling allowed us to study regulation in ...Missing: GGR | Show results with:GGR
  86. [86]
    FlyBase:ScRNA-Seq
    Apr 3, 2025 · During 2020 and 2021, the FCA consortium ran a collaborative effort with CZ Biohub, Genentech, and NIH, to sequence all cells of the adult fly.Missing: 2020s | Show results with:2020s
  87. [87]
    Fly Cell Atlas: a single-nucleus transcriptomic atlas of the adult fruit fly
    Here we present a single cell atlas of the adult fly, Tabula Drosophilae, that includes 580k nuclei from 15 individually dissected sexed tissues as well as the ...
  88. [88]
    FLY CELL ATLAS - FlyCellAtlas description.
    The Fly Cell Atlas brings together Drosophila researchers interested in single-cell genomics, transcriptomics, and epigenomics, to build comprehensive cell ...
  89. [89]
    GENCODE: producing a reference annotation for ENCODE
    Aug 7, 2006 · The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination ...
  90. [90]
    Human Release Statistics - GENCODE
    Total No of Genes, 78691. Protein-coding genes, 19433. - readthrough genes (not included), 664. Long non-coding RNA genes, 35899. Small non-coding RNA genes ...
  91. [91]
    GENCODE 2025: reference gene annotation for human and mouse
    Nov 20, 2024 · Thus, GENCODE does not currently annotate Ribo-seq ORFs as novel protein-coding genes when the only support comes from immunopeptidomics data.
  92. [92]
    A BLUEPRINT of Haematopoietic Epigenomes
    The BLUEPRINT Project is a five-year project to further the understanding of how genes are activated and repressed in healthy and diseased human blood cells.Missing: ENCODE | Show results with:ENCODE
  93. [93]
    PsychENCODE
    The PsychENCODE Consortium brings together multidisciplinary teams to study the molecular basis of neuropsychiatric diseases.Data · Research · Resources · People
  94. [94]
    The PsychENCODE project - PMC - PubMed Central - NIH
    Nov 25, 2015 · The PsychENCODE project aims to produce a public resource of multidimensional genomic data using tissue- and cell type–specific samples.
  95. [95]
    How massive datasets generated at Broad are powering the latest AI ...
    Sep 3, 2025 · Broad scientists describe how data resources they helped build over more than a decade now form the foundation for cutting-edge AI and ...
  96. [96]
    How AI uses ENCODE and GTEx datasets for genomic discovery
    Sep 3, 2025 · → Each agent can specialize. ↳ One interprets raw sequencing data. ↳ Another predicts disease risk. ↳ A third suggests personalized treatments.
  97. [97]
    Interpreting non-coding disease-associated human variants using ...
    Genome-wide association studies (GWAS) have linked hundreds of thousands of sequence variants in the human genome to common traits and diseases.Missing: precision oncology
  98. [98]
    The GTEx Consortium atlas of genetic regulatory effects across ...
    To test this supposition, we analyzed TAD data from ENCODE (1) and cis-eQTLs from matching GTEx tissues (table S3). Compared to matching random variant-gene ...
  99. [99]
    A global transcriptional network connecting noncoding mutations to ...
    Abstract. Although cancer genomes are replete with noncoding mutations, the effects of these mutations remain poorly characterized.Missing: precision oncology
  100. [100]
    The ENCODE Uniform Analysis Pipelines - PMC - NIH
    While IDR is used to estimate reproducibility and stringent peak calls, the default “replicated” peaks are those that are identified by MACS2 with relaxed ...
  101. [101]
    Comprehensive functional genomic resource and integrative model ...
    First, we used standard pipelines to uniformly process single-cell RNA-seq data from PsychENCODE, in conjunction with other single-cell studies on the brain (14 ...
  102. [102]
    Tutorials & Workshops - ENCODE
    The ENCODE Consortium hosts interactive workshops at conferences around the world in order to teach participants how to navigate the ENCODE Encylopedia.Missing: programs | Show results with:programs
  103. [103]
    Defining functional DNA elements in the human genome - PNAS
    Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments.<|control11|><|separator|>
  104. [104]
    Scientists attacked over claim that 'junk DNA' is vital to life | Genetics
    Feb 23, 2013 · "Everything that Encode claims is wrong. Their statistics are horrible, for a start," the lead author of the paper, Professor Dan Graur, of ...