ATAC-seq
Assay for transposase-accessible chromatin using sequencing (ATAC-seq) is a high-throughput sequencing method that maps regions of open chromatin across the genome, identifying regulatory elements such as promoters, enhancers, and insulators by probing DNA accessibility with a hyperactive Tn5 transposase enzyme. The technique involves a simple two-step protocol where the transposase simultaneously fragments accessible DNA and ligates sequencing adapters to the ends, enabling library preparation from as few as 500 cells and generating nucleotide-resolution profiles of chromatin structure, nucleosome positioning, and transcription factor binding sites.[1] Developed to overcome limitations of prior methods like DNase-seq, ATAC-seq provides rapid and sensitive epigenomic profiling without the need for antibodies or extensive cell numbers.[1][2] Introduced in 2013 by Buenrostro et al., ATAC-seq builds on earlier chromatin accessibility assays but distinguishes itself through its efficiency, requiring 50,000-fold fewer cells than traditional DNase hypersensitivity assays and completing in under three hours.[1][2] Its advantages include high reproducibility, low input requirements suitable for clinical or rare samples, and compatibility with diverse sample types, including frozen tissues via optimized protocols like Omni-ATAC.[3] Compared to ChIP-seq, which targets specific proteins but demands 10^5–10^7 cells and specialized antibodies, ATAC-seq offers a broader, unbiased view of regulatory landscapes at lower cost and complexity.[2] ATAC-seq has revolutionized studies of gene regulation, enabling insights into cell-type-specific epigenomes during development, differentiation, and disease.[4] Key applications include profiling chromatin accessibility in embryonic tissues, immune cell responses, and cancer subtypes to uncover driver mutations and therapeutic targets.[5][2] Single-cell variants, such as scATAC-seq introduced in 2015, extend its utility to heterogeneous populations, revealing cell fate transitions and integrating with RNA-seq for multi-omics analysis of regulatory networks. Recent advancements, including high-throughput methods like sci-ATAC-seq and benchmarks of protocols, continue to enhance its resolution and scalability for large-scale epigenomic atlases.[6]Introduction and Background
Overview and Principles
ATAC-seq, or Assay for Transposase-Accessible Chromatin using sequencing, is a high-throughput sequencing method designed to map regions of open chromatin across the genome, identifying sites where transcription factors and other regulatory proteins can bind to influence gene expression.[7] This technique leverages the natural accessibility of DNA in regulatory elements, such as promoters and enhancers, to provide insights into the epigenetic landscape that governs cellular identity and function.[8] Chromatin accessibility is a fundamental biological principle underlying gene regulation, where the packaging of DNA into chromatin modulates access to genetic information. Euchromatin represents loosely packed, transcriptionally active regions with high accessibility, facilitating the binding of regulatory proteins, while heterochromatin is densely compacted and generally repressive, limiting such interactions.[9] Nucleosome positioning plays a critical role in this process, as nucleosomes—histone octamers wrapped by DNA—act as barriers that, when repositioned or evicted, expose enhancers and promoters to enable precise control of transcriptional activity.[10] The biochemical foundation of ATAC-seq relies on the hyperactive Tn5 transposase, an engineered enzyme that catalyzes tagmentation—a coupled process of DNA cleavage and adapter sequence ligation—specifically at accessible chromatin sites. This transposase preferentially targets nucleosome-free or depleted regions, fragmenting the DNA and tagging it with sequencing adapters in a single step, which allows for high-resolution mapping down to the single-nucleotide level upon next-generation sequencing.[7] The method was first described in 2013 as a streamlined approach for epigenomic profiling. Compared to earlier techniques like DNase-seq, which uses DNase I digestion, or FAIRE-seq, which relies on formaldehyde-assisted isolation of regulatory elements, ATAC-seq offers significant advantages, including a rapid protocol completable in under 3 hours and compatibility with low cell inputs ranging from 500 to 50,000 cells, making it suitable for scarce or precious samples.[7][4] These features enhance its utility for profiling dynamic chromatin states in diverse biological contexts.Historical Development
ATAC-seq was initially developed in 2013 by Jason D. Buenrostro, Paul G. Giresi, Lisa C. Zaba, Howard Y. Chang, and William J. Greenleaf at Stanford University, introducing a method for assaying chromatin accessibility genome-wide using hyperactive Tn5 transposase to insert sequencing adapters directly into open chromatin regions. This innovation addressed key limitations of prior techniques, such as DNase-seq, which required millions of cells and complex enzymatic digestion steps, and ChIP-seq, which was limited to predefined protein targets and dependent on antibody quality. The original protocol enabled profiling with as few as 500 cells in under 3 hours, facilitating rapid epigenomic analysis in diverse biological contexts. Key milestones followed swiftly, including the 2015 adaptation for single-cell resolution (scATAC-seq) by Buenrostro, Bo Wu, Utz-Maria Litzenburger, and colleagues, which allowed high-throughput mapping of chromatin accessibility from individual cells, revealing principles of regulatory variation across cell types.[11] In 2017, M. Ryan Corces, Anshul Kundaje, Howard Y. Chang, and others refined the method into Omni-ATAC, optimizing it for low-input samples (down to 50,000 nuclei) and frozen tissues while reducing mitochondrial background noise, thus broadening applicability to archival and clinical specimens. Post-2020 advances extended ATAC-seq spatially, as demonstrated by Deng et al.'s 2022 spatial-ATAC-seq, which integrated tissue sectioning with barcoded capture to profile chromatin accessibility at subcellular resolution in intact mouse and human tissues.[12] ATAC-seq's adoption accelerated rapidly, with integration into the ENCODE project's phase 3 by 2015, where it complemented DNase-seq for comprehensive mapping of regulatory elements across human cell lines and tissues.[13] By 2025, ATAC-seq had been referenced in over 9,000 publications, reflecting its widespread use in genomics research.[14] The Chang laboratory at Stanford played a pivotal role in multimodal extensions, combining ATAC-seq with RNA-seq for joint profiling of accessibility and transcription.[11] Meanwhile, contributions from the Broad Institute, including computational pipelines for peak calling and integration, enhanced data analysis scalability and reproducibility.[13]Methodology
Core Experimental Procedure
The core experimental procedure for ATAC-seq involves isolating nuclei from a small number of cells, followed by tagmentation to insert sequencing adapters into accessible chromatin regions, purification, and PCR amplification to generate a sequencing-ready library. This protocol, originally developed for bulk analysis, requires approximately 50,000 intact nuclei obtained from fresh or frozen cells or tissues, ensuring high-quality starting material to maintain chromatin integrity. Nuclei are isolated by lysing cells in a hypotonic buffer containing detergents like NP-40, followed by centrifugation to pellet the nuclei while removing cytoplasmic contaminants; this step typically takes 10-15 minutes and is performed on ice to prevent degradation. Tagmentation, the hallmark step of ATAC-seq, employs hyperactive Tn5 transposase from the Nextera kit in a specialized buffer to simultaneously fragment DNA and ligate adapters preferentially at open chromatin sites, generating 5-10 base pair fragments. The isolated nuclei are resuspended in a transposition mix consisting of 2× TD buffer, Tn5 transposase, and nuclease-free water (total volume ~50 µL), then incubated at 37°C for 30 minutes in a thermal cycler to allow enzyme activity without excessive background cutting. This controlled reaction exploits the transposase's preference for accessible DNA, yielding fragments that reflect regulatory elements like promoters and enhancers. Following tagmentation, the reaction is stopped by adding EDTA and purified using a Qiagen MinElute column to remove the transposase enzyme, free adapters, and debris, resulting in a clean DNA eluate of ~10-20 µL. The purified tagmented DNA is then amplified via PCR using 1× NEBNext master mix and barcoded Nextera primers (1.25 µM each) to add full sequencing adapters and indices; the cycling conditions include an initial 72°C for 5 minutes, 98°C for 30 seconds, followed by 8-12 cycles of 98°C for 10 seconds, 63°C for 30 seconds, and 72°C for 1 minute, with a final 72°C extension. Amplification cycle number is optimized via qPCR to avoid over-amplification, which can introduce bias, and the entire tagmentation-to-amplification process can be completed in 2-3 hours. Library quality is assessed by quantifying DNA concentration using qPCR and evaluating fragment size distribution with a Bioanalyzer or TapeStation, where nucleosome-free regions appear as a peak at 150-250 base pairs (including adapters), and mono-nucleosomal fragments at ~300 bp. Sequencing is performed on Illumina platforms with paired-end reads of 50-75 base pairs at a depth of 25-50 million reads per sample to achieve sufficient coverage for peak calling. To ensure quality, over-tagmentation is avoided by strict adherence to incubation times, as prolonged exposure can lead to excessive fragmentation and reduced library complexity; additionally, all steps are conducted with RNase treatment if necessary to eliminate RNA interference.Protocol Variations and Optimizations
To address challenges with limited starting material, the Omni-ATAC protocol optimizes the standard ATAC-seq workflow for low-input samples, enabling reliable profiling from as few as 500 cells through refined nuclear lysis using a combination of mild detergents (NP-40, digitonin, and Tween-20) and adjusted transposase concentrations that enhance tagmentation efficiency while minimizing PCR duplicates and background noise. This adaptation maintains high correlation with bulk ATAC-seq data (r > 0.95) across diverse cell types, facilitating applications in rare populations without substantial loss in resolution. Adaptations for fixed-tissue samples incorporate formaldehyde fixation to preserve chromatin structure in archival or clinical specimens, such as formalin-fixed paraffin-embedded (FFPE) tissues, followed by de-crosslinking steps and extended incubation times (up to 2 hours) during transposition to recover accessible regions comparable to fresh samples (correlation r ≈ 0.87). These modifications extend ATAC-seq utility to biobanked materials, though they may increase sequencing depth requirements by 20-50% to compensate for fixation-induced biases. For high-throughput processing, automation via 96-well plate formats and robotic liquid handling systems, such as RoboATAC, supports multiplexing of up to 96 samples per run, streamlining library preparation while preserving signal-to-noise ratios equivalent to manual methods. Microfluidic integrations further enable scalable tagmentation, reducing hands-on time to under 2 hours and minimizing variability across batches. Bias mitigation strategies include custom lysis and transposition buffers that suppress mitochondrial DNA contamination by over 90% through selective nuclear permeabilization, as implemented in Omni-ATAC, thereby enriching for nuclear chromatin signals. Additionally, ATAC-STARR-seq combines ATAC-seq enrichment with self-transcribing active regulatory region sequencing to functionally validate enhancer activity in accessible regions, identifying active versus poised elements with high specificity in reporter assays.[15] In the 2020s, protocol updates have integrated ATAC-seq with long-read platforms like PacBio's Fiber-seq, which captures extended chromatin fragments (>10 kb) to phase haplotypes within accessible regions, revealing allele-specific accessibility patterns and structural variants not detectable by short-read methods.[16] Troubleshooting high nucleosome occupancy, particularly in stem cells with compact chromatin, involves incorporating mild detergents (e.g., 0.1% NP-40) into the lysis buffer to gently disrupt membranes without altering nucleosome positioning, yielding up to 30% more peaks in low-accessibility samples compared to standard conditions.Applications
Fundamental Uses in Genomics
ATAC-seq serves as a foundational tool for mapping regulatory elements across the genome by identifying regions of open chromatin that correspond to enhancers, promoters, and insulators, which are typically associated with active transcription. These accessible sites, detected as peaks in sequencing reads, highlight DNA sequences bound by transcription factors and other regulatory proteins, providing insights into the architectural organization of gene regulation. For instance, in human cell lines, ATAC-seq peaks often overlap with known active promoters near transcription start sites and distal enhancers that loop to influence gene expression, enabling the annotation of functional non-coding elements without prior knowledge of specific protein bindings.[1] Beyond peak identification, ATAC-seq facilitates the inference of nucleosome positioning by analyzing the size distribution of sequenced fragments, where protected regions of approximately 147 base pairs indicate nucleosome cores, and shorter fragments in linker DNA reveal accessible intervals. This approach allows for high-resolution mapping of the +1 nucleosome immediately downstream of transcription start sites, as well as periodic nucleosome arrays in promoter and enhancer regions, which influence chromatin compaction and accessibility. In yeast and human cells, such patterns have demonstrated shifts in nucleosome occupancy during environmental responses, underscoring the role of chromatin structure in modulating regulatory potential. At sub-nucleosomal resolution, ATAC-seq enables transcription factor footprinting by detecting localized reductions in accessibility within open chromatin peaks, corresponding to motifs where TFs bind and protect DNA from transposase insertion. Computational tools like HINT-ATAC correct for Tn5 transposase biases to accurately delineate these footprints, revealing binding dynamics for hundreds of TFs in a single experiment. This has proven effective in identifying key regulatory motifs in immune cells, where footprints correlate with ChIP-seq validated sites, offering a cost-efficient alternative for genome-wide TF occupancy profiling.[17] In comparative epigenomics, ATAC-seq profiles chromatin accessibility differences across species, tissues, or conditions, as exemplified by ENCODE datasets from diverse human cell lines, which catalog over 100,000 reproducible peaks per sample to reveal cell-type-specific regulatory landscapes. These comparisons highlight evolutionary conservation of open chromatin at core promoters while exposing condition-specific gains or losses at enhancers, aiding in the dissection of regulatory evolution and perturbation responses.[13] ATAC-seq integrates seamlessly with other epigenomic assays, such as H3K27ac ChIP-seq, to distinguish active enhancers from poised ones; regions with high ATAC accessibility and H3K27ac enrichment denote dynamically active regulatory elements driving transcription. Deep learning models trained on such paired data enhance prediction of active marks from ATAC-seq alone, improving the resolution of enhancer states in low-input samples.[18] A notable application is illustrated in a 2015 study on human T cell differentiation, where ATAC-seq revealed dynamic chromatin accessibility changes at cytokine loci, such as Il2 and Ifng, correlating with lineage-specific activation during immune responses. These shifts in accessibility preceded transcriptional upregulation, demonstrating how ATAC-seq captures regulatory remodeling in bulk immune cell populations.Applications in Disease and Development
ATAC-seq has been instrumental in profiling chromatin accessibility alterations in cancer, enabling the identification of regulatory changes driving tumorigenesis. A landmark study generated genome-wide accessibility profiles for 410 tumor samples across 23 cancer types from The Cancer Genome Atlas (TCGA), revealing cancer-specific open chromatin regions enriched near oncogenes.[5] Such analyses have uncovered tissue-specific enhancer landscapes that distinguish tumor subtypes and inform potential therapeutic vulnerabilities.[19] In developmental biology, ATAC-seq facilitates tracking dynamic epigenetic reprogramming during embryogenesis by capturing waves of chromatin accessibility that coincide with key lineage specification events. For instance, a multi-omics study of mouse gastrulation at single-cell resolution demonstrated sequential accessibility changes in promoter and enhancer regions, marking the transition from pluripotent to differentiated states around embryonic day 6.5 to 7.5. These accessibility waves were linked to the activation of developmental transcription factors, providing insights into the regulatory mechanisms orchestrating gastrulation.[20] Quantitative assessments often employ fold-change metrics, such as Δaccessibility = log₂(peak intensity_disease / peak intensity_control), to quantify these shifts relative to baseline states, highlighting up to 2- to 4-fold increases in accessibility at lineage-specific loci. ATAC-seq has revealed aberrant chromatin landscapes in neurological disorders, particularly in Alzheimer's disease (AD) where accessibility changes at enhancers contribute to pathological gene dysregulation. In AD, snATAC-seq profiling of human brain tissue identified altered chromatin accessibility in neuronal enhancers associated with tau pathology, correlating with disease progression.[21] These findings link open chromatin regions to the propagation of tau pathology, with differential accessibility scores showing significant fold-changes (e.g., >1.5 log₂) in disease-associated loci compared to controls.[21] Such patterns underscore ATAC-seq's utility in dissecting epigenetic drivers of neurodegeneration. Recent integrations of ATAC-seq with single-nucleus multi-omics have identified disease-critical cell types in brain-related disorders, as of 2024.[22] In infectious diseases, ATAC-seq elucidates host-pathogen interactions by mapping chromatin remodeling at viral integration or latency sites. For HIV, analyses of latently infected CD4+ T-cells demonstrated reduced proviral chromatin accessibility in latent reservoirs, identifying closed regions that enforce viral silencing and evade immune detection. These latency-associated sites exhibited lower ATAC-seq signal intensities, with log₂ fold-changes indicating up to 3-fold decreased accessibility relative to productively infected cells, informing strategies for reservoir reactivation.[23] Clinically, ATAC-seq-derived accessibility signatures are emerging as biomarkers for predicting immunotherapy responses in tumors. A 2024 multi-omics study in adrenocortical carcinoma integrated ATAC-seq data to identify differential accessibility patterns in immune-related enhancers, associating open chromatin signatures with improved outcomes to checkpoint inhibitors like PD-1 blockers.[24] These signatures, quantified via fold-change metrics in peak intensities between responders and non-responders, enable non-invasive prognostic tools and guide personalized treatment.[24]Variants and Extensions
Single-Cell ATAC-seq
Single-cell ATAC-seq (scATAC-seq) adapts the bulk ATAC-seq protocol to profile chromatin accessibility at the resolution of individual cells, enabling the study of regulatory heterogeneity within populations. The method originated in 2015 with a pioneering approach by Buenrostro et al., who isolated single cells via fluorescence-activated cell sorting (FACS) and performed tagmentation on each, yielding sparse but informative accessibility profiles from hundreds of cells.[11] Subsequent advancements included combinatorial indexing strategies, such as sci-ATAC-seq introduced in 2018 by Cusanovich et al., which barcodes nuclei across multiple rounds of splitting and pooling to scale up to tens of thousands of cells without physical isolation.[25] Droplet-based encapsulation protocols, exemplified by the 10x Genomics Chromium system, further revolutionized scalability by partitioning nuclei into emulsion droplets for barcoded tagmentation, typically requiring an input of 5,000–10,000 nuclei per sample to generate libraries from thousands to tens of thousands of cells.[6] scATAC-seq data exhibit high sparsity, with individual cells covering only 1–10% of the genome due to limited fragment detection per nucleus, often resulting in binary accessibility matrices where peaks are scored as present or absent.[26] To address amplification biases introduced during library preparation, some protocols incorporate unique molecular identifiers (UMIs) to deduplicate PCR artifacts and improve quantification accuracy, particularly in low-input settings.[27] These features distinguish scATAC-seq from bulk methods, which average signals across populations, by revealing cell-to-cell variability in open chromatin regions. Key applications of scATAC-seq include deconvoluting cell types in complex tissues, such as identifying rare immune subsets like exhausted T cells within tumor microenvironments, as demonstrated in profiling of human cancer samples. It also supports trajectory inference to map differentiation paths, for instance, tracing hematopoietic lineage progression from progenitors to mature cells by linking accessibility changes to developmental stages.[28] When integrated with single-cell RNA-seq, scATAC-seq enhances regulatory inference, though it primarily stands alone for epigenetic heterogeneity. A landmark example is the 2021 atlas by Domcke et al., profiling over 1.3 million single cells and nuclei across 30 human tissues, which uncovered subtype-specific regulatory elements driving tissue-specific gene programs.[29]Spatial and Multimodal ATAC-seq
Spatial ATAC-seq extends traditional ATAC-seq by incorporating spatial barcoding to map chromatin accessibility while preserving tissue architecture, enabling the study of epigenetic variation in situ. One seminal method, spatial-ATAC-seq, uses microfluidic channels to deliver barcoded oligonucleotides to fixed tissue sections, followed by in situ Tn5 tagmentation and sequencing, achieving resolutions down to 20 μm per pixel, often capturing single nuclei.[12] This approach has revealed region-specific chromatin accessibility patterns in mouse embryos and adult brain tissues, such as differential accessibility in radial glia progenitors during central nervous system development.[12] More recent innovations, like SPACE-seq introduced in 2025, adapt ATAC-seq with polyA-tailed transposomes on platforms such as 10x Visium, combining spatial epigenomics with transcriptomics and lineage tracing in frozen tissues at similar micron-scale resolutions.[30] Multimodal ATAC-seq variants integrate chromatin accessibility with other omics layers, typically RNA expression, to link regulatory elements to transcriptional states in single cells, providing foundational data for spatial extensions. sci-CAR, developed in 2018, employs split-pool barcoding to co-profile ATAC-seq and RNA-seq in thousands of nuclei, yielding thousands of unique fragments and UMIs per cell to infer cis-regulatory correlations, such as in pseudotemporal dynamics of treated cell lines.[31] Building on this, SHARE-seq (2020) uses iterative hybridization for high-throughput joint profiling, detecting over 8,000 ATAC fragments and 2,500 RNA UMIs per cell across tissues like mouse skin, where it identifies domains of regulatory chromatin (DORCs) that overlap super-enhancers and predict cell fate potential.[32] ISSAAC-seq (2022) further enhances sensitivity through in situ RNA-DNA hybrid sequencing post-ATAC tagmentation, enabling flexible plate- or droplet-based workflows that capture chromatin heterogeneity within expression-defined cell types, such as during oligodendrocyte maturation in mouse cortex.[33] Protocol variations in these spatial and multimodal methods emphasize in situ tagmentation on tissue slides or split-pool ligation to minimize dissociation artifacts, with resolutions typically ranging from 10-100 μm to align epigenetic data with histological features.[12][30] Key insights include spatially resolved accessibility gradients in brain regions, highlighting tissue architecture's role in developmental epigenetics, and heterogeneous chromatin states in tumor microenvironments that reveal immune cell organization relative to lymphoid structures.[12] Data outputs consist of spatially binned accessibility matrices, often co-registered with images, facilitating overlays of peak calls with gene expression or protein markers.[12][30] Advances in 2024-2025 have incorporated AI for multimodal fusion, such as SIMO, which integrates spatial transcriptomics with single-cell ATAC and RNA data using optimal transport algorithms to predict regulatory interactions and map modalities at accuracies exceeding 80% in complex tissues like mouse brain.[34] These tools enable inference of 3D chromatin contacts from fused epigenomic layers, enhancing predictions of long-range interactions in developmental and disease contexts without direct Hi-C measurements.[34]Data Analysis and Computational Tools
Preprocessing and Peak Identification
Raw ATAC-seq data are typically obtained as FASTQ files from Illumina sequencing platforms, consisting of paired-end reads with a minimum length of 45 base pairs.[13] Initial preprocessing involves quality assessment using tools like FastQC to evaluate base quality scores, GC content distribution, and adapter contamination, followed by adapter trimming and removal of low-quality bases with software such as Trimmomatic or Cutadapt.[4] Reads are then aligned to a reference genome, such as hg38 for human samples, using aligners like BWA-MEM or Bowtie2, targeting a unique mapping rate exceeding 80%.[4] Post-alignment processing includes removal of PCR duplicates via Picard MarkDuplicates and filtering out mitochondrial reads to ensure less than 5% mitochondrial content, as higher levels indicate poor cell quality or contamination.[35] Aligned reads are stored in BAM format, with at least 50 million non-duplicate, non-mitochondrial reads recommended for reliable open chromatin profiling.[13] Quality control metrics are essential to validate data integrity. The fraction of reads in peaks (FRiP) measures the proportion of aligned reads overlapping called peaks, with values greater than 30% indicating high-quality enrichment for accessible regions.[36] Transcription start site (TSS) enrichment score assesses nucleosome-free fragment accumulation at gene promoters, where scores above 5 (ideally >7 for hg38) confirm effective tagmentation and low background noise.[37] These metrics, computed using tools like ATACseqQC, help filter suboptimal samples before proceeding.[4] Peak calling identifies regions of open chromatin by detecting significant read enrichments. The widely adopted MACS2 algorithm is employed, often with parameters tailored for ATAC-seq paired-end data:-f BAMPE -g hs --nomodel --shift -75 --extsize 150, which accounts for the Tn5 transposase offset of approximately 75 base pairs upstream and 50 base pairs downstream.[38] This process typically yields 50,000 to 100,000 peaks per sample, depending on cell type and sequencing depth.[13] Peak significance is determined using a binomial model, where the p-value reflects the probability of observing the read count under a null background distribution:
-\log_{10}(p\text{-value}) = -\log_{10}\left( \text{[binomial](/page/Binomial) CDF}(k \mid n, [\lambda](/page/Lambda)) \right)
Here, k is the observed read count in the region, n is the total reads, and [\lambda](/page/Lambda) is the expected count based on local background.[4]
To mitigate biases, downsampling is applied to normalize for GC content variations, and peaks overlapping ENCODE blacklist regions—known for artefactual enrichments like mappability issues—are removed.[13] Final outputs include BED files listing peak coordinates and scores for downstream use, as well as bigWig files generated from normalized read coverage for visualization in genome browsers like IGV.[4]