Fact-checked by Grok 2 weeks ago

Single-cell transcriptomics

Single-cell transcriptomics is a revolutionary approach in genomics that profiles the complete set of RNA transcripts, known as the transcriptome, within individual cells to uncover cellular heterogeneity and dynamic gene expression patterns that are masked in bulk tissue analyses. This technique, primarily executed through single-cell RNA sequencing (scRNA-seq), enables the high-resolution mapping of cell types, states, and transitions in complex biological systems, transforming fields such as developmental biology, immunology, and oncology. By isolating and sequencing mRNA from single cells or nuclei, it reveals subtle variations in gene activity that drive differentiation, response to stimuli, and disease progression. The origins of single-cell transcriptomics trace back to early efforts in the 1990s using quantitative PCR (qPCR) to measure gene expression in isolated cells, but the field truly accelerated with the integration of next-generation sequencing technologies around 2009. A landmark study by Tang et al. in 2009 demonstrated the first comprehensive single-cell transcriptome from mouse primordial germ cells, paving the way for scalable methods that could profile thousands to millions of cells simultaneously. Subsequent innovations, including microfluidics-based droplet encapsulation and barcoding introduced in the mid-2010s (e.g., Drop-seq and inDrop), dramatically reduced costs and increased throughput, making the technology accessible for large-scale projects like the Human Cell Atlas initiative launched in 2016. As of 2025, advancements in long-read sequencing, computational tools, and multimodal integrations (e.g., with spatial transcriptomics) have further enhanced sensitivity, allowing for isoform-level resolution and integration across species. Key methodologies in single-cell transcriptomics encompass a range of platforms balancing throughput, coverage, and cost. Plate-based approaches, such as SMART-seq3, provide full-length transcript coverage for detailed analysis of splice variants and allele-specific expression but are limited to thousands of cells due to manual handling. In contrast, droplet-based systems like 10x Genomics Chromium enable ultra-high-throughput profiling of up to approximately 1 million cells per run with multiplexing by encapsulating cells in oil droplets with unique barcodes, though they offer shallower coverage per cell. Combinatorial indexing methods, including SPLiT-seq, further optimize scalability by splitting libraries across multiple rounds of barcoding, achieving population-scale studies while minimizing equipment needs. Sample preparation typically involves dissociating tissues into viable single-cell suspensions, capturing polyadenylated mRNA via oligo-dT primers, reverse transcription, and library amplification before paired-end sequencing. The significance of single-cell transcriptomics lies in its ability to catalog novel cell types, infer developmental trajectories, and elucidate mechanisms of heterogeneity in health and disease. Applications span creating comprehensive atlases of human and model organism tissues, identifying rare disease-associated cell states in cancer and neurodegeneration, and tracing evolutionary conserved gene programs across species. Despite challenges like technical noise, dropout events, and data integration, ongoing refinements in multimodal profiling (combining transcriptomics with proteomics or spatial data) promise deeper insights into cellular ecosystems.

Background

Overview and importance

Single-cell transcriptomics encompasses a suite of technologies designed to profile the transcriptome—the full repertoire of RNA molecules, including messenger RNAs (mRNAs) and non-coding RNAs—within individual cells, thereby enabling the measurement of gene expression at unprecedented resolution. This approach captures the dynamic and heterogeneous nature of cellular states, revealing variations in gene activity that define cell types, developmental trajectories, and responses to environmental cues. By focusing on single cells rather than populations, it addresses fundamental limitations of traditional methods, providing insights into biological processes at the granular level essential for understanding complex systems. The importance of single-cell transcriptomics lies in its ability to uncover cellular heterogeneity that is obscured in bulk RNA sequencing, where gene expression signals are averaged across thousands of cells, masking rare subpopulations comprising less than 5% of a tissue. This resolution shift has revolutionized the study of dynamic processes, such as cell differentiation and lineage tracing, by identifying transient states and rare cell types that drive tissue function and disease progression. For instance, in oncology, it elucidates intratumor heterogeneity, highlighting diverse malignant subclones that contribute to therapy resistance and tumor evolution. In neuroscience, single-cell transcriptomics has illuminated the vast neuronal diversity in the brain, cataloging thousands of distinct subtypes based on unique transcriptional signatures that underpin circuit function and vulnerability to disorders. Overall, these capabilities position single-cell transcriptomics as a cornerstone of precision medicine, facilitating personalized diagnostics and targeted interventions by linking molecular profiles to individual cellular behaviors in health and disease.

Historical development

The development of single-cell transcriptomics originated in the early 1990s with pioneering efforts to quantify gene expression in individual cells using techniques like quantitative PCR (qPCR) and mRNA amplification. A foundational advance came in 1992 when Eberwine et al. demonstrated the amplification of mRNA from single live neurons via microinjection of primers, nucleotides, and enzymes into acutely dissociated rat hippocampal cells, allowing detection of specific transcripts in defined neuronal populations. This approach addressed the challenges of low RNA abundance in single cells and set the stage for more comprehensive profiling. A critical innovation emerged with the Switching Mechanism at the 5' end of the RNA Template (SMART), introduced in 2001, which exploited the template-switching activity of certain reverse transcriptases to generate full-length cDNA from minimal RNA inputs without fragmentation or tailing. This method enabled efficient amplification while preserving transcript integrity, becoming integral to subsequent single-cell protocols. Building on this, the Smart-seq protocol was adapted for single cells in 2012, supporting plate-based full-length mRNA sequencing from individual circulating tumor cells or limited samples, though still constrained to low throughput (typically 1-100 cells per experiment). The field transformed in 2009 with the first single-cell RNA sequencing (scRNA-seq) experiment by Tang et al., who developed an mRNA-seq assay to profile the whole transcriptome of individual mouse oocytes and blastomeres, detecting 5,270 genes—75% more than contemporary microarrays—and uncovering novel splice junctions and transcript isoforms. This proof-of-concept shifted focus from targeted qPCR to unbiased genome-wide analysis, revealing maternal mRNA contributions to early embryonic development. Scalability surged in the mid-2010s through droplet-based microfluidics, enabling high-throughput profiling. In 2015, Macosko et al. launched Drop-seq, a method that encapsulates single cells with barcoded mRNA-capture beads in nanoliter droplets, facilitating simultaneous RNA-seq of thousands of mouse retinal cells and identifying rare neuronal subtypes.00549-8) That same year, Klein et al. introduced inDrop, using releasable hydrogel barcodes in droplets to index transcripts from embryonic stem cells, achieving comparable throughput while minimizing biases in gene detection.00500-0) These innovations marked a departure from labor-intensive plate-based systems to automated, scalable platforms processing >1,000 cells per run. Commercialization in 2016 via the 10x Genomics Chromium system democratized access, integrating droplet encapsulation with gel-bead emulsions to routinely capture and barcode up to 10,000 cells per sample, accelerating adoption in diverse biological contexts. The 2020 Nobel Prize in Chemistry for CRISPR-Cas9 further propelled the field by enabling genome-wide perturbation screens paired with scRNA-seq, illuminating gene function at single-cell resolution. By the 2020s, throughput evolved to exceed 1 million cells per experiment, with recent integrations to spatial transcriptomics—such as sequencing-free whole-genome profiling of 23,000 human genes in single cells and tissue sections—enhancing spatiotemporal resolution of cellular heterogeneity.01037-2) This progression has fundamentally expanded the ability to dissect complex tissues into their molecular constituents.

Experimental methods

Cell isolation and preparation

Cell isolation and preparation represent the foundational steps in single-cell transcriptomics, where tissues or cell suspensions are processed to yield viable individual cells or nuclei suitable for downstream RNA capture and sequencing. This process aims to preserve cellular integrity and transcriptional states while minimizing artifacts such as stress-induced gene expression changes or loss of fragile cell types. Effective isolation ensures high-quality input, typically targeting yields of 10,000 to 1,000,000 cells per sample to support comprehensive profiling across diverse populations. Mechanical dissociation techniques, such as pipetting or grinding, are commonly employed for non-adherent cells or soft tissues to gently separate cells without chemical interference, though they may result in lower yields and higher debris compared to enzymatic methods. Enzymatic dissociation, using proteases like trypsin for epithelial cells or collagenase for stromal components, is widely adopted for adherent tissues, but requires optimization to avoid prolonged exposure that induces stress responses, such as upregulation of heat shock genes. For instance, dissociation at 4°C minimizes these artifacts by reducing enzymatic activity and preserving RNA integrity. Tissue-specific protocols are essential; in brain tissue, thin slicing followed by mild enzymatic treatment helps isolate neurons while mitigating dissociation-induced signatures in glia, where aggressive methods can artifactually elevate inflammatory gene expression. Flow cytometry (FACS) enables marker-based sorting of specific cell populations prior to transcriptomics, improving purity and reducing heterogeneity, particularly for rare subtypes like immune cells in complex tissues. Microfluidic approaches, such as droplet encapsulation in platforms like 10x Genomics, integrate isolation with barcoding, allowing high-throughput processing but necessitating uniform cell suspensions to avoid encapsulation biases. Post-isolation, viability is assessed using trypan blue exclusion, with protocols recommending >80% viable cells to ensure reliable RNA recovery, as dead cells compromise library quality. For archival or frozen samples, cryopreservation with 10% DMSO in fetal bovine serum maintains cell viability for months, enabling retrospective studies, though thawing must be rapid to limit RNA degradation. Fixation methods, such as methanol for nuclei, preserve samples for delayed processing without significantly altering transcriptomic profiles, making it suitable for droplet-based workflows. Single-nucleus RNA sequencing (snRNA-seq) circumvents issues with fragile intact cells by isolating nuclei via homogenization, which is particularly advantageous for frozen or fibrous tissues like brain or muscle. Key challenges include doublet formation, where multiple cells are captured as one, inflating heterogeneity and requiring computational detection downstream; this risk increases with higher cell loads in droplet systems. Dissociation biases further complicate representation, as epithelial cells are more susceptible to lysis than robust fibroblasts, leading to underrepresentation of fragile types in suspensions. Recent advances like laser capture microdissection (LCM) address spatial precision, enabling isolation of targeted cells from tissue sections for transcriptomics, with 2024 protocols enhancing RNA yield from fixed samples via optimized lysis buffers. Yield optimization focuses on balancing recovery with quality, often achieving 10^4-10^6 cells through iterative protocol refinement to support low-input RNA amplification needs.

RNA capture, amplification, and library preparation

In single-cell transcriptomics, RNA capture begins immediately after cell lysis to preserve the transcriptome snapshot, typically targeting messenger RNA (mRNA) due to its low abundance, typically comprising 1–5% of total cellular RNA (approximately 0.1–1.5 pg per mammalian cell). The predominant method employs poly-A selection, where oligo-dT primers bound to magnetic beads or surfaces hybridize to the poly-A tails of mRNA molecules, enabling efficient isolation from the total RNA pool while excluding non-coding RNAs like ribosomal RNA unless specifically targeted. For broader transcriptome coverage, including non-polyadenylated RNAs, alternative strategies such as ribosomal RNA depletion or total RNA capture via random priming are used, though these increase complexity and potential off-target capture. To facilitate high-throughput multiplexing, unique molecular identifiers (UMIs) and cell-specific barcodes are incorporated during capture; these short DNA sequences tag individual transcripts and cells, allowing demultiplexing post-sequencing and mitigating amplification artifacts. Following capture, reverse transcription converts RNA to complementary DNA (cDNA) using reverse transcriptase enzymes, often with template-switching mechanisms to add universal priming sites for subsequent amplification. Amplification is essential due to the sparse starting material, employing whole-transcriptome amplification (WTA) via polymerase chain reaction (PCR) or in vitro transcription (IVT) to generate sufficient material for sequencing. Plate-based protocols like SMART-seq utilize template-switching oligo (TSO) technology to achieve full-length cDNA coverage, enabling isoform detection but at the cost of higher per-cell expense and lower throughput. In contrast, droplet-based methods, such as Drop-seq and the 10x Genomics Chromium system, focus on 3'-end capture with UMIs to reduce PCR duplicates and bias, supporting thousands of cells per run through emulsion-based barcoding in gel-beads-in-emulsion (GEMs).00549-8) However, these approaches introduce 3' bias and dropout rates, where longer genes or lowly expressed transcripts are underrepresented due to incomplete reverse transcription and fragmentation inefficiencies. Library preparation transforms amplified cDNA into a sequencing-ready format, involving fragmentation to generate insert sizes compatible with short-read platforms, followed by end-repair, A-tailing, and adapter ligation for Illumina-compatible indexing. For efficiency, tagmentation enzymes like those in the Nextera system simultaneously fragment and append adapters, streamlining the process and reducing hands-on time in high-throughput workflows. In the 10x Genomics protocol, post-amplification libraries are purified and quantified before pooling, with UMIs enabling absolute quantification by counting unique transcript molecules rather than PCR duplicates. Recent enhancements include UMI-optimized kits from 10x Genomics that achieve capture efficiencies of approximately 30% of input mRNA, reducing dropout rates through advanced barcoding and enzymatic formulations and enhancing quantification accuracy in diverse cell types.

Sequencing platforms and protocols

Single-cell transcriptomics relies primarily on high-throughput sequencing platforms to generate the vast amounts of data required for profiling gene expression at cellular resolution. The dominant platform is Illumina's short-read sequencing by synthesis (SBS) technology, exemplified by the NovaSeq series, which offers ultra-high throughput of up to 20 billion single reads per dual flow cell run in under two days, enabling the analysis of millions of cells per experiment. This platform's high accuracy, with error rates below 0.1% for single nucleotide polymorphisms that could impact quantification, makes it ideal for standard single-cell RNA sequencing (scRNA-seq) workflows. Emerging long-read technologies, such as Oxford Nanopore Technologies' nanopore sequencing and Pacific Biosciences' (PacBio) single-molecule real-time (SMRT) sequencing, are gaining traction for their ability to capture full-length transcripts and detect isoforms, which short-read methods often fragment. Nanopore sequencing provides real-time, ultra-long reads exceeding 100 kb with moderate accuracy (90-98%), and pilot studies in 2024 demonstrated its application to scRNA-seq for isoform-level resolution in mouse retina cells, yielding over 1.4 billion long reads from 30,000 cells. Similarly, PacBio's HiFi reads (5-30 kb) achieve high accuracy (up to 99.9%) for full-length isoform sequencing in single cells, as shown in methods like MAS-Seq, which integrate barcoding and unique molecular identifiers (UMIs) to profile isoform diversity.
PlatformRead LengthThroughputAccuracyKey scRNA-seq ApplicationCitation
Illumina (e.g., NovaSeq)Short (50-300 bp)Very high (up to 20B reads/run)>99.9%High-throughput gene quantification
Oxford NanoporeLong (>100 kb)Moderate-high90-98%Isoform detection in single cells
PacBio (SMRT/HiFi)Long (5-30 kb)Moderate90-99.9%Full-length transcript profiling
Protocols for scRNA-seq sequencing emphasize efficiency to balance resolution and cost, typically employing paired-end reads to improve alignment and quantification accuracy over single-end reads, particularly for protocols like 10x Genomics Chromium, which recommend a minimum depth of 20,000-50,000 read pairs per cell for basic gene expression profiling. Multiplexing scales experiments significantly, with 10x Genomics enabling 5,000-20,000 cells per lane on NovaSeq through barcode demultiplexing, allowing thousands of cells to be sequenced simultaneously. UMIs are incorporated during library preparation to enable deduplication, correcting for amplification biases in downstream processing. Costs have plummeted since early scRNA-seq efforts around 2010, when profiling a single cell could exceed $1,000 due to low throughput and manual handling, to approximately $0.01-0.10 per cell in 2025 high-throughput setups on Illumina platforms, driven by economies of scale and automation. Recent innovations are exploring sequencing-free alternatives to further reduce costs and complexity, such as the 2025 reverse-padlock amplicon-encoding fluorescence in situ hybridization (RAEFISH) method, which enables whole-genome spatial transcriptomics of up to 23,000 human genes at single-molecule resolution without sequencing, using cost-effective probe synthesis ($158 per experiment versus thousands for sequencing-based approaches). This hybridization-based readout integrates well with scRNA-seq pipelines for validation, highlighting a shift toward hybrid protocols that bypass traditional sequencing for specific applications like spatial profiling.

Quality control and experimental considerations

Quality control in single-cell transcriptomics experiments is essential to ensure the reliability and reproducibility of data, focusing on minimizing technical artifacts during cell isolation, RNA capture, and library preparation. Standard metrics include cell viability exceeding 90%, assessed via flow cytometry or viability dyes to exclude dead cells that could introduce bias through ambient RNA contamination. RNA integrity, measured by the RNA Integrity Number (RIN), should be greater than 7 prior to sequencing to avoid degradation-related dropout events and skewed expression profiles. Doublet rates are targeted below 5%, achieved through optimized cell loading densities and encapsulation efficiencies in droplet-based methods to prevent artificial hybrid profiles. Capture efficiency is evaluated post-experimentally, aiming for detection of over 1000 unique genes per cell to confirm adequate transcript recovery without excessive shallow sequencing. Experimental considerations encompass several potential pitfalls unique to single-cell workflows. Batch effects arising from variations in reagent lots, dissociation enzymes, or processing times can confound biological signals, necessitating randomized experimental designs and the use of unique molecular identifiers (UMIs) for mitigation. Environmental stresses, such as hypoxia or mechanical damage during tissue dissociation, may induce transient gene expression changes (e.g., stress response pathways), which can be minimized by performing procedures under controlled normoxic conditions and using cold-active proteases for gentler tissue disaggregation. For human samples, ethical and safety protocols require Institutional Review Board (IRB) compliance to ensure informed consent and proper handling of biohazards. Troubleshooting low yields often involves optimizing lysis buffer conditions, such as maintaining a pH around 8.0 to enhance RNA release without degradation, alongside empirical testing of enzyme concentrations. Cost-benefit analyses highlight trade-offs between high-throughput platforms (e.g., 10x Genomics, enabling >10,000 cells per run at lower per-cell cost but shallower coverage) and deep-coverage methods (e.g., SMART-seq, for fewer cells with higher transcript detection but increased expense), guiding selection based on study goals like rare cell identification. Recent 2025 guidelines emphasize standardization for reproducibility, recommending detailed protocol documentation and validation across labs. Key concepts include the incorporation of spike-in controls, such as External RNA Controls Consortium (ERCC) mixes, added at known concentrations to enable absolute quantification of transcripts and assessment of technical noise independent of cellular variability. Experimental design should incorporate power analysis to determine sample sizes, typically requiring over 1000 total cells to detect rare populations with sufficient statistical power, ensuring robust inference of cell states. Sequencing depth, while optimized in prior protocol sections, influences capture efficiency and should be balanced to achieve these metrics without unnecessary expenditure.

Data preprocessing

Raw data processing and alignment

Raw data processing in single-cell transcriptomics begins with the conversion of sequencer output files, typically FASTQ files containing lane-demultiplexed reads, into a quantifiable format suitable for downstream analysis. This involves initial quality assessment and filtering to remove low-quality reads and adapters, often using tools like FastQC for visualization of read quality metrics and Cutadapt for trimming adapters and low-quality bases. These steps are crucial due to the sparse and noisy nature of single-cell data, where sequencing errors can disproportionately affect the limited transcripts per cell. Following quality control, demultiplexing assigns reads to individual cells based on cell barcodes, while unique molecular identifiers (UMIs) enable error correction and deduplication to mitigate PCR amplification biases. Pipelines like Cell Ranger from 10x Genomics perform barcode whitelisting against known sequences and correct UMI errors using a Hamming distance threshold of 1 (substitutions only), ensuring accurate assignment of reads to cells and genes. Similarly, STARsolo implements comparable barcode and UMI processing logic, integrating it with spliced alignment for efficient handling of droplet-based protocols. Alternative tools, such as Alevin (built on the Salmon quasi-mapping framework), offer faster processing for large datasets by using indexed transcriptome references and probabilistic assignment of multimapping reads, reducing runtime while maintaining accuracy comparable to STARsolo. Alignment maps filtered reads to a reference genome or transcriptome, typically using indexed assemblies like those from GENCODE for human or mouse annotations, to quantify expression at the gene level. STARsolo, an extension of the STAR aligner optimized for single-cell RNA-seq, excels in handling spliced alignments and multimappers, including intronic reads that capture pre-mRNA and improve detection of lowly expressed genes; including introns can increase usable reads by 20-40% in 3' gene expression libraries. Recent multi-center benchmarks highlight STAR's superior performance, achieving high mapping rates, typically 80-90%, on diverse datasets when paired with comprehensive annotations. Alevin further addresses multimappers through equivalence classes and decoy sequences, minimizing biases from repetitive regions. The culmination of these steps produces a gene-cell count matrix, where rows represent genes and columns represent cells, with entries denoting UMI counts (or reads after deduplication) in sparse matrix format such as MatrixMarket (.mtx) to efficiently store the high proportion of zeros inherent to single-cell data. This matrix serves as the foundational input for subsequent analyses, though challenges persist with novel or lowly expressed transcripts, where reference-based mapping may miss variants; de novo assembly is rarely applied in single-cell contexts due to the sparsity and dropout events that complicate reconstruction without a reference genome.

Normalization, scaling, and batch effect correction

In single-cell transcriptomics, normalization adjusts raw gene expression counts to account for technical variations such as differences in sequencing depth and capture efficiency across cells, enabling fair comparisons of biological signal. A common approach is library size normalization, where counts are scaled by the total number of reads or unique molecular identifiers (UMIs) per cell; for example, counts per million (CPM) computes the expression as (raw count / total counts per cell) × 10^6. The general formula for normalized counts is raw count divided by (total reads × scaling factor), where the scaling factor estimates relative library sizes to mitigate biases from uneven sequencing. UMI-based protocols, by collapsing duplicate molecules, avoid PCR amplification biases that disproportionately affect lowly expressed genes in amplification-heavy workflows. Single-cell-specific normalization methods address sparsity and noise inherent to scRNA-seq data. The scran package employs a pooling strategy, summing counts across groups of similar cells to robustly estimate size factors via deconvolution, which outperforms global scaling in heterogeneous datasets by borrowing information across cells. Similarly, sctransform uses regularized negative binomial regression to model technical noise, producing Pearson residuals that normalize and stabilize variance simultaneously; these residuals are defined as (observed - expected) / sqrt(variance), capturing deviations from the mean-variance trend while handling zero inflation. Zero-inflated models like ZINB-WaVE incorporate a dropout component in a negative binomial framework, estimating latent factors for both expression and dropout probabilities to yield denoised low-dimensional representations suitable for downstream analysis. Scaling transforms normalized data to meet assumptions of analytical methods, such as linearity in regression or normality in clustering. Log-transformation, typically log1p (log(1 + normalized count)), reduces skewness and handles zeros by adding a pseudocount of 1, though it can distort lowly expressed genes if not preceded by proper normalization. Variance stabilization further adjusts for heteroscedasticity, where highly expressed genes show inflated variance; methods like deviance residuals from generalized linear models provide a stabilized scale by measuring the contribution of each observation to the model's deviance. Batch effects, arising from systematic technical differences between experimental runs (e.g., reagent lots or instruments), confound biological interpretations and require correction to integrate datasets. ComBat, an empirical Bayes method originally for microarrays, adjusts for known batch covariates by estimating and removing additive and multiplicative effects while protecting biological variance; adaptations like ComBat-seq extend it to count data using negative binomial assumptions. Harmony integrates batches by projecting cells into a shared corrected space via iterative linear transformations on principal components, preserving cell-type structure in large datasets. The mutual nearest neighbors (MNN) algorithm, implemented efficiently as fastMNN, identifies cross-batch cell pairs with similar expression profiles and corrects via low-rank approximations, effectively aligning subspaces without over-correcting rare populations. Probabilistic approaches like scVI employ variational autoencoders to model counts with batch as a latent covariate, enabling joint normalization, imputation, and integration in a Bayesian framework that quantifies uncertainty. Approaches to handle dropouts, such as imputation, carry pitfalls including artificial reduction of gene expression variability and introduction of false correlations, potentially masking true biological heterogeneity; thus, model-based residuals are preferred over direct imputation.

Core data analysis

Dimensionality reduction techniques

Single-cell transcriptomics generates high-dimensional datasets, often comprising expression profiles for 20,000 genes across 10,000 or more cells, exacerbating the curse of dimensionality where Euclidean distances lose interpretability and analyses become computationally intensive. Dimensionality reduction techniques address this by projecting data into lower-dimensional spaces, typically 2D or 3D, to facilitate visualization of cell populations and enable downstream analyses like clustering while retaining key biological variance. These methods are generally applied following data normalization and selection of highly variable genes (HVGs), such as the top 2,000 genes identified via mean-variance modeling to prioritize biologically informative features over technical noise. Principal component analysis (PCA) is a foundational linear technique that transforms the data by finding orthogonal axes of maximum variance. In single-cell RNA-seq (scRNA-seq), the first 10–50 principal components commonly capture 50–80% of the total variance, providing a global overview of data structure. PCA is computed via singular value decomposition (SVD) on the centered expression matrix X: X = U \Sigma V^T where U contains left singular vectors, \Sigma the singular values, and V the right singular vectors representing principal components; the scores for cells are given by X V. To determine the optimal number of components, elbow plots visualize the cumulative explained variance, revealing an inflection point beyond which additional components add minimal biological signal. Non-linear methods like t-distributed stochastic neighbor embedding (t-SNE) excel at preserving local neighborhoods, making them ideal for revealing fine-grained cell subtypes in scRNA-seq visualizations. t-SNE minimizes the Kullback-Leibler (KL) divergence between joint probability distributions in high- and low-dimensional spaces: KL(P || Q) = \sum_{i} P_{ij} \log \frac{P_{ij}}{Q_{ij}} where P and Q represent similarities in the original and embedded spaces, respectively. The perplexity parameter, typically set to 30–50 for scRNA-seq datasets of thousands of cells, controls the balance between local and global structure but requires tuning to avoid overcrowding or fragmentation. Uniform manifold approximation and projection (UMAP) offers a faster, graph-based alternative to t-SNE, constructing a fuzzy topological representation of the data before optimizing low-dimensional embeddings, often with default n_neighbors=15 to capture local cell similarities in scRNA-seq. UMAP better preserves both local clusters and global relationships compared to t-SNE, enabling scalable analysis of large datasets while reducing computational demands. Single-cell-specific adaptations emphasize HVG pre-selection to mitigate sparsity and noise, enhancing reduction quality. Batch-aware approaches, such as scPCA frameworks, integrate condition effects and technical batches directly into the decomposition to prevent confounding biological signals. Methods such as PHATE, introduced in 2017, use diffusion-based distances to preserve trajectory-like structures in scRNA-seq data, offering improved global continuity over traditional methods as demonstrated in applications through 2024.

Clustering and cell type annotation

Clustering in single-cell transcriptomics involves partitioning cells into groups based on similarities in their gene expression profiles, typically after dimensionality reduction to low-dimensional embeddings such as those from PCA or UMAP. Graph-based methods are widely used, constructing a k-nearest neighbors (k-NN) graph where edges represent expression similarity, followed by community detection algorithms like Louvain or its improved variant, Leiden. These algorithms optimize modularity to identify clusters, with the resolution parameter (often tuned between 0.4 and 1.2) controlling the granularity of partitioning—lower values yield broader clusters, while higher values promote finer subdivisions. Density-based clustering, such as HDBSCAN, offers an alternative by identifying clusters as high-density regions in the expression space without assuming spherical shapes, making it robust to varying cluster densities common in heterogeneous tissues. Model-based approaches, including Gaussian mixture models, assume data arise from a mixture of multivariate normal distributions and use expectation-maximization to assign cells probabilistically to components, providing uncertainty estimates for assignments. Overclustering, where a high resolution is initially applied to generate many small clusters, allows subsequent merging to resolve rare subtypes while avoiding underclustering of diverse populations. Cluster quality is often validated using the silhouette score, which measures how well cells fit within their cluster relative to others; scores above 0.5 indicate strong separation, though high noise in single-cell data can lower averages to 0.3-0.6. Prior to clustering, doublet removal is essential to eliminate artifacts from co-encapsulated cells, with tools like Scrublet simulating doublets and scoring cells (threshold >0.2 for removal) based on their similarity to simulated aggregates. Cell type annotation assigns biological identities to clusters by leveraging prior knowledge of gene expression patterns. Marker gene identification identifies cluster-specific genes using statistical tests like the Wilcoxon rank-sum, which compares expression distributions between a cluster and the rest of the cells to highlight enriched genes for manual interpretation against databases. Automated reference-based mapping, such as SingleR, correlates query cluster profiles with annotated reference datasets like PanglaoDB, assigning labels based on Spearman rank correlations to known cell type signatures. Recent advancements incorporate AI foundation models for automated annotation; for instance, scExtract uses large language models to process raw data through to labeling, achieving high accuracy on diverse datasets by integrating contextual gene knowledge. These methods reduce manual effort while improving consistency, particularly for novel cell states not in traditional references.

Differential expression analysis

Differential expression (DE) analysis in single-cell transcriptomics identifies genes whose expression levels differ significantly between predefined groups of cells, such as clusters identified from prior analyses, facilitating the interpretation of cellular heterogeneity and functional differences. This process is essential for pinpointing marker genes that define cell types or states, but it must account for the unique characteristics of single-cell RNA sequencing (scRNA-seq) data, including high dropout rates, sparsity, and technical variability. Unlike bulk RNA-seq, where averaging across many cells smooths noise, scRNA-seq requires methods that model the discreteness and zero-inflation of counts to avoid false positives or negatives. Methods for DE analysis can be broadly categorized into those adapted from bulk RNA-seq and those specifically designed for single-cell data. Bulk-like approaches, such as the Wilcoxon rank-sum test implemented in Seurat's FindMarkers function or adaptations of DESeq2 via pseudobulk aggregation (where counts are summed across cells within samples or groups before analysis), treat scRNA-seq data as pseudo-replicates to leverage established negative binomial models. These methods perform well when cell numbers per group are sufficient but may underperform with sparse data due to unmodeled zeros. In contrast, single-cell-specific methods like MAST (Model-based Analysis of Single-cell Transcriptomics) explicitly account for zero-inflation by modeling the data in two parts: a hurdle model separating dropout events from expression levels in detecting cells. Similarly, edgeR's quasi-likelihood framework extends negative binomial modeling to handle overdispersion and low cell counts in scRNA-seq, using empirical Bayes shrinkage for robust dispersion estimates. Recent tools, such as DiSC (2025), provide fast DE analysis tailored for large-scale scRNA-seq datasets. The typical workflow for DE analysis involves comparing expression between two or more groups, computing log2 fold-changes (often thresholding at >1 for meaningful differences) and statistical significance via p-values adjusted for multiple testing using the Benjamini-Hochberg procedure to control the false discovery rate (FDR) at <0.05. For rare cell populations, methods must consider statistical power, as low cell numbers can inflate variance; tools like edgeR incorporate quasi-likelihood tests to maintain reliability even with few observations. A key statistical test in many frameworks, including DESeq2 adaptations and scDE tools, is the Wald test, which assesses the significance of the log fold-change coefficient \beta by computing the z-score z = \beta / \mathrm{SE}(\beta), where SE is the standard error derived from the variance-covariance matrix of the model; under the null hypothesis, this follows a standard normal distribution, yielding a p-value. Post-DE, gene ontology (GO) enrichment on significant genes uses the hypergeometric test to evaluate overrepresentation of pathways, calculating the probability of observing k annotated genes in the DE list by chance given the total annotations. To mitigate biases, analysts must avoid overinterpreting dropouts as biological zeros, as technical factors like capture efficiency often contribute; methods like MAST help by estimating the proportion of truly expressing cells. Additionally, the Benjamini-Hochberg correction is crucial for the thousands of genes tested, sorting p-values and adjusting them to control FDR, preventing inflated discovery rates in high-dimensional scRNA-seq data. These steps ensure that identified differentially expressed genes provide reliable insights into cellular functions without undue emphasis on noise.

Advanced analyses

Trajectory and pseudotime inference

Trajectory inference in single-cell transcriptomics aims to reconstruct continuous biological processes, such as cell differentiation or response to stimuli, from static snapshots of gene expression data by ordering cells along a pseudotemporal axis. This approach assumes that cells captured at a single time point represent a progression through developmental states, enabling the inference of dynamic changes without temporal sampling. Seminal methods leverage low-dimensional embeddings of the data to model trajectories as paths on a manifold, approximating the underlying biological continuum. Key methods include Diffusion Pseudotime (DPT), which computes pseudotime as the geodesic distance along diffusion maps, capturing branching trajectories by modeling cell transitions via random walks on the data manifold. In DPT, pseudotime for a cell i relative to a root is defined as the diffusion distance \tau_i = d_{\text{diff}}(i, root), normalized to a [0,1] scale, where d_{\text{diff}} approximates the geodesic distance on the manifold embedded by the diffusion kernel. Monocle, particularly in its second version using Discriminative Dimensionality Reduction via Tree embedding (DDRTree), constructs a principal tree by embedding cells into a low-dimensional space and fitting a minimum spanning tree to detect branches, allowing robust inference of complex, multifurcating trajectories. Slingshot builds on pre-computed clusters by constructing a minimum spanning tree across cluster centroids and fitting smooth curves (e.g., via principal curves) to infer lineages and assign pseudotimes along each path. These methods have been benchmarked for accuracy and scalability, with Slingshot, Monocle DDRTree, and DPT showing superior performance on linear and branching topologies in datasets like hematopoiesis. The typical workflow begins with root selection, often guided by known progenitor markers or automated scoring to identify the starting point of the trajectory. Pseudotime is then assigned to each cell as a continuous value from 0 (root) to 1 (endpoint), representing progress along the inferred path, while branch detection identifies bifurcation points where cells diverge into distinct lineages. These approaches assume deterministic progression, where gene expression changes follow a fixed sequence, though stochasticity can introduce noise. Validation is commonly performed using well-characterized systems, such as hematopoietic differentiation, where inferred trajectories align with established lineage markers like CD34 for progenitors. An extension incorporating dynamical information is RNA velocity, which models the rate of change in gene expression by distinguishing unspliced (nascent) from spliced mRNA ratios to predict future cell states and refine trajectory directions. The velocity for a gene, representing the rate of change ds/dt, is estimated as v_g = \beta_g u_g - \gamma_g s_g, where u_g and s_g are unspliced and spliced mRNA counts, \beta_g is the splicing rate, and \gamma_g is the degradation rate of spliced mRNA. Parameters are typically fitted assuming steady-state dynamics across cells, enabling projection of trajectories beyond observed data. Recent advances in time-resolved single-cell sequencing, such as Zman-seq for timestamping transcripts during immune responses, further enhance trajectory accuracy by integrating real temporal data with pseudotime inferences.

Gene regulatory network inference

Gene regulatory network (GRN) inference in single-cell transcriptomics aims to reconstruct the interactions between transcription factors (TFs) and their target genes from scRNA-seq data, revealing regulatory mechanisms at cellular resolution. Unlike bulk RNA-seq, single-cell data's sparsity and heterogeneity pose unique challenges, such as high dropout rates leading to zero-inflated expression profiles, necessitating the use of highly variable genes (HVGs) to focus on informative features and mitigate noise. Methods typically leverage co-expression patterns, motif enrichment, or causal inference to identify direct regulatory links, distinguishing them from indirect associations through pruning or scoring techniques. Correlation-based approaches, such as those using mutual information (MI), quantify dependencies between gene expression levels to infer potential regulatory edges. MI measures the shared information between two variables X and Y as I(X;Y) = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}, where p(x,y) is the joint probability and p(x), p(y) are marginals; higher values indicate stronger associations. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe), originally developed for bulk data, applies MI followed by data processing inequality (DPI) to prune indirect edges, assuming the weakest link in a triplet is indirect, and has been adapted for single-cell contexts by focusing on HVGs. Machine learning extensions like GENIE3 use random forests to predict TF-target links by modeling each gene's expression as a function of others, outperforming correlation methods in benchmarks on simulated scRNA-seq data. Bayesian frameworks, exemplified by SCENIC (Single-Cell rEgulatory Network Inference and Clustering), integrate co-expression with prior knowledge of TF binding motifs to infer regulons—co-regulated gene modules. The workflow begins with GENIE3 to identify co-expressed TF-gene pairs, followed by cisTarget for motif scanning in promoter regions to enrich direct targets, and AUCell to score regulon activity per cell based on gene set enrichment. This modular approach addresses sparsity by aggregating evidence and has been validated against ChIP-seq data, showing high overlap with in vivo binding sites in immune and tumor cells. For dynamic processes, Granger causality methods like SINGE extend inference by treating pseudotime-ordered cells as time series, testing if past values of one gene predict another's future expression beyond autocorrelation, thus capturing regulatory directionality in trajectories. Key challenges include distinguishing direct from indirect regulation, as co-expression alone cannot confirm causality, and sparsity amplifies false positives; solutions involve indirect edge pruning in ARACNe or motif-based filtering in SCENIC. Validation often overlays inferred networks with orthogonal data like ChIP-seq, where SCENIC regulons recover up to 70% of known TF targets in benchmarks. These methods enable the discovery of cell-type-specific regulators, such as in differentiation pathways, without relying on population averages.

Multi-omics and dataset integration

Single-cell transcriptomics often requires integration with other omics modalities, such as epigenomics (e.g., ATAC-seq), proteomics, or spatial transcriptomics, to capture a more holistic view of cellular states, as well as harmonization across multiple datasets to mitigate batch effects from technical variations like different sequencing platforms or experimental conditions. This process enables the identification of shared biological signals while preserving dataset-specific features, facilitating downstream analyses like cell type annotation and trajectory inference. One foundational approach for dataset integration is canonical correlation analysis (CCA), as implemented in the Seurat framework, which aligns multiple single-cell RNA-seq (scRNA-seq) datasets by maximizing the correlation between their low-dimensional representations. In CCA, for two datasets X and Y, the canonical vectors A and B are found by solving \max_{A,B} \corr(XA, YB), equivalent to \arg\max_{A,B} \cov(XA, YB) under unit variance constraints, projecting data into a shared subspace that captures aligned variance. This method has been widely adopted for batch correction, demonstrating effective removal of technical artifacts while retaining biological heterogeneity in benchmarks across diverse tissues. Anchor-based methods build on this by identifying "anchors"—pairs of mutual nearest neighbors (MNNs) across datasets in a high-dimensional space—to propagate corrections without assuming linear relationships, as seen in Seurat's integration workflow. These anchors, often selected via k-nearest neighbors in the CCA subspace, enable label transfer and alignment even for datasets with substantial batch effects, improving cell type concordance in large-scale atlases. For multi-omics integration, tools like Multi-Omics Factor Analysis (MOFA) decompose multiple modalities (e.g., RNA, ATAC-seq, proteomics) into shared latent factors that explain coordinated variation, while allowing modality-specific factors for unique signals. MOFA+ extends this to single-cell data by incorporating sparsity and scalability, revealing differentiation trajectories in applications like hematopoietic stem cell profiling. Joint embedding techniques, such as scVI (single-cell variational inference), leverage conditional variational autoencoders to learn a probabilistic latent space that integrates datasets or modalities by modeling batch effects as covariates in the generative process. This approach aligns cells via amortized inference, enabling transfer learning for annotation across unseen datasets and outperforming linear methods in preserving global structure during multi-batch integration. Modality alignment often relies on shared cells or landmarks, such as dissociated cells profiled in both scRNA-seq and spatial platforms, to map transcriptomic profiles onto tissue contexts. Recent advances include foundation models like CellFM, a transformer-based architecture with 800 million parameters pretrained on transcriptomics from 100 million human cells, which facilitates zero-shot integration and multi-omics querying by embedding diverse datasets into a unified representation. For spatial-transcriptomics integration, methods like those in Seurat or SpatialScope align Visium data with scRNA-seq references using anchors or generative models, deconvoluting spots into cell-type compositions and localizing rare subpopulations with sub-spot resolution. These strategies underscore the shift toward scalable, multimodal frameworks that enhance biological interpretability without extensive retraining.

Applications

Developmental and stem cell biology

Single-cell transcriptomics has revolutionized the study of developmental and stem cell biology by enabling the resolution of cellular hierarchies, lineage relationships, and dynamic gene expression changes at unprecedented granularity. In embryonic development, it facilitates lineage tracing by mapping transcriptional states across stages, revealing how progenitor cells diversify into specialized lineages during processes like gastrulation. This approach identifies transient cell states and rare populations that are undetectable in bulk analyses, providing insights into the molecular mechanisms driving morphogenesis and organogenesis. A prominent application is in constructing gastrulation atlases from human embryos using single-cell RNA sequencing (scRNA-seq), which has delineated the emergence of germ layers and mesodermal subtypes in the 2020s. For instance, spatially resolved scRNA-seq of a gastrulating human embryo at Carnegie Stage 7 identified distinct transcriptional profiles for epiblast, primitive endoderm, and emerging mesoderm, highlighting the coordination of cell migration and differentiation. These atlases, integrating datasets from zygote to gastrula stages, serve as references for validating in vitro embryo models and uncovering conserved developmental programs across species. Earlier foundational work, such as the 2019 scRNA-seq profiling of over 116,000 cells from mouse embryos spanning gastrulation, mapped the molecular transitions from naive pluripotency to lineage commitment, establishing benchmarks for trajectory reconstruction in mammalian development. In stem cell biology, scRNA-seq tracks differentiation trajectories from induced pluripotent stem cells (iPSCs), such as those converting to neurons, to dissect potency loss and branching decisions. Pseudotime analysis along these trajectories quantifies the progressive decline in pluripotency markers like NANOG and OCT4, correlating with upregulation of lineage-specific genes during iPSC-to-neuron differentiation. A 2022 study using scRNA-seq on human iPSC-derived dopaminergic neurons revealed branching points where neural progenitors diverge into midbrain or forebrain fates, guided by transcription factors like FOXA2 and LMX1A. Such analyses inform protocols for generating pure neuronal populations by identifying marker panels—e.g., SOX2 for neural progenitors and TUJ1 for mature neurons—for fluorescence-activated cell sorting (FACS) enrichment. Insights into rare progenitors, such as neural crest cells comprising approximately 1% of early embryonic cells, have been gained through scRNA-seq, which identifies unique markers like NGFR and SOX10 to isolate these multipotent cells for transplantation studies. ScRNA-seq also enables modeling of the Waddington landscape, visualizing developmental potential as a topographic map of transcriptional states where valleys represent stable cell fates and ridges denote barriers to transitions. By integrating pseudotime with diffusion maps, this framework reconstructs the energy landscape of stem cell differentiation, showing how stochastic fluctuations drive cells toward committed states during potency loss. A 2024 study applied this to single-cell data from differentiating neural stem cells, quantifying flux along trajectories to predict bifurcation points in gliogenesis. These models, briefly referencing trajectory inference methods like Monocle, underscore how scRNA-seq deciphers the regulatory logic of pluripotency maintenance and loss in both in vivo development and in vitro reprogramming. For glia differentiation, a 2024 practical guide in neuroscience highlights scRNA-seq workflows to profile astrocyte and oligodendrocyte trajectories from neural progenitors, emphasizing key markers like GFAP and OLIG2 for sorting and validation.

Immunology and disease modeling

Single-cell transcriptomics has revolutionized the study of immune cell diversity by enabling detailed mapping of leukocyte populations across human tissues, as demonstrated in the Human Cell Atlas initiative's comprehensive immune landscape. A landmark effort profiled over 20,000 immune cells from multiple organs, revealing tissue-specific signatures in T cells, B cells, and myeloid lineages that underpin immune homeostasis and response to perturbations. This atlas highlighted the prevalence of T cells as the dominant immune subset in most tissues, providing a reference for identifying rare or disease-associated states. In immunology, single-cell transcriptomics combined with T and B cell receptor (TCR/BCR) sequencing has facilitated clonotype tracking to dissect adaptive immune responses, particularly in infectious diseases like COVID-19. For instance, large-scale analyses of peripheral blood mononuclear cells from over 196 patients identified SARS-CoV-2-specific clonotypes enriched in activated CD4+ T cells and spike-reactive B cells, correlating disease severity with clonal expansion and polyfunctional responses. Such profiling revealed persistent memory clones post-infection, informing vaccine design and long-term immunity. Applications in disease modeling extend to oncology, where single-cell transcriptomics elucidates tumor microenvironments (TMEs) by characterizing immune exhaustion and heterogeneity. In pancreatic ductal adenocarcinoma (PDAC), profiling of tumor-infiltrating lymphocytes uncovered exhausted CD8+ T cells marked by high PD-1 and TIM-3 expression, driven by chronic antigen exposure and immunosuppressive cytokines, which correlate with poor prognosis. Integrating spatial transcriptomics further revealed perineural invasion patterns, with malignant cells co-opting nerves via ligand-receptor interactions to promote metastasis. Recent reviews emphasize how these insights enable personalized therapies, such as targeting exhaustion pathways to reinvigorate T cells in immunotherapy-resistant tumors. In hematologic malignancies like acute myeloid leukemia (AML), single-cell transcriptomics has exposed intratumor heterogeneity beyond genomic subtypes, identifying distinct transcriptional states linked to stemness and differentiation blocks. For example, analyses of patient-derived blasts delineated rare progenitor-like subpopulations with upregulated survival pathways, contributing to relapse risk. Differential expression (DE) analyses in these contexts have pinpointed drug resistance mechanisms, such as upregulated efflux pumps and anti-apoptotic genes in tolerant persister cells during chemotherapy exposure. Key concepts in these applications include inferring cell-cell communication through ligand-receptor pairs, as implemented in tools like CellPhoneDB, which statistically evaluates interactions in TMEs to uncover immunosuppressive signaling, such as PD-L1/PD-1 between tumor-associated macrophages and T cells. Additionally, single-cell RNA-seq enables aneuploidy detection by quantifying chromosome-wide expression imbalances, revealing mosaic karyotypes in cancer cells that drive genomic instability and therapeutic evasion. These approaches collectively enhance disease modeling by linking cellular states to pathological outcomes.

Emerging computational tools and databases

Recent advancements in single-cell transcriptomics have been driven by sophisticated computational pipelines that streamline data processing and analysis. Scanpy, a Python-based toolkit, enables scalable preprocessing, visualization, clustering, and trajectory inference for large datasets, with benchmarks showing it processes data over three times faster than alternatives like Seurat for growing dataset sizes. Seurat v5, an R package, introduces enhanced multimodal integration and graph-based methods for quality control and exploration, supporting efficient handling of single-cell RNA-seq data while addressing scalability for millions of cells. These pipelines facilitate reproducible workflows through integration with containerization tools like Docker, which encapsulate environments to mitigate variations in software dependencies and ensure consistent results across studies. Artificial intelligence models are transforming single-cell analysis by enabling predictive tasks and zero-shot annotations. CellFM, a foundation model pre-trained on over 100 million cells, excels in recovering masked gene embeddings and supports downstream applications like cell type prediction without task-specific fine-tuning. Other single-cell foundation models (scFMs) integrate heterogeneous datasets for biological discovery, with zero-shot evaluations highlighting their potential and limitations in generalizing across tissues and species. For visualization, interactive Shiny applications in R allow dynamic exploration of expression patterns and clustering results, aiding researchers in interpreting complex datasets through user-friendly interfaces. Key databases serve as repositories for sharing and querying single-cell transcriptomic data, promoting reproducibility and meta-analysis. The Single Cell Portal from the Broad Institute provides access to curated datasets with tools for gene search and visualization across thousands of experiments. The Human Cell Atlas (HCA) has amassed over 63 million cells by 2025, offering multi-omic profiles from diverse human tissues to map cellular diversity. GEO includes extensive single-cell subsets, while PanglaoDB specializes in gene markers for cell type identification, aggregating data from mouse and human studies. A 2024 Frontiers review emphasizes challenges in these databases, such as data standardization and interoperability, which hinder cross-study comparisons despite their growing scale. Benchmarking initiatives like the DREAM challenges evaluate tool performance on tasks such as clustering and differential expression, guiding the adoption of robust methods and revealing gaps in current approaches. Time-resolved tools, including those for pseudotime inference in dynamic processes, further enhance analysis of developmental trajectories, as highlighted in recent reviews. These resources collectively lower barriers to entry, foster collaborative research, and accelerate insights into cellular heterogeneity.

References

  1. [1]
    Establishing single cell RNA transcriptomics: a brief guide
    Sep 2, 2025 · Single cell RNA sequencing is a tool for evaluating the specific transcriptome usage of different cell types within an organism.
  2. [2]
    Single cell transcriptomics comes of age | Nature Communications
    Aug 27, 2020 · Single cell transcriptomics technologies have vast potential in advancing our understanding of biology and disease. Here, Sarah Aldridge and ...Development Of Single-Cell... · Single-Cell Transcriptomics... · Future Outlook
  3. [3]
  4. [4]
    Review The Technology and Biology of Single-Cell RNA Sequencing
    May 21, 2015 · The first transcriptomes generated via single-cell RNA-sequencing (scRNA-seq) were published in 2009 (Tang et al., 2009), only 2 years after the ...Missing: seminal | Show results with:seminal
  5. [5]
    Beyond bulk: A review of single cell transcriptomics methodologies ...
    In this review, we will focus on a number of technical aspects of single cell transcriptomic profiling, such as various methods of single cell isolation and ...
  6. [6]
    Low-input and single-cell RNA sequencing - Illumina
    Difference between bulk and single-cell RNA-Seq​​ However, bulk RNA-Seq may fail to capture transcripts from rare but biologically relevant subpopulations, such ...10x Genomics Chromium Single... · Smart-Seq Ultra Low Input... · Nextseq 1000 And Nextseq...
  7. [7]
    Single-Cell Transcriptomic Analysis of Tumor Heterogeneity - PubMed
    Single-cell RNA sequencing (scRNA-seq) may help dissect diverse tumor cells, potentially informing targeted therapies and clinical trial criteria.
  8. [8]
    Neural cell diversity in the light of single-cell transcriptomics - PMC
    Jan 23, 2024 · In this review, we summarize how single-cell transcriptomics have changed our view on the cellular diversity in the human brain.
  9. [9]
    A practical guide to single-cell RNA-sequencing for biomedical ...
    Aug 18, 2017 · Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist ...Missing: seminal | Show results with:seminal
  10. [10]
    Analysis of gene expression in single live neurons - PMC - NIH
    The RNA from defined single cells is amplified by microinjecting primer, nucleotides, and enzyme into acutely dissociated cells from a defined region of rat ...
  11. [11]
    Review : Single-Cell mRNA Amplification: Implications for Basic and ...
    Eberwine J., Yeh H., Miyashiro K., et al. Analysis of gene expression in single live neurons. Proc Natl Acad Sci 1992;89:3010-3014. Crossref · PubMed · Web of ...
  12. [12]
    a SMART approach for full-length cDNA library construction - PubMed
    Here, we describe a fast, simple method for constructing full-length cDNA libraries using SMART technology. This novel procedure uses the template-switching ...Missing: termini paper
  13. [13]
    Full-Length mRNA-Seq from single cell levels of RNA and individual ...
    Here we describe a novel and robust mRNA-Seq protocol (Smart-Seq) that is applicable down to single cell levels.
  14. [14]
    mRNA-Seq whole-transcriptome analysis of a single cell - Nature
    Apr 6, 2009 · Here we modified a widely used single-cell whole-transcriptome amplification method to generate cDNAs as long as 3 kilobases (kb) efficiently ...
  15. [15]
    mRNA-Seq whole-transcriptome analysis of a single cell - PubMed
    mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009 May;6(5):377-82. doi: 10.1038/nmeth.1315. Epub 2009 Apr 6. Authors. Fuchou Tang ...
  16. [16]
    10x Genomics Announces Commercial Availability of Single Cell 3 ...
    ... ChromiumTM System launch anticipated for Q2-2016. The company began shipping the Single Cell 3' Solution early to meet demand from early adopting customers ...
  17. [17]
    CRISPR screening in hematology research: from bulk to single-cell ...
    Oct 24, 2023 · CRISPR tiling scanned different exons of a gene using a high-density gRNA library while simultaneously performing single-cell RNA-seq.
  18. [18]
    CellFM: a large-scale foundation model pre-trained on ... - Nature
    May 20, 2025 · We have collected a diverse dataset of 100 million human cells, on which we train a single-cell foundation model (CellFM) containing 800 million parameters.
  19. [19]
    Optimized design of single-cell RNA sequencing experiments for ...
    Oct 30, 2020 · Single-cell RNA-sequencing (scRNA-Seq) is a compelling approach to directly and simultaneously measure cellular composition and state, ...
  20. [20]
    Single‐cell RNA sequencing technologies and applications: A brief ...
    Mar 29, 2022 · Single‐cell RNA sequencing (scRNA‐seq) technology has become the state‐of‐the‐art approach for unravelling the heterogeneity and complexity of RNA transcripts ...Missing: seminal | Show results with:seminal
  21. [21]
    Systematic assessment of tissue dissociation and storage biases in ...
    Jun 2, 2020 · Here, we compare gene expression and cellular composition of single-cell suspensions prepared from adult mouse kidney using two tissue dissociation protocols.
  22. [22]
    Image-seq: Spatially Resolved Single-Cell Sequencing
    Nov 24, 2022 · We present Image-seq, a technology that provides single-cell transcriptional data on cells that are isolated from specific spatial locations under image ...<|control11|><|separator|>
  23. [23]
    DMSO cryopreservation is the method of choice to preserve cells for ...
    Jul 23, 2019 · DMSO cryopreservation is the method of choice to preserve cells for droplet-based single-cell RNA sequencing | Scientific Reports.
  24. [24]
    Methanol fixation is the method of choice for droplet-based single ...
    May 15, 2023 · Taken together, our results show that methanol fixation is the method of choice for performing droplet-based single-cell transcriptomics ...
  25. [25]
    enrichment of adherent cell types in single-nucleus RNA sequencing
    Dec 2, 2022 · Dissociation involves enzymatic digestion, which requires incubation. This leads to the alteration of gene expression by the cell ...
  26. [26]
    Protocol for high-quality single-cell RNA-seq from tissue sections ...
    Jun 21, 2024 · Single-cell RNA sequencing (scRNA-seq) combined with laser capture microdissection (LCM) offers a versatile framework for comprehensive ...
  27. [27]
    Single-cell RNA sequencing technologies and bioinformatics pipelines
    Aug 7, 2018 · In this review, we will focus on technical challenges in single-cell isolation and library preparation and on computational analysis pipelines available for ...
  28. [28]
    Massively parallel digital transcriptional profiling of single cells
    Jan 16, 2017 · The scRNA-seq microfluidics platform builds on the GemCode technology, which has been used for genome haplotyping, structural variant analysis ...
  29. [29]
    Advancements in single-cell RNA sequencing and spatial ...
    Feb 4, 2025 · This review highlights single-cell sequencing approaches, recent technological developments, associated challenges, various techniques for expression data ...
  30. [30]
    NovaSeq 6000 Sequencing System specifications - Illumina
    a. All sample throughputs are estimates and are based on dual flow cell runs. Human Genomes assumes > 120 Gb of data per sample to achieve 30× genome coverage ...
  31. [31]
  32. [32]
    Single-cell characterization of transcript isoforms with long-read ...
    May 24, 2024 · We profiled approximately 30,000 mouse retina cells, yielding 1.54 billion Illumina short-reads and 1.4 billion long nanopore sequencing reads.
  33. [33]
    High-throughput and high-accuracy single-cell RNA isoform ... - Nature
    May 6, 2023 · Recently, through combining single-molecule long-read sequencing technology (PacBio or Oxford Nanopore sequencing), researchers have developed ...
  34. [34]
    What is the recommended sequencing depth for Single Cell 3' and 5 ...
    Nov 3, 2025 · Answer: For new sample types, we recommend sequencing a minimum of20,000 read pairs/cell for Single Cell 3' and Single Cell 5' gene expression ...
  35. [35]
    Sequencing Requirements for Flex Gene Expression - 10x Genomics
    Oct 29, 2025 · The sequencing depth requirement (10,000 read pairs per cell) applies to the total pool of material representing the entire cell population. How ...
  36. [36]
    Cell Ranger's Gene Expression Algorithm - 10x Genomics
    A set of analysis pipelines that perform sample demultiplexing, barcode processing, single cell 3' and 5' gene counting, V(D)J transcript sequence assembly ...Missing: STARsolo Alevin
  37. [37]
    A real-world multi-center RNA-seq benchmarking study using the ...
    Jul 22, 2024 · Second, for alignment tools, STAR with Ensembl annotation resulted in the highest mapping rate. Similar to previous studies, the three alignment ...
  38. [38]
    Generation of count matrix | Introduction to Single-cell RNA-seq
    Each value in the matrix represents the number of reads in a cell originating from the corresponding gene. Using the count matrix, we can explore and filter ...
  39. [39]
    Single-cell RNA-Seq Count Matrix - OmicsBox User Manual - BioBam
    Export Count Matrix:Export the count matrix in Matrix Market File format. Three locations must be specified in the wizard (Figure 5) in order to save the mtx ( ...
  40. [40]
    Single-Cell Transcriptomics: Current Methods and Challenges in ...
    Apr 22, 2021 · However, due to low starting materials, the SC-RNA-seq data face various computational challenges: normalization, differential gene expression ...
  41. [41]
    Normalizing single-cell RNA sequencing data - PubMed Central - NIH
    In this perspective, we discuss commonly used normalization approaches and illustrate how these can lead to misleading results.Missing: correction seminal
  42. [42]
    Current best practices in single‐cell RNA‐seq analysis: a tutorial
    Jun 19, 2019 · This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.
  43. [43]
    Pooling across cells to normalize single-cell RNA sequencing data ...
    Apr 27, 2016 · We present a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization.
  44. [44]
    Gene length and detection bias in single cell RNA sequencing ... - NIH
    The extensive PCR amplification that is required for scRNA-Seq increases technical variability in the data by introducing amplification biases ( Stegle et al., ...
  45. [45]
    Normalization and variance stabilization of single-cell RNA-seq data ...
    Dec 23, 2019 · We present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments.
  46. [46]
    A general and flexible method for signal extraction from single-cell ...
    Jan 18, 2018 · We present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data.
  47. [47]
    ComBat-seq: batch effect adjustment for RNA-seq count data
    We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies.INTRODUCTION · MATERIALS AND METHODS · RESULTS · DISCUSSION
  48. [48]
    Fast, sensitive and accurate integration of single-cell data ... - Nature
    Nov 18, 2019 · Harmony iteratively learns a cell-specific linear correction function ... single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019) ...
  49. [49]
    Statistics or biology: the zero-inflation controversy about scRNA-seq ...
    Jan 21, 2022 · pointed out that a major drawback of scRNA-seq imputation is diminished gene expression variability across cells after imputation; they argued ...
  50. [50]
    Towards a comprehensive evaluation of dimension reduction ...
    Jul 19, 2022 · In this paper, we introduce and perform a systematic evaluation of popular DR methods, including t-SNE, art-SNE, UMAP, PaCMAP, TriMap and ForceAtlas2.Missing: seminal | Show results with:seminal
  51. [51]
    A Comparison for Dimensionality Reduction Methods of Single-Cell ...
    We developed a strategy to evaluate the stability, accuracy, and computing cost of 10 dimensionality reduction methods using 30 simulation datasets and five ...Missing: seminal papers
  52. [52]
    Benchmarking principal component analysis for large-scale single ...
    Jan 20, 2020 · In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA ...
  53. [53]
    The art of using t-SNE for single-cell transcriptomics - Nature
    Nov 28, 2019 · A protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels.
  54. [54]
    Dimensionality reduction for visualizing single-cell data using UMAP
    Dec 3, 2018 · A benchmarking analysis on single-cell RNA-seq and mass cytometry data reveals the best-performing technique for dimensionality reduction.
  55. [55]
    Nonlinear dimensionality reduction based visualization of single-cell ...
    Jan 11, 2024 · This paper navigates through a landscape of dimensionality reduction techniques essential for distilling meaningful insights from the scRNA-seq datasets.Scrna-Seq Data Collection... · Methodology · Mantel TestMissing: seminal | Show results with:seminal
  56. [56]
    [PDF] Review of single-cell RNA-seq data clustering ... - CityUHK Scholars
    May 1, 2023 · In this sec- tion, we review several commonly used dimension reduction methods including principal component analysis (PCA), t- distributed ...
  57. [57]
    Chapter 5 Clustering, redux | Advanced Single-Cell Analysis with ...
    A useful aspect of the silhouette width is that it naturally captures both underclustering and overclustering. Cells in heterogeneous clusters will have a large ...
  58. [58]
    Diffusion pseudotime robustly reconstructs lineage branching - Nature
    Aug 29, 2016 · Diffusion pseudotime (DPT) enables robust and scalable inference of cellular trajectories, branching events, metastable states and ...
  59. [59]
    Slingshot: cell lineage and pseudotime inference for single-cell ...
    Jun 19, 2018 · We introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data.
  60. [60]
    RNA velocity of single cells - Nature
    Aug 8, 2018 · RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. We validate its accuracy ...
  61. [61]
    Regulatory Network Inference Methods: Single Cell RNA Sequencing
    In this article, we review 15 such network inference methods developed for single-cell data. We discuss their underlying assumptions, inference techniques, ...
  62. [62]
    Integrating spatial and single-cell transcriptomics data using deep ...
    Nov 29, 2023 · We present SpatialScope, a unified approach integrating scRNA-seq reference data and ST data using deep generative models.
  63. [63]
    A technical review of multi-omics data integration methods
    Aug 1, 2025 · VAEs are widely used for multi-omics analysis, particularly for single-cell experiments, due to their flexibility in handling high-dimensional ...Deep Learning Approaches · Graph Neural Networks · Beyond Omics: Multimodal...<|control11|><|separator|>
  64. [64]
    A benchmark of batch-effect correction methods for single-cell RNA ...
    Jan 16, 2020 · We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type ...Missing: seminal | Show results with:seminal
  65. [65]
    Introduction to scRNA-seq integration - Satija Lab
    Nov 16, 2023 · The Seurat v5 integration procedure aims to return a single dimensional reduction that captures the shared sources of variance across multiple layers.Setup the Seurat objects · Perform analysis without... · Perform integration
  66. [66]
    Multi‐Omics Factor Analysis—a framework for unsupervised ...
    In an application to multi‐omics profiles from single‐cells, MOFA recovers differentiation trajectories and identifies coordinated variation between the ...
  67. [67]
    MOFA+: a statistical framework for comprehensive integration of ...
    May 11, 2020 · We present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of single-cell multi-modal data.
  68. [68]
    12. Data integration - Single-cell best practices
    Nov 10, 2022 · Batch effects are corrected by forcing connections between cells from different batches and then allowing for differences in cell type ...<|control11|><|separator|>
  69. [69]
    Analysis, visualization, and integration of spatial datasets with Seurat
    Jun 17, 2025 · This tutorial demonstrates how to use Seurat (>=3.2) to analyze spatially-resolved RNA-seq data. While the analytical pipelines are similar to the Seurat ...
  70. [70]
    A spatially resolved single cell atlas of human gastrulation - bioRxiv
    Jul 21, 2020 · We characterize in a spatially resolved manner the single cell transcriptional profile of an entire gastrulating human embryo approximately 16 to 19 days after ...
  71. [71]
    A comprehensive human embryo reference tool using single-cell ...
    Nov 14, 2024 · To authenticate human embryo models, single-cell RNA sequencing has been utilized for unbiased transcriptional profiling. However, an organized ...
  72. [72]
    Single-cell transcriptomics reveals the cell fate transitions of human ...
    Aug 13, 2022 · Our study provides the single-cell transcriptional landscape of in vitro DA differentiation, which can guide future improvements in DA preparation and quality ...Hescs Culture And Da... · Cell Sorting · Scrna-Seq Analysis
  73. [73]
    Uncovering underlying physical principles and driving forces of cell ...
    Aug 16, 2024 · Quantifying Waddington landscape and flux of cell development from single-cell transcriptomics data. (A) Workflow of constructing cell ...
  74. [74]
    A practical guide for single-cell transcriptome data analysis in ...
    This tutorial provides a practical guide to scRNA-seq data analysis in neuroscience, focusing on the essential workflows and theoretical foundations.
  75. [75]
    Single-cell transcriptome profiling of an adult human cell atlas of 15 ...
    We identified a total of 20,034 T cells prevailing in the immune cells of most organ tissues (Additional file 4: Table S21), which is consistent with a previous ...
  76. [76]
    Single-cell profiling of T and B cell repertoires following SARS-CoV ...
    Dec 22, 2021 · Our analyses revealed enrichment of spike-specific B cells, activated CD4 + T cells, and robust antigen-specific polyfunctional CD4 + T cell responses ...
  77. [77]
    Applications and techniques of single-cell RNA sequencing across ...
    Jul 23, 2025 · Abstract. Single-cell ribonucleic acid sequencing (scRNA-seq) is an important tool in molecular biology, allowing transcriptomic profiling ...Missing: seminal | Show results with:seminal
  78. [78]
    Unveiling novel insights in acute myeloid leukemia through single ...
    Apr 21, 2024 · Here, we summarize findings from various RNA-seq studies in adult AML that dissected the cellular and molecular heterogeneity of the AML ...<|separator|>
  79. [79]
    Single-cell transcriptional changes associated with drug tolerance ...
    Mar 12, 2021 · Recurrently overrepresented ontologies in genes that are differentially expressed between drug tolerant cell populations and drug sensitive ...
  80. [80]
    Mosaic autosomal aneuploidies are detectable from single-cell ...
    Nov 25, 2017 · We have developed a method that uses chromosome-wide expression imbalances to identify aneuploidies from single-cell RNA-seq data.
  81. [81]
    Best Tools for Single Cell RNA-Seq Analysis in 2025 - Biostate.ai
    Aug 2, 2025 · Benchmark studies show Scanpy processes data over three times faster than Seurat, with this advantage increasing as the dataset size grows.Missing: v5 | Show results with:v5
  82. [82]
    Scanpy – Single-Cell Analysis in Python — scanpy
    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, ...Contribution guide · API · Scanpy.pp.recipe_seurat · Plotting: plMissing: v5 AI
  83. [83]
    Tools for Single Cell Genomics • Seurat - Satija Lab
    Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of ...Missing: Scanpy AI 2025
  84. [84]
    Tools and Databases in Transcriptomics Analysis - ResearchGate
    Aug 9, 2025 · This essay could explore the current state of transcriptome technology in medicine and clinical sciences. It could discuss the techniques, such ...
  85. [85]
    CellFM: a large-scale foundation model pre-trained on ...
    May 20, 2025 · CellFM is categorized as a value-projection-based single-cell foundation model, as it aims to recover the vector embeddings of masked genes ...Missing: v5 | Show results with:v5
  86. [86]
    Biology-driven insights into the power of single-cell foundation models
    Oct 3, 2025 · Single-cell foundation models (scFMs) have emerged as powerful tools for integrating heterogeneous datasets and exploring biological systems ...Missing: CellFM | Show results with:CellFM
  87. [87]
    Zero-shot evaluation reveals limitations of single-cell foundation ...
    Apr 18, 2025 · Our findings underscore the importance of zero-shot evaluations in development and deployment of foundation models in single-cell research.Missing: Frontiers DREAM reproducibility Shiny RSC
  88. [88]
    Top 10 Bioinformatics Tools for scRNA-seq in 2025 - Neovarsity
    May 16, 2025 · In this blog, we highlight 10 of the most impactful and widely adopted tools in single-cell analysis today.Missing: v5 | Show results with:v5
  89. [89]
    Single Cell Portal
    The stromal cell compartment plays a central role in maintaining tissue homeostasis by coordinating with the immune system throughout the inception, ...Browse collections · Search genes · Manage all · 2 New featuresMissing: PanglaoDB | Show results with:PanglaoDB
  90. [90]
    HCA Data Portal - Human Cell Atlas
    HCA Data Portal. Explore the datasets of the Human Cell Atlas. Community generated, multi-omic, open data. Explore DataContribute Data. 63.3MCells. 11.1kDonors.HCA Data Explorer · About · Get Help · Contribute DataMissing: PanglaoDB GEO 2025
  91. [91]
    PanglaoDB - A Single Cell Sequencing Resource For Gene ...
    PanglaoDB is a database for the scientific community interested in exploration of single cell RNA sequencing experiments from mouse and human.Missing: Atlas GEO 2025
  92. [92]
    A systematic overview of single-cell transcriptomics databases, their ...
    This review underscores the importance of leveraging computational approaches to unravel the complexities of single-cell data and offers a promising ...
  93. [93]
    An overview of computational methods in single-cell transcriptomic ...
    May 10, 2025 · In this review, we systematically examine these annotation approaches based on transcriptomics-specific gene expression profiles and provide a comprehensive ...Introduction · Characteristics and challenges... · Methods of single-cell type...