Fact-checked by Grok 2 weeks ago

Trajectory inference

Trajectory inference is a computational method in single-cell omics analysis that reconstructs the dynamic progression of cell states by ordering individual cells along continuous trajectories, typically parameterized by pseudotime, to model biological processes such as , maturation, and response to stimuli from snapshot data. These methods address the challenge of inferring temporal dynamics from static, high-dimensional measurements like profiles, enabling the study of cellular transitions at unprecedented resolution without requiring time-series experiments. The field emerged around 2014 with pioneering tools like , which used principal curves to fit trajectories to dimensionality-reduced data, followed by , which employed diffusion-based pseudotime estimation on k-nearest neighbor graphs. Since then, over 70 trajectory inference tools have been developed, reflecting rapid evolution driven by advances in single-cell sequencing (scRNA-seq) technologies. Early methods focused on linear or branching topologies, but subsequent innovations incorporated more complex structures, such as cycles and bifurcations, to better capture real biological variability. Trajectory inference methods can be broadly categorized into clustering-based, graph-based, and probabilistic approaches. Clustering-based methods, such as and , first identify discrete cell clusters representing states and connect them via minimum spanning trees or elastic principal curves to form trajectories. Graph-based techniques, including PAGA and TSCAN, leverage nearest-neighbor graphs or random walks to propagate pseudotime and infer global structures, often excelling in scalability for large datasets. Probabilistic models, like and VeloVI, introduce uncertainty quantification through generative processes, such as Gaussian processes or Markov chains, and integrate additional data modalities like velocity to predict future cell fates. Applications of trajectory inference extend beyond basic trajectory reconstruction to downstream analyses, including differential gene expression along paths (e.g., via tradeSeq), inference, and alignment of trajectories across datasets or conditions. It has been instrumental in fields like , , and , revealing insights into processes such as embryonic development, immune cell activation, and tumor heterogeneity. Benchmarks, such as the dynverse evaluation of 45 methods on synthetic and real datasets, highlight trade-offs in accuracy, stability, and computational efficiency, with no single method outperforming others universally; selection depends on data , dimensionality, and noise levels. Despite these advances, challenges persist, including sensitivity to , assumptions about continuous progression that may not hold for transitions, and the need for robust to distinguish true trajectories from artifacts. Ongoing developments focus on multi-omics integration, such as combining transcriptomics with or spatial data, and enhancements to handle complex, non-linear dynamics more effectively.

Background

Definition and principles

Trajectory inference is a computational approach that reconstructs continuous developmental or differentiation paths, known as trajectories, from static snapshot data obtained from single-cell omics technologies, such as single-cell RNA sequencing (scRNA-seq). It achieves this by ordering individual cells along a pseudotime axis, which serves as a proxy for biological progression through a dynamic process, thereby enabling the inference of temporal dynamics without requiring time-series experiments. This method assumes that the observed cells represent asynchronous samples from an underlying continuous biological process, where gene expression changes occur gradually, allowing the reconstruction of cell fate transitions from high-dimensional data. The core principles of trajectory inference revolve around modeling cellular progression as a graph-like , where nodes represent states and edges denote transitions along pseudotime. Pseudotime quantifies the extent of progress through the process, often derived from diffusion-based or approaches that capture manifold geometry in the data. Trajectories can take various topologies, including linear paths for unbranched processes, branching structures to model fate bifurcations, and multifurcating graphs for complex decision points involving multiple lineages. Primarily applied to high-dimensional datasets like profiles, trajectory inference facilitates the study of cell fate decisions in contexts such as embryonic development or regeneration, where traditional time-course are scarce or infeasible. For instance, in scRNA-seq analysis of differentiating cells, clusters representing distinct stages are identified and linked via edges to form a that visualizes progression from to mature states. techniques, such as , are often employed as a preprocessing step to aid and in these high-dimensional spaces.

Historical development

Trajectory inference emerged in the mid-2010s, driven by advancements in single-cell RNA sequencing (scRNA-seq) that provided high-resolution snapshots of cellular states, necessitating computational methods to reconstruct dynamic processes like . The foundational concepts of pseudotime—ordering cells along inferred developmental paths based on transcriptional similarity—were introduced in 2014 through two landmark publications. , developed by Bendall et al., employed graph-based diffusion to align cells on trajectories reflecting progression in human B-cell development, marking an early shift from static clustering to dynamic modeling. Independently, , proposed by Trapnell et al., used unsupervised algorithms to order cells by pseudotemporal changes, demonstrating its utility in myoblast and revealing regulatory cascades. Key milestones in the late 2010s addressed limitations in handling complex topologies, such as branching lineages. In 2016, extended graph diffusion techniques to detect bifurcations, enabling high-resolution mapping of diverging trajectories in hematopoietic and neural . The same year, TSCAN introduced cluster-based pseudotime reconstruction via principal curves, offering a robust approach for ordering cells in branching structures without requiring prior lineage knowledge. , building on these ideas and published in 2018 (with preprint in 2017), inferred multiple lineages by constructing minimum spanning trees over clusters, providing flexible pseudotime estimation along curved paths. Concurrently, Haghverdi et al. (2016) advanced the field with pseudotime (), a manifold learning-based metric that robustly captures branching and metastable states using approximations on diffusion maps. Street et al. (2018) further refined lineage inference in , emphasizing its stability for noisy scRNA-seq . These developments were synthesized in a 2019 of 45 methods, which underscored the shift toward scalable, topology-aware algorithms. Post-2019 advancements focused on scalability, multi-omics integration, and dynamic information. Monocle 3, introduced in 2019, enhanced trajectory construction for million-cell datasets by incorporating graph-based partitioning and partition-based graph abstraction, facilitating multi-omics analyses in contexts like embryonic development. From 2019 onward, velocity—estimating transcriptional kinetics from unspliced and spliced mRNA ratios—has been increasingly integrated to refine trajectories, as pioneered by La Manno et al. (2018) and extended in subsequent works to predict cell fate directions. PAGA (2019), developed by Wolf et al., reconciled clustering with trajectory inference via topology-preserving graph abstractions, enabling coarse-grained connectivity maps for complex manifolds like regeneration. By 2024, methods like IDTI addressed time-series scRNA-seq challenges, using diversity minimization to infer trajectories without predefined starting cells, improving accuracy in dynamic datasets. In 2024-2025, methods like Genes2Genes for gene trajectory alignment and CASCAT for in spatial data further advanced multi-omics and spatiotemporal applications. These evolutions reflect trajectory inference's maturation into a cornerstone of , with ongoing emphasis on robustness and interpretability as of November 2025.

Theoretical foundations

Key assumptions

Trajectory inference methods rely on several core assumptions to reconstruct cellular progression from high-dimensional single-cell . A fundamental premise is that the captures a continuous , where changes gradually across states, and cells represent independent, asynchronous samples drawn from this at different progression points. Another is the existence of an underlying low-dimensional manifold structure in the , which techniques can reveal to facilitate embedding. These assumptions enable the ordering of cells along a pseudotemporal without requiring time-labeled . Biologically, trajectory inference typically presumes unidirectional progression along the , excluding cycles unless explicitly modeled, as in processes like where cells advance toward terminal states. Branching structures are assumed to reflect fate decisions, with divergence points representing bifurcations driven by regulatory changes. Additionally, the methods assume that technical noise in the data, such as dropout events in single-cell sequencing, does not overwhelm the signal of the underlying , preserving in expression profiles. From a statistical , expression profiles are expected to vary smoothly along the inferred pseudotime, implying that nearby cells in pseudotime exhibit similar transcriptomic states. Methods further assume sufficient sampling density across trajectory states to avoid gaps that could distort reconstruction, ensuring representation of transient intermediates. Violations of these assumptions can introduce artifacts, such as erroneous linearization of cyclic processes like the , leading to biologically implausible trajectories. To assess reliability, validation often employs metrics like w-correlation to quantify alignment between inferred and reference trajectories, highlighting discrepancies from assumption breaches.

Mathematical models

Pseudotime serves as a fundamental scalar metric in trajectory inference, assigning to each cell a continuous value \tau_i \in [0, 1] that quantifies its progression along an inferred developmental path, often derived from the projection of a cell's expression profile x_i onto a parameterized trajectory curve \mu(t), such that \tau_i = \arg\min_t \|x_i - \mu(t)\|^2. This formulation assumes a smooth, low-dimensional manifold underlying the data, enabling the ordering of cells without direct temporal information. In diffusion-based approaches, pseudotime can alternatively be computed as the diffusion distance along the principal eigenvector of a transition matrix derived from cell-cell similarities, capturing geodesic distances on the data manifold to robustly reconstruct branching structures. Trajectory representations frequently employ graph-based models, where cells or clusters form nodes in a similarity , and edges are weighted by metrics such as k-nearest neighbor distances or kernel affinities; the (MST) of this then approximates the global topology by connecting nodes with minimal total edge weight while avoiding cycles, providing a tree-like backbone for pseudotime assignment along paths from root to terminal nodes. Complementing this, diffusion maps embed the data into a lower-dimensional space via the eigenvectors \Psi_k(X) of a Markov P, where P_{ij} represents the probability from cell i to j, yielding coordinates that preserve local manifold and facilitate pseudotime computation as progress along the embedding's dominant direction. Branching models extend linear trajectories to tree structures, incorporating bifurcation points where cell fates diverge; these are often parameterized as Gaussian processes per branch, with the posterior over the branching time t_b calculated as p(t_b | Y) \propto p(Y | t_b) using , enabling gene-specific identification of dynamic changes at branches. Such models assume independent evolution along branches after splitting, grounded in the key assumption of a -like without cycles. Integration of RNA velocity enhances trajectory inference by estimating future cell states from ratios of unspliced to spliced mRNA, modeling the dynamics as \frac{dX}{dt} = A X + B, where X denotes the expression state vector, A captures regulatory interactions estimated via steady-state ratios, and B represents transcriptional bursts; velocities are derived by solving the kinetic equations distinguishing unspliced (u) and spliced (s) transcripts, with \frac{du}{dt} = \alpha - \beta u and \frac{ds}{dt} = \beta u - \gamma s, projecting arrows onto the trajectory to refine pseudotime and branch directions. Validation of inferred trajectories relies on metrics assessing consistency with ground-truth progressions, such as the w-correlation, a weighted Pearson between pseudotimes that emphasizes local manifold by weighting pairs according to their diffusion distances, ensuring preservation of biological ordering across methods or batches.

Methods

Dimensionality reduction techniques

techniques are essential preprocessing steps in trajectory inference for single-cell sequencing (scRNA-seq) data, projecting high-dimensional profiles—often spanning thousands of genes—into low-dimensional spaces of 2 to 50 dimensions to uncover underlying manifold structures and cell neighborhoods. This reduction facilitates the identification of continuous trajectories by mitigating the curse of dimensionality, noise, and computational burdens associated with raw data analysis. Linear methods, such as (PCA), serve as a baseline for initial noise reduction and global structure capture. In PCA, the data matrix X is centered by subtracting the mean, followed by (SVD) to yield X = U \Sigma V^T, where the reduced representation is formed by projecting onto the top principal components: X_{\text{reduced}} = U_k \Sigma_k, with k denoting the number of retained components. This approach assumes linearity and excels in scRNA-seq for denoising but may distort nonlinear manifolds prevalent in biological processes. Nonlinear methods better preserve local and global structures in complex scRNA-seq datasets. (t-SNE) visualizes clusters by minimizing divergences between high- and low-dimensional probability distributions, controlled by the parameter that balances local and global neighborhood preservation (typically 5–50 for single-cell data). Uniform manifold approximation and (UMAP) offers faster computation through fuzzy simplicial set representations of the data manifold, optimizing a loss to embed cells while maintaining topological relationships. Diffusion maps approximate the manifold's geometry via eigenvectors of a diffusion kernel, providing coordinates that can serve as a basis for pseudotime estimation by capturing diffusion distances along potential trajectories. scRNA-seq data's sparsity, arising from dropout events where true expression is undetected as zero, poses challenges for standard reduction techniques, often leading to distorted manifolds. To address this, zero-inflated models like zero-inflated factor analysis (ZIFA) explicitly parameterize dropout probability alongside latent factors, enabling robust low-dimensional embeddings without full imputation. Alternatively, imputation methods (e.g., or ) can preprocess data to fill dropouts based on similar cells before applying reduction, though this risks introducing bias if over-applied. Evaluation of these techniques emphasizes preservation of global structure, quantified by the trustworthiness score, which measures the fraction of low-dimensional k-nearest neighbors that were also neighbors in the original space (ranging from 0 to 1, with higher values indicating better preservation). In scRNA-seq benchmarks, nonlinear methods like UMAP often achieve higher trustworthiness (e.g., >0.8 on datasets) compared to , supporting their preference for trajectory preprocessing.

Trajectory construction algorithms

Trajectory construction algorithms in trajectory inference aim to build topological structures, such as graphs or trees, from dimensionality-reduced single-cell data to represent cellular progression paths. These algorithms typically operate after an initial dimensionality reduction step, like or t-SNE, to capture the underlying manifold of cell states. Common approaches include graph-based, diffusion-based, and partition-based methods, each suited to different trajectory topologies from linear to branched structures. Graph-based approaches construct trajectories by first building a k-nearest neighbors (k-NN) graph from the reduced data, which encodes local cell similarities, and then deriving a (MST) to form a linear or tree-like backbone for pseudotemporal ordering. This framework is particularly effective for linear trajectories, as the MST connects all nodes without cycles while minimizing total edge weights, providing a parsimonious path. For instance, employs this strategy by clustering cells, constructing an MST on cluster centroids, and fitting principal curves to smooth the paths through clusters, enabling robust inference of branching lineages. Slingshot's use of principal curves ensures trajectories follow the data's principal directions, improving accuracy on datasets with clear bifurcations. Diffusion-based methods leverage maps or similar embeddings to propagate pseudotime along principal components of cell , capturing global trajectory geometry. , in its second version, uses reversed (RGE) to iteratively learn a principal that aligns with the structure, allowing pseudotime to be ordered along tree-like branches in an manner. This approach resolves complex trajectories by embedding cells onto a low-dimensional that minimizes reconstruction error, making it suitable for datasets with multiple fate decisions. Partition-based algorithms focus on clustering cells first and then ordering the clusters to form the trajectory skeleton. TSCAN, for example, groups cells into clusters based on similarity, computes an MST connecting cluster centroids in reduced space, and pseudotemporally orders cells within each branch by projecting onto the principal curve of the cluster sequence. This method excels in handling noisy data by reducing the problem to cluster-level connections, providing interpretable trajectories for linear or mildly branched progressions. To handle branching trajectories, methods like PAGA first abstract the data into a coarse-grained by partitioning cells and estimating via mutual nearest neighbors, then refine into fine-grained paths while preserving . PAGA detects potential bifurcations by maximizing variance in modules across graph edges, highlighting decision points where cellular states diverge. This abstraction step reconciles clustering with continuity, enabling scalable analysis of high-dimensional data. Recent extensions address multifurcating and cyclic topologies, which challenge traditional tree models. PHLOWER (2025) infers complex, multi-branching trees from single-cell data, such as combined transcriptomics and , by integrating Hodge decomposition to decompose cell connectivity into harmonic, gradient, and curl components, thus capturing cycles and multifurcations in developmental processes. This method improves accuracy on datasets with non-tree structures, like looping trajectories in immune responses.

Incorporation of prior knowledge

Trajectory inference methods can be enhanced by incorporating external biological knowledge, known as priors, to guide the reconstruction process and improve alignment with known cellular dynamics. These priors help constrain the vast space of possible trajectories derived from data-driven approaches, reducing ambiguity in branching structures or pseudotime ordering. Common types of priors include gene regulatory networks (GRNs), which weight potential transitions between cell states by prioritizing edges supported by known regulatory interactions; known marker genes, used to identify the root or starting point of a trajectory based on expression patterns of stem cell or progenitor markers; and time-series labels from longitudinal experiments, which anchor cells to specific developmental stages across time points. One prevalent approach for integration involves frameworks, where the is formulated as minimizing a composite that balances data fidelity with adherence to priors. For instance, the objective can be expressed as: \text{Cost} = \text{Data fit} + \lambda \cdot \text{Prior violation} Here, the data fit term measures agreement with observed or spatial coordinates, while the prior violation penalizes deviations from biological constraints, such as spatial contiguity or regulatory strengths, with \lambda tuning the . This regularization ensures trajectories respect domain-specific assumptions without fully relying on inference. Examples of prior incorporation include using lineage tracing data from clonal labeling experiments to validate inferred branch points, confirming that trajectory bifurcations align with experimentally observed cell fate divergences. Similarly, RNA velocity estimates—derived from ratios of unspliced to spliced mRNA—provide directional priors by aligning velocity vectors with the trajectory manifold, orienting pseudotime flow toward predicted future states and resolving ambiguities in cyclic or multifurcating paths.00150-8) Advanced methods leverage Bayesian frameworks to incorporate hierarchical priors for more robust inference, particularly across multiple samples. For example, VITAE employs a latent hierarchical with variational autoencoders to jointly infer trajectories while integrating sample-specific priors on and dynamics, enabling scalable analysis of heterogeneous datasets like brain cell atlases. Incorporating priors enhances the biological of trajectories, often yielding more interpretable and accurate reconstructions in complex systems like hematopoiesis or , where pure data-driven methods may overlook subtle regulatory cues. However, reliance on priors risks introducing bias if the external knowledge is incomplete or species-specific, potentially to erroneous assumptions and reducing generalizability across datasets.

Applications

Developmental biology

Trajectory inference has been instrumental in reconstructing embryogenesis trajectories in model organisms, such as the progression from to stages in and embryos. In , single-cell RNA sequencing (scRNA-seq) data from over 38,000 s across 12 developmental stages revealed continuous transcriptional changes underlying somitogenesis, migration, and other morphogenetic processes, enabling the mapping of cell fate transitions without relying on lineage knowledge. Similarly, in embryos, integration of scRNA-seq datasets spanning to early identified 56 distinct cell trajectories, highlighting spatiotemporal gene expression dynamics during formation and specification. Prominent examples include the inference of hematopoiesis trajectories, where branching structures delineate myeloid and lymphoid lineages from hematopoietic stem cells, revealing sequential differentiation waves in progenitors. In pancreatic development, trajectory analysis of fetal scRNA-seq data has delineated the divergence of endocrine progenitors into alpha and cells, capturing key transcriptional shifts that govern insulin-producing cell maturation. These applications have yielded critical insights, such as pinpointing tipping points in fate decisions—where cells commit irreversibly to specific lineages—and uncovering novel intermediate states that bridge and differentiated populations, thereby refining models of developmental progression. Notable case studies underscore these advances: a 2019 analysis of on integrated scRNA-seq profiles from embryonic days 6.5 to 8.5 reconstructed trajectories for epiblast diversification and formation, identifying regulatory modules active at boundaries. More recently, branching trajectory analyses of human retinal organoids derived from induced pluripotent stem cells have mapped differentiation paths, revealing off-target cell populations . Multi-omics integration further enhances trajectory inference in developmental contexts, exemplified by combining scRNA-seq with assay for transposase-accessible sequencing () to trace epigenetic landscapes during human fetal hematopoiesis. This approach aligned transcriptional and accessibility profiles across approximately 8,500 cells and nuclei, disclosing dynamic enhancer remodeling that coordinates myeloid and erythroid branching from common progenitors.

Disease progression and other fields

Trajectory inference has been instrumental in modeling disease progression by reconstructing dynamic cellular states from single-cell data, particularly in where it elucidates paths of tumor evolution and within heterogeneous microenvironments. For instance, in , trajectory analysis of single-cell transcriptomes has delineated progression from normal to metastatic states, identifying key transcriptional changes along pseudotemporal paths that highlight driver genes and potential intervention points. Similarly, in and other solid tumors, methods like alignment of single-cell trajectories across conditions have enabled the inference of branching paths representing divergent metastatic routes, integrating multi-omics data to reveal how tumor cells adapt to selective pressures in the microenvironment. These approaches not only map linear progression but also capture bifurcations that correspond to therapeutic resistance or , providing insights into mechanisms. In neurodegenerative diseases, trajectory inference has mapped transitions from healthy to pathological states, as exemplified in (AD) studies around 2022 that applied guidelines to reconstruct glial and neuronal trajectories influenced by and . These analyses revealed pseudotemporal ordering of cell states, showing how activation and dysfunction contribute to neurodegeneration, with trajectories diverging based on genetic risk factors. More recent extensions have integrated perturbation data from precision medicine cohorts to infer causal paths of therapeutic efficacy, such as how targeted inhibitors alter branching in AD-related cellular models, thereby identifying responsive subpopulations for personalized interventions. Beyond canonical diseases, trajectory inference extends to immune dynamics during infections, where it has characterized response trajectories by ordering immune cell states from activation to exhaustion in severe cases. For example, single-cell analyses have inferred epithelial-immune interaction paths, revealing how airway cells transition amid viral challenge and how immune niches bifurcate into recovery or hyperinflammatory routes. In reprogramming, optimal transport-based trajectory methods have uncovered heterogeneous paths from somatic to pluripotent states, highlighting asynchronous diversification and regulatory networks that govern efficiency. In microbial , trajectory concepts applied to community succession, such as in ecosystems, model bacterial shifts over environmental gradients, inferring successional paths shaped by nutrient availability and interactions, though single-cell applications remain emerging. A key advantage of trajectory inference in these contexts is its ability to pinpoint branch points as therapeutic windows, where interventions can redirect pathological paths, as seen in cancer models where pseudotime analysis identifies vulnerable states for drug targeting. Furthermore, integrating perturbation data, such as from screens or drug exposures, enables along trajectories, enhancing precision medicine by predicting response heterogeneity. Extensions to have further advanced disease modeling by inferring tissue-level trajectories, for instance, in tumors where spatial pseudotime reveals invasion gradients and in AD brains where it maps regional progression from healthy to degenerative zones.

Software tools

Monocle

Monocle is an open-source distributed through , developed for analyzing single-cell sequencing data with a focus on trajectory inference to model dynamic changes along biological processes such as . Introduced in 2014 with version 1, Monocle pioneered the concept of pseudotime, an ordering of cells along a linear to reconstruct temporal progression from static snapshots of transcriptomes. Monocle 2, released in 2017, extended this to support branching trajectories by employing reversed to capture complex fate decisions without prior knowledge of the structure. Monocle 3, launched in 2019, further advanced scalability for large datasets through a graph abstraction framework that enables efficient partitioning and trajectory construction across millions of cells. A core feature of is its use of DDRTree for low-dimensional , which approximates a tree-structured manifold to position cells in a space that reflects their progression, building on mathematical models of discriminant analysis for reduced representations. The package incorporates partition-based trajectory learning, where cells are first clustered into partitions representing potential lineages, followed by the construction of a principal that connects these partitions into a cohesive trajectory. It integrates tightly with cell clustering algorithms, such as those based on or Louvain methods, to define the scope of trajectories and identify branch points corresponding to fate decisions. The standard workflow in begins with input of dimensionality-reduced data, such as or UMAP coordinates from preprocessed expression matrices, followed by graph learning to fit the structure and pseudotime assignment to quantify each cell's position along the paths. This process supports visualization of trajectories in 2D or 3D embeddings and can incorporate RNA velocity information to orient branches and predict future states. Monocle 3 enhances this by allowing trajectory refinement within specific partitions or across multiple ones, facilitating modular analysis of complex datasets. Monocle's strengths lie in its ability to handle large-scale single-cell datasets efficiently, processing up to millions of cells while producing interpretable 2D/ trajectory visualizations that aid in exploring relationships. However, the learning and steps can be computationally intensive for very large datasets exceeding current hardware limits, and the method assumes predominantly tree-like topologies, potentially underperforming on datasets with cyclic or highly disconnected structures.

Slingshot

Slingshot is an open-source developed for trajectory inference in single-cell sequencing data, first introduced in 2017. It models continuous developmental trajectories by leveraging pre-computed cell clusters in a low-dimensional space, fitting principal curves to represent and pseudotime. The method assumes that cells within clusters are more similar to each other than to those in other clusters, enabling robust lineage reconstruction without directly operating on high-dimensional data. A core strength of lies in its ability to simultaneously infer pseudotime and branching lineages, starting from user-specified root cells or automatically detecting them. It accommodates complex topologies, including multiple branches emanating from a common , which is particularly useful for capturing hierarchies in biological processes. Unlike methods that rely on continuous embeddings alone, Slingshot emphasizes discrete clustering as a , providing interpretable structures that align with known types. The workflow begins with dimensionality reduction techniques, such as (PCA) or uniform manifold approximation and projection (UMAP), to obtain cell coordinates, followed by clustering (e.g., using k-means or density-based methods). then constructs a (MST) connecting cluster centroids to define the global lineage topology. Principal curves are fitted along each path in the MST, projecting cells onto these curves to assign pseudotime values that reflect progression along inferred trajectories. This process is implemented via the slingshot() wrapper function, which outputs a SingleCellExperiment object compatible with downstream analyses. Slingshot excels in interpretability for branching trajectories, allowing visualization of lineages and pseudotime on reduced-dimensional plots using integrated plotting functions. It integrates directly with the tradeSeq package for differential expression analysis, enabling tests for genes changing along specific paths or between branches, which enhances its utility in identifying regulatory dynamics. For instance, in hematopoiesis datasets like those from Paul et al. (2015), has been used to reconstruct myeloid and erythroid lineages from hematopoietic stem cells, revealing pseudotime-ordered patterns consistent with known cascades. Recent package versions maintain compatibility with diverse input formats, including those derived from by treating spatial coordinates as additional dimensions in the embedding.

PAGA

PAGA (Partition-based ) is a package introduced in 2019 within the Scanpy ecosystem for single-cell sequencing analysis, designed to infer coarse-grained trajectory topologies by creating interpretable graph-like maps of cellular manifolds. It addresses the challenge of reconciling clustering with trajectory inference by abstracting high-dimensional data into a simplified that preserves global connectivity structures. As an implementation of trajectory construction algorithms, PAGA focuses on initial topology estimation rather than full pseudotime assignment. The core feature of PAGA involves constructing a k-nearest neighbor (kNN) graph from high-dimensional cell neighborhoods, followed by partitioning cells into communities using algorithms like Louvain clustering, and then estimating connectivities between these partitions to form an abstracted PAGA . This uses edge weights to represent confidence in connections, enabling visualization of branching or linear trajectories while maintaining multi-resolution views of the data manifold. By prioritizing global topology preservation, PAGA avoids distortions common in techniques and supports integration with velocity for directional inference. In a typical workflow, PAGA first computes the kNN graph and partitions, abstracts these into the PAGA graph, and then facilitates pseudotime estimation through methods like diffusion-based ordering or diffusion pseudotime (DPT) on the abstracted structure. This stepwise approach allows users to refine trajectories iteratively, starting from a robust coarse map. PAGA's strengths include high scalability, processing a dataset of 1.3 million neurons in approximately 90 seconds on standard hardware, making it suitable for large-scale analyses. It demonstrates robustness to technical noise in single-cell data and provides intuitive visualizations of connectivity strengths via plot functions in Scanpy. These attributes enable efficient exploration of complex s without excessive computational demands. Applications of PAGA span , including trajectory inference in hematopoiesis across multiple datasets, regeneration studies in planaria, and embryonic patterning in . It has been employed in large-scale single-cell atlases, such as the 2023 embryonic limb cell atlas, where it generated abstracted graphs to resolve spatial and temporal cellular progressions.

Other notable tools

TSCAN (Tools for Single-Cell ANalysis) is a cluster-based trajectory inference method that reconstructs pseudotime by ordering cell clusters along a derived from pairwise distances between clusters, making it particularly suitable for linear trajectories in single-cell data. Introduced in 2016, TSCAN emphasizes simplicity and computational efficiency, enabling robust pseudotime estimation even in noisy datasets by incorporating quantitative evaluation metrics for trajectory quality. Early diffusion-based approaches like and laid foundational work for handling branching structures. , developed in 2014, infers a one-dimensional trajectory by propagating distances from a user-specified root across a k-nearest neighbors , effectively capturing progression in processes such as human development. , building on this in 2016, extends diffusion pseudotime to detect bifurcations by identifying branch points through maps and classification, proving effective for developmental bifurcations like hematopoietic . While influential, these methods are now considered outdated for large-scale datasets due to scalability limitations and sensitivity to the choice of root . More recent tools address complex topologies and multimodal data. scSAE, introduced in 2025, employs stacked autoencoders to learn nonlinear latent representations of single-cell data, enabling inference of intricate, nonlinear trajectories without assuming predefined structures. VITAE, from , integrates variational autoencoders with a latent hierarchical for Bayesian joint of trajectories and types, supporting multi-sample integration and outperforming prior methods on diverse topologies like trees and cycles. PHLOWER, released in 2025, leverages Hodge decomposition on single- data (e.g., and protein) to infer complex, multi-branching differentiation trees, enhancing accuracy in scenarios with spatial or multi-omics inputs. For time-series data, IDTI () computes trajectories by quantifying the increment of between adjacent time points, avoiding the need for specification and improving reconstruction in dynamic processes like embryonic development.
ToolKey StrengthLimitationCitation
TSCANFast cluster-based linear ordering via MSTDependent on initial clustering quality
Detects single bifurcations in diffusion spaceLimited to one ; scales poorly
Simple root-to-end progression mappingAssumes linear paths; root-sensitive
scSAEHandles nonlinear paths with autoencodersRequires for deep architectures
VITAEBayesian across samplesComputationally intensive for large datasets
PHLOWER support for complex treesRelies on assumptions
IDTITailored for time-series without rootsAssumes temporal ordering in input
The dynverse framework, established around 2020, provides standardized benchmarking for these tools against gold-standard simulations, revealing that methods like TSCAN excel in speed for linear cases while emerging approaches such as VITAE and PHLOWER achieve higher accuracy on branched and multimodal benchmarks, often surpassing established tools like in topological fidelity.

Challenges and future directions

Current limitations

Trajectory inference methods face significant technical challenges that limit their scalability and robustness in handling large-scale single-cell sequencing (scRNA-seq) datasets. For instance, many algorithms struggle with ultra-large datasets comprising millions of cells due to high computational demands, particularly when integrating data such as transcriptomics with or spatial information, which exacerbates and processing requirements. Additionally, these methods are highly sensitive to technical artifacts like dropout events—where is falsely recorded as zero due to inefficient mRNA capture—and batch effects arising from variations in experimental protocols across samples, which can distort pseudotime ordering and lead to spurious branching structures. Biologically, trajectory inference often grapples with distinguishing correlations in from true causal relationships, as stochastic fluctuations in cellular processes can mimic directional progressions without implying . Handling inherent biological , such as random state transitions or loops in gene regulatory networks, further complicates accurate , as methods typically assume smooth, deterministic paths that may not capture abrupt or reversible dynamics. In noisy datasets, this can result in , where algorithms infer excessive branches that reflect artifacts rather than genuine bifurcations, as observed in applications of tools like to heterogeneous tissues. Similarly, without explicit priors, these methods frequently fail to model cyclic processes, such as oscillations, leading to linearized trajectories that ignore periodicity. Validation of inferred trajectories remains problematic due to the absence of in most biological systems, making it difficult to assess accuracy beyond qualitative with known markers. Common metrics, such as scores, are often dataset-specific and sensitive to preprocessing choices, limiting generalizability across studies. From a 2025 perspective, biases introduced by uneven sampling—where rare cell types are underrepresented—persist as a key issue, potentially skewing trajectories toward dominant populations and amplifying errors in integrations that demand balanced representation. These limitations underscore the need for cautious interpretation in complex, real-world applications. Recent advancements in trajectory inference are increasingly focusing on integrating data sources to enhance the accuracy and biological relevance of inferred cellular paths. For instance, methods like PHLOWER incorporate single-cell RNA sequencing (scRNA-seq) alongside and to reconstruct complex, multi-branching trajectories, leveraging Hodge decomposition for spatial-aware inference. This approach addresses limitations in unimodal analyses by capturing interactions across molecular layers, as demonstrated in kidney organoid models where integration improved trajectory resolution. In parallel, and innovations are enabling more interpretive and end-to-end trajectory modeling. Large language models (LLMs) are being augmented into analysis pipelines, such as in PyEvoCell, which applies LLM capabilities to outputs from tools like Monocle3 for suggesting biologically relevant lineages and aiding interpretation. frameworks, exemplified by scSAE (single-cell stacked autoencoders), facilitate direct trajectory inference from raw scRNA-seq data through unsupervised feature learning, reducing reliance on predefined pseudotime assumptions. Joint analysis techniques are advancing cross-sample comparability, with Genes2Genes providing a Bayesian dynamic programming framework for aligning gene expression trajectories between datasets, enabling the identification of conserved or divergent patterns in pseudotime. These alignments support time-series augmentation strategies that refine pseudotime estimation by incorporating temporal replicates, improving robustness in heterogeneous samples. Looking ahead, methods are emerging to incorporate experimental interventions, such as in CASCAT, which uses tree-shaped structural causal models on to infer directed cell differentiation paths and predict perturbation outcomes. Real-time trajectory analysis is also gaining traction for clinical applications, with machine learning integrations poised to enable on-the-fly pseudotime estimation in patient-derived samples for . Community-driven standardization efforts are bolstering these developments through updated benchmarks like dyno, which evaluates over 50 trajectory inference methods on gold-standard datasets to guide method selection and foster . Open-source ecosystems, including repositories under dynverse, promote collaborative tool development and reproducible workflows.

References

  1. [1]
    A comparison of single-cell trajectory inference methods - Nature
    Apr 1, 2019 · Trajectory inference approaches analyze genome-wide omics data from thousands of single cells and computationally infer the order of these ...
  2. [2]
    Recent advances in trajectory inference from single-cell omics data
    Trajectory inference methods have emerged as a novel class of single-cell bioinformatics tools to study cellular dynamics at unprecedented resolution.
  3. [3]
    Current progress and potential opportunities to infer single-cell ... - NIH
    Here, we summarize and categorize the most recent and popular computational approaches for trajectory inference based on the information they leverage.Cell Fate Modeling With... · Future Perspectives · Figure 2<|control11|><|separator|>
  4. [4]
    The dynamics and regulators of cell fate decisions are revealed by ...
    Mar 23, 2014 · Monocle orders cells by progress through a biological process, resulting in an induced 'pseudotime' scale describing that process in ...
  5. [5]
    Pseudo-time reconstruction and evaluation in single-cell RNA-seq ...
    May 13, 2016 · In this article, we exploit this idea to develop Tools for Single Cell Analysis (TSCAN), a new tool for pseudo-time reconstruction. One ...
  6. [6]
    Slingshot: cell lineage and pseudotime inference for single-cell ...
    Jun 19, 2018 · We introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data.
  7. [7]
    Diffusion pseudotime robustly reconstructs lineage branching - Nature
    Aug 29, 2016 · Diffusion pseudotime (DPT) enables robust and scalable inference of cellular trajectories, branching events, metastable states and ...
  8. [8]
    The single-cell transcriptional landscape of mammalian ... - Nature
    Feb 20, 2019 · We use Monocle 3 to identify hundreds of cell types and 56 trajectories, many of which are detected only because of the depth of cellular ...
  9. [9]
    PAGA: graph abstraction reconciles clustering with trajectory ...
    Finally, we show how PAGA abstracts transition graphs, for instance, from RNA velocity and compare to previous trajectory-inference algorithms.
  10. [10]
    Concepts and limitations for learning developmental trajectories ...
    Jun 27, 2019 · Summary: This Review describes the concepts and use of computational approaches to infer cellular trajectories from single cell expression ...
  11. [11]
    Trajectory Inference for Single Cell Omics - PMC - NIH
    Trajectory inference is used to order single-cell omics data along a path that reflects a continuous transition between cells.
  12. [12]
    A robust and accurate single-cell data trajectory inference method ...
    Feb 20, 2023 · We proposed a novel framework for trajectory inference called the single-cell data Trajectory inference method using Ensemble Pseudotime inference (scTEP).Methods · Ensemble Pseudotime... · Trajectory InferenceMissing: machine | Show results with:machine<|control11|><|separator|>
  13. [13]
    BGP: identifying gene-specific branching dynamics from single-cell ...
    May 29, 2018 · We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes.
  14. [14]
    Benchmarking atlas-level data integration in single-cell genomics
    Dec 23, 2021 · Moreover, we use 14 metrics to evaluate the integration methods on their ability to remove batch effects while conserving biological variation.
  15. [15]
    Accuracy, robustness, scalability of dimensionality reduction methods
    Dec 10, 2019 · We compare 18 different dimensionality reduction methods on 30 publicly available scRNA-seq datasets that cover a range of sequencing techniques and sample ...Missing: seminal | Show results with:seminal
  16. [16]
    Principal component analysis: a review and recent developments
    Apr 13, 2016 · Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing ...
  17. [17]
    [PDF] Visualizing Data using t-SNE - Journal of Machine Learning Research
    We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
  18. [18]
    UMAP: Uniform Manifold Approximation and Projection for ... - arXiv
    Feb 9, 2018 · UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Authors:Leland McInnes, John Healy, James Melville.
  19. [19]
    Geometric diffusions as a tool for harmonic analysis and structure ...
    Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. R. R. Coifman, S. Lafon, A. B. Lee, +3 , M. Maggioni ...
  20. [20]
    Embracing the dropouts in single-cell RNA-seq analysis - Nature
    Mar 3, 2020 · As a result of the dropouts, the scRNA-seq data is often highly sparse. The excessive zero counts cause the data to be zero-inflated, only ...
  21. [21]
    ZIFA: Dimensionality reduction for zero-inflated single-cell gene ...
    Nov 2, 2015 · The inclusion of a zero-inflation model gives ZIFA greater expressive power than standard PPCA/FA but increases the computational complexity. We ...
  22. [22]
    Trustworthiness and metrics in visualizing similarity of gene ...
    Oct 13, 2003 · In this paper, we study one of the main tasks of exploratory data analysis: visualization of similarity relationships among high-dimensional ...Missing: score | Show results with:score
  23. [23]
    ENTRAIN: integrating trajectory inference and gene regulatory ... - NIH
    A computational method that integrates trajectory inference methods with ligand-receptor pair gene regulatory networks to identify extracellular signals.
  24. [24]
    descriptive marker gene approach to single-cell pseudotime inference
    Jun 23, 2018 · Here we introduce an orthogonal Bayesian approach termed 'Ouija' that learns pseudotimes from a small set of marker genes.
  25. [25]
    Tempora: Cell trajectory inference using time-series single-cell RNA ...
    Sep 9, 2020 · We present Tempora, a novel cell trajectory inference method that orders cells using time information from time-series scRNA-seq data.
  26. [26]
    Accurate trajectory inference in time-series spatial transcriptomics ...
    There are five arguments in the objective function, and five tunable parameters: α , f b , ρ 1 , ρ 2 and ϵ . The first argument is the earth-mover's distance: ...
  27. [27]
    Creating Lineage Trajectory Maps Via Integration of Single-Cell ...
    The inferred lineage trajectories and branch points can be tested and validated by using clonal level lineage tracing.
  28. [28]
    Joint trajectory inference for single-cell genomics using ... - PNAS
    Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular ...Joint Trajectory Inference... · Results · Model Training, Estimation...
  29. [29]
    Single-cell reconstruction of developmental trajectories during ...
    Apr 26, 2018 · Three research groups have used single-cell RNA sequencing to analyze the transcriptional changes accompanying development of vertebrate embryos.
  30. [30]
    New insights into hematopoietic differentiation landscapes from ...
    Mar 28, 2019 · Single-cell transcriptomics has recently emerged as a powerful tool to analyze cellular heterogeneity, discover new cell types, and infer putative ...
  31. [31]
    A transcriptomic roadmap to α- and β-cell differentiation in the ...
    Jun 24, 2019 · Pancreatic epithelial cells are located in the top of the trajectory, α-cells on the left trajectory and β-cells are on the right trajectory, as ...
  32. [32]
    Single‐Cell Transcriptome Landscape and Cell Fate Decoding in ...
    May 6, 2024 · To further assess the cell lineage differentiation, single-cell branching trajectory analysis was performed for two main cell types, neurons and ...
  33. [33]
  34. [34]
    Trajectory-based differential expression analysis for single-cell ...
    Mar 5, 2020 · Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression.Missing: machine | Show results with:machine
  35. [35]
    scanpy.tl.paga - Read the Docs
    A much simpler abstracted graph (PAGA graph) of partitions, in which edge weights represent confidence in the presence of connections.
  36. [36]
    theislab/paga: Mapping out the coarse-grained connectivity ... - GitHub
    PAGA - partition-based graph abstraction. Mapping out the coarse-grained connectivity structures of complex manifolds (Genome Biology, 2019).
  37. [37]
    A human embryonic limb cell atlas resolved in space and time - Nature
    Dec 6, 2023 · In the second step of this process, PAGA (scanpy tl.paga) was performed to generate an abstracted graph of partitions. Finally, force ...
  38. [38]
    Cell Trajectory Inference Based On Single Cell Stacked Auto Encoders
    Sep 15, 2025 · scSAE outperformed existing cell trajectory inference tools and scSAE can be a powerful tool for analyzing single-cell RNA trajectory inference.
  39. [39]
    PHLOWER leverages single-cell multimodal data to infer complex ...
    Oct 23, 2025 · Computational trajectory analysis is a key computational task for inferring differentiation trees from this single-cell data.
  40. [40]
    An increment of diversity method for cell state trajectory inference of ...
    Here, we compared IDTI with six other trajectory inference methods (i.e., Monocle 2, TSCAN [38], Slingshot [39], PAGA [40], Tempora and CStreet) on the ...Missing: paper | Show results with:paper
  41. [41]
    :: dynverse
    dynverse is a collection of R packages aimed at supporting the trajectory inference (TI) community on multiple levels.Missing: 2020 | Show results with:2020
  42. [42]
    Unique challenges and best practices for single cell transcriptomic ...
    This review examines these challenges while presenting best practices for critical single cell analysis tasks.
  43. [43]
    Single cell trajectory analysis using Hodge Decomposition - bioRxiv
    We evaluate PHLOWER through benchmarking with multi-branching differentiation trees and using novel kidney organoid multi-modal and spatial ...
  44. [44]
    PyEvoCell: an LLM-augmented single-cell trajectory analysis ...
    Apr 10, 2025 · We developed PyEvoCell, a dashboard for trajectory interpretation and analysis that is augmented by large language model (LLM) capabilities.
  45. [45]
    Gene-level alignment of single-cell trajectories | Nature Methods
    Sep 19, 2024 · Here we describe Genes2Genes, a Bayesian information-theoretic dynamic programming framework for aligning single-cell trajectories.
  46. [46]
    Trajectory inference across multiple conditions with condiments
    Jan 27, 2024 · In this manuscript, we present condiments, a method for the inference and downstream interpretation of cell trajectories across multiple conditions.<|control11|><|separator|>
  47. [47]
    Inferring causal trajectories from spatial transcriptomics using CASCAT
    Aug 19, 2025 · Abstract. Spatial trajectory inference models cell differentiation and state dynamics within tissues by integrating spatial information.Missing: assumptions | Show results with:assumptions
  48. [48]
    Global trends in machine learning applications for single-cell ...
    Aug 16, 2025 · ML-scRNA-seq integration has advanced cellular heterogeneity analysis and precision medicine development. Future directions should optimize ...
  49. [49]
    dynverse: benchmarking, constructing and interpreting single-cell ...
    The dynverse is an open set of packages to benchmark, construct and interpret single-cell trajectories. See https://dynverse.org for an overview.