Fact-checked by Grok 2 weeks ago

Simultaneous localization and mapping

Simultaneous localization and mapping () is the process by which a or uses onboard sensors to construct a of an unknown while simultaneously estimating its own location and orientation within that , without relying on any prior information about the surroundings. This dual task addresses the fundamental "chicken-and-egg" problem in , where accurate requires knowing the agent's pose, and precise pose estimation depends on a reliable . The origins of SLAM trace back to the mid-1980s, with foundational work on probabilistic representations of spatial uncertainty presented by researchers including , , and Cheeseman at the 1986 IEEE and . The term "" was coined in a 1995 survey paper by Durrant-Whyte and others, marking the formalization of the problem, while early implementations drew from Bayesian estimation techniques to handle sensor noise and motion errors. Key milestones in the and included the development of the (EKF) for real-time SLAM, as detailed in Dissanayake et al. (2001), and the introduction of graph-based formulations by Lu and Milios (1997), which enabled scalable optimization over large datasets. By the , advancements shifted toward robust, vision-aided systems and semantic integration, reflecting over three decades of evolution from theoretical foundations to practical deployment. At its core, SLAM employs probabilistic models, such as maximum (MAP) estimation, to fuse data from sensors like lidars, cameras, and inertial measurement units, often represented as factor graphs that capture spatial relationships between poses and landmarks. Challenges persist in areas like data association in feature-poor environments, handling dynamic objects, and achieving computational efficiency for long-term operation, but solutions like Rao-Blackwellized particle filters (e.g., ) have improved consistency and accuracy. Widely regarded as a of autonomous systems, SLAM enables applications in self-driving cars for real-time navigation, indoor robots for exploration and inspection, for spatial anchoring, and underwater or aerial vehicles for mapping inaccessible areas.

Problem Definition

Mathematical Formulation

The SLAM problem is fundamentally a probabilistic task that seeks to compute the posterior over the 's pose and the given a sequence of observations and inputs. Formally, at time t, this is expressed as p(x_t, m \mid z_{1:t}, u_{1:t}), where x_t denotes the 's pose (typically including and ), m represents the of the , z_{1:t} = \{z_1, \dots, z_t\} is the history of observations from onboard , and u_{1:t} = \{u_1, \dots, u_t\} is the sequence of actions applied to the . This posterior encapsulates the uncertainty inherent in both localization and mapping due to noisy and motion. The derivation of this posterior relies on Bayesian filtering, recursively updating the estimate as new data arrives. Starting from the Chapman-Kolmogorov equation for the prediction step, the time-update integrates the motion model: p(x_t, m \mid z_{1:t-1}, u_{1:t}) = \int p(x_t \mid x_{t-1}, u_t) \, p(x_{t-1}, m \mid z_{1:t-1}, u_{1:t-1}) \, dx_{t-1}, where the motion model p(x_t \mid x_{t-1}, u_t) captures the probabilistic effect of control u_t on the pose transition, often modeled as a Markov process assuming from errors. The measurement-update then applies Bayes' rule: p(x_t, m \mid z_{1:t}, u_{1:t}) = \frac{p(z_t \mid x_t, m) \, p(x_t, m \mid z_{1:t-1}, u_{1:t})}{p(z_t \mid z_{1:t-1}, u_{1:t})}, incorporating the observation model p(z_t \mid x_t, m), which relates the current measurement z_t to the pose and map under assumptions, with the ensuring the posterior integrates to 1. This recursive factorization enables online computation, though the high dimensionality of the joint distribution poses computational challenges. In landmark-based representations, the map m is typically a set of features m = \{m_i\}_{i=1}^N, where each m_i = (m_{i,x}, m_{i,y}) is the 2D position of a in a global frame. For range-bearing sensors, common in , an z_t = (r, \phi) measures the relative r and bearing \phi to a m_i from the robot's pose x_t = (x, y, \theta). The expected measurement function is nonlinear: h(x_t, m_i) = \begin{pmatrix} \sqrt{(m_{i,x} - x)^2 + (m_{i,y} - y)^2} \\ \atantwo(m_{i,y} - y, m_{i,x} - x) - \theta \end{pmatrix}, with actual observations modeled as z_t = h(x_t, m_i) + v_t, where v_t is zero-mean with R_t. This setup assumes known data association between z_t and m_i, allowing the likelihood p(z_t \mid x_t, m_i) = \mathcal{N}(z_t; h(x_t, m_i), R_t) to update both pose and estimates jointly. An alternative to explicit landmarks is the occupancy grid map, where m consists of a discrete grid of cells, each with an occupancy probability p(m_{i,j} = 1 \mid z_{1:t}, x_{1:t}) indicating the likelihood that cell (i,j) is occupied. Unlike landmark maps, this representation avoids feature extraction by directly modeling spatial occupancy via inverse sensor models, integrating observations into log-odds ratios for efficient updates: l(m_{i,j}) = \log \frac{p(m_{i,j}=1)}{p(m_{i,j}=0)}, updated recursively without requiring point correspondences. This grid-based approach is particularly suited for dense environments but increases storage demands compared to sparse landmark sets.

Key Challenges

One of the primary challenges in simultaneous localization and mapping (SLAM) is the accumulation of errors over time, stemming from odometry noise and uncertainties in sensor observations, which leads to drift in pose estimates. , which relies on integrating relative motion measurements from sources like wheel encoders or inertial s, inherently introduces errors that grow unbounded without correction, resulting in diverging and map estimates. This drift can cause the robot's perceived position to deviate significantly from its true location, compounding inaccuracies in subsequent mappings. Seminal work by Smith, Self, and Cheeseman highlighted how such uncertainties propagate through spatial relationships, necessitating probabilistic representations to model and mitigate error growth. The data association problem further complicates SLAM by requiring the correct matching of current observations to existing map features, often amid ambiguities in feature correspondence. In environments with numerous similar landmarks, such as repeated patterns in urban settings, associating a new sensor reading to the appropriate map element becomes error-prone, potentially leading to incorrect updates that corrupt the entire map. This issue is exacerbated by noisy sensors or partial occlusions, where multiple landmarks may appear viable matches, demanding robust validation techniques to avoid catastrophic failures. Durrant-Whyte and emphasized the fragility of early SLAM formulations to such mismatches, particularly in the approach. Scalability poses a significant computational hurdle as the map size increases, with naive algorithms exhibiting O(n²) for n landmarks due to the need to update correlations across the entire state space. As the expands, maintaining and inverting the or information form becomes prohibitive in terms of memory and processing time, limiting applicability to large-scale or long-term operations. For instance, in outdoor , maps can encompass thousands of features, rendering execution infeasible without approximations. This growth was a focal point in early analyses, driving the development of sparse and hierarchical methods to manage . Perceptual aliasing arises when similar environmental features produce indistinguishable sensor signatures from different locations, leading to erroneous loop closures or improper map merges. In symmetric or repetitive spaces, such as hallways or grid-like structures, the system may falsely identify revisited areas, injecting inconsistencies that amplify drift or fragment the map. This perceptual ambiguity challenges the reliability of feature-based matching and requires contextual cues beyond raw appearances to disambiguate. Cadena et al. noted how contributes to outliers in data association, underscoring its role in robust . Representing and propagating uncertainty in the high-dimensional state space, which jointly estimates robot poses and map features, remains a core difficulty due to the curse of dimensionality and non-linear dynamics. The SLAM posterior distribution involves thousands of variables, where correlations between all elements must be tracked, but approximations like Gaussian assumptions can fail under multi-modality or heavy-tailed noise. Propagating these uncertainties through motion and observation models demands efficient numerical methods to avoid computational explosion or loss of accuracy. Smith et al. introduced stochastic maps to handle such multivariate uncertainties, while Durrant-Whyte and Bailey discussed the impracticality of direct particle filtering in these spaces. Loop closure detection offers a partial remedy to drift by realigning trajectories upon reobservation, though it does not fully resolve underlying representational challenges.

Sensing and Modeling

Sensor Technologies

Visual sensors, primarily cameras, form the backbone of many systems due to their ability to capture rich environmental details through , where light rays project onto a plane to produce 2D intensity or color images. cameras provide a single viewpoint, enabling feature-based methods that extract invariant keypoints such as (SIFT) descriptors, which detect and describe local image features robust to scale and rotation changes, or (ORB) features, offering computational efficiency for real-time processing in resource-constrained setups. However, setups suffer from scale ambiguity, as depth information is not directly observable, limiting absolute positioning without additional cues. cameras address this by using two parallel viewpoints to compute disparity maps via , yielding metric-scale reconstructions with advantages in textured environments but drawbacks in low-light or featureless scenes due to baseline-dependent accuracy. RGB-D cameras, such as those using structured light or time-of-flight principles, augment RGB images with per-pixel depth, facilitating direct dense mapping; they excel in indoor settings with high but are constrained by short (typically under 5 meters) and to ambient light. Range sensors provide direct distance measurements, complementing visual data in challenging visibility conditions. LiDAR (Light Detection and Ranging) systems emit laser pulses and measure time-of-flight to generate 2D or 3D point clouds, with scanning mechanisms like mechanical rotation or solid-state arrays enabling high-resolution (centimeter-level) mapping over long ranges (up to hundreds of meters); advantages include precision in sparse environments and independence from lighting, though limitations arise from high cost, mechanical wear in rotating units, and sparsity in dynamic scenes. Ultrasonic sensors, operating on acoustic wave propagation, offer short-range (up to 5-10 meters) distance estimates via echo timing, ideal for low-cost obstacle avoidance in robotics; they provide robustness to dust or smoke but suffer from narrow beam angles (10-30 degrees), leading to coarse resolution and susceptibility to multipath reflections in cluttered spaces. Inertial sensors, embodied in Inertial Measurement Units (IMUs), integrate accelerometers and gyroscopes to estimate motion without external references. Accelerometers detect linear accelerations along three axes, while gyroscopes measure angular rates, allowing dead-reckoning through double integration for position and single integration for orientation; this provides high-frequency (hundreds of Hz) updates for bridging gaps in other sensor data but accumulates drift errors rapidly due to noise and bias instabilities. IMUs are compact and low-power, making them ubiquitous in mobile SLAM, yet require fusion with exteroceptive sensors to mitigate unbounded error growth over time. Other modalities extend capabilities in niche scenarios. Radar sensors, particularly Frequency Modulated Continuous Wave (FMCW) types, use radio waves for all-weather ranging and velocity estimation via Doppler shifts, offering penetration through fog or with ranges exceeding 100 meters; they enable robust outdoor mapping but produce sparse, noisy point clouds due to low . Event cameras, neuromorphic sensors that asynchronously record per-pixel brightness changes (events) at resolution, capture (over 120 dB) and low-latency motion data, advantageous for high-speed or varying-light environments; however, their output requires specialized processing to reconstruct traditional images or features, and they lack absolute intensity information. Sensor fusion combines these modalities to enhance robustness, such as visual-inertial odometry (VIO), where IMU data recovers scale and predicts motion between camera frames, reducing drift in monocular setups through tightly coupled estimation of poses and biases. This approach leverages complementary strengths—cameras for global consistency and IMUs for short-term accuracy—while addressing individual limitations like visual occlusions or inertial drift via probabilistic models.

Kinematics and Dynamics Modeling

In simultaneous localization and mapping (), kinematic and dynamic models predict the robot's pose evolution based on control inputs, forming the prediction step in probabilistic frameworks. These models account for the robot's motion constraints and integrate data to estimate transitions, while propagating uncertainties to maintain accurate posterior distributions. Kinematic models assume instantaneous responses without inertial effects, suitable for low-speed operations, whereas dynamic models incorporate forces and accelerations for higher-fidelity predictions in non-holonomic systems. Kinematic models commonly used in SLAM include the differential drive, unicycle, and Ackermann steering configurations. For differential drive and unicycle robots, the motion is parameterized by linear velocity v and angular velocity \omega, updating the pose (x, y, \theta) over time step \Delta t as follows: \begin{aligned} x_t &= x_{t-1} + \Delta t \cdot v \cos\left(\theta_{t-1} + \frac{\omega \Delta t}{2}\right), \\ y_t &= y_{t-1} + \Delta t \cdot v \sin\left(\theta_{t-1} + \frac{\omega \Delta t}{2}\right), \\ \theta_t &= \theta_{t-1} + \omega \Delta t. \end{aligned} This approximates the curved with a , providing a balance between simplicity and accuracy for wheeled platforms. For Ackermann vehicles, such as cars, the model extends to include a \phi, relating velocities at the front and rear axles while enforcing non-holonomic constraints that prevent sideways motion; the instantaneous center of lies at the intersection of the wheel axes extended lines. Dynamic models build on by incorporating inertial effects, accelerations, and external forces, particularly for non-holonomic systems where velocity limits and slip must be modeled. The Newton-Euler recursively computes joint torques and accelerations by balancing linear and across the robot's links, enabling predictions that account for mass distribution and . These are essential for high-speed or uneven terrain scenarios in , though computationally intensive. Odometry integration estimates control inputs u_t = (v, \omega) from wheel encoders, which measure wheel rotations to compute relative displacements, or from inertial measurement units (IMUs) via dead-reckoning of accelerations and angular rates. Noise in these estimates, such as encoder quantization or IMU drift, is typically modeled as zero-mean Gaussian perturbations added to the motion parameters, with covariances scaled by distance traveled to capture accumulating errors. In probabilistic SLAM, uncertainty propagation evolves the state covariance matrix through the motion model's Jacobian, as in the extended Kalman filter prediction step: P_{t|t-1} = \mathbf{F} P_{t-1} \mathbf{F}^T + \mathbf{Q}, where \mathbf{F} linearizes the nonlinear motion function and \mathbf{Q} is the process noise covariance; this ensures the filter tracks pose and map uncertainties consistently.

Core SLAM Techniques

Mapping and Localization Basics

In simultaneous localization and mapping (), the core processes involve iteratively estimating the agent's pose within an while concurrently constructing and refining a of that . This joint addresses the inherent coupling between pose and map, where inaccuracies in one directly impact the other, forming the basis of the SLAM posterior distribution over trajectories and maps. Traditional approaches begin with an initial pose estimate derived from or prior knowledge, followed by alignment of sensor observations to update both the pose and the map incrementally. Localization in refers to the task of estimating the agent's current pose—typically its and —relative to a pre-existing or partially built . This is achieved through techniques such as scan matching, which aligns consecutive sensor scans (e.g., or visual features) to compute relative transformations, or particle filters, which maintain a set of weighted hypotheses (particles) representing possible poses and propagate them based on motion models and observation likelihoods. Scan matching, often implemented via () algorithms, minimizes the distance between corresponding points in current and prior scans to yield a refined pose estimate, enabling robust alignment even in feature-sparse areas. Particle filters, particularly in variants, handle multimodal pose distributions effectively by resampling particles according to their alignment with features, providing a probabilistic that mitigates drift over time. Mapping complements localization by incrementally updating the environmental using new observations aligned to the estimated pose. This typically involves adding detected features—such as landmarks or cells—to the or inflating existing structures based on data, ensuring the map evolves without redundant storage. In feature-based , sparse landmarks (e.g., corners or edges) are extracted and inserted if they meet criteria like and , forming a lightweight structure that supports efficient pose-to-observation associations. Grid-based , conversely, updates probabilistic values in a or 2D , where observations toggle cell states from unknown to occupied or free, facilitating collision avoidance but at higher memory cost. SLAM systems often employ a frontend-backend to balance real-time performance and accuracy. The frontend handles immediate processing for and local map initialization, using fast but approximate methods like direct or tracking to maintain at high frame rates. In contrast, the backend performs on accumulated data, refining poses and maps through least-squares minimization to correct local inconsistencies, though it operates less frequently to avoid computational bottlenecks. This split allows the frontend to prioritize speed for navigation, while the backend ensures long-term coherence. SLAM can operate in online or offline modes, distinguished by their processing paradigms. Online SLAM processes data continuously as it arrives, enabling decision-making in dynamic scenarios, but risks accumulating errors without full context. Offline SLAM, or , delays optimization until the entire dataset is collected, allowing comprehensive adjustments like full , which yields higher accuracy at the expense of immediacy—ideal for post-mission analysis. Map representations in SLAM trade off between sparsity and density to suit computational constraints and application needs. Sparse maps store only salient landmarks, reducing storage (e.g., to thousands of points) and enabling efficient probabilistic updates in methods like EKF-SLAM, but they may overlook fine details for tasks requiring full geometry. Dense maps, such as volumetric grids or surfels, capture comprehensive scene structure for applications like , yet demand significantly more memory and processing (often scaling with volume rather than features), limiting scalability in large environments. The choice depends on factors like resolution and dynamics, with hybrid approaches emerging to combine benefits.

Loop Closure Detection

Loop closure detection is a critical component of simultaneous localization and mapping () systems, enabling the identification of previously visited locations to mitigate accumulated drift in the estimated . By recognizing when a revisits a known area, loop closure allows for the correction of errors, ensuring global consistency in the map and pose estimates. This process typically involves candidate detection, verification of potential matches, and into the SLAM backend for error . Detection methods for loop closure vary based on the modalities and environmental assumptions. Appearance-based approaches, such as Bag-of-Words (BoW) models, represent scenes as histograms of visual words derived from features like ORB descriptors, facilitating rapid similarity searches between current and past keyframes. A seminal implementation, DBoW2, employs a hierarchical built from binary features to enable efficient place recognition in large-scale visual SLAM, achieving real-time performance with high recall rates on datasets like New College. Geometric methods rely on scan matching to align current data, such as scans, with stored submaps, using techniques like branch-and-bound optimization to compute relative poses and detect overlaps in 2D environments. Probabilistic approaches, including sampling via Markov Chain Monte Carlo (MCMC), model loop closure as a sampling process over possible alignments, efficiently handling uncertainty in large loops by proposing and evaluating high-likelihood configurations. Once candidates are identified, verification ensures robustness against outliers. The Random Sample Consensus (RANSAC) algorithm is widely used to estimate transformation parameters between matched frames while rejecting inconsistent correspondences, as implemented in systems like ORB-SLAM where it refines pose hypotheses from BoW matches. Following verification, corrections are applied through pose graph relaxation, where detected loops add constraint edges to the graph, and nonlinear least-squares optimization propagates adjustments globally to minimize trajectory inconsistencies. Tools like g2o facilitate this relaxation by solving sparse systems efficiently, reducing drift by orders of magnitude in loop-rich trajectories. To mitigate false positives, which can introduce erroneous constraints and degrade map quality, temporal and spatial consistency checks are employed. These validate candidates by ensuring sequential frame alignments and geometric plausibility within the current , preventing premature matches in dynamic or repetitive environments. For instance, checks against recent paths discard implausible loops based on motion priors. Computational efficiency is paramount for operation in large-scale settings, achieved through hierarchical indexing structures like those in DBoW2, which prune search spaces via multi-level vocabularies and enable sub-linear query times even for thousands of keyframes.

Exploration Strategies

Exploration strategies in simultaneous localization and mapping () enable autonomous agents to plan paths that efficiently expand the while minimizing in pose and environmental . These methods address the of navigating spaces by balancing coverage of new areas with the need to reduce errors, often leveraging the current to guide . In essence, exploration integrates path planning with SLAM's ongoing process, ensuring that motion commands contribute to both discovery and accuracy. Frontier-based exploration identifies boundaries, or frontiers, between known and unknown regions in the map, typically using occupancy grid representations to detect these edges as potential targets for expansion. Pioneered in the late 1990s, this approach directs the robot to move toward the closest or most promising frontier point, promoting systematic coverage without redundant traversal of mapped areas. For instance, frontiers are computed by analyzing grid cells where occupied and free spaces meet unknown regions, allowing real-time selection of navigation goals that maximize information gain per step. This method is computationally efficient and widely adopted for its simplicity in integrating with grid-based SLAM systems. Information-theoretic approaches select exploration actions by quantifying and minimizing through metrics like or , prioritizing paths that yield the highest expected reduction in map and pose ambiguity. These strategies model the SLAM posterior as a and evaluate candidate trajectories based on their ability to resolve ambiguities in the current estimate. A seminal optimizes trajectories to maximize coverage while reducing localization error, often using approximations like expected information gain for scalability in applications. Such methods are particularly effective in sparse or feature-poor environments, where passive SLAM might accumulate errors rapidly. Coverage path planning focuses on generating trajectories that systematically traverse the entire known area, such as through lawn-mower patterns or recursive subdivision, to ensure complete for . In contexts, this involves adapting offline coverage algorithms to online updates, replanning paths as new regions are discovered to avoid gaps in coverage. Early integrations with demonstrated its utility for and aerial , where uniform sweeps improve and accuracy. This approach excels in structured environments requiring thorough , though it may require modifications to handle dynamic growth. Uncertainty-driven exploration selects actions that directly target regions of high pose or map covariance, using SLAM's error estimates to guide the agent toward viewpoints that best constrain the state. By propagating uncertainty through motion models, these methods compute utility functions that penalize paths increasing variance while rewarding those that gather corrective observations. Foundational work in stochastic mapping highlighted how active selection of observation points can bound error growth during exploration. This is crucial for long-duration missions, where unchecked uncertainty can lead to divergence in the SLAM backend. The integration of exploration with SLAM, known as active SLAM, treats path planning as part of the estimation loop, where exploration goals influence and are influenced by the SLAM frontend and backend. In active SLAM, utility functions combine coverage, reduction, and loop closure potential to compute optimal motions, often via receding-horizon optimization. Seminal frameworks demonstrated that actively choosing trajectories can improve compared to reactive methods, enhancing both and in unknown environments. This holistic approach ensures that not only builds the but also maintains its reliability throughout the process.

Advanced Algorithms

Handling Dynamic Environments

In dynamic environments, where moving objects such as pedestrians or can disrupt feature tracking and map consistency, systems require specialized adaptations to distinguish static landmarks from transient elements. Traditional methods, which assume a predominantly static world, often suffer from drift or failure when dynamic objects are incorrectly incorporated into the map. To address this, techniques focus on detecting and excluding dynamic features while preserving the reliability of the static environment representation. Dynamic object detection in SLAM typically employs to identify motion inconsistencies across frames, pixel-wise segmentation using convolutional neural networks (CNNs) like Mask R-CNN or SegNet, or estimation from multi-frame depth and parallax analysis. For instance, computes pixel displacements between consecutive images to flag inconsistent movements, while CNN-based segmentation classifies regions as dynamic (e.g., or cars) based on trained priors. Multi-frame estimation further refines this by comparing predicted positions with observed ones, using and RANSAC for robust outlier rejection. These methods enable real-time identification of moving elements, with segmentation achieving high precision in cluttered scenes. , such as integrating RGB-D data, can aid detection by providing depth cues for parallax-based verification. Static map maintenance involves masking or detected dynamic features to ensure only reliable contribute to the pose or updates. Detected dynamic regions are excluded from keypoint extraction and , preventing erroneous associations that could corrupt the map. Background reconstructs occluded static areas using historical keyframes, maintaining a consistent representation of the . This filtering preserves landmark stability, reducing localization drift in sequences with up to 50% dynamic content. Semantic SLAM enhances robustness by incorporating object classes into the mapping process, treating entities like cars or people as dynamic by default while building layered maps that separate static infrastructure from movable objects. Semantic labels from networks like SegNet are projected into 3D space and fused into structures, where voxels are updated with log-odds probabilities to filter unstable dynamic elements. This approach not only improves map utility for high-level tasks, such as around known object types, but also boosts localization accuracy by rejecting semantically inconsistent features. For example, DS-SLAM integrates semantics with motion checks to create dense, class-aware maps suitable for indoor and outdoor use. Trajectory prediction for dynamic obstacles often relies on to forecast short-term motion, allowing SLAM systems to anticipate and avoid incorporating predicted dynamic paths into the static map. An extended or unscented models object states (position, velocity) based on sequential observations, excluding landmarks whose predicted trajectories deviate significantly from the robot's ego-motion. This enables proactive filtering, with thresholds identifying movers for separate tracking, thereby maintaining overall SLAM consistency in crowded settings. Recent advances leverage for segmentation in visual , significantly enhancing performance in urban scenes with heavy traffic or pedestrian activity. For example, integrating lightweight CNNs like or Fast-SCNN with ORB-SLAM variants has reduced average trajectory errors by 90-97% compared to baselines in dynamic urban forest environments, enabling or near performance. These methods combine instance segmentation with geometric verification, achieving up to 97% error reduction in highly dynamic sequences on benchmarks like TUM RGB-D.

Multi-Robot and Collaborative SLAM

Multi-robot simultaneous localization and mapping () extends single-robot techniques to coordinate multiple agents in building a shared environmental map while estimating their poses, addressing challenges like limited individual coverage and communication constraints. In multi-robot systems, agents collaboratively localize and map by exchanging data, enabling faster exploration and robustness in large or complex environments. Centralized architectures involve a that collects data and submaps from all robots to perform global pose-graph optimization, ensuring high accuracy and consistency for small teams of 5-10 agents. For instance, systems like use centralized fusion with tools such as GTSAM for joint optimization, achieving localization errors below 1% in underground settings. In contrast, decentralized architectures allow each robot to maintain and optimize its local map independently, incorporating inter-robot loop closures via data sharing to reduce central bottlenecks. This approach, as in CSIRO's system, yields errors around 22 cm in urban trials but risks misalignment without frequent closures. Distributed variants further minimize communication by enabling local optimizations with partial exchanges, though they remain less mature for real-time applications. Map merging is essential for integrating local submaps into a global representation, typically involving relative pose estimation between robots and alignment via shared landmarks or geometric transforms. Probability-based methods estimate poses using probabilistic models when robots , fusing maps by maximizing overlap consistency. Optimization techniques, such as those employing genetic algorithms, align maps by minimizing transformation errors between overlapping regions identified through features like SIFT or line segments. Widely adopted approaches include for non-iterative alignment of linear features and mean-shift clustering for merging laser scan line segments, ensuring efficient data association without exhaustive computation. Communication protocols facilitate submap sharing while managing limitations in resource-constrained settings. -based protocols enable decentralized exchanges of keyframes or features, as in CoSLAM and DDF-SAM, where robots selectively broadcast summaries to propagate updates efficiently across the network. Tree-based protocols, used in hierarchical systems like CCM-SLAM, organize data flow through a structured , funneling submaps to nodes to minimize redundant transmissions. These methods handle by prioritizing processed data over raw sensors, with gossip improving global map quality by up to 50% in stable conditions. Scalability in multi-robot SLAM varies by architecture, with centralized fusion suiting small teams of 2-4 robots by aggregating all data at a single node for precise coordination, though it demands high . For larger swarms, hierarchical approaches distribute computation across layers, combining local optimizations with selective fusions as in Kimera-Multi and C2TAM, enabling adaptability in expansive environments. Recent developments include consensus algorithms for robust multi-robot loop closure, such as the Incremental Manifold Edge-based Separable ADMM (iMESA), which uses biased priors in local factor graphs and dual gradient ascent to enforce shared variable consistency via sparse communications. This enables state estimation in teams operating over extended periods, with applications in search-and-rescue scenarios like disaster zones where communication is intermittent. Systems like LAMP 2.0 leverage such for high-precision localization in underground rescue operations, supporting heterogeneous robot teams.

Biological and Bio-Inspired Approaches

Biological and bio-inspired approaches to simultaneous localization and mapping () draw from neural mechanisms observed in , aiming to replicate efficient, low-resource in robotic systems. These methods emphasize strategies and neural models that enable robust mapping and localization in uncertain environments, often prioritizing energy efficiency and adaptability over exhaustive computation. Seminal work in this area includes the RatSLAM system, which models the rodent to perform topological mapping using place cells for recognizing locations and grid cells for estimating spatial layout. Place cells fire in response to specific positions, forming a that supports loop closure by matching current sensory input to stored experiences, while grid cells provide a periodic representation of space to correct drift over large areas. This bio-mimetic framework has demonstrated persistent navigation in complex, large-scale environments, such as suburban areas, using only visual input from a single camera. Insect navigation strategies offer additional inspiration, particularly through path integration and derived from optic flow. Ants, for instance, maintain a home vector by integrating self-motion cues—such as stride counts and body orientation—alongside optic flow from environmental texture to estimate and during excursions. This mechanism allows homing without continuous landmarks, addressing challenges in feature-sparse terrains. Robotic implementations, like the AntBot platform, replicate these processes using minimalist sensors: a low-resolution optic flow detector measures ground motion, while a celestial compass tracks skylight polarization for directional stability. In outdoor trials over distances up to 20 meters, AntBot achieved sub-meter accuracy (mean homing error of 0.67% of trajectory length) in pose estimation, highlighting the efficacy of insect-inspired path integration for lightweight in GPS-denied settings. Bio-inspired algorithms extend these principles through , which emulates to process asynchronous events, mimicking retinal cells for efficient sensing. leverages dynamic vision sensors that output pixel-level changes only upon brightness shifts, drastically reducing data volume and power draw compared to frame-based cameras. , implemented on spiking hardware, enable real-time topological mapping in cluttered spaces by encoding spatial novelty via burst activity, with reported energy savings of up to 90% in edge devices. These systems excel in high-speed or low-light scenarios, where traditional methods falter due to or high . Hybrid models integrate with paradigms that mimic animal , using active to optimize and map refinement. In this approach, agents treat as a , where policies are learned to minimize —balancing uncertainty reduction with goal-directed actions akin to resource-seeking behaviors in mammals. For example, generalized (G-SLAM) frameworks employ hierarchical to fuse sensory data and predict trajectories, enabling adaptive during simulated tasks. Such hybrids have shown improved sample efficiency in dynamic environments, converging on accurate maps with 20-30% fewer iterations than baseline probabilistic . Despite these advances, bio-inspired neural models face scalability limitations relative to traditional methods. Hippocampal-inspired systems like RatSLAM struggle with in pose cell representations for expansive or highly detailed maps, leading to increased computational overhead beyond 1 km² areas. Neuromorphic approaches, while power-efficient, often exhibit drift accumulation in long-term due to sparse in static scenes, requiring fusion with classical filters for robustness. Overall, these models prioritize biological plausibility and efficiency in constrained settings but lag in precision and real-time performance for large-scale, metric-accurate compared to optimization-based techniques.

Specialized SLAM Variants

Visual and Visual-Inertial SLAM

Visual simultaneous localization and mapping () leverages camera sensors to estimate the trajectory of a moving agent and construct a of the environment, primarily through image-based processing. In setups, a single camera provides relative pose estimates but suffers from ambiguity, as the absolute distance to features cannot be determined without additional constraints. This ambiguity is resolved in configurations by exploiting the known between two cameras, which provides depth information via . Alternatively, in systems, can be recovered through structure-from-motion techniques that accumulate observations over multiple views or by fusing with inertial measurements from an IMU. Feature-based visual SLAM methods extract and track discrete keypoints from images, representing the scene as a sparse set of landmarks. A seminal example is , which operates in three parallel threads: tracking for pose estimation by matching ORB features between consecutive frames, local mapping to insert keyframes and perform local for consistency, and loop closing to detect revisits and apply via pose graph techniques. This pipeline enables robust performance in on , , and RGB-D cameras, achieving high accuracy in diverse environments. In contrast, direct methods minimize photometric errors directly on pixel intensities, avoiding explicit feature extraction for denser reconstructions. Direct Sparse Odometry (DSO) exemplifies this approach, jointly optimizing camera poses and sparse inverse depth maps by minimizing the reprojection of photometric residuals across a window of keyframes, while adaptively selecting active points to balance computation and accuracy. DSO produces semi-dense maps and demonstrates superior performance in texture-rich scenes compared to feature-based alternatives. Visual-inertial SLAM integrates IMU data with visual observations to enhance robustness, particularly in low-texture or fast-motion scenarios. Tightly-coupled optimizes camera poses, , and IMU in a single nonlinear least-squares problem, where IMU measurements are preintegrated between keyframes to form efficient relative motion constraints on the manifold of the special Euclidean group SE(3). Preintegration computes position, , and deltas while accounting for corrections via first-order approximations, reducing computational overhead in optimization. Systems like VINS-Mono implement this by initializing the IMU-camera extrinsics and , then performing marginalization on old keyframes to maintain a fixed-size sliding window for operation. Recent advances in visual and visual-inertial incorporate for semantic understanding, using neural networks to extract robust features and segment dynamic objects. Semantic visual employs deep features from models like SuperPoint for invariant keypoint detection, improving tracking in challenging conditions, while integrating object-level semantics via segmentation networks to filter out moving elements and enhance loop closure in dynamic scenes.

LiDAR and Radar SLAM

LiDAR-based SLAM leverages laser range finders to generate dense point clouds, enabling precise geometric mapping and localization in structured environments such as urban areas and indoor spaces. These systems process sequential scans to estimate poses and build representations, often outperforming visual methods in low-texture or varying lighting conditions due to direct distance measurements. A core component of LiDAR SLAM is scan registration, where the (ICP) algorithm aligns consecutive point clouds by iteratively minimizing the distance between corresponding points. Introduced as a general method for shape registration, ICP operates in a point-to-point or point-to-plane mode, converging to a local minimum of the mean-square error metric, and has been widely adopted in LiDAR applications for its computational efficiency in rigid transformations. For probabilistic matching, the Normal Distributions Transform (NDT) models point clouds as piecewise Gaussian distributions, dividing space into voxels and estimating parameters like mean and covariance for each, which allows robust alignment even with noise or partial overlaps in 2D and 3D settings. Extended to , NDT facilitates scan-to-map matching by scoring transformations based on the negative log-likelihood, reducing sensitivity to outliers compared to ICP. In autonomous vehicles, multi-layer fusion integrates 3D with (IMU) data to enhance robustness, particularly for high-speed motion where LiDAR scans suffer from distortion. Algorithms like LeGO-LOAM employ a two-pass approach: a module segments point clouds into features (edges and planes) while using IMU to undistort scans, followed by an optimization module that fuses these with IMU preintegration for pose estimation, achieving real-time performance with reduced drift in variable terrains. This tightly coupled fusion propagates IMU's high-frequency attitude estimates to correct LiDAR's sparse temporal sampling, improving accuracy in dynamic scenarios like off-road driving. Radar SLAM utilizes millimeter-wave sensors for all-weather operation, providing sparse detections influenced by beam patterns that determine angular resolution and range ambiguity. These patterns, typically conical with widths of 10-30 degrees, result in clustered point clouds with fewer than 100 points per , necessitating specialized handling to mitigate ambiguities from multipath reflections. Doppler-enabled estimation exploits shifts in radar returns to directly measure radial ego-motion, aiding initialization and reducing reliance on geometric features alone. Frameworks like Doppler-SLAM integrate this with inertial data, filtering dynamic clutter via thresholds and aligning scans using intensity or range-bearing models, achieving sub-meter accuracy in adverse visibility. Loop closure in point cloud-based detects revisits to correct accumulated drift, often employing global descriptors for efficient retrieval. Scan Context represents scans as 2D histograms of elevation-distance profiles, invariant to rotations and scalable for large vocabularies, enabling rapid matching via coarse-to-fine search with recall rates exceeding 90% in urban datasets. This descriptor captures vertical structures like , outperforming bag-of-words methods in viewpoint changes and integrating into pose-graph optimization for global consistency. Post-2023 advancements in -vision hybrids fuse dense geometric point clouds with semantic visual cues to mitigate long-term drift, enhancing robustness in feature-poor scenes. Systems like GSFusion employ 3D Gaussian splatting for joint optimization, where provides scale-accurate geometry and vision adds loop closure semantics through surfel-based . These approaches leverage 's precision for initialization while using visual semantics to resolve ambiguities, as demonstrated in urban driving benchmarks.

Acoustic and Audiovisual SLAM

Acoustic SLAM leverages sensors, such as multibeam echosounders and side-scan , to enable simultaneous localization and mapping in underwater environments where optical and are unavailable. These systems measure time-of-flight distances to construct acoustic range profiles or images, facilitating vehicle pose estimation and environmental mapping through scan matching techniques like () variants. Seminal work by Ribas et al. demonstrated early feasibility of acoustic SLAM in structured underwater settings using forward-looking for feature-based mapping. More recent advancements, such as the semi-direct SLAM method, adapt visual SLAM paradigms to acoustic data by minimizing photometric errors on images initialized via , achieving robust performance in real-time AUV operations. Audiovisual SLAM integrates acoustic data from microphone arrays with visual inputs from cameras to create enriched maps that include both geometric structures and audio source positions, particularly useful in low-visibility or reverberant scenarios. Microphone arrays localize sound sources via direction-of-arrival () estimation using methods like relative transfer functions and Gaussian mixture models, which are then fused with camera-derived point clouds to track dynamic elements such as speakers. For instance, vision-audio fusion projects audio onto RGB-D images to detect and exclude moving sound-emitting obstacles, enhancing map consistency in cluttered indoor or underwater settings. This approach supports speaker tracking by associating audio cues with visual detections, enabling persistent 3D audio maps for applications like human-robot interaction. Key challenges in acoustic and SLAM include in acoustics, which causes signal reverberations and false echoes, and the inherently low of compared to high-fidelity visuals, leading to sparse and noisy maps. In contexts, acoustic from currents and gradients further degrades bearing accuracy, while audiovisual must handle asynchronous data and varying lighting conditions. These issues are mitigated through probabilistic models that account for via direct-path filtering, though they increase computational demands for processing. Algorithms for acoustic-visual fusion often employ factor graphs to jointly optimize vehicle trajectories, landmarks, and sensor poses by incorporating factors for ranges, visual features, and audio . In speaker tracking scenarios, audiovisual systems use particle filters or graph-based optimization to maintain multi-hypothesis tracks, fusing outputs with camera detections for robust localization. For example, pose-graph frameworks integrate acoustic and optical data in a , leveraging complementary strengths to reduce drift in AUV . Niche applications include (AUV) navigation for bathymetric surveying and harbor inspection, where acoustic provides drift-corrected trajectories over extended missions. Recent advances in opti-acoustic semantic , such as the 2024 method for unknown objects in environments, enable without prior labeling using graphs for .

Implementation Frameworks

Filter-Based Methods

Filter-based methods in simultaneous localization and (SLAM) employ probabilistic recursive techniques to maintain an estimate of the robot's pose and the map in , leveraging Bayesian filtering to handle uncertainty from noisy measurements and motion models. These approaches process sequentially, updating the posterior over the joint of pose and map at each timestep, which enables efficient computation suitable for operation. The core idea is to represent the state estimate via a Gaussian or particle-based , propagating it through and correction steps derived from motion and models. The (EKF) SLAM represents a foundational filter-based approach, where the is augmented to include both the robot's pose \mathbf{x}_r and the map features \mathbf{m}, forming \mathbf{x} = [\mathbf{x}_r^T, \mathbf{m}^T]^T. During the prediction step, the state mean and are propagated using the nonlinear motion model, approximated linearly via the \mathbf{F} of the motion function with respect to the pose; this yields the predicted state \hat{\mathbf{x}}_{k|k-1} = f(\hat{\mathbf{x}}_{k-1|k-1}, \mathbf{u}_k) and \mathbf{P}_{k|k-1} = \mathbf{F} \mathbf{P}_{k-1|k-1} \mathbf{F}^T + \mathbf{Q}, where \mathbf{u}_k is the control input and \mathbf{Q} is process noise . The update step incorporates \mathbf{z}_k from landmarks, using the \mathbf{H} to compute the innovation \mathbf{S} = \mathbf{H} \mathbf{P}_{k|k-1} \mathbf{H}^T + \mathbf{R}, Kalman gain \mathbf{K} = \mathbf{P}_{k|k-1} \mathbf{H}^T \mathbf{S}^{-1}, and corrected state \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K} (\mathbf{z}_k - h(\hat{\mathbf{x}}_{k|k-1})), with \mathbf{R} as measurement noise and h the model. This formulation allows incremental map building but grows quadratically in complexity with the number of landmarks due to the size. To address the linearization inaccuracies of the in highly nonlinear settings, the unscented Kalman filter (UKF) SLAM uses sigma-point sampling to propagate the mean and through the true nonlinear functions without explicit Jacobians. In UKF, a set of deterministically chosen sigma points—typically $2n+1 for an n-dimensional state—are sampled from the current Gaussian approximation, transformed via the motion and observation models, and then used to compute weighted statistics for the predicted and updated distributions; this captures higher-order moments more accurately than Taylor-series . For instance, the sigma points \mathcal{X}_i are generated as \mathcal{X}_0 = \hat{\mathbf{x}} and \mathcal{X}_i = \hat{\mathbf{x}} + (\sqrt{(n+\lambda) \mathbf{P}})_i for i=1,\dots,n, with \lambda a scaling parameter, enabling robust handling of non-Gaussian effects in pose and landmark updates. UKF-SLAM maintains similar computational scaling to EKF but improves consistency in challenging environments like those with wide-angle sensors. Particle filter-based methods, exemplified by FastSLAM, extend recursive to non-parametric representations by Rao-Blackwellization, factorizing the posterior p(\mathbf{x}_{1:t}, \mathbf{m} | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) into pose trajectory samples and conditional map estimates. In FastSLAM, a set of M particles represents the pose history \{\mathbf{x}_{1:t}^{(m)}\}_{m=1}^M, each augmented with an EKF maintaining the map \mathbf{m}^{(m)} | \mathbf{x}_{1:t}^{(m)}; motion updates sample new poses using a proposal distribution (often motion model perturbed by ), while observations update individual EKFs and resample particles based on likelihoods to avoid degeneracy. This scales linearly with map size per particle, making it suitable for large environments, and the particle cloud approximates multimodal posteriors effectively. Despite their efficiencies, filter-based methods face key limitations: EKF-SLAM suffers from linearization errors that accumulate, leading to inconsistent estimates where the covariance underestimates true , particularly in loops or with sparse observations. Similarly, particle filters like FastSLAM are prone to particle depletion, where resampling concentrates weights on few particles, reducing diversity and causing premature convergence to suboptimal modes, especially under significant . To mitigate these, hybrid variants employ EKF or UKF for small-scale, operation in local maps, transitioning to batch optimization methods for global consistency as the map grows, such as by extracting submaps and refining them offline. These hybrids balance the sequential speed of filters with the accuracy of global adjustments, improving scalability in practical deployments.

Optimization-Based Methods

Optimization-based methods in simultaneous localization and mapping (SLAM) formulate the problem as a nonlinear least-squares optimization over a structure, enabling global consistency by minimizing errors across all measurements and constraints. These approaches represent poses and landmarks as nodes in a pose , with edges encoding relative pose constraints derived from , observations, or closures, contrasting with filter-based methods that provide local estimates for operation. The optimization seeks to find the configuration that best explains the data, typically solved iteratively using sparse techniques to handle large-scale problems efficiently. GraphSLAM, a foundational , models the problem as a pose graph where nodes correspond to poses and edges to constraints, such as relative transformations from sensor . The objective is to minimize the sum of squared residuals weighted by their , formulated as: \arg\min_{\mathbf{x}} \sum_i \| \mathbf{e}_i(\mathbf{x}) \|^2_{\Sigma_i} where \mathbf{x} is the vector of pose variables, \mathbf{e}_i are the error terms for each constraint, and \Sigma_i are the covariance matrices capturing uncertainties. This least-squares formulation allows for batch or incremental solving, improving accuracy in environments with accumulated errors from . Early implementations demonstrated reduced drift compared to extended Kalman filters, with pose graph optimization achieving sub-meter accuracy in large indoor datasets. Bundle adjustment extends this framework specifically for visual SLAM by jointly optimizing camera poses and 3D landmarks to minimize reprojection errors of observed image features. In this process, landmark positions and pose estimates are refined together, incorporating geometric constraints from multiple views to reconstruct sparse maps. This method is central to systems like ORB-SLAM, where it corrects for both pose and structure inconsistencies, yielding precise 3D maps with errors below 1% of the environment scale in benchmark sequences. Unlike pose-graph-only optimization, bundle adjustment explicitly handles landmark covariances, enhancing robustness in feature-rich scenes. Sparse solvers are essential for scalability in these optimizations, with the Levenberg-Marquardt algorithm providing a robust that blends and Gauss-Newton steps to navigate nonlinearities. Libraries like g2o implement this for graph-based problems, supporting incremental solving through sparse Cholesky factorization and variable reordering to exploit graph sparsity, reducing computation from O(n^3) to near-linear in practice for pose graphs with thousands of nodes. g2o has been widely adopted in visual and , enabling real-time optimization on consumer hardware with convergence in under 100 ms per iteration for medium-scale maps. Optimization-based SLAM distinguishes between full , which re-optimizes the entire history for global consistency, and incremental approaches that update only affected variables for online efficiency. The iSAM algorithm exemplifies incremental by maintaining a square-root information matrix and using Bayes tree factorization to perform targeted relinearization and elimination upon new measurements or loop closures, achieving up to 100-fold speedups over full batch methods while preserving near-optimal estimates. This enables continuous in dynamic applications, with demonstrated trajectory errors under 0.5 meters in urban driving scenarios spanning kilometers. Recent enhancements incorporate semantic constraints into pose graphs to handle dynamic scenes, where object detections provide additional edges excluding moving elements from optimization, improving robustness in cluttered environments with up to 30% fewer false positives in feature tracking. Post-2023 developments have focused on scalability, such as task-aware dense using complex-step finite differences for differentiable optimization, allowing gradient-based refinement of large maps in real-time at 30 Hz on GPUs, and frameworks that ensure consistent reconstructions across wide baselines with reduced drift in handheld systems. These advances support deployment in multi-robot and long-term mapping tasks, scaling to millions of nodes without proportional compute increases.

Learning-Based Methods

Learning-based methods in simultaneous localization and mapping (SLAM) integrate and techniques to enhance perception, adaptation, and robustness, particularly in complex or data-scarce environments. These approaches leverage neural networks to learn representations directly from raw sensor data, moving beyond traditional handcrafted features and geometric models. By training on large datasets, they enable SLAM systems to handle variations in lighting, occlusions, and dynamics more effectively, often achieving superior performance in real-world scenarios where classical methods falter. A key advancement is deep feature extraction using convolutional neural networks (s), which replace manually designed descriptors with learned ones for improved invariance and matching accuracy in visual . For instance, SuperPoint employs a self-supervised to simultaneously detect interest points and compute dense descriptors from images, outperforming traditional methods like SIFT in repeatability and matching on challenging datasets such as KITTI. This integration allows SLAM pipelines to extract more robust features for pose estimation and , reducing drift in long-term trajectories. Similarly, end-to-end learning frameworks directly predict camera poses from sequential raw images without intermediate feature steps. DeepVO, a recurrent -based model, estimates monocular by processing image stacks through convolutional and LSTM layers, demonstrating lower absolute trajectory errors compared to geometric VO on the KITTI benchmark. Semantic and hybrid SLAM further incorporates learning for higher-level understanding, such as using (DRL) to optimize exploration paths in unknown environments and for loop closure detection. DRL agents, trained via trial-and-error interactions, select viewpoints that maximize information gain for mapping, as surveyed in applications where they improve coverage by adapting to environmental uncertainties. deep networks, like those based on autoencoders, detect loop closures by learning compact image representations for efficient retrieval, enabling correction of accumulated errors in large-scale SLAM without labeled data. Recent developments from 2023 to 2025 have advanced dense mapping with neural radiance fields (), which implicitly represent scenes for photorealistic reconstruction and robust tracking in dynamic settings; for example, NeRF-SLAM variants achieve up to 25% lower absolute trajectory error in dynamic sequences by filtering outliers through radiance priors. Diffusion models have also emerged for uncertainty estimation, generating probabilistic pose distributions to quantify mapping and enhance in ambiguous scenarios. These innovations collectively boost robustness in dynamic environments by 25-40% in trajectory accuracy metrics on benchmarks like TUM RGB-D. Despite these gains, learning-based faces challenges in across unseen environments and high computational overhead. Models trained on specific datasets often underperform in conditions due to domain shifts, requiring techniques like or to broaden applicability. Additionally, the inference demands of deep networks, especially for real-time NeRF rendering, can exceed resource limits on embedded hardware, prompting ongoing research into efficient architectures like lightweight CNNs and quantized models.

Historical Development

Early Foundations

The foundations of simultaneous localization and mapping (SLAM) emerged in the 1980s through pioneering work on probabilistic representations of spatial uncertainty in robotics. Researchers at SRI International, including Randall Smith, Matthew Self, and Peter Cheeseman, introduced the concept of the stochastic map, a framework for estimating uncertain spatial relationships between landmarks using Bayesian inference to propagate uncertainties in feature positions and robot poses. This approach laid the groundwork for handling the dual challenges of localization and mapping by modeling the environment as a network of probabilistic relations rather than deterministic coordinates. Concurrently, Hugh Durrant-Whyte developed consistent estimation techniques for integrating noisy sensor data into spatial models, emphasizing the importance of maintaining correlations in multi-sensor fusion for accurate map building in mobile systems. These efforts, often applied in early cartographic and navigation prototypes, established SLAM as a probabilistic problem solvable through statistical methods. The term "SLAM" was formally coined in a 1995 survey paper by Durrant-Whyte and colleagues. In the 1990s, the integration of extended Kalman filters (EKF) marked a significant advancement, enabling implementations on mobile robots. Larry Matthies at NASA's advanced stereo vision techniques for autonomous , demonstrating how stereo disparity maps could provide dense environmental models to support localization and obstacle avoidance in unstructured terrains, as seen in planetary rover prototypes. Early EKF applications, such as those by Michel Moutarlier and Raja Chatila in 1989, incorporated evidence-based updates for feature-based , while John Leonard and Hugh Durrant-Whyte's 1991 work on directed sensing formalized EKF for underwater and indoor robot , treating maps as augmented state vectors to recursively estimate poses and landmarks. These developments shifted from offline estimation to online processing, though limited by the computational demands of maintaining full matrices. A pivotal formalization occurred in 2004 with and colleagues' introduction of sparse extended filters (SEIF) for EKF-SLAM, which addressed the neglect of inter-landmark correlations in traditional formulations by exploiting the sparse structure of matrices to achieve . This method proved that SLAM solutions converge to consistent estimates in the limit of infinite data, mitigating error accumulation from overlooked dependencies and enabling larger-scale maps without prohibitive memory use. Initial challenges persisted, particularly computational limits on 1990s hardware, which restricted implementations to sparse feature sets—typically tens of landmarks—and sequential processing to avoid inverting high-dimensional covariances. The seminal textbook Probabilistic Robotics by , Burgard, and Fox, published in 2005, synthesized these foundations into a comprehensive framework, detailing EKF-based algorithms and their probabilistic underpinnings as essential tools for robotic perception.

Modern Milestones

The marked a pivotal era for , with the emergence of , open-source systems that enhanced scalability and accessibility. ORB-SLAM, introduced in 2015, represented a breakthrough in feature-based monocular visual , enabling robust tracking and mapping in diverse environments through oriented FAST and rotated BRIEF () features, loop closure, and , achieving sub-centimeter accuracy in applications. Similarly, Google's Cartographer, released in 2016, advanced LiDAR-based with a graph-optimized approach incorporating loop closure via branch-and-bound scan matching, facilitating high-precision 2D and 3D mapping for indoor and outdoor . The rise of multi-sensor fusion further propelled SLAM's robustness in the late 2010s. VINS-Mono, developed in 2017, integrated visual and inertial measurements in a tightly coupled optimization framework, delivering drift-free pose estimation and scale recovery suitable for aerial and ground robots, with demonstrated accuracy improvements over visual-only methods in challenging motion scenarios. Entering the 2020s, deep learning integrations transformed by improving feature extraction and generalization. DROID-SLAM, proposed in 2021, leveraged recurrent neural networks for dense , stereo, and RGB-D mapping, outperforming classical methods on benchmarks like TUM RGB-D by achieving lower absolute error through learned and depth prediction. Concurrently, frameworks like OpenVSLAM, introduced in 2019 and extended into the 2020s, supported semantic enhancements, incorporating object-level understanding for more interpretable maps in dynamic scenes. From 2023 to 2025, evolved toward hybrid paradigms and practical deployment. -vision fusion emerged as a for robust perception in varied conditions, with methods combining geometric constraints from point clouds and semantic cues from cameras to achieve centimeter-level localization in adverse weather, as evidenced in comprehensive reviews of over 50 systems. hybrids integrated neural radiance fields for dense, implicit scene representations, enabling photorealistic mapping and relocalization with reduced computational overhead compared to traditional grids, as surveyed in implicit SLAM advancements. Commercialization accelerated in autonomous vehicles, where underpins stacks in production systems from companies like and . Open-source ecosystems, particularly ROS packages, significantly boosted SLAM adoption by providing modular integrations like cartographer_ros and orb_slam3_ros, enabling and community-driven improvements that lowered barriers for industrial and academic use, with over 500 million package downloads by 2020.

Applications

In Robotics and Autonomous Systems

Simultaneous localization and mapping () plays a pivotal role in enabling and autonomous systems to navigate unknown environments by simultaneously estimating their pose and constructing environmental maps in . In , is essential for tasks requiring precise localization and path planning without reliance on external infrastructure like GPS. For instance, autonomous mobile robots (AMRs) in warehouse settings commonly employ 2D LiDAR-based to generate occupancy grid maps, facilitating efficient navigation around dynamic obstacles such as moving pallets or workers. This approach has been demonstrated to achieve localization errors below 5 cm in industrial environments, supporting applications in logistics automation by companies like . In autonomous vehicles, extends to three-dimensional mapping using techniques that integrate and camera data to create high-definition () maps for urban driving scenarios. These systems process point clouds and visual features to handle challenges like occlusions from traffic and varying lighting, enabling safe trajectory planning and . A notable example is the use of LiDAR-camera in self-driving cars, where fusion algorithms reduce pose estimation drift to under 10 cm over kilometer-scale trajectories, as validated in datasets like KITTI. Such capabilities are critical for level 4 and 5 autonomy, with deployments in vehicles from manufacturers like . For unmanned aerial vehicles (UAVs) and drones, visual-inertial (VI-SLAM) is widely adopted to enable flight in GPS-denied environments, such as indoors or urban canyons, by combining camera imagery with (IMU) data for robust state estimation and obstacle avoidance. This method supports real-time , allowing drones to maintain trajectories with position accuracies of 1-2% of the flight distance, as shown in experiments with systems like ORB-SLAM3 integrated with . Applications include search-and-rescue operations and aerial surveying, where VI-SLAM enables collision-free navigation in cluttered spaces. Underwater autonomous underwater vehicles (AUVs) leverage acoustic to map floors and structures in environments where optical sensors fail due to low visibility and light attenuation. By using sonar arrays for range and bearing measurements, these systems construct bathymetric maps while localizing the vehicle, achieving mapping resolutions on the order of meters in deep-sea surveys. For example, acoustic SLAM has been applied in AUVs for pipeline inspection, with demonstrated error reductions of up to 30% compared to dead-reckoning alone. As of 2025, emerging trends in highlight collaborative , where multiple robots share map data to accelerate large-scale mapping in expansive areas like disaster zones or agricultural fields. This multi-robot approach builds on centralized fusion techniques to distribute computational load and enhance robustness against individual sensor failures, with prototypes showing improved coverage in simulations and field tests.

In Augmented and Virtual Reality

Simultaneous localization and mapping () plays a pivotal role in () and () by enabling devices to track user movements and map environments in , thus overlaying virtual elements onto the physical world with high fidelity. In applications, facilitates stable anchoring of , while in , it supports seamless without external sensors. This enhances by allowing persistent virtual interactions that adapt to the user's surroundings. In AR tracking, visual is employed in devices like the to enable precise anchor placement for holograms. The HoloLens leverages its and internal sensors to perform visual , creating a spatial that intersects with the user's gaze for anchor creation. These anchors maintain coordinate systems over time, ensuring holograms remain fixed relative to the real world even as the user moves, which is essential for applications like remote collaboration or architectural visualization. For VR locomotion, inside-out tracking combines with inertial measurement units () and cameras to achieve room-scale mapping without base stations. In systems like the (now Meta Quest), SLAM processes video feeds from headset-mounted cameras to detect environmental features and estimate six-degree-of-freedom pose, fused with IMU data for low-drift tracking during natural movements. This approach supports expansive play areas, such as in room-scale VR games, by continuously updating a lightweight map of the user's space to prevent collisions and enable intuitive navigation. Handheld AR on smartphones utilizes for persistent overlays, as seen in Pokémon GO's evolutions and playground features. Niantic's Visual Positioning System (VPS), built on visual principles, processes camera frames against pre-built maps to anchor Pokémon with centimeter-level accuracy at real-world locations like PokéStops. This allows shared, session-persistent AR experiences where virtual elements remain stable across devices and visits, enhancing social gameplay without relying solely on GPS. AR SLAM faces significant challenges, including stringent low-latency requirements and the need for lighting invariance to maintain tracking reliability. Low-latency demands arise from IMU limitations and real-time processing needs, where delays can cause or desynchronization in dynamic scenes, as observed in indoor AR experiments spanning over 100 hours. Lighting variations further complicate feature detection, leading to tracking failures in low-light or changing conditions, necessitating robust with alternative sensors like for consistency. Recent advancements from 2024 to 2025 have introduced semantic AR enhanced by for richer object interactions. Semantic visual integrates convolutional neural networks and transformers for and segmentation, enabling context-aware mapping where virtual elements interact meaningfully with detected real-world objects, such as placing holograms on specific furniture. These methods, like those using 3D Gaussian splatting and foundation models (e.g., Segment Anything Model), improve dynamic scene handling and open-vocabulary understanding, expanding to applications.

References

  1. [1]
    [PDF] Simultaneous Localisation and Mapping (SLAM) - People @EECS
    Part I (this paper) begins by providing a brief history of early developments in SLAM. Section III introduces the structure the SLAM problem in now standard ...
  2. [2]
    None
    Summary of each segment:
  3. [3]
    [PDF] PROBABILISTIC ROBOTICS
    ... Bayes Filters. 23. 2.4.1 The Bayes Filter Algorithm. 23. 2.4.2 Example. 24. 2.4.3 Mathematical Derivation of the Bayes Filter. 28. 2.4.4 The Markov Assumption.Missing: seminal | Show results with:seminal
  4. [4]
    ORB-SLAM: a Versatile and Accurate Monocular SLAM System - arXiv
    Feb 3, 2015 · Cite as: arXiv:1502.00956 [cs.RO] ; (or arXiv:1502.00956v2 [cs.RO] for this version) ; https://doi.org/10.48550/arXiv.1502.00956. Focus to learn ...
  5. [5]
    A Review of Simultaneous Localization and Mapping Algorithms ...
    Simultaneous localization and mapping (SLAM) is one of the key technologies for mobile robots to achieve autonomous driving, and the lidar SLAM algorithm is ...Missing: credible | Show results with:credible
  6. [6]
  7. [7]
    Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to ...
    In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between a camera and an inertial measurement ...Missing: URL | Show results with:URL
  8. [8]
    Radar SLAM: A Robust SLAM System for All Weather Conditions
    Apr 12, 2021 · This paper studies the use of a Frequency Modulated Continuous Wave radar for SLAM in large-scale outdoor environments.
  9. [9]
    [2304.09793] Event-based Simultaneous Localization and Mapping
    Apr 19, 2023 · This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams.Missing: seminal | Show results with:seminal
  10. [10]
    [PDF] A Survey of Modern & Capable Mobile Robotics Algorithms in ... - arXiv
    Jul 28, 2023 · These planners enable integration with modern robots such as Ackermann and legged robots, as well as providing accurate modeling of non-circular ...
  11. [11]
    [PDF] Dynamic model of robots: Newton-Euler approach
    Dynamic model of robots: Newton-Euler approach. Page 2. Approaches to dynamic modeling. (reprise) energy-based approach. (Euler-Lagrange). ▫ multi-body robot ...Missing: SLAM | Show results with:SLAM
  12. [12]
    [PDF] A solution to the simultaneous localization and map building (SLAM ...
    In conclusion, this paper discusses a number of key issues raised by the solution to the. SLAM problem including suboptimal map-building algorithms and map ...
  13. [13]
    Simultaneous localization and mapping: part I - IEEE Xplore
    Jun 30, 2006 · This paper describes the simultaneous localization and mapping (SLAM) problem and the essential methods for solving the SLAM problem and summarizes key ...
  14. [14]
    A Comprehensive Survey of Visual SLAM Algorithms - MDPI
    Map density: in general, dense reconstruction requires more computational resources than a sparse one, having an impact on memory usage and computational cost.
  15. [15]
    Active SLAM: A Review on Last Decade - PMC - PubMed Central
    This article presents a comprehensive review of the Active Simultaneous Localization and Mapping (A-SLAM) research conducted over the past decade.
  16. [16]
  17. [17]
    [PDF] Efficient optimization of information-theoretic exploration in SLAM
    The goal in this paper is to choose trajectories that lead to sensor data that results in the best map, maximizing both the map coverage and the map accuracy.Missing: seminal | Show results with:seminal
  18. [18]
    [PDF] Active Visual SLAM for Robotic Area Coverage: Theory and ...
    Sep 25, 2014 · Abstract. This paper reports on an integrated navigation algorithm for the visual simultaneous localization and mapping. (SLAM) robotic area ...
  19. [19]
    Planning exploration strategies for simultaneous localization and ...
    The paper focuses on planning optimal exploration strategies using a utility function, a randomized algorithm, and an efficient algorithm to explore steps ...Missing: seminal | Show results with:seminal
  20. [20]
    A survey: which features are required for dynamic visual ...
    Jul 16, 2021 · This article presents a survey on dynamic SLAM from the perspective of feature choices. A discussion of the advantages and disadvantages of different visual ...
  21. [21]
    DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes
    Jun 14, 2018 · In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object detection and background ...
  22. [22]
    DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments
    Sep 22, 2018 · DS-SLAM combines semantic segmentation network with moving consistency check method to reduce the impact of dynamic objects, and thus the localization accuracy ...
  23. [23]
    Dynamic visual SLAM algorithm for urban forest environments ...
    May 14, 2025 · The proposed YSLK-SLAM provides a reliable solution for autonomous navigation of mobile robots or unmanned aerial vehicle in urban forest environments.
  24. [24]
    Overview of Multi-Robot Collaborative SLAM from the Perspective of ...
    This paper provides a comprehensive review. First, the development history of multi-robot collaborative SLAM is reviewed. Second, the fusion algorithms and ...
  25. [25]
    [PDF] Present and Future of SLAM in Extreme Underground Environments
    Abstract—This paper surveys recent progress and discusses future opportunities for Simultaneous Localization And Map- ping (SLAM) in extreme underground ...Missing: seminal | Show results with:seminal
  26. [26]
    A Review on Map-Merging Methods for Typical Map Types in ... - MDPI
    Multi-robot SLAM can significantly improve mapping efficiency, as the maps produced by different robots are merged to form a greater map, which avoids repeated ...
  27. [27]
  28. [28]
  29. [29]
  30. [30]
    [PDF] LAMP 2.0: A Robust Multi-Robot SLAM System for Operation in ...
    Abstract—Search and rescue with a team of heterogeneous mobile robots in unknown and large-scale underground envi- ronments requires high-precision localization ...
  31. [31]
    A hippocampal model for simultaneous localization and mapping
    Aug 7, 2025 · Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episodic memory to guide behaviour. The agent ...
  32. [32]
    Solving Navigational Uncertainty Using Grid Cells on Robots - NIH
    Nov 11, 2010 · We apply the RatSLAM robot navigation model to this paradigm to show that conjunctive grid cells can encode multiple hypotheses of spatial ...
  33. [33]
    [PDF] SLAM Algorithms based on Place and Grid Cells Models
    Sep 26, 2014 · • ”Biologically-inspired robot spatial cognition based on rat neurophysiological studies” - Barrera and Weitzenfeld [2008]. • ”Robustness of ...
  34. [34]
    The internal maps of insects | Journal of Experimental Biology
    Feb 6, 2019 · Insect navigation is strikingly geometric. Many species use path integration to maintain an accurate estimate of their distance and direction.
  35. [35]
    AntBot: A six-legged walking robot able to home like desert ants in ...
    Feb 20, 2019 · We tested several ant-inspired solutions to outdoor homing navigation problems on a legged robot using two optical sensors equipped with just 14 pixels.
  36. [36]
    Application of Event Cameras and Neuromorphic Computing to ...
    Event cameras inspired by biological vision systems capture the scenes asynchronously, consuming minimal power but with higher temporal resolution. Neuromorphic ...
  37. [37]
    Neuromorphic place cells - IOPscience
    May 20, 2024 · We provide a model for implementing dynamic neuromorphic SLAM systems for dynamic-scale mapping of cluttered environments, even when subject to ...
  38. [38]
    Generalized Simultaneous Localization and Mapping (G-SLAM) as ...
    We have developed a biologically-inspired SLAM architecture based on latent variable generative modeling within the Free Energy Principle and Active Inference ...
  39. [39]
    Bio-Inspired Topological Autonomous Navigation with Active ... - arXiv
    Aug 10, 2025 · However, they are prone to drift and may scale poorly with environmental complexity. To overcome these limitations, bio-inspired topological ...
  40. [40]
    [PDF] Assessing the Scalability of Biologically-Motivated Deep Learning ...
    This paper assesses the scalability of biologically-motivated deep learning models, exploring target-propagation and feedback alignment algorithms on MNIST, ...
  41. [41]
    Monocular visual SLAM, visual odometry, and structure from motion ...
    Sep 30, 2024 · Due to the scale ambiguity of monocular SLAM, the system initialization was carried out by giving the system a certain amount of scene-prior ...
  42. [42]
    On-Manifold Preintegration for Real-Time Visual-Inertial Odometry
    Dec 8, 2015 · In this paper, we address this issue by preintegrating inertial measurements between selected keyframes into single relative motion constraints.
  43. [43]
    A Robust and Versatile Monocular Visual-Inertial State Estimator
    Aug 13, 2017 · In this work, we present VINS-Mono: a robust and versatile monocular visual-inertial state this http URL approach starts with a robust procedure ...Missing: original | Show results with:original
  44. [44]
  45. [45]
    [PDF] ICP Algorithm: Theory, Practice And Its SLAM-oriented Taxonomy
    In this paper, we illustrate the theoretical principles of the ICP algorithm, how it can be used in surface registration tasks, and the traditional taxonomy of ...
  46. [46]
    [PDF] A method for registration of 3-D shapes
    The ICP algorithm always converges monotonically to the nearest local minimum of a mean- square distance metric, and experience shows that the rate of.
  47. [47]
    The Normal Distributions Transform: A New Approach to Laser Scan ...
    The paper presents a new convergence calculation method of the normal distributions transform (NDT) scan matching for high resolution grid map. The proposed ...
  48. [48]
    [PDF] The Three-Dimensional Normal-Distributions Transform - DiVA portal
    This dissertation proposes the normal-distributions transform, NDT, as a general 3D surface representation with applications in scan registration, localisation, ...
  49. [49]
    LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry ...
    Sep 14, 2019 · In this paper, we set |S|to 10. Using the range values computed during segmentation, we. can evaluate the roughness of point piin S ...
  50. [50]
    Doppler-SLAM: Doppler-Aided Radar-Inertial and LiDAR ... - arXiv
    Apr 15, 2025 · We propose a novel Doppler-aided radar-inertial and LiDAR-inertial SLAM framework that leverages the complementary strengths of 4D radar, FMCW LiDAR, and ...
  51. [51]
    [PDF] Scan Context: Egocentric Spatial Descriptor for Place ... - Giseop Kim
    Scan context and its search algorithm make loop- detection invariant to LiDAR viewpoint changes so that loops can be detected in places such as reverse revisit ...
  52. [52]
  53. [53]
    [PDF] On the Complexity and Consistency of UKF-based SLAM
    Abstract—This paper addresses two key limitations of the unscented Kalman filter (UKF) when applied to the simulta- neous localization and mapping (SLAM) ...
  54. [54]
    [PDF] FastSLAM: A Factored Solution to the Simultaneous Localization ...
    FastSLAM is an algorithm that estimates robot pose and landmark locations, scaling logarithmically with landmarks, and uses a tree-based structure for faster ...
  55. [55]
    [PDF] LIMITS TO THE CONSISTENCY OF EKF-BASED SLAM 1 José A ...
    Abstract: This paper analyzes the consistency of the classical extended Kalman filter (EKF) solution to the simultaneous localization and map building (SLAM).
  56. [56]
  57. [57]
    [PDF] A Hybrid Filter-based and Graph-based approach to SLAM - HAL
    A Hybrid Filter-based and Graph-based approach to SLAM. ROBIO, Dec 2010, China. pp.999. hal-00544729 . Page 2. A Hybrid Filter-based and Graph-based approach ...
  58. [58]
    [PDF] A Tutorial on Graph-Based SLAM
    In this paper we presented a tutorial on graph-based SLAM. Our aim was to provide the reader with sufficient details and insights to allow for an easy ...Missing: seminal | Show results with:seminal
  59. [59]
    [PDF] A Linear Approximation for Graph-based Simultaneous Localization ...
    Abstract—This article investigates the problem of Simultaneous. Localization and Mapping (SLAM) from the perspective of linear estimation theory.
  60. [60]
    [1902.03747] Visual SLAM: Why Bundle Adjust? - arXiv
    Feb 11, 2019 · Bundle adjustment is performed to estimate the 6DOF camera trajectory and 3D map (3D point cloud) from the input feature tracks.
  61. [61]
    Efficient Levenberg-Marquardt for SLAM - OpenReview
    Oct 10, 2024 · The Levenberg-Marquardt optimization algorithm is widely used in many applications and is well-known for its use in Bundle Adjustment (BA), ...
  62. [62]
    [PDF] iSAM: Incremental Smoothing and Mapping
    Sep 7, 2008 · In this paper we present incremental smoothing and map- ping (iSAM), which performs fast incremental updates of the square root information ...
  63. [63]
    High-Precision Visual SLAM for Dynamic Scenes Using Semantic ...
    In this paper, we present a highly robust visual SLAM system tailored for dynamic scenes. The proposed framework exhibits superior feature extraction and ...
  64. [64]
    [PDF] Global Optimization for Consistent 3D Instant Reconstruction
    Purposely, this paper introduces GO-SLAM, a deep- learning-based SLAM system featuring on-the-fly, globally consistent 3D reconstruction, facilitated by our ...<|control11|><|separator|>
  65. [65]
  66. [66]
    SuperPoint: Self-Supervised Interest Point Detection and Description
    This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry ...
  67. [67]
    DeepVO: Towards End-to-End Visual Odometry with Deep ... - arXiv
    Sep 25, 2017 · This paper presents a novel end-to-end framework for monocular VO by using deep Recurrent Convolutional Neural Networks (RCNNs).
  68. [68]
    A Survey on Reinforcement Learning Applications in SLAM - arXiv
    Aug 26, 2024 · This study specifically explores the application of reinforcement learning in the context of SLAM. By enabling the agent (the robot) to iteratively interact ...
  69. [69]
    [PDF] Lightweight Unsupervised Deep Loop Closure - Robotics
    In this paper, we propose a novel unsupervised deep neural network architecture of a feature embedding for visual loop closure that is both reliable and compact ...
  70. [70]
    (PDF) SLAM Meets NeRF: A Survey of Implicit SLAM Methods
    Oct 12, 2025 · NeRF-based SLAM in mapping aims to implicitly understand irregular environmental information using large-scale parameters of deep learning ...
  71. [71]
    [2502.20946] Generative Uncertainty in Diffusion Models - arXiv
    Feb 28, 2025 · We propose a Bayesian framework for estimating generative uncertainty of synthetic samples. We outline how to make Bayesian inference practical for large, ...
  72. [72]
    Visual Adaptive and Robust SLAM for Dynamic Environments - arXiv
    Oct 17, 2025 · Results show improved trajectory accuracy and robustness over state-of-the-art baselines, achieving up to 25% lower ATE RMSE than NGD-SLAM on ...
  73. [73]
    Review of deep learning-based visual SLAM: types, approaches ...
    Sep 24, 2025 · RSO-SLAM (Qin et al., 2024) fuses instance segmentation and optical flow, enhancing estimation accuracy and robustness in dynamic environments ...
  74. [74]
    Improve generalization for neural visual-SLAM with Bayes online ...
    Mar 31, 2025 · Among various deep learning-based SLAM systems, many exhibit low accuracy and inadequate generalization on non-training datasets.
  75. [75]
    [PDF] Real-Time Loop Closure in 2D LIDAR SLAM - Google Research
    Scan-to-scan matching is frequently used to compute relative pose changes in laser-based SLAM approaches, for example [1]–[4]. On its own, however, scan-to-scan ...
  76. [76]
    Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras - arXiv
    Aug 24, 2021 · We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth.
  77. [77]
    [1910.01122] OpenVSLAM: A Versatile Visual SLAM Framework
    Oct 2, 2019 · In this paper, we introduce OpenVSLAM, a visual SLAM framework with high usability and extensibility. Visual SLAM systems are essential for AR devices.
  78. [78]
    A Review of Research on SLAM Technology Based on the Fusion of ...
    The fundamental idea behind SLAM technology is to use sensors (such as LiDAR and vision sensors) to gather environmental data, a process that uses data ...
  79. [79]
    SLAM Meets NeRF: A Survey of Implicit SLAM Methods - MDPI
    Feb 26, 2024 · This paper provides in-depth insight into the innovation of SLAM and NeRF methods and provides a useful reference for future research.
  80. [80]
    Simultaneous Localization and Mapping Market Size, Share, Trends ...
    The simultaneous localization and mapping market is poised for remarkable expansion between 2025 and 2035, driven by growing demand for automation, robotics, ...
  81. [81]
    Inquiring the robot operating system community on the state of ...
    Oct 19, 2024 · This work focuses on the state of adoption of ROS 2. Specifically, the article presents a user experience questionnaire targeting the ROS community.
  82. [82]
    Spatial anchors - Mixed Reality | Microsoft Learn
    Jan 16, 2025 · A spatial anchor represents an important point in the world that the system tracks over time. Each anchor has an adjustable coordinate system.
  83. [83]
    C'mon and SLAM: How Oculus tackled portable, 6DOF tracking for ...
    Facebook shares a peek into how simultaneous localization and mapping (SLAM) technology on which Oculus Insight was built evolved to power the inside-out ...
  84. [84]
    Using Niantic's Visual Positioning System to Anchor Pokémon to ...
    Nov 13, 2024 · This feature is made possible by our Visual Positioning System (VPS), a service that aligns persistent digital content with real world locations ...Missing: SLAM overlays<|separator|>
  85. [85]
    Experience: Practical Challenges for Indoor AR Applications
    Dec 4, 2024 · This paper shares the challenges facing today's augmented reality (AR) smartphone applications, particularly in the realm of localization and tracking failure.
  86. [86]
    Browse Mixed reality on Meta Quest VR Games on Meta Quest | Meta Store
    **Summary of SLAM in Meta Quest for Mixed Reality (2024-2025):**