Fact-checked by Grok 2 weeks ago

Simultaneous localization and mapping

Simultaneous localization and mapping (SLAM) is the process by which a mobile robot or autonomous agent uses onboard sensors to construct a map of an unknown environment while simultaneously estimating its own location and orientation within that map, without relying on any prior information about the surroundings.^[1] This dual task addresses the fundamental "chicken-and-egg" problem in robotics, where accurate mapping requires knowing the agent's pose, and precise pose estimation depends on a reliable map.^[2] The origins of SLAM trace back to the mid-1980s, with foundational work on probabilistic representations of spatial uncertainty presented by researchers including Smith, Self, and Cheeseman at the 1986 IEEE Robotics and Automation Conference.^[1] The term "SLAM" was coined in a 1995 survey paper by Durrant-Whyte and others, marking the formalization of the problem, while early implementations drew from Bayesian estimation techniques to handle sensor noise and motion errors.^[1] Key milestones in the 1990s and 2000s included the development of the Extended Kalman Filter (EKF) for real-time SLAM, as detailed in Dissanayake et al. (2001), and the introduction of graph-based formulations by Lu and Milios (1997), which enabled scalable optimization over large datasets.^[2] By the 2010s, advancements shifted toward robust, vision-aided systems and semantic integration, reflecting over three decades of evolution from theoretical foundations to practical deployment.^[2] At its core, SLAM employs probabilistic models, such as maximum a posteriori (MAP) estimation, to fuse data from sensors like lidars, cameras, and inertial measurement units, often represented as factor graphs that capture spatial relationships between poses and landmarks.^[2] Challenges persist in areas like data association in feature-poor environments, handling dynamic objects, and achieving computational efficiency for long-term operation, but solutions like Rao-Blackwellized particle filters (e.g., FastSLAM) have improved consistency and accuracy.^[1] Widely regarded as a cornerstone of autonomous systems, SLAM enables applications in self-driving cars for real-time navigation, indoor robots for exploration and inspection, augmented reality for spatial anchoring, and underwater or aerial vehicles for mapping inaccessible areas.^[2]

Problem Definition

Mathematical Formulation

The SLAM problem is fundamentally a probabilistic estimation task that seeks to compute the joint posterior distribution over the robot's pose and the environment map given a sequence of sensor observations and control inputs. Formally, at time t, this is expressed as p(x_t, m \mid z_{1:t}, u_{1:t}), where x_t denotes the robot's pose (typically including position and orientation), m represents the map of the environment, z_{1:t} = \{z_1, \dots, z_t\} is the history of observations from onboard sensors, and u_{1:t} = \{u_1, \dots, u_t\} is the sequence of control actions applied to the robot.^[1] This posterior encapsulates the uncertainty inherent in both localization and mapping due to noisy sensors and motion. The derivation of this posterior relies on Bayesian filtering, recursively updating the estimate as new data arrives. Starting from the Chapman-Kolmogorov equation for the prediction step, the time-update integrates the motion model:

p(x_t, m \mid z_{1:t-1}, u_{1:t}) = \int p(x_t \mid x_{t-1}, u_t) \, p(x_{t-1}, m \mid z_{1:t-1}, u_{1:t-1}) \, dx_{t-1},

where the motion model p(x_t \mid x_{t-1}, u_t) captures the probabilistic effect of control u_t on the pose transition, often modeled as a Markov process assuming Gaussian noise from odometry errors.^[1] The measurement-update then applies Bayes' rule:

p(x_t, m \mid z_{1:t}, u_{1:t}) = \frac{p(z_t \mid x_t, m) \, p(x_t, m \mid z_{1:t-1}, u_{1:t})}{p(z_t \mid z_{1:t-1}, u_{1:t})},

incorporating the observation model p(z_t \mid x_t, m), which relates the current measurement z_t to the pose and map under conditional independence assumptions, with the normalizing constant ensuring the posterior integrates to 1.^[1] This recursive factorization enables online computation, though the high dimensionality of the joint distribution poses computational challenges. In landmark-based representations, the map m is typically a set of features m = \{m_i\}_{i=1}^N, where each m_i = (m_{i,x}, m_{i,y}) is the 2D position of a landmark in a global frame. For range-bearing sensors, common in robotics, an observation z_t = (r, \phi) measures the relative range r and bearing \phi to a landmark m_i from the robot's pose x_t = (x, y, \theta). The expected measurement function is nonlinear:

h(x_t, m_i) = \begin{pmatrix} \sqrt{(m_{i,x} - x)^2 + (m_{i,y} - y)^2} \\ \atantwo(m_{i,y} - y, m_{i,x} - x) - \theta \end{pmatrix},

with actual observations modeled as z_t = h(x_t, m_i) + v_t, where v_t is zero-mean Gaussian noise with covariance R_t.^[3] This setup assumes known data association between z_t and m_i, allowing the observation likelihood p(z_t \mid x_t, m_i) = \mathcal{N}(z_t; h(x_t, m_i), R_t) to update both pose and landmark estimates jointly. An alternative to explicit landmarks is the occupancy grid map, where m consists of a discrete grid of cells, each with an occupancy probability p(m_{i,j} = 1 \mid z_{1:t}, x_{1:t}) indicating the likelihood that cell (i,j) is occupied. Unlike landmark maps, this representation avoids feature extraction by directly modeling spatial occupancy via inverse sensor models, integrating observations into log-odds ratios for efficient updates: l(m_{i,j}) = \log \frac{p(m_{i,j}=1)}{p(m_{i,j}=0)}, updated recursively without requiring point correspondences.^[3] This grid-based approach is particularly suited for dense environments but increases storage demands compared to sparse landmark sets.

Key Challenges

One of the primary challenges in simultaneous localization and mapping (SLAM) is the accumulation of errors over time, stemming from odometry noise and uncertainties in sensor observations, which leads to drift in pose estimates. Odometry, which relies on integrating relative motion measurements from sources like wheel encoders or inertial sensors, inherently introduces errors that grow unbounded without correction, resulting in diverging trajectory and map estimates. This drift can cause the robot's perceived position to deviate significantly from its true location, compounding inaccuracies in subsequent mappings. Seminal work by Smith, Self, and Cheeseman highlighted how such uncertainties propagate through spatial relationships, necessitating probabilistic representations to model and mitigate error growth. The data association problem further complicates SLAM by requiring the correct matching of current observations to existing map features, often amid ambiguities in feature correspondence. In environments with numerous similar landmarks, such as repeated patterns in urban settings, associating a new sensor reading to the appropriate map element becomes error-prone, potentially leading to incorrect updates that corrupt the entire map. This issue is exacerbated by noisy sensors or partial occlusions, where multiple landmarks may appear viable matches, demanding robust validation techniques to avoid catastrophic failures. Durrant-Whyte and Bailey emphasized the fragility of early SLAM formulations to such mismatches, particularly in the extended Kalman filter approach. Scalability poses a significant computational hurdle as the map size increases, with naive SLAM algorithms exhibiting O(n²) complexity for n landmarks due to the need to update correlations across the entire state space. As the environment expands, maintaining and inverting the covariance matrix or information form becomes prohibitive in terms of memory and processing time, limiting applicability to large-scale or long-term operations. For instance, in outdoor navigation, maps can encompass thousands of features, rendering real-time execution infeasible without approximations. This quadratic growth was a focal point in early analyses, driving the development of sparse and hierarchical methods to manage complexity. Perceptual aliasing arises when similar environmental features produce indistinguishable sensor signatures from different locations, leading to erroneous loop closures or improper map merges. In symmetric or repetitive spaces, such as hallways or grid-like structures, the system may falsely identify revisited areas, injecting inconsistencies that amplify drift or fragment the map. This perceptual ambiguity challenges the reliability of feature-based matching and requires contextual cues beyond raw appearances to disambiguate. Cadena et al. noted how aliasing contributes to outliers in data association, underscoring its role in robust perception. Representing and propagating uncertainty in the high-dimensional state space, which jointly estimates robot poses and map features, remains a core difficulty due to the curse of dimensionality and non-linear dynamics. The SLAM posterior distribution involves thousands of variables, where correlations between all elements must be tracked, but approximations like Gaussian assumptions can fail under multi-modality or heavy-tailed noise. Propagating these uncertainties through motion and observation models demands efficient numerical methods to avoid computational explosion or loss of accuracy. Smith et al. introduced stochastic maps to handle such multivariate uncertainties, while Durrant-Whyte and Bailey discussed the impracticality of direct particle filtering in these spaces. Loop closure detection offers a partial remedy to drift by realigning trajectories upon reobservation, though it does not fully resolve underlying representational challenges.

Sensing and Modeling

Sensor Technologies

Visual sensors, primarily cameras, form the backbone of many SLAM systems due to their ability to capture rich environmental details through image formation, where light rays project onto a sensor plane to produce 2D intensity or color images. Monocular cameras provide a single viewpoint, enabling feature-based methods that extract invariant keypoints such as Scale-Invariant Feature Transform (SIFT) descriptors, which detect and describe local image features robust to scale and rotation changes, or Oriented FAST and Rotated BRIEF (ORB) features, offering computational efficiency for real-time processing in resource-constrained setups. However, monocular setups suffer from scale ambiguity, as depth information is not directly observable, limiting absolute positioning without additional cues. Stereo cameras address this by using two parallel viewpoints to compute disparity maps via triangulation, yielding metric-scale 3D reconstructions with advantages in textured environments but drawbacks in low-light or featureless scenes due to baseline-dependent accuracy. RGB-D cameras, such as those using structured light or time-of-flight principles, augment RGB images with per-pixel depth, facilitating direct dense mapping; they excel in indoor settings with high resolution but are constrained by short range (typically under 5 meters) and sensitivity to ambient light.^[4] Range sensors provide direct distance measurements, complementing visual data in challenging visibility conditions. LiDAR (Light Detection and Ranging) systems emit laser pulses and measure time-of-flight to generate 2D or 3D point clouds, with scanning mechanisms like mechanical rotation or solid-state arrays enabling high-resolution (centimeter-level) mapping over long ranges (up to hundreds of meters); advantages include precision in sparse environments and independence from lighting, though limitations arise from high cost, mechanical wear in rotating units, and sparsity in dynamic scenes. Ultrasonic sensors, operating on acoustic wave propagation, offer short-range (up to 5-10 meters) distance estimates via echo timing, ideal for low-cost obstacle avoidance in robotics; they provide robustness to dust or smoke but suffer from narrow beam angles (10-30 degrees), leading to coarse resolution and susceptibility to multipath reflections in cluttered spaces.^[5]^[6] Inertial sensors, embodied in Inertial Measurement Units (IMUs), integrate accelerometers and gyroscopes to estimate motion without external references. Accelerometers detect linear accelerations along three axes, while gyroscopes measure angular rates, allowing dead-reckoning through double integration for position and single integration for orientation; this provides high-frequency (hundreds of Hz) updates for bridging gaps in other sensor data but accumulates drift errors rapidly due to noise and bias instabilities. IMUs are compact and low-power, making them ubiquitous in mobile SLAM, yet require fusion with exteroceptive sensors to mitigate unbounded error growth over time.^[7] Other modalities extend SLAM capabilities in niche scenarios. Radar sensors, particularly Frequency Modulated Continuous Wave (FMCW) types, use radio waves for all-weather ranging and velocity estimation via Doppler shifts, offering penetration through fog or rain with ranges exceeding 100 meters; they enable robust outdoor mapping but produce sparse, noisy point clouds due to low angular resolution. Event cameras, neuromorphic sensors that asynchronously record per-pixel brightness changes (events) at microsecond resolution, capture high dynamic range (over 120 dB) and low-latency motion data, advantageous for high-speed or varying-light environments; however, their output requires specialized processing to reconstruct traditional images or features, and they lack absolute intensity information.^[8]^[9] Sensor fusion combines these modalities to enhance robustness, such as visual-inertial odometry (VIO), where IMU data recovers scale and predicts motion between camera frames, reducing drift in monocular setups through tightly coupled estimation of poses and biases. This approach leverages complementary strengths—cameras for global consistency and IMUs for short-term accuracy—while addressing individual limitations like visual occlusions or inertial drift via probabilistic models.^[7]

Kinematics and Dynamics Modeling

In simultaneous localization and mapping (SLAM), kinematic and dynamic models predict the robot's pose evolution based on control inputs, forming the prediction step in probabilistic frameworks. These models account for the robot's motion constraints and integrate odometry data to estimate state transitions, while propagating uncertainties to maintain accurate posterior distributions. Kinematic models assume instantaneous velocity responses without inertial effects, suitable for low-speed operations, whereas dynamic models incorporate forces and accelerations for higher-fidelity predictions in non-holonomic systems. Kinematic models commonly used in SLAM include the differential drive, unicycle, and Ackermann steering configurations. For differential drive and unicycle robots, the motion is parameterized by linear velocity v and angular velocity \omega, updating the pose (x, y, \theta) over time step \Delta t as follows:

\begin{aligned} x_t &= x_{t-1} + \Delta t \cdot v \cos\left(\theta_{t-1} + \frac{\omega \Delta t}{2}\right), \\ y_t &= y_{t-1} + \Delta t \cdot v \sin\left(\theta_{t-1} + \frac{\omega \Delta t}{2}\right), \\ \theta_t &= \theta_{t-1} + \omega \Delta t. \end{aligned}

This formulation approximates the curved trajectory with a midpoint rotation, providing a balance between simplicity and accuracy for wheeled platforms.^[3] For Ackermann steering vehicles, such as cars, the model extends to include a steering angle \phi, relating velocities at the front and rear axles while enforcing non-holonomic constraints that prevent sideways motion; the instantaneous center of rotation lies at the intersection of the wheel axes extended lines.^[10] Dynamic models build on kinematics by incorporating inertial effects, accelerations, and external forces, particularly for non-holonomic systems where velocity limits and slip must be modeled. The Newton-Euler formulation recursively computes joint torques and accelerations by balancing linear and angular momentum across the robot's links, enabling predictions that account for mass distribution and friction. These are essential for high-speed or uneven terrain scenarios in SLAM, though computationally intensive.^[11] Odometry integration estimates control inputs u_t = (v, \omega) from wheel encoders, which measure wheel rotations to compute relative displacements, or from inertial measurement units (IMUs) via dead-reckoning of accelerations and angular rates. Noise in these estimates, such as encoder quantization or IMU drift, is typically modeled as zero-mean Gaussian perturbations added to the motion parameters, with covariances scaled by distance traveled to capture accumulating errors.^[3] In probabilistic SLAM, uncertainty propagation evolves the state covariance matrix through the motion model's Jacobian, as in the extended Kalman filter prediction step: P_{t|t-1} = \mathbf{F} P_{t-1} \mathbf{F}^T + \mathbf{Q}, where \mathbf{F} linearizes the nonlinear motion function and \mathbf{Q} is the process noise covariance; this ensures the filter tracks pose and map uncertainties consistently.^[12]

Core SLAM Techniques

Mapping and Localization Basics

In simultaneous localization and mapping (SLAM), the core processes involve iteratively estimating the agent's pose within an environment while concurrently constructing and refining a map of that environment. This joint estimation addresses the inherent coupling between pose and map, where inaccuracies in one directly impact the other, forming the basis of the SLAM posterior distribution over trajectories and maps.^[13] Traditional approaches begin with an initial pose estimate derived from odometry or prior knowledge, followed by alignment of sensor observations to update both the pose and the map incrementally.^[13] Localization in SLAM refers to the task of estimating the agent's current pose—typically its position and orientation—relative to a pre-existing or partially built map. This is achieved through techniques such as scan matching, which aligns consecutive sensor scans (e.g., laser or visual features) to compute relative transformations, or particle filters, which maintain a set of weighted hypotheses (particles) representing possible poses and propagate them based on motion models and observation likelihoods. Scan matching, often implemented via iterative closest point (ICP) algorithms, minimizes the distance between corresponding points in current and prior scans to yield a refined pose estimate, enabling robust alignment even in feature-sparse areas. Particle filters, particularly in Monte Carlo localization variants, handle multimodal pose distributions effectively by resampling particles according to their alignment with map features, providing a probabilistic framework that mitigates odometry drift over time. Mapping complements localization by incrementally updating the environmental representation using new observations aligned to the estimated pose. This process typically involves adding detected features—such as landmarks or occupancy grid cells—to the map or inflating existing structures based on sensor data, ensuring the map evolves without redundant storage.^[13] In feature-based mapping, sparse landmarks (e.g., corners or edges) are extracted and inserted if they meet criteria like uniqueness and visibility, forming a lightweight structure that supports efficient pose-to-observation associations. Grid-based mapping, conversely, updates probabilistic occupancy values in a voxel or 2D grid, where observations toggle cell states from unknown to occupied or free, facilitating collision avoidance but at higher memory cost.^[13] SLAM systems often employ a frontend-backend architecture to balance real-time performance and accuracy. The frontend handles immediate sensor processing for pose tracking and local map initialization, using fast but approximate methods like direct alignment or feature tracking to maintain continuity at high frame rates. In contrast, the backend performs global optimization on accumulated data, refining poses and maps through least-squares minimization to correct local inconsistencies, though it operates less frequently to avoid computational bottlenecks. This split allows the frontend to prioritize speed for navigation, while the backend ensures long-term coherence.^[14] SLAM can operate in online or offline modes, distinguished by their processing paradigms. Online SLAM processes data continuously as it arrives, enabling real-time decision-making in dynamic scenarios, but risks accumulating errors without full context.^[13] Offline SLAM, or batch processing, delays optimization until the entire dataset is collected, allowing comprehensive adjustments like full trajectory smoothing, which yields higher accuracy at the expense of immediacy—ideal for post-mission analysis.^[13] Map representations in SLAM trade off between sparsity and density to suit computational constraints and application needs. Sparse maps store only salient landmarks, reducing storage (e.g., to thousands of points) and enabling efficient probabilistic updates in methods like EKF-SLAM, but they may overlook fine details for tasks requiring full geometry.^[14] Dense maps, such as volumetric grids or surfels, capture comprehensive scene structure for applications like augmented reality, yet demand significantly more memory and processing (often scaling with volume rather than features), limiting scalability in large environments.^[14] The choice depends on factors like sensor resolution and robot dynamics, with hybrid approaches emerging to combine benefits.^[14]

Loop Closure Detection

Loop closure detection is a critical component of simultaneous localization and mapping (SLAM) systems, enabling the identification of previously visited locations to mitigate accumulated drift in the estimated trajectory. By recognizing when a robot revisits a known area, loop closure allows for the correction of odometry errors, ensuring global consistency in the map and pose estimates. This process typically involves candidate detection, verification of potential matches, and integration into the SLAM backend for error propagation. Detection methods for loop closure vary based on the sensor modalities and environmental assumptions. Appearance-based approaches, such as Bag-of-Words (BoW) models, represent scenes as histograms of visual words derived from image features like ORB descriptors, facilitating rapid similarity searches between current and past keyframes. A seminal implementation, DBoW2, employs a hierarchical inverted index built from binary features to enable efficient place recognition in large-scale visual SLAM, achieving real-time performance with high recall rates on datasets like New College. Geometric methods rely on scan matching to align current sensor data, such as LiDAR scans, with stored submaps, using techniques like branch-and-bound optimization to compute relative poses and detect overlaps in 2D environments. Probabilistic approaches, including Monte Carlo sampling via Markov Chain Monte Carlo (MCMC), model loop closure as a sampling process over possible trajectory alignments, efficiently handling uncertainty in large loops by proposing and evaluating high-likelihood configurations. Once candidates are identified, verification ensures robustness against outliers. The Random Sample Consensus (RANSAC) algorithm is widely used to estimate transformation parameters between matched frames while rejecting inconsistent correspondences, as implemented in systems like ORB-SLAM where it refines pose hypotheses from BoW matches. Following verification, corrections are applied through pose graph relaxation, where detected loops add constraint edges to the graph, and nonlinear least-squares optimization propagates adjustments globally to minimize trajectory inconsistencies. Tools like g2o facilitate this relaxation by solving sparse systems efficiently, reducing drift by orders of magnitude in loop-rich trajectories. To mitigate false positives, which can introduce erroneous constraints and degrade map quality, temporal and spatial consistency checks are employed. These validate candidates by ensuring sequential frame alignments and geometric plausibility within the current trajectory, preventing premature matches in dynamic or repetitive environments. For instance, checks against recent odometry paths discard implausible loops based on motion priors. Computational efficiency is paramount for real-time operation in large-scale settings, achieved through hierarchical indexing structures like those in DBoW2, which prune search spaces via multi-level vocabularies and enable sub-linear query times even for thousands of keyframes.

Exploration Strategies

Exploration strategies in simultaneous localization and mapping (SLAM) enable autonomous agents to plan paths that efficiently expand the map while minimizing uncertainty in pose estimation and environmental representation. These methods address the challenge of navigating unknown spaces by balancing coverage of new areas with the need to reduce estimation errors, often leveraging the current map to guide decision-making. In essence, exploration integrates path planning with SLAM's ongoing estimation process, ensuring that motion commands contribute to both discovery and accuracy.^[15] Frontier-based exploration identifies boundaries, or frontiers, between known and unknown regions in the map, typically using occupancy grid representations to detect these edges as potential targets for expansion. Pioneered in the late 1990s, this approach directs the robot to move toward the closest or most promising frontier point, promoting systematic coverage without redundant traversal of mapped areas. For instance, frontiers are computed by analyzing grid cells where occupied and free spaces meet unknown regions, allowing real-time selection of navigation goals that maximize information gain per step. This method is computationally efficient and widely adopted for its simplicity in integrating with grid-based SLAM systems.^[16] Information-theoretic approaches select exploration actions by quantifying and minimizing uncertainty through metrics like entropy or mutual information, prioritizing paths that yield the highest expected reduction in map and pose ambiguity. These strategies model the SLAM posterior as a probability distribution and evaluate candidate trajectories based on their ability to resolve ambiguities in the current estimate. A seminal formulation optimizes trajectories to maximize coverage while reducing localization error, often using approximations like expected information gain for scalability in real-time applications. Such methods are particularly effective in sparse or feature-poor environments, where passive SLAM might accumulate errors rapidly.^[17] Coverage path planning focuses on generating trajectories that systematically traverse the entire known map area, such as through lawn-mower patterns or recursive subdivision, to ensure complete sensor data collection for mapping. In SLAM contexts, this involves adapting offline coverage algorithms to online map updates, replanning paths as new regions are discovered to avoid gaps in coverage. Early integrations with SLAM demonstrated its utility for underwater and aerial mapping, where uniform sensor sweeps improve map density and accuracy. This approach excels in structured environments requiring thorough inspection, though it may require modifications to handle dynamic map growth.^[18] Uncertainty-driven exploration selects actions that directly target regions of high pose or map covariance, using SLAM's error estimates to guide the agent toward viewpoints that best constrain the state. By propagating uncertainty through motion models, these methods compute utility functions that penalize paths increasing variance while rewarding those that gather corrective observations. Foundational work in stochastic mapping highlighted how active selection of observation points can bound error growth during exploration. This is crucial for long-duration missions, where unchecked uncertainty can lead to divergence in the SLAM backend. The integration of exploration with SLAM, known as active SLAM, treats path planning as part of the estimation loop, where exploration goals influence and are influenced by the SLAM frontend and backend. In active SLAM, utility functions combine coverage, uncertainty reduction, and loop closure potential to compute optimal motions, often via receding-horizon optimization. Seminal frameworks demonstrated that actively choosing trajectories can improve efficiency compared to reactive methods, enhancing both efficiency and precision in unknown environments. This holistic approach ensures that exploration not only builds the map but also maintains its reliability throughout the process.^[19]

Advanced Algorithms

Handling Dynamic Environments

In dynamic environments, where moving objects such as pedestrians or vehicles can disrupt feature tracking and map consistency, SLAM systems require specialized adaptations to distinguish static landmarks from transient elements. Traditional SLAM methods, which assume a predominantly static world, often suffer from drift or failure when dynamic objects are incorrectly incorporated into the map. To address this, techniques focus on detecting and excluding dynamic features while preserving the reliability of the static environment representation.^[20] Dynamic object detection in SLAM typically employs optical flow to identify motion inconsistencies across frames, pixel-wise segmentation using convolutional neural networks (CNNs) like Mask R-CNN or SegNet, or velocity estimation from multi-frame depth and parallax analysis. For instance, optical flow computes pixel displacements between consecutive images to flag inconsistent movements, while CNN-based segmentation classifies regions as dynamic (e.g., people or cars) based on trained priors. Multi-frame velocity estimation further refines this by comparing predicted positions with observed ones, using epipolar geometry and RANSAC for robust outlier rejection. These methods enable real-time identification of moving elements, with segmentation achieving high precision in cluttered scenes. Sensor fusion, such as integrating RGB-D data, can aid detection by providing depth cues for parallax-based verification.^[21]^[22] Static map maintenance involves masking or filtering detected dynamic features to ensure only reliable landmarks contribute to the pose graph or filter updates. Detected dynamic regions are excluded from keypoint extraction and triangulation, preventing erroneous associations that could corrupt the map. Background inpainting reconstructs occluded static areas using historical keyframes, maintaining a consistent representation of the environment. This filtering preserves landmark stability, reducing localization drift in sequences with up to 50% dynamic content.^[21] Semantic SLAM enhances robustness by incorporating object classes into the mapping process, treating entities like cars or people as dynamic by default while building layered maps that separate static infrastructure from movable objects. Semantic labels from networks like SegNet are projected into 3D space and fused into octree structures, where voxels are updated with log-odds probabilities to filter unstable dynamic elements. This approach not only improves map utility for high-level tasks, such as navigation around known object types, but also boosts localization accuracy by rejecting semantically inconsistent features. For example, DS-SLAM integrates semantics with motion checks to create dense, class-aware maps suitable for indoor and outdoor use.^[22] Trajectory prediction for dynamic obstacles often relies on Kalman filters to forecast short-term motion, allowing SLAM systems to anticipate and avoid incorporating predicted dynamic paths into the static map. An extended or unscented Kalman filter models object states (position, velocity) based on sequential observations, excluding landmarks whose predicted trajectories deviate significantly from the robot's ego-motion. This enables proactive filtering, with Euclidean distance thresholds identifying movers for separate tracking, thereby maintaining overall SLAM consistency in crowded settings. Recent advances leverage deep learning for real-time segmentation in visual SLAM, significantly enhancing performance in urban scenes with heavy traffic or pedestrian activity. For example, integrating lightweight CNNs like YOLO or Fast-SCNN with ORB-SLAM variants has reduced average trajectory errors by 90-97% compared to baselines in dynamic urban forest environments, enabling real-time or near real-time performance. These methods combine instance segmentation with geometric verification, achieving up to 97% error reduction in highly dynamic sequences on benchmarks like TUM RGB-D.^[23]^[21]^[22]

Multi-Robot and Collaborative SLAM

Multi-robot simultaneous localization and mapping (SLAM) extends single-robot techniques to coordinate multiple agents in building a shared environmental map while estimating their poses, addressing challenges like limited individual coverage and communication constraints.^[24] In multi-robot systems, agents collaboratively localize and map by exchanging data, enabling faster exploration and robustness in large or complex environments.^[25] Centralized architectures involve a base station that collects sensor data and submaps from all robots to perform global pose-graph optimization, ensuring high accuracy and consistency for small teams of 5-10 agents.^[25] For instance, systems like CERBERUS use centralized fusion with tools such as GTSAM for joint optimization, achieving localization errors below 1% in underground settings.^[25] In contrast, decentralized architectures allow each robot to maintain and optimize its local map independently, incorporating inter-robot loop closures via peer-to-peer data sharing to reduce central bottlenecks.^[25] This approach, as in CSIRO's Wildcat system, yields errors around 22 cm in urban trials but risks misalignment without frequent closures.^[25] Distributed variants further minimize communication by enabling local optimizations with partial exchanges, though they remain less mature for real-time applications.^[25] Map merging is essential for integrating local submaps into a global representation, typically involving relative pose estimation between robots and alignment via shared landmarks or geometric transforms.^[26] Probability-based methods estimate poses using probabilistic models when robots rendezvous, fusing maps by maximizing overlap consistency.^[26] Optimization techniques, such as those employing genetic algorithms, align maps by minimizing transformation errors between overlapping regions identified through features like SIFT or line segments.^[27]^[26] Widely adopted approaches include Hough transform for non-iterative alignment of linear features and mean-shift clustering for merging laser scan line segments, ensuring efficient data association without exhaustive computation.^[28]^[29] Communication protocols facilitate submap sharing while managing bandwidth limitations in resource-constrained settings.^[24] Gossip-based protocols enable decentralized peer-to-peer exchanges of keyframes or features, as in CoSLAM and DDF-SAM, where robots selectively broadcast summaries to propagate updates efficiently across the network.^[24] Tree-based protocols, used in hierarchical systems like CCM-SLAM, organize data flow through a structured topology, funneling submaps to fusion nodes to minimize redundant transmissions.^[24] These methods handle bandwidth by prioritizing processed data over raw sensors, with gossip improving global map quality by up to 50% in stable conditions.^[24] Scalability in multi-robot SLAM varies by architecture, with centralized fusion suiting small teams of 2-4 robots by aggregating all data at a single node for precise coordination, though it demands high bandwidth.^[24] For larger swarms, hierarchical approaches distribute computation across layers, combining local optimizations with selective fusions as in Kimera-Multi and C2TAM, enabling adaptability in expansive environments.^[24] Recent developments include consensus algorithms for robust multi-robot loop closure, such as the Incremental Manifold Edge-based Separable ADMM (iMESA), which uses biased priors in local factor graphs and dual gradient ascent to enforce shared variable consistency via sparse communications. This enables real-time state estimation in teams operating over extended periods, with applications in search-and-rescue scenarios like disaster zones where communication is intermittent. Systems like LAMP 2.0 leverage such consensus for high-precision localization in underground rescue operations, supporting heterogeneous robot teams.^[30]

Biological and Bio-Inspired Approaches

Biological and bio-inspired approaches to simultaneous localization and mapping (SLAM) draw from neural mechanisms observed in animal navigation, aiming to replicate efficient, low-resource spatial cognition in robotic systems. These methods emphasize heuristic strategies and neural models that enable robust mapping and localization in uncertain environments, often prioritizing energy efficiency and adaptability over exhaustive computation. Seminal work in this area includes the RatSLAM system, which models the rodent hippocampus to perform topological mapping using place cells for recognizing locations and grid cells for estimating spatial layout.^[31] Place cells fire in response to specific positions, forming a cognitive map that supports loop closure by matching current sensory input to stored experiences, while grid cells provide a periodic representation of space to correct odometry drift over large areas.^[32] This bio-mimetic framework has demonstrated persistent navigation in complex, large-scale environments, such as suburban areas, using only visual input from a single camera.^[33] Insect navigation strategies offer additional inspiration, particularly through path integration and visual odometry derived from optic flow. Ants, for instance, maintain a home vector by integrating self-motion cues—such as stride counts and body orientation—alongside optic flow from environmental texture to estimate distance and direction during foraging excursions.^[34] This mechanism allows homing without continuous landmarks, addressing SLAM challenges in feature-sparse terrains. Robotic implementations, like the AntBot platform, replicate these processes using minimalist sensors: a low-resolution optic flow detector measures ground motion, while a celestial compass tracks skylight polarization for directional stability.^[35] In outdoor trials over distances up to 20 meters, AntBot achieved sub-meter accuracy (mean homing error of 0.67% of trajectory length) in pose estimation, highlighting the efficacy of insect-inspired path integration for lightweight SLAM in GPS-denied settings.^[35] Bio-inspired algorithms extend these principles through neuromorphic computing, which emulates spiking neural networks to process asynchronous events, mimicking retinal ganglion cells for efficient sensing. Event-based SLAM leverages dynamic vision sensors that output pixel-level changes only upon brightness shifts, drastically reducing data volume and power draw compared to frame-based cameras.^[36] Neuromorphic place cells, implemented on spiking hardware, enable real-time topological mapping in cluttered spaces by encoding spatial novelty via burst activity, with reported energy savings of up to 90% in edge devices.^[37] These systems excel in high-speed or low-light scenarios, where traditional methods falter due to motion blur or high latency.^[36] Hybrid models integrate SLAM with reinforcement learning paradigms that mimic animal foraging, using active inference to optimize exploration and map refinement. In this approach, agents treat mapping as a partially observable Markov decision process, where policies are learned to minimize free energy—balancing uncertainty reduction with goal-directed actions akin to resource-seeking behaviors in mammals.^[38] For example, generalized SLAM (G-SLAM) frameworks employ hierarchical Bayesian inference to fuse sensory data and predict trajectories, enabling adaptive loop closure during simulated foraging tasks.^[38] Such hybrids have shown improved sample efficiency in dynamic environments, converging on accurate maps with 20-30% fewer iterations than baseline probabilistic SLAM.^[38] Despite these advances, bio-inspired neural models face scalability limitations relative to traditional methods. Hippocampal-inspired systems like RatSLAM struggle with exponential growth in pose cell representations for expansive or highly detailed maps, leading to increased computational overhead beyond 1 km² areas.^[39] Neuromorphic approaches, while power-efficient, often exhibit drift accumulation in long-term navigation due to sparse event data in static scenes, requiring hybrid fusion with classical filters for robustness.^[39] Overall, these models prioritize biological plausibility and efficiency in constrained settings but lag in precision and real-time performance for large-scale, metric-accurate SLAM compared to optimization-based techniques.^[40]

Specialized SLAM Variants

Visual and Visual-Inertial SLAM

Visual simultaneous localization and mapping (SLAM) leverages camera sensors to estimate the trajectory of a moving agent and construct a map of the environment, primarily through image-based processing. In monocular setups, a single camera provides relative pose estimates but suffers from scale ambiguity, as the absolute distance to features cannot be determined without additional constraints. This ambiguity is resolved in stereo configurations by exploiting the known baseline between two cameras, which provides depth information via triangulation. Alternatively, in monocular systems, scale can be recovered through structure-from-motion techniques that accumulate observations over multiple views or by fusing with inertial measurements from an IMU.^[41] Feature-based visual SLAM methods extract and track discrete keypoints from images, representing the scene as a sparse set of 3D landmarks. A seminal example is ORB-SLAM, which operates in three parallel threads: tracking for real-time pose estimation by matching ORB features between consecutive frames, local mapping to insert keyframes and perform local bundle adjustment for consistency, and loop closing to detect revisits and apply global optimization via pose graph techniques. This pipeline enables robust performance in real-time on monocular, stereo, and RGB-D cameras, achieving high accuracy in diverse environments.^[4] In contrast, direct methods minimize photometric errors directly on pixel intensities, avoiding explicit feature extraction for denser reconstructions. Direct Sparse Odometry (DSO) exemplifies this approach, jointly optimizing camera poses and sparse inverse depth maps by minimizing the reprojection of photometric residuals across a window of keyframes, while adaptively selecting active points to balance computation and accuracy. DSO produces semi-dense maps and demonstrates superior performance in texture-rich scenes compared to feature-based alternatives. Visual-inertial SLAM integrates IMU data with visual observations to enhance robustness, particularly in low-texture or fast-motion scenarios. Tightly-coupled fusion optimizes camera poses, velocities, and IMU biases in a single nonlinear least-squares problem, where IMU measurements are preintegrated between keyframes to form efficient relative motion constraints on the manifold of the special Euclidean group SE(3). Preintegration computes position, velocity, and rotation deltas while accounting for bias corrections via first-order approximations, reducing computational overhead in optimization. Systems like VINS-Mono implement this by initializing the IMU-camera extrinsics and biases, then performing marginalization on old keyframes to maintain a fixed-size sliding window for real-time operation.^[42]^[43] Recent advances in visual and visual-inertial SLAM incorporate deep learning for semantic understanding, using neural networks to extract robust features and segment dynamic objects. Semantic visual SLAM employs deep features from models like SuperPoint for invariant keypoint detection, improving tracking in challenging conditions, while integrating object-level semantics via segmentation networks to filter out moving elements and enhance loop closure in dynamic scenes.^[44]

LiDAR and Radar SLAM

LiDAR-based SLAM leverages laser range finders to generate dense point clouds, enabling precise geometric mapping and localization in structured environments such as urban areas and indoor spaces. These systems process sequential scans to estimate robot poses and build 3D representations, often outperforming visual methods in low-texture or varying lighting conditions due to direct distance measurements.^[45] A core component of LiDAR SLAM is scan registration, where the Iterative Closest Point (ICP) algorithm aligns consecutive point clouds by iteratively minimizing the distance between corresponding points. Introduced as a general method for 3D shape registration, ICP operates in a point-to-point or point-to-plane mode, converging to a local minimum of the mean-square error metric, and has been widely adopted in LiDAR applications for its computational efficiency in rigid transformations.^[46] For probabilistic matching, the Normal Distributions Transform (NDT) models point clouds as piecewise Gaussian distributions, dividing space into voxels and estimating parameters like mean and covariance for each, which allows robust alignment even with noise or partial overlaps in 2D and 3D settings.^[47] Extended to 3D, NDT facilitates scan-to-map matching by scoring transformations based on the negative log-likelihood, reducing sensitivity to outliers compared to ICP.^[48] In autonomous vehicles, multi-layer fusion integrates 3D LiDAR with Inertial Measurement Unit (IMU) data to enhance odometry robustness, particularly for high-speed motion where LiDAR scans suffer from distortion. Algorithms like LeGO-LOAM employ a two-pass approach: a perception module segments point clouds into features (edges and planes) while using IMU to undistort scans, followed by an optimization module that fuses these with IMU preintegration for pose estimation, achieving real-time performance with reduced drift in variable terrains.^[49] This tightly coupled fusion propagates IMU's high-frequency attitude estimates to correct LiDAR's sparse temporal sampling, improving accuracy in dynamic scenarios like off-road driving.^[49] Radar SLAM utilizes millimeter-wave sensors for all-weather operation, providing sparse detections influenced by beam patterns that determine angular resolution and range ambiguity. These patterns, typically conical with widths of 10-30 degrees, result in clustered point clouds with fewer than 100 points per scan, necessitating specialized handling to mitigate ambiguities from multipath reflections. Doppler-enabled velocity estimation exploits frequency shifts in radar returns to directly measure radial ego-motion, aiding odometry initialization and reducing reliance on geometric features alone.^[50] Frameworks like Doppler-SLAM integrate this with inertial data, filtering dynamic clutter via velocity thresholds and aligning scans using intensity or range-bearing models, achieving sub-meter accuracy in adverse visibility.^[50] Loop closure in point cloud-based SLAM detects revisits to correct accumulated drift, often employing global descriptors for efficient retrieval. Scan Context represents scans as 2D histograms of elevation-distance profiles, invariant to rotations and scalable for large vocabularies, enabling rapid matching via coarse-to-fine search with recall rates exceeding 90% in urban datasets.^[51] This descriptor captures vertical structures like buildings, outperforming bag-of-words methods in viewpoint changes and integrating into pose-graph optimization for global consistency.^[51] Post-2023 advancements in LiDAR-vision hybrids fuse dense geometric point clouds with semantic visual cues to mitigate long-term drift, enhancing robustness in feature-poor scenes. Systems like GSFusion employ 3D Gaussian splatting for joint optimization, where LiDAR provides scale-accurate geometry and vision adds loop closure semantics through surfel-based bundle adjustment.^[52] These approaches leverage LiDAR's precision for initialization while using visual semantics to resolve ambiguities, as demonstrated in urban driving benchmarks.

Acoustic and Audiovisual SLAM

Acoustic SLAM leverages sonar sensors, such as multibeam echosounders and side-scan sonars, to enable simultaneous localization and mapping in underwater environments where optical and GPS signals are unavailable. These systems measure time-of-flight distances to construct acoustic range profiles or images, facilitating vehicle pose estimation and environmental mapping through scan matching techniques like iterative closest point (ICP) variants. Seminal work by Ribas et al. demonstrated early feasibility of acoustic SLAM in structured underwater settings using forward-looking sonar for feature-based mapping. More recent advancements, such as the semi-direct sonar SLAM method, adapt visual SLAM paradigms to acoustic data by minimizing photometric errors on sonar images initialized via ICP, achieving robust performance in real-time AUV operations. Audiovisual SLAM integrates acoustic data from microphone arrays with visual inputs from cameras to create enriched 3D maps that include both geometric structures and audio source positions, particularly useful in low-visibility or reverberant scenarios. Microphone arrays localize sound sources via direction-of-arrival (DOA) estimation using methods like relative transfer functions and Gaussian mixture models, which are then fused with camera-derived point clouds to track dynamic elements such as speakers. For instance, vision-audio fusion SLAM projects audio DOA onto RGB-D images to detect and exclude moving sound-emitting obstacles, enhancing map consistency in cluttered indoor or underwater settings. This approach supports speaker tracking by associating audio cues with visual detections, enabling persistent 3D audio maps for applications like human-robot interaction. Key challenges in acoustic and audiovisual SLAM include multipath propagation in acoustics, which causes signal reverberations and false echoes, and the inherently low angular resolution of sonar compared to high-fidelity visuals, leading to sparse and noisy maps. In underwater contexts, acoustic distortion from water currents and temperature gradients further degrades bearing accuracy, while audiovisual fusion must handle asynchronous sensor data and varying lighting conditions. These issues are mitigated through probabilistic models that account for multipath via direct-path filtering, though they increase computational demands for real-time processing. Algorithms for acoustic-visual fusion often employ factor graphs to jointly optimize vehicle trajectories, landmarks, and sensor poses by incorporating factors for sonar ranges, visual features, and audio DOAs. In speaker tracking scenarios, audiovisual systems use particle filters or graph-based optimization to maintain multi-hypothesis tracks, fusing microphone array outputs with camera detections for robust localization. For example, pose-graph SLAM frameworks integrate acoustic and optical data in a maximum a posteriori estimation, leveraging complementary strengths to reduce drift in AUV navigation. Niche applications include autonomous underwater vehicle (AUV) navigation for bathymetric surveying and harbor inspection, where acoustic SLAM provides drift-corrected trajectories over extended missions. Recent advances in opti-acoustic semantic SLAM, such as the 2024 method for unknown objects in underwater environments, enable mapping without prior labeling using factor graphs for fusion.^[53]

Implementation Frameworks

Filter-Based Methods

Filter-based methods in simultaneous localization and mapping (SLAM) employ probabilistic recursive estimation techniques to maintain an estimate of the robot's pose and the map in real time, leveraging Bayesian filtering to handle uncertainty from noisy sensor measurements and motion models. These approaches process data sequentially, updating the posterior distribution over the joint state of pose and map at each timestep, which enables efficient computation suitable for online operation. The core idea is to represent the state estimate via a Gaussian or particle-based approximation, propagating it through prediction and correction steps derived from motion and observation models.^[54] The extended Kalman filter (EKF) SLAM represents a foundational filter-based approach, where the state vector is augmented to include both the robot's pose \mathbf{x}_r and the map features \mathbf{m}, forming \mathbf{x} = [\mathbf{x}_r^T, \mathbf{m}^T]^T. During the prediction step, the state mean and covariance are propagated using the nonlinear motion model, approximated linearly via the Jacobian \mathbf{F} of the motion function with respect to the pose; this yields the predicted state \hat{\mathbf{x}}_{k|k-1} = f(\hat{\mathbf{x}}_{k-1|k-1}, \mathbf{u}_k) and covariance \mathbf{P}_{k|k-1} = \mathbf{F} \mathbf{P}_{k-1|k-1} \mathbf{F}^T + \mathbf{Q}, where \mathbf{u}_k is the control input and \mathbf{Q} is process noise covariance. The update step incorporates observations \mathbf{z}_k from landmarks, using the observation Jacobian \mathbf{H} to compute the innovation covariance \mathbf{S} = \mathbf{H} \mathbf{P}_{k|k-1} \mathbf{H}^T + \mathbf{R}, Kalman gain \mathbf{K} = \mathbf{P}_{k|k-1} \mathbf{H}^T \mathbf{S}^{-1}, and corrected state \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K} (\mathbf{z}_k - h(\hat{\mathbf{x}}_{k|k-1})), with \mathbf{R} as measurement noise covariance and h the observation model. This formulation allows incremental map building but grows quadratically in complexity with the number of landmarks due to the covariance matrix size.^[54] To address the linearization inaccuracies of the EKF in highly nonlinear settings, the unscented Kalman filter (UKF) SLAM uses sigma-point sampling to propagate the mean and covariance through the true nonlinear functions without explicit Jacobians. In UKF, a set of deterministically chosen sigma points—typically $2n+1 for an n-dimensional state—are sampled from the current Gaussian approximation, transformed via the motion and observation models, and then used to compute weighted statistics for the predicted and updated distributions; this captures higher-order moments more accurately than Taylor-series linearization. For instance, the sigma points \mathcal{X}_i are generated as \mathcal{X}_0 = \hat{\mathbf{x}} and \mathcal{X}_i = \hat{\mathbf{x}} + (\sqrt{(n+\lambda) \mathbf{P}})_i for i=1,\dots,n, with \lambda a scaling parameter, enabling robust handling of non-Gaussian effects in pose estimation and landmark updates. UKF-SLAM maintains similar computational scaling to EKF but improves consistency in challenging environments like those with wide-angle sensors.^[55] Particle filter-based methods, exemplified by FastSLAM, extend recursive estimation to non-parametric representations by Rao-Blackwellization, factorizing the posterior p(\mathbf{x}_{1:t}, \mathbf{m} | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) into pose trajectory samples and conditional map estimates. In FastSLAM, a set of M particles represents the pose history \{\mathbf{x}_{1:t}^{(m)}\}_{m=1}^M, each augmented with an EKF maintaining the map \mathbf{m}^{(m)} | \mathbf{x}_{1:t}^{(m)}; motion updates sample new poses using a proposal distribution (often motion model perturbed by odometry), while observations update individual EKFs and resample particles based on likelihoods to avoid degeneracy. This factorization scales linearly with map size per particle, making it suitable for large environments, and the particle cloud approximates multimodal posteriors effectively.^[56] Despite their efficiencies, filter-based methods face key limitations: EKF-SLAM suffers from linearization errors that accumulate, leading to inconsistent estimates where the covariance underestimates true uncertainty, particularly in loops or with sparse observations. Similarly, particle filters like FastSLAM are prone to particle depletion, where resampling concentrates weights on few particles, reducing diversity and causing premature convergence to suboptimal modes, especially under significant uncertainty. To mitigate these, hybrid variants employ EKF or UKF for small-scale, real-time operation in local maps, transitioning to batch optimization methods for global consistency as the map grows, such as by extracting submaps and refining them offline. These hybrids balance the sequential speed of filters with the accuracy of global adjustments, improving scalability in practical deployments.^[57]^[58]^[59]

Optimization-Based Methods

Optimization-based methods in simultaneous localization and mapping (SLAM) formulate the problem as a nonlinear least-squares optimization over a graph structure, enabling global consistency by minimizing errors across all measurements and constraints. These approaches represent robot poses and landmarks as nodes in a pose graph, with edges encoding relative pose constraints derived from odometry, observations, or loop closures, contrasting with filter-based methods that provide local estimates for real-time operation. The optimization seeks to find the configuration that best explains the data, typically solved iteratively using sparse techniques to handle large-scale problems efficiently.^[60] GraphSLAM, a foundational technique, models the SLAM problem as a pose graph where nodes correspond to robot poses and edges to constraints, such as relative transformations from sensor measurements. The objective is to minimize the sum of squared residuals weighted by their covariances, formulated as:

\arg\min_{\mathbf{x}} \sum_i \| \mathbf{e}_i(\mathbf{x}) \|^2_{\Sigma_i}

where \mathbf{x} is the vector of pose variables, \mathbf{e}_i are the error terms for each constraint, and \Sigma_i are the covariance matrices capturing measurement uncertainties. This least-squares formulation allows for batch or incremental solving, improving trajectory accuracy in environments with accumulated errors from dead reckoning. Early implementations demonstrated reduced drift compared to extended Kalman filters, with pose graph optimization achieving sub-meter accuracy in large indoor datasets.^[60]^[61] Bundle adjustment extends this framework specifically for visual SLAM by jointly optimizing camera poses and 3D landmarks to minimize reprojection errors of observed image features. In this process, landmark positions and pose estimates are refined together, incorporating geometric constraints from multiple views to reconstruct sparse maps. This method is central to systems like ORB-SLAM, where it corrects for both pose and structure inconsistencies, yielding precise 3D maps with errors below 1% of the environment scale in benchmark sequences. Unlike pose-graph-only optimization, bundle adjustment explicitly handles landmark covariances, enhancing robustness in feature-rich scenes.^[62] Sparse solvers are essential for scalability in these optimizations, with the Levenberg-Marquardt algorithm providing a robust iterative method that blends gradient descent and Gauss-Newton steps to navigate nonlinearities. Libraries like g2o implement this for graph-based problems, supporting incremental solving through sparse Cholesky factorization and variable reordering to exploit graph sparsity, reducing computation from O(n^3) to near-linear in practice for pose graphs with thousands of nodes. g2o has been widely adopted in visual and LiDAR SLAM, enabling real-time optimization on consumer hardware with convergence in under 100 ms per iteration for medium-scale maps.^[63] Optimization-based SLAM distinguishes between full smoothing, which re-optimizes the entire history for global consistency, and incremental approaches that update only affected variables for online efficiency. The iSAM algorithm exemplifies incremental smoothing by maintaining a square-root information matrix and using Bayes tree factorization to perform targeted relinearization and elimination upon new measurements or loop closures, achieving up to 100-fold speedups over full batch methods while preserving near-optimal estimates. This enables continuous mapping in dynamic robotics applications, with demonstrated trajectory errors under 0.5 meters in urban driving scenarios spanning kilometers.^[64] Recent enhancements incorporate semantic constraints into pose graphs to handle dynamic scenes, where object detections provide additional edges excluding moving elements from optimization, improving robustness in cluttered environments with up to 30% fewer false positives in feature tracking. Post-2023 developments have focused on scalability, such as task-aware dense SLAM using complex-step finite differences for differentiable optimization, allowing gradient-based refinement of large voxel maps in real-time at 30 Hz on GPUs, and global optimization frameworks that ensure consistent 3D reconstructions across wide baselines with reduced drift in handheld AR systems. These advances support deployment in multi-robot and long-term mapping tasks, scaling to millions of nodes without proportional compute increases.^[65]^[66]

Learning-Based Methods

Learning-based methods in simultaneous localization and mapping (SLAM) integrate artificial intelligence and deep learning techniques to enhance perception, adaptation, and robustness, particularly in complex or data-scarce environments. These approaches leverage neural networks to learn representations directly from raw sensor data, moving beyond traditional handcrafted features and geometric models. By training on large datasets, they enable SLAM systems to handle variations in lighting, occlusions, and dynamics more effectively, often achieving superior performance in real-world scenarios where classical methods falter.^[67] A key advancement is deep feature extraction using convolutional neural networks (CNNs), which replace manually designed descriptors with learned ones for improved invariance and matching accuracy in visual SLAM. For instance, SuperPoint employs a self-supervised CNN to simultaneously detect interest points and compute dense descriptors from images, outperforming traditional methods like SIFT in repeatability and matching on challenging datasets such as KITTI. This integration allows SLAM pipelines to extract more robust features for pose estimation and mapping, reducing drift in long-term trajectories. Similarly, end-to-end learning frameworks directly predict camera poses from sequential raw images without intermediate feature steps. DeepVO, a recurrent CNN-based model, estimates monocular visual odometry by processing image stacks through convolutional and LSTM layers, demonstrating lower absolute trajectory errors compared to geometric VO on the KITTI odometry benchmark.^[68]^[69] Semantic and hybrid SLAM further incorporates learning for higher-level understanding, such as using deep reinforcement learning (DRL) to optimize exploration paths in unknown environments and unsupervised learning for loop closure detection. DRL agents, trained via trial-and-error interactions, select viewpoints that maximize information gain for mapping, as surveyed in applications where they improve coverage by adapting to environmental uncertainties. Unsupervised deep networks, like those based on autoencoders, detect loop closures by learning compact image representations for efficient retrieval, enabling correction of accumulated errors in large-scale SLAM without labeled data. Recent developments from 2023 to 2025 have advanced dense mapping with neural radiance fields (NeRF), which implicitly represent scenes for photorealistic reconstruction and robust tracking in dynamic settings; for example, NeRF-SLAM variants achieve up to 25% lower absolute trajectory error in dynamic sequences by filtering outliers through radiance priors. Diffusion models have also emerged for uncertainty estimation, generating probabilistic pose distributions to quantify mapping confidence and enhance decision-making in ambiguous scenarios. These innovations collectively boost robustness in dynamic environments by 25-40% in trajectory accuracy metrics on benchmarks like TUM RGB-D.^[70]^[71]^[72]^[73]^[74] Despite these gains, learning-based SLAM faces challenges in generalization across unseen environments and high computational overhead. Models trained on specific datasets often underperform in novel conditions due to domain shifts, requiring techniques like domain adaptation or meta-learning to broaden applicability. Additionally, the inference demands of deep networks, especially for real-time NeRF rendering, can exceed resource limits on embedded robotics hardware, prompting ongoing research into efficient architectures like lightweight CNNs and quantized models.^[75]^[76]

Historical Development

Early Foundations

The foundations of simultaneous localization and mapping (SLAM) emerged in the 1980s through pioneering work on probabilistic representations of spatial uncertainty in robotics. Researchers at SRI International, including Randall Smith, Matthew Self, and Peter Cheeseman, introduced the concept of the stochastic map, a framework for estimating uncertain spatial relationships between landmarks using Bayesian inference to propagate uncertainties in feature positions and robot poses. This approach laid the groundwork for handling the dual challenges of localization and mapping by modeling the environment as a network of probabilistic relations rather than deterministic coordinates. Concurrently, Hugh Durrant-Whyte developed consistent estimation techniques for integrating noisy sensor data into spatial models, emphasizing the importance of maintaining correlations in multi-sensor fusion for accurate map building in mobile systems. These efforts, often applied in early cartographic and navigation prototypes, established SLAM as a probabilistic problem solvable through statistical methods. The term "SLAM" was formally coined in a 1995 survey paper by Durrant-Whyte and colleagues.^[1] In the 1990s, the integration of extended Kalman filters (EKF) marked a significant advancement, enabling real-time implementations on mobile robots. Larry Matthies at NASA's Jet Propulsion Laboratory advanced stereo vision techniques for autonomous navigation, demonstrating how stereo disparity maps could provide dense 3D environmental models to support localization and obstacle avoidance in unstructured terrains, as seen in planetary rover prototypes. Early EKF applications, such as those by Michel Moutarlier and Raja Chatila in 1989, incorporated evidence-based updates for feature-based mapping, while John Leonard and Hugh Durrant-Whyte's 1991 work on directed sonar sensing formalized EKF for underwater and indoor robot navigation, treating maps as augmented state vectors to recursively estimate poses and landmarks. These developments shifted SLAM from offline estimation to online processing, though limited by the computational demands of maintaining full covariance matrices. A pivotal formalization occurred in 2004 with Sebastian Thrun and colleagues' introduction of sparse extended information filters (SEIF) for EKF-SLAM, which addressed the neglect of inter-landmark correlations in traditional formulations by exploiting the sparse structure of information matrices to achieve scalability.^[77] This method proved that SLAM solutions converge to consistent estimates in the limit of infinite data, mitigating error accumulation from overlooked dependencies and enabling larger-scale maps without prohibitive memory use. Initial challenges persisted, particularly computational limits on 1990s hardware, which restricted implementations to sparse feature sets—typically tens of landmarks—and sequential processing to avoid inverting high-dimensional covariances. The seminal textbook Probabilistic Robotics by Sebastian Thrun, Wolfram Burgard, and Dieter Fox, published in 2005, synthesized these foundations into a comprehensive framework, detailing EKF-based SLAM algorithms and their probabilistic underpinnings as essential tools for robotic perception.

Modern Milestones

The 2010s marked a pivotal era for SLAM, with the emergence of real-time, open-source systems that enhanced scalability and accessibility. ORB-SLAM, introduced in 2015, represented a breakthrough in feature-based monocular visual SLAM, enabling robust tracking and mapping in diverse environments through oriented FAST and rotated BRIEF (ORB) features, loop closure, and bundle adjustment, achieving sub-centimeter accuracy in real-time applications.^[4] Similarly, Google's Cartographer, released in 2016, advanced LiDAR-based SLAM with a graph-optimized approach incorporating real-time loop closure via branch-and-bound scan matching, facilitating high-precision 2D and 3D mapping for indoor and outdoor robotics.^[78] The rise of multi-sensor fusion further propelled SLAM's robustness in the late 2010s. VINS-Mono, developed in 2017, integrated monocular visual and inertial measurements in a tightly coupled optimization framework, delivering drift-free pose estimation and scale recovery suitable for aerial and ground robots, with demonstrated accuracy improvements over visual-only methods in challenging motion scenarios.^[43] Entering the 2020s, deep learning integrations transformed SLAM by improving feature extraction and generalization. DROID-SLAM, proposed in 2021, leveraged recurrent neural networks for dense monocular, stereo, and RGB-D mapping, outperforming classical methods on benchmarks like TUM RGB-D by achieving lower absolute trajectory error through learned optical flow and depth prediction.^[79] Concurrently, frameworks like OpenVSLAM, introduced in 2019 and extended into the 2020s, supported semantic SLAM enhancements, incorporating object-level understanding for more interpretable maps in dynamic scenes.^[80] From 2023 to 2025, SLAM evolved toward hybrid paradigms and practical deployment. LiDAR-vision fusion emerged as a de facto standard for robust perception in varied conditions, with methods combining geometric constraints from LiDAR point clouds and semantic cues from cameras to achieve centimeter-level localization in adverse weather, as evidenced in comprehensive reviews of over 50 systems.^[81] NeRF-SLAM hybrids integrated neural radiance fields for dense, implicit scene representations, enabling photorealistic mapping and relocalization with reduced computational overhead compared to traditional voxel grids, as surveyed in implicit SLAM advancements.^[82] Commercialization accelerated in autonomous vehicles, where SLAM underpins sensor fusion stacks in production systems from companies like Waymo and Cruise.^[83] Open-source ecosystems, particularly ROS packages, significantly boosted SLAM adoption by providing modular integrations like cartographer_ros and orb_slam3_ros, enabling rapid prototyping and community-driven improvements that lowered barriers for industrial and academic use, with over 500 million package downloads by 2020.^[84]

Applications

In Robotics and Autonomous Systems

Simultaneous localization and mapping (SLAM) plays a pivotal role in enabling robotics and autonomous systems to navigate unknown environments by simultaneously estimating their pose and constructing environmental maps in real time. In mobile robotics, SLAM is essential for tasks requiring precise localization and path planning without reliance on external infrastructure like GPS. For instance, autonomous mobile robots (AMRs) in warehouse settings commonly employ 2D LiDAR-based SLAM to generate occupancy grid maps, facilitating efficient navigation around dynamic obstacles such as moving pallets or workers. This approach has been demonstrated to achieve localization errors below 5 cm in industrial environments, supporting applications in logistics automation by companies like Amazon Robotics. In autonomous vehicles, SLAM extends to three-dimensional mapping using sensor fusion techniques that integrate LiDAR and camera data to create high-definition (HD) maps for urban driving scenarios. These systems process point clouds and visual features to handle challenges like occlusions from traffic and varying lighting, enabling safe trajectory planning and decision-making. A notable example is the use of LiDAR-camera SLAM in self-driving cars, where fusion algorithms reduce pose estimation drift to under 10 cm over kilometer-scale trajectories, as validated in datasets like KITTI. Such capabilities are critical for level 4 and 5 autonomy, with deployments in vehicles from manufacturers like Waymo. For unmanned aerial vehicles (UAVs) and drones, visual-inertial SLAM (VI-SLAM) is widely adopted to enable flight in GPS-denied environments, such as indoors or urban canyons, by combining camera imagery with inertial measurement unit (IMU) data for robust state estimation and obstacle avoidance. This method supports real-time 3D reconstruction, allowing drones to maintain trajectories with position accuracies of 1-2% of the flight distance, as shown in experiments with systems like ORB-SLAM3 integrated with IMUs. Applications include search-and-rescue operations and aerial surveying, where VI-SLAM enables collision-free navigation in cluttered spaces. Underwater autonomous underwater vehicles (AUVs) leverage acoustic SLAM to map ocean floors and structures in environments where optical sensors fail due to low visibility and light attenuation. By using sonar arrays for range and bearing measurements, these systems construct bathymetric maps while localizing the vehicle, achieving mapping resolutions on the order of meters in deep-sea surveys. For example, acoustic SLAM has been applied in AUVs for pipeline inspection, with demonstrated error reductions of up to 30% compared to dead-reckoning alone. As of 2025, emerging trends in swarm robotics highlight collaborative SLAM, where multiple robots share map data to accelerate large-scale mapping in expansive areas like disaster zones or agricultural fields. This multi-robot approach builds on centralized fusion techniques to distribute computational load and enhance robustness against individual sensor failures, with prototypes showing improved coverage in simulations and field tests.

In Augmented and Virtual Reality

Simultaneous localization and mapping (SLAM) plays a pivotal role in augmented reality (AR) and virtual reality (VR) by enabling devices to track user movements and map environments in real time, thus overlaying virtual elements onto the physical world with high fidelity. In AR applications, SLAM facilitates stable anchoring of digital content, while in VR, it supports seamless locomotion without external sensors. This integration enhances immersion by allowing persistent virtual interactions that adapt to the user's surroundings. In AR tracking, monocular visual SLAM is employed in devices like the Microsoft HoloLens to enable precise anchor placement for holograms. The HoloLens leverages its front-facing camera and internal sensors to perform visual SLAM, creating a spatial mapping mesh that intersects with the user's gaze for anchor creation. These anchors maintain coordinate systems over time, ensuring holograms remain fixed relative to the real world even as the user moves, which is essential for applications like remote collaboration or architectural visualization. For VR locomotion, inside-out tracking combines SLAM with inertial measurement units (IMUs) and cameras to achieve room-scale mapping without base stations. In systems like the Oculus Quest (now Meta Quest), SLAM processes video feeds from headset-mounted cameras to detect environmental features and estimate six-degree-of-freedom pose, fused with IMU data for low-drift tracking during natural movements. This approach supports expansive play areas, such as in room-scale VR games, by continuously updating a lightweight map of the user's space to prevent collisions and enable intuitive navigation. Handheld AR on smartphones utilizes SLAM for persistent overlays, as seen in Pokémon GO's evolutions and playground features. Niantic's Visual Positioning System (VPS), built on visual SLAM principles, processes camera frames against pre-built maps to anchor Pokémon with centimeter-level accuracy at real-world locations like PokéStops. This allows shared, session-persistent AR experiences where virtual elements remain stable across devices and visits, enhancing social gameplay without relying solely on GPS. AR SLAM faces significant challenges, including stringent low-latency requirements and the need for lighting invariance to maintain tracking reliability. Low-latency demands arise from IMU limitations and real-time processing needs, where delays can cause motion sickness or desynchronization in dynamic scenes, as observed in indoor AR experiments spanning over 100 hours. Lighting variations further complicate feature detection, leading to tracking failures in low-light or changing conditions, necessitating robust fusion with alternative sensors like ultra-wideband for consistency. Recent advancements from 2024 to 2025 have introduced semantic AR SLAM enhanced by deep learning for richer object interactions. Semantic visual SLAM integrates convolutional neural networks and transformers for object detection and segmentation, enabling context-aware mapping where virtual elements interact meaningfully with detected real-world objects, such as placing holograms on specific furniture. These methods, like those using 3D Gaussian splatting and foundation models (e.g., Segment Anything Model), improve dynamic scene handling and open-vocabulary understanding, expanding to mixed reality applications.

References

[1]
[PDF] Simultaneous Localisation and Mapping (SLAM) - People @EECS
Part I (this paper) begins by providing a brief history of early developments in SLAM. Section III introduces the structure the SLAM problem in now standard ...
[2]
None
Summary of each segment:
[3]
[PDF] PROBABILISTIC ROBOTICS
... Bayes Filters. 23. 2.4.1 The Bayes Filter Algorithm. 23. 2.4.2 Example. 24. 2.4.3 Mathematical Derivation of the Bayes Filter. 28. 2.4.4 The Markov Assumption.Missing: seminal | Show results with:seminal
[4]
ORB-SLAM: a Versatile and Accurate Monocular SLAM System - arXiv
Feb 3, 2015 · Cite as: arXiv:1502.00956 [cs.RO] ; (or arXiv:1502.00956v2 [cs.RO] for this version) ; https://doi.org/10.48550/arXiv.1502.00956. Focus to learn ...
[5]
A Review of Simultaneous Localization and Mapping Algorithms ...
Simultaneous localization and mapping (SLAM) is one of the key technologies for mobile robots to achieve autonomous driving, and the lidar SLAM algorithm is ...Missing: credible | Show results with:credible
[6]
https://ieeexplore.ieee.org/document/9767130
[7]
Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to ...
In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between a camera and an inertial measurement ...Missing: URL | Show results with:URL
[8]
Radar SLAM: A Robust SLAM System for All Weather Conditions
Apr 12, 2021 · This paper studies the use of a Frequency Modulated Continuous Wave radar for SLAM in large-scale outdoor environments.
[9]
[2304.09793] Event-based Simultaneous Localization and Mapping
Apr 19, 2023 · This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams.Missing: seminal | Show results with:seminal
[10]
[PDF] A Survey of Modern & Capable Mobile Robotics Algorithms in ... - arXiv
Jul 28, 2023 · These planners enable integration with modern robots such as Ackermann and legged robots, as well as providing accurate modeling of non-circular ...
[11]
[PDF] Dynamic model of robots: Newton-Euler approach
Dynamic model of robots: Newton-Euler approach. Page 2. Approaches to dynamic modeling. (reprise) energy-based approach. (Euler-Lagrange). ▫ multi-body robot ...Missing: SLAM | Show results with:SLAM
[12]
[PDF] A solution to the simultaneous localization and map building (SLAM ...
In conclusion, this paper discusses a number of key issues raised by the solution to the. SLAM problem including suboptimal map-building algorithms and map ...
[13]
Simultaneous localization and mapping: part I - IEEE Xplore
Jun 30, 2006 · This paper describes the simultaneous localization and mapping (SLAM) problem and the essential methods for solving the SLAM problem and summarizes key ...
[14]
A Comprehensive Survey of Visual SLAM Algorithms - MDPI
Map density: in general, dense reconstruction requires more computational resources than a sparse one, having an impact on memory usage and computational cost.
[15]
Active SLAM: A Review on Last Decade - PMC - PubMed Central
This article presents a comprehensive review of the Active Simultaneous Localization and Mapping (A-SLAM) research conducted over the past decade.
[16]
https://doi.org/10.1109/CIRA.1997.613851
[17]
[PDF] Efficient optimization of information-theoretic exploration in SLAM
The goal in this paper is to choose trajectories that lead to sensor data that results in the best map, maximizing both the map coverage and the map accuracy.Missing: seminal | Show results with:seminal
[18]
[PDF] Active Visual SLAM for Robotic Area Coverage: Theory and ...
Sep 25, 2014 · Abstract. This paper reports on an integrated navigation algorithm for the visual simultaneous localization and mapping. (SLAM) robotic area ...
[19]
Planning exploration strategies for simultaneous localization and ...
The paper focuses on planning optimal exploration strategies using a utility function, a randomized algorithm, and an efficient algorithm to explore steps ...Missing: seminal | Show results with:seminal
[20]
A survey: which features are required for dynamic visual ...
Jul 16, 2021 · This article presents a survey on dynamic SLAM from the perspective of feature choices. A discussion of the advantages and disadvantages of different visual ...
[21]
DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes
Jun 14, 2018 · In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object detection and background ...
[22]
DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments
Sep 22, 2018 · DS-SLAM combines semantic segmentation network with moving consistency check method to reduce the impact of dynamic objects, and thus the localization accuracy ...
[23]
Dynamic visual SLAM algorithm for urban forest environments ...
May 14, 2025 · The proposed YSLK-SLAM provides a reliable solution for autonomous navigation of mobile robots or unmanned aerial vehicle in urban forest environments.
[24]
Overview of Multi-Robot Collaborative SLAM from the Perspective of ...
This paper provides a comprehensive review. First, the development history of multi-robot collaborative SLAM is reviewed. Second, the fusion algorithms and ...
[25]
[PDF] Present and Future of SLAM in Extreme Underground Environments
Abstract—This paper surveys recent progress and discusses future opportunities for Simultaneous Localization And Map- ping (SLAM) in extreme underground ...Missing: seminal | Show results with:seminal
[26]
A Review on Map-Merging Methods for Typical Map Types in ... - MDPI
Multi-robot SLAM can significantly improve mapping efficiency, as the maps produced by different robots are merged to form a greater map, which avoids repeated ...
[27]
https://doi.org/10.1109/JPROC.2006.876965
[28]
https://doi.org/10.1007/s10514-008-9097-4
[29]
https://ieeexplore.ieee.org/document/5354602
[30]
[PDF] LAMP 2.0: A Robust Multi-Robot SLAM System for Operation in ...
Abstract—Search and rescue with a team of heterogeneous mobile robots in unknown and large-scale underground envi- ronments requires high-precision localization ...
[31]
A hippocampal model for simultaneous localization and mapping
Aug 7, 2025 · Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episodic memory to guide behaviour. The agent ...
[32]
Solving Navigational Uncertainty Using Grid Cells on Robots - NIH
Nov 11, 2010 · We apply the RatSLAM robot navigation model to this paradigm to show that conjunctive grid cells can encode multiple hypotheses of spatial ...
[33]
[PDF] SLAM Algorithms based on Place and Grid Cells Models
Sep 26, 2014 · • ”Biologically-inspired robot spatial cognition based on rat neurophysiological studies” - Barrera and Weitzenfeld [2008]. • ”Robustness of ...
[34]
The internal maps of insects | Journal of Experimental Biology
Feb 6, 2019 · Insect navigation is strikingly geometric. Many species use path integration to maintain an accurate estimate of their distance and direction.
[35]
AntBot: A six-legged walking robot able to home like desert ants in ...
Feb 20, 2019 · We tested several ant-inspired solutions to outdoor homing navigation problems on a legged robot using two optical sensors equipped with just 14 pixels.
[36]
Application of Event Cameras and Neuromorphic Computing to ...
Event cameras inspired by biological vision systems capture the scenes asynchronously, consuming minimal power but with higher temporal resolution. Neuromorphic ...
[37]
Neuromorphic place cells - IOPscience
May 20, 2024 · We provide a model for implementing dynamic neuromorphic SLAM systems for dynamic-scale mapping of cluttered environments, even when subject to ...
[38]
Generalized Simultaneous Localization and Mapping (G-SLAM) as ...
We have developed a biologically-inspired SLAM architecture based on latent variable generative modeling within the Free Energy Principle and Active Inference ...
[39]
Bio-Inspired Topological Autonomous Navigation with Active ... - arXiv
Aug 10, 2025 · However, they are prone to drift and may scale poorly with environmental complexity. To overcome these limitations, bio-inspired topological ...
[40]
[PDF] Assessing the Scalability of Biologically-Motivated Deep Learning ...
This paper assesses the scalability of biologically-motivated deep learning models, exploring target-propagation and feedback alignment algorithms on MNIST, ...
[41]
Monocular visual SLAM, visual odometry, and structure from motion ...
Sep 30, 2024 · Due to the scale ambiguity of monocular SLAM, the system initialization was carried out by giving the system a certain amount of scene-prior ...
[42]
On-Manifold Preintegration for Real-Time Visual-Inertial Odometry
Dec 8, 2015 · In this paper, we address this issue by preintegrating inertial measurements between selected keyframes into single relative motion constraints.
[43]
A Robust and Versatile Monocular Visual-Inertial State Estimator
Aug 13, 2017 · In this work, we present VINS-Mono: a robust and versatile monocular visual-inertial state this http URL approach starts with a robust procedure ...Missing: original | Show results with:original
[44]
https://arxiv.org/abs/2510.02616
[45]
[PDF] ICP Algorithm: Theory, Practice And Its SLAM-oriented Taxonomy
In this paper, we illustrate the theoretical principles of the ICP algorithm, how it can be used in surface registration tasks, and the traditional taxonomy of ...
[46]
[PDF] A method for registration of 3-D shapes
The ICP algorithm always converges monotonically to the nearest local minimum of a mean- square distance metric, and experience shows that the rate of.
[47]
The Normal Distributions Transform: A New Approach to Laser Scan ...
The paper presents a new convergence calculation method of the normal distributions transform (NDT) scan matching for high resolution grid map. The proposed ...
[48]
[PDF] The Three-Dimensional Normal-Distributions Transform - DiVA portal
This dissertation proposes the normal-distributions transform, NDT, as a general 3D surface representation with applications in scan registration, localisation, ...
[49]
LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry ...
Sep 14, 2019 · In this paper, we set |S|to 10. Using the range values computed during segmentation, we. can evaluate the roughness of point piin S ...
[50]
Doppler-SLAM: Doppler-Aided Radar-Inertial and LiDAR ... - arXiv
Apr 15, 2025 · We propose a novel Doppler-aided radar-inertial and LiDAR-inertial SLAM framework that leverages the complementary strengths of 4D radar, FMCW LiDAR, and ...
[51]
[PDF] Scan Context: Egocentric Spatial Descriptor for Place ... - Giseop Kim
Scan context and its search algorithm make loop- detection invariant to LiDAR viewpoint changes so that loops can be detected in places such as reverse revisit ...
[52]
https://arxiv.org/abs/2507.23273
[53]
[PDF] On the Complexity and Consistency of UKF-based SLAM
Abstract—This paper addresses two key limitations of the unscented Kalman filter (UKF) when applied to the simulta- neous localization and mapping (SLAM) ...
[54]
[PDF] FastSLAM: A Factored Solution to the Simultaneous Localization ...
FastSLAM is an algorithm that estimates robot pose and landmark locations, scaling logarithmically with landmarks, and uses a tree-based structure for faster ...
[55]
[PDF] LIMITS TO THE CONSISTENCY OF EKF-BASED SLAM 1 José A ...
Abstract: This paper analyzes the consistency of the classical extended Kalman filter (EKF) solution to the simultaneous localization and map building (SLAM).
[56]
https://cdn.aaai.org/AAAI/2002/AAAI02-089.pdf
[57]
[PDF] A Hybrid Filter-based and Graph-based approach to SLAM - HAL
A Hybrid Filter-based and Graph-based approach to SLAM. ROBIO, Dec 2010, China. pp.999. hal-00544729 . Page 2. A Hybrid Filter-based and Graph-based approach ...
[58]
[PDF] A Tutorial on Graph-Based SLAM
In this paper we presented a tutorial on graph-based SLAM. Our aim was to provide the reader with sufficient details and insights to allow for an easy ...Missing: seminal | Show results with:seminal
[59]
[PDF] A Linear Approximation for Graph-based Simultaneous Localization ...
Abstract—This article investigates the problem of Simultaneous. Localization and Mapping (SLAM) from the perspective of linear estimation theory.
[60]
[1902.03747] Visual SLAM: Why Bundle Adjust? - arXiv
Feb 11, 2019 · Bundle adjustment is performed to estimate the 6DOF camera trajectory and 3D map (3D point cloud) from the input feature tracks.
[61]
Efficient Levenberg-Marquardt for SLAM - OpenReview
Oct 10, 2024 · The Levenberg-Marquardt optimization algorithm is widely used in many applications and is well-known for its use in Bundle Adjustment (BA), ...
[62]
[PDF] iSAM: Incremental Smoothing and Mapping
Sep 7, 2008 · In this paper we present incremental smoothing and map- ping (iSAM), which performs fast incremental updates of the square root information ...
[63]
High-Precision Visual SLAM for Dynamic Scenes Using Semantic ...
In this paper, we present a highly robust visual SLAM system tailored for dynamic scenes. The proposed framework exhibits superior feature extraction and ...
[64]
[PDF] Global Optimization for Consistent 3D Instant Reconstruction
Purposely, this paper introduces GO-SLAM, a deep- learning-based SLAM system featuring on-the-fly, globally consistent 3D reconstruction, facilitated by our ...<|control11|><|separator|>
[65]
https://www.mdpi.com/2079-9292/14/18/3657
[66]
SuperPoint: Self-Supervised Interest Point Detection and Description
This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry ...
[67]
DeepVO: Towards End-to-End Visual Odometry with Deep ... - arXiv
Sep 25, 2017 · This paper presents a novel end-to-end framework for monocular VO by using deep Recurrent Convolutional Neural Networks (RCNNs).
[68]
A Survey on Reinforcement Learning Applications in SLAM - arXiv
Aug 26, 2024 · This study specifically explores the application of reinforcement learning in the context of SLAM. By enabling the agent (the robot) to iteratively interact ...
[69]
[PDF] Lightweight Unsupervised Deep Loop Closure - Robotics
In this paper, we propose a novel unsupervised deep neural network architecture of a feature embedding for visual loop closure that is both reliable and compact ...
[70]
(PDF) SLAM Meets NeRF: A Survey of Implicit SLAM Methods
Oct 12, 2025 · NeRF-based SLAM in mapping aims to implicitly understand irregular environmental information using large-scale parameters of deep learning ...
[71]
[2502.20946] Generative Uncertainty in Diffusion Models - arXiv
Feb 28, 2025 · We propose a Bayesian framework for estimating generative uncertainty of synthetic samples. We outline how to make Bayesian inference practical for large, ...
[72]
Visual Adaptive and Robust SLAM for Dynamic Environments - arXiv
Oct 17, 2025 · Results show improved trajectory accuracy and robustness over state-of-the-art baselines, achieving up to 25% lower ATE RMSE than NGD-SLAM on ...
[73]
Review of deep learning-based visual SLAM: types, approaches ...
Sep 24, 2025 · RSO-SLAM (Qin et al., 2024) fuses instance segmentation and optical flow, enhancing estimation accuracy and robustness in dynamic environments ...
[74]
Improve generalization for neural visual-SLAM with Bayes online ...
Mar 31, 2025 · Among various deep learning-based SLAM systems, many exhibit low accuracy and inadequate generalization on non-training datasets.
[75]
[PDF] Real-Time Loop Closure in 2D LIDAR SLAM - Google Research
Scan-to-scan matching is frequently used to compute relative pose changes in laser-based SLAM approaches, for example [1]–[4]. On its own, however, scan-to-scan ...
[76]
Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras - arXiv
Aug 24, 2021 · We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth.
[77]
[1910.01122] OpenVSLAM: A Versatile Visual SLAM Framework
Oct 2, 2019 · In this paper, we introduce OpenVSLAM, a visual SLAM framework with high usability and extensibility. Visual SLAM systems are essential for AR devices.
[78]
A Review of Research on SLAM Technology Based on the Fusion of ...
The fundamental idea behind SLAM technology is to use sensors (such as LiDAR and vision sensors) to gather environmental data, a process that uses data ...
[79]
SLAM Meets NeRF: A Survey of Implicit SLAM Methods - MDPI
Feb 26, 2024 · This paper provides in-depth insight into the innovation of SLAM and NeRF methods and provides a useful reference for future research.
[80]
Simultaneous Localization and Mapping Market Size, Share, Trends ...
The simultaneous localization and mapping market is poised for remarkable expansion between 2025 and 2035, driven by growing demand for automation, robotics, ...
[81]
Inquiring the robot operating system community on the state of ...
Oct 19, 2024 · This work focuses on the state of adoption of ROS 2. Specifically, the article presents a user experience questionnaire targeting the ROS community.
[82]
Spatial anchors - Mixed Reality | Microsoft Learn
Jan 16, 2025 · A spatial anchor represents an important point in the world that the system tracks over time. Each anchor has an adjustable coordinate system.
[83]
C'mon and SLAM: How Oculus tackled portable, 6DOF tracking for ...
Facebook shares a peek into how simultaneous localization and mapping (SLAM) technology on which Oculus Insight was built evolved to power the inside-out ...
[84]
Using Niantic's Visual Positioning System to Anchor Pokémon to ...
Nov 13, 2024 · This feature is made possible by our Visual Positioning System (VPS), a service that aligns persistent digital content with real world locations ...Missing: SLAM overlays<|separator|>
[85]
Experience: Practical Challenges for Indoor AR Applications
Dec 4, 2024 · This paper shares the challenges facing today's augmented reality (AR) smartphone applications, particularly in the realm of localization and tracking failure.
[86]
Browse Mixed reality on Meta Quest VR Games on Meta Quest | Meta Store
**Summary of SLAM in Meta Quest for Mixed Reality (2024-2025):**