Fact-checked by Grok 2 weeks ago

Visual servoing

Visual servoing is a robotic technique that uses visual from cameras to direct and adjust the motion of robots, integrating for feature extraction with to minimize errors between current and desired visual configurations. This approach enables precise tasks such as positioning, tracking, and manipulation without relying solely on pre-programmed models or external sensors. The concept emerged in the late 1980s, with early foundational work including Weiss et al.'s 1987 demonstration of vision-guided and subsequent schemes by Feddema and Mitchell in 1989, building toward a unified by the mid-1990s. A seminal by Hutchinson, Hager, and Corke in 1996 formalized visual servoing as the fusion of image processing, , , and to servo s based on visual features. Over time, it has expanded from static environments to dynamic scenarios, incorporating for higher speed and accuracy, and addressing challenges like and feature occlusion. Central to visual servoing are two primary paradigms: image-based visual servoing (IBVS), which directly regulates features in the to avoid explicit , and position-based visual servoing (PBVS), which estimates the camera's 3D pose relative to targets and controls motion in Cartesian space. Hybrid methods combining these, along with or switching schemes, further enhance robustness by decoupling translational and rotational motions or fusing visual data with other sensors. Camera configurations vary, including eye-in-hand (mounted on the ) for dexterous and eye-to-hand (fixed) for broader . Applications span mobile for navigation and localization, aerial vehicles for obstacle avoidance, medical systems for minimally invasive procedures, and industrial manipulators for assembly tasks. Recent advances incorporate for feature detection in unstructured environments and for optimal trajectories, improving adaptability to uncertainties like variations or . These developments underscore visual servoing's role in enabling autonomous, vision-driven across diverse domains.

Introduction

Definition and principles

Visual servoing is a closed-loop control technique that employs visual feedback from cameras to direct robot motion, allowing the end-effector to attain a desired pose relative to a target object. This approach integrates data directly into the servo loop, enabling precise and without relying on precomputed trajectories. At its core, visual servoing relies on image processing to extract visual features, such as points or contours, which are compared to desired values to generate corrective commands. These features feed into the to minimize positioning errors, setting it apart from open-loop vision guidance methods that lack ongoing feedback and are prone to inaccuracies from calibration drifts or environmental changes. The fundamental system architecture includes a vision sensor, typically a camera mounted on the (eye-in-hand) or fixed in the , a feature extractor that identifies and tracks relevant image elements, a controller that processes errors to compute velocity commands, and actuators that implement the motions. Visual servoing surpasses traditional sensors, such as tactile or proprioceptive devices, by accommodating unstructured through direct use of visual data and by adapting to dynamic scenes via continuous , thus enhancing robustness without needing full environmental models. For instance, a can employ visual servoing to adjust its gripper based on the target's position in the , ensuring reliable amid minor perturbations.

Historical development

The origins of visual servoing trace back to the integration of and in the , with early experiments focusing on visual for robotic . In , Shirai and demonstrated one of the first uses of visual to guide a in tasks, marking an initial step toward closed-loop vision-based control. By 1979, Hill and Park introduced the term "visual servoing" and developed a system using a mobile camera attached to a for hand-eye coordination, laying foundational concepts for eye-in-hand configurations. Throughout the 1980s, researchers advanced these ideas through taxonomies and control frameworks; notably, Sanderson and Weiss in 1980 classified visual servo systems into look-and-move and direct servo categories, while Weiss et al. in 1987 explored dynamic sensor-based control with visual , emphasizing the need for robust of into . The saw a surge in theoretical and practical developments, establishing core paradigms in visual servoing. Espiau, Chaumette, and Rives in 1992 proposed a seminal framework for image-based visual servoing (IBVS), deriving interaction matrices to directly regulate features for . Concurrently, Weiss et al. in 1987 advanced position-based visual servoing (PBVS) by estimating pose from visual to guide robotic motion. These contributions were synthesized in the influential tutorial by Hutchinson, Hager, and Corke, which formalized IBVS and PBVS as primary control schemes and highlighted their implementation on standard hardware. Key figures like François Chaumette and Seth Hutchinson drove much of this progress, with Chaumette's work on and stability analysis becoming central to the field. In the 2000s, advancements focused on methods and capabilities, enabled by improved computational power. Malis, Chaumette, and Boudet in 1999 introduced 2.5D visual servoing, combining image features with partial 3D depth information to mitigate limitations of pure IBVS and PBVS. Researchers like Corke further refined these through open-source toolboxes, facilitating widespread adoption in robotic applications. Post-2010 developments integrated to enhance feature robustness and adaptability, particularly for dynamic environments like unmanned aerial vehicles (UAVs). For instance, Saxena et al. in 2017 proposed end-to-end visual servoing using convolutional neural networks to predict control commands directly from images, improving performance in unstructured settings. By the , ML-enhanced approaches, such as deep for visual servoing, have addressed challenges in feature extraction and , with applications in UAV and tasks.

Fundamentals

Visual feedback mechanisms

Visual feedback in visual servoing relies on specialized vision sensors to capture environmental data, which is then processed to guide robotic actions. The primary configurations include eye-in-hand systems, where the camera is mounted on the 's end-effector, providing a dynamic viewpoint that moves with the manipulator for precise local tracking; eye-to-hand setups, featuring a fixed camera external to the that observes the workspace globally; and eye-in-body arrangements, typically used in s like unmanned aerial vehicles, where the camera is attached to the 's body frame to enable and avoidance. The data flow begins with image acquisition, where the vision captures sequential frames of the scene at high rates to ensure temporal continuity. Preprocessing follows, involving operations such as noise filtering through Gaussian smoothing or to mitigate distortions from sensor artifacts or environmental . Feature detection then extracts relevant visual cues, such as edges using Canny algorithms or corners via the Harris detector, which identifies points of high curvature by computing the autocorrelation matrix of image gradients to localize stable keypoints for tracking. In the feedback , these processed continuously update estimates of the robot's pose relative to the , forming a closed- where visual errors drive corrective velocities. Systems qualitatively handle challenges like occlusions—where are temporarily obscured—through predictive tracking or multi-view , and variations via adaptive thresholding or illumination-invariant descriptors to maintain reliability without interrupting the . Sensor fusion enhances feedback robustness by integrating visual data with complementary sensors, such as inertial measurement units (), which provide and readings to compensate for visual drift or momentary losses in feature tracking, yielding more accurate pose estimates in dynamic environments. performance is critical, as latency—from acquisition delays to computation overhead—can destabilize the by introducing phase lags that amplify errors in high-speed tasks; mitigation strategies include parallel and predictive filtering to ensure loop closure rates exceeding 30 Hz for stable servoing.

Mathematical foundations

Visual servoing relies on well-defined coordinate systems to relate visual observations to robotic motion. The primary frames include the camera frame, attached to the optical center of the imaging sensor; the , where two-dimensional pixel coordinates are measured; and the robot's Cartesian space, encompassing the base frame and end-effector frame. These systems enable the of three-dimensional world points to image features, crucial for . The projection of three-dimensional points onto the is typically modeled using the equation, which assumes an ideal perspective projection. For a point in homogeneous world coordinates \tilde{\mathbf{X}}_w = [X_w, Y_w, Z_w, 1]^T, the homogeneous image coordinates [u, v, 1]^T are given by s \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} [\mathbf{R} | \mathbf{t}] \tilde{\mathbf{X}}_w, where s is a scaling factor, \mathbf{K} is the intrinsic incorporating and principal point, and [\mathbf{R} | \mathbf{t}] represents the extrinsic parameters defining the \mathbf{R} and \mathbf{t} from the world to the camera . This model forms the basis for interpreting visual data in visual servoing tasks. Pose estimation in visual servoing involves determining the relative positions and orientations between the robot's end-effector, the camera, and the target object. This is achieved through homogeneous matrices, which compactly represent rigid-body motions in . A homogeneous transformation \mathbf{T} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ 0 & 1 \end{bmatrix} describes the pose from one to another, such as from the robot base to the end-effector or from the camera to the target. Chains of these transformations link the robot's joint space to the visual observations, enabling pose reconstruction from image correspondences or direct measurements. The interaction , also known as the image Jacobian, bridges the gap between image and camera motion. For a s in the , the time \dot{s} relates to the camera v = [v_x, v_y, v_z, \omega_x, \omega_y, \omega_z]^T via \dot{s} = \mathbf{L}_s v, where \mathbf{L}_s = \frac{\partial s}{\partial v} is the k \times 6 interaction for a of k. This depends on the type and current image coordinates, allowing the projection of control commands from image space to three-dimensional velocities. Its computation is essential for ensuring the and of servoing loops. Robot integrate with visual data by combining forward and kinematic models. Forward kinematics map joint velocities \dot{q} to end-effector velocities via the manipulator \dot{x} = \mathbf{J}(q) \dot{q}, where x is the Cartesian pose. In visual servoing, this is extended to include the camera , often yielding a composite relating joint velocities to image changes: \dot{s} = \mathbf{L}_s \mathbf{V}_c \mathbf{J}(q) \dot{q}, with \mathbf{V}_c transforming end-effector to camera velocities. kinematics then solve for \dot{q} to achieve desired visual motion, accommodating constraints like joint limits. The visual in servoing is defined as the discrepancy between and desired feature configurations. In image-based approaches, the is \mathbf{e} = \mathbf{s} - \mathbf{s}^*, where \mathbf{s} and \mathbf{s}^* are the and desired image features, respectively. In position-based methods, it is formulated in three-dimensional space as \mathbf{e} = \mathbf{T} - \mathbf{T}^*, using pose differences via homogeneous transformations. This drives the control law, with its minimization ensuring task .

Taxonomy and Classification

Control schemes

Visual servoing control schemes are primarily classified based on the reference frame in which the control law operates, with the two foundational approaches being image-based visual servoing (IBVS) and position-based visual servoing (PBVS). These schemes determine how visual features are mapped to velocities or positions to achieve task , balancing computational , robustness to modeling errors, and predictability. IBVS emerged in the late as a direct method to leverage raw image data, while PBVS relied on techniques available at the time. In IBVS, control is performed directly in the image plane using pixel coordinates or other image features, without explicit of the environment. This decoupling from camera and makes IBVS robust to errors and in pose estimates, as it operates solely on observable image . However, it can suffer from nonlinear interactions between image features, leading to potential local minima and curved camera trajectories that may exit the robot's workspace for large initial errors. Early IBVS implementations, such as those using point features, demonstrated feasibility on robotic arms. PBVS, in contrast, reconstructs the 3D relative pose between the camera and using visual , then controls the in the Cartesian task space to minimize this pose error. This approach allows for straight-line trajectories and global asymptotic stability when accurate 3D models are available, making it suitable for tasks requiring precise positioning. Its drawbacks include high sensitivity to calibration inaccuracies, depth estimation errors, and feature occlusions, which can propagate into unstable control if the pose computation fails. PBVS was among the first visual servoing methods proposed, building on pose estimation from or . Hybrid schemes, such as visual servoing, partition the control between image space and partial information, often using image coordinates for in-plane motions and depth or pose components for out-of-plane adjustments. This partitioning mitigates the local minima of pure IBVS while reducing the dependence of PBVS, enabling more predictable trajectories without full . For instance, methods employ logarithmic depth features alongside projections to ensure convergence even from distant initial positions. These hybrids evolved in the late 1990s to address limitations of the basic schemes, incorporating techniques like for uncalibrated environments. Additional classifications distinguish schemes by control output and target motion. Velocity-based , the most common , computes joint or end-effector velocities from visual errors, integrating robot for smooth motion in eye-in-hand configurations. Position-based , less prevalent, directly optimizes positions, which is advantageous for avoiding velocity saturation but requires more complex guarantees. Regarding , traditional schemes assume static objects for analysis, whereas extensions for dynamic targets incorporate predictive models or filtering to track moving features, though these are often treated separately from core . The evolution of these schemes reflects advancements in computing and vision: late 1980s work focused on PBVS for its intuitive 3D control, but in the early 1990s, IBVS gained prominence for its calibration insensitivity, leading to hybrids like that combine strengths for practical applications. Modern variants include switching strategies that alternate between IBVS and PBVS based on error thresholds, enhancing robustness in unstructured environments.
SchemeAdvantagesDisadvantages
IBVSRobust to calibration errors; uses direct image feedback for local stability.Prone to local minima; nonlinear trajectories for large displacements.
PBVSGlobal stability; enables Cartesian straight-line paths.Sensitive to pose estimation and calibration inaccuracies.
(e.g., )Balances robustness with predictability; avoids full .Requires partial depth estimation; increased computational partitioning.

Feature types

In visual servoing, visual features serve as the primitive elements extracted from image data to guide robotic control, categorized based on their geometric or photometric properties. These features can be projections or reconstructed representations, enabling tasks from simple point tracking to complex pose estimation. The choice of feature type depends on the application's requirements for robustness, computational efficiency, and the to be controlled. Point features represent discrete locations in the image plane, typically as 2D coordinates (x, y) or polar forms (\rho, \theta), derived from the perspective projection of 3D points. They are extracted using corner detectors like Harris or Shi-Tomasi, or blob detection algorithms such as the Laplacian of Gaussian for centroids of uniform regions. For enhanced robustness to illumination and viewpoint changes, descriptors like SIFT (Scale-Invariant Feature Transform) or SURF (Speeded-Up Robust Features) are often matched across frames to track points reliably in dynamic environments. Line features capture linear structures, parameterized by normal distance \rho and angle \theta in the image, often from edges of objects or environmental contours. Extraction commonly employs the for detecting infinite lines or the Line Segment Detector (LSD) for finite segments, providing invariance to partial occlusions and suitable for controlling orientation in or tasks. These features are particularly effective in structured scenes, such as aligning a with hallway lines. Region features describe extended areas rather than isolated points, using image moments (e.g., zeroth-order for area, first-order for ) or for textured surfaces. Computed via spatial integrals over pixel intensities, they offer scale and rotation invariance through normalized central moments, as in Hu's invariant moments, making them ideal for servoing on deformable or non-rigid objects like grasping tools. 3D features involve reconstructed spatial information, such as object poses or point coordinates (X, Y, Z), estimated from multiple views via structure-from-motion techniques or directly from depth sensors like RGB-D cameras. These enable full 6-DOF control but require accurate calibration and handling of estimation uncertainties, often combined with 2D features for hybrid approaches. Advanced features include fields, which quantify velocities (\dot{x}, \dot{y}) from constancy assumptions, useful for motion-based servoing in cluttered scenes, and vanishing points, derived from intersecting projected to infer and camera orientation. supports dense tracking but is sensitive to lighting variations, while vanishing points provide geometric constraints for in man-made environments. Selection criteria for features emphasize properties like and invariance to ensure stable control under varying viewpoints, alongside low computational cost for implementation—point features are typically fastest, while 3D reconstructions demand more processing. Robustness to and occlusions further guides choices, with combinations often optimizing performance across tasks.

Visual Servoing Methods

Image-based visual servoing

Image-based visual servoing (IBVS) operates directly in the by using two-dimensional visual features extracted from camera images to compute control commands for the or camera system. Unlike methods that reconstruct three-dimensional pose, IBVS minimizes the error between the current positions of selected image features and their desired positions without requiring an explicit model of the environment. This approach leverages the of the camera to relate image feature velocities to the camera's relative motion, enabling real-time feedback control. The vector in IBVS typically consists of 2D coordinates of geometric primitives such as points, lines, or moments of image regions, which are robustly tracked using image processing techniques. For instance, the coordinates of corner points on a target object serve as features to guide motion. The core algorithm computes the camera velocity \mathbf{v} to drive the feature error \mathbf{e} = \mathbf{s} - \mathbf{s}^* (where \mathbf{s} are current features and \mathbf{s}^* are desired features) to zero. This is achieved via the control law \mathbf{v} = -\lambda \mathbf{L}^+ \mathbf{e}, where \lambda > 0 is a positive , \mathbf{L} is the interaction matrix (also known as the ) that maps camera velocities to feature velocities, and \mathbf{L}^+ is its pseudoinverse. The interaction matrix for a point feature (x, y) in the image is given by \mathbf{L} = \begin{bmatrix} -\frac{1}{Z} & 0 & \frac{x}{Z} & xy & -(1 + x^2) & y \\ 0 & -\frac{1}{Z} & \frac{y}{Z} & 1 + y^2 & -xy & -x \end{bmatrix}, where Z is the depth of the point relative to the camera; approximations or online estimates are often used for Z in practice. A key advantage of IBVS is its robustness to camera calibration errors and the absence of need for a precise 3D model, as control remains effective even with uncalibrated cameras by relying solely on image measurements. This makes it suitable for unstructured environments where modeling the target is challenging. However, limitations include the potential for singularities in the interaction matrix when features align in certain configurations, leading to loss of , and the risk of local minima along nonlinear trajectories in the image space, which can cause convergence failures if the initial error is large. In practical implementations, the task function approach addresses issues like between translational and rotational motions by defining a hierarchical or \boldsymbol{\alpha}(\mathbf{s}) such that the error is regulated as \dot{\boldsymbol{\alpha}} = -\lambda (\boldsymbol{\alpha} - \boldsymbol{\alpha}^*), allowing prioritization of primary tasks (e.g., alignment) while satisfying secondary constraints (e.g., limits). This helps mitigate unwanted rotations during pure translations. For example, in servoing a camera to center a , four point from the corners are selected; the control law adjusts the camera to move these points toward their centered desired positions, ensuring smooth convergence while keeping the in the field of view.

Position-based visual servoing

Position-based visual servoing (PBVS) is a strategy that reconstructs the three-dimensional pose of a object relative to a camera and regulates the camera's in Cartesian space to achieve a desired pose. In this approach, visual features extracted from the image, such as point correspondences between known model points and their projections, are used to estimate the current pose \mathbf{T} of the . The objective is then formulated in the task space, decoupling translational and rotational for intuitive . The core algorithm of PBVS involves two main steps: pose estimation and velocity command generation. First, the 3D pose \mathbf{T} (comprising rotation and translation) is computed from the visual features using Perspective-n-Point () algorithms, which solve for the camera pose given at least four 3D-2D correspondences and known camera intrinsics. Efficient implementations like the EPnP algorithm achieve this in linear time O(n) for n points by expressing world points as a of four virtual control points and solving a followed by eigenvalue decomposition for pose recovery. Once the current pose \mathbf{T} and desired pose \mathbf{T}^* are obtained, the pose error is computed, and the camera linear \mathbf{v}_c = -\lambda (\mathbf{t} - \mathbf{t}^*) and angular velocity \boldsymbol{\omega}_c = -\lambda \boldsymbol{\theta} \mathbf{u} are commanded, where \mathbf{t} and \mathbf{t}^* are the translational components of \mathbf{T} and \mathbf{T}^*, and \boldsymbol{\theta} \mathbf{u} is the axis-angle representation of the rotational error, with \lambda > 0 a ensuring exponential convergence to the target pose in Cartesian space. A key advantage of PBVS is its ability to provide global asymptotic stability and decoupled control of the six degrees of freedom (6DOF) when pose estimates are accurate, allowing for straightforward integration with task-specific constraints like obstacle avoidance. This makes it particularly suitable for applications requiring precise 3D positioning without singularities in the control law. However, PBVS is highly sensitive to errors in pose estimation, which can arise from noisy feature detection, inaccurate camera calibration, or modeling mismatches, potentially leading to instability or large control errors. Accurate 3D models of the target and reliable calibration are thus essential prerequisites. To mitigate some limitations of full , variants such as PBVS incorporate partial depth information by using a combination of image coordinates and estimated depths (e.g., logarithmic depth ratios) as features, reducing sensitivity to full pose inaccuracies while maintaining some benefits. In this approach, the interaction matrix is adapted to handle the feature set, enabling more for planar or near-planar targets. An illustrative example of PBVS is the positioning of a to grasp a target object, where corner points of the object are detected in the camera image, and is applied to reconstruct the target's 3D pose relative to the arm's end-effector. The arm's joint velocities are then derived via from the commanded Cartesian velocity, guiding the end-effector to align with the reconstructed target pose.

Hybrid and advanced methods

Hybrid visual servoing methods integrate elements of image-based visual servoing (IBVS) and position-based visual servoing (PBVS) to leverage the strengths of both approaches, such as IBVS's robustness to calibration errors and PBVS's of translational and rotational motions. These hybrids often employ partitioning strategies, where image features handle positioning tasks while pose estimates manage orientation, reducing sensitivity to camera modeling errors. For instance, optimized hybrid decoupled schemes use to refine feature interactions, achieving improved convergence in cluttered environments compared to pure IBVS or PBVS. Switching schemes within hybrid frameworks dynamically transition between IBVS and PBVS based on predefined error thresholds or visibility constraints, ensuring stability during feature occlusions or large initial displacements. This approach treats the controllers as complementary subsystems in a switched system, with transitions triggered by metrics like image Jacobian conditioning to avoid local minima. Model predictive control (MPC) further enhances hybrids by optimizing future visual feature trajectories under constraints, incorporating visual feedback to predict and correct deviations in real-time. In DeepMPCVS, a deep network forecasts optical flow for MPC planning, enabling precise alignment in novel scenes with faster convergence than traditional methods. Learning-based methods advance hybrid servoing through () for adaptive and neural networks for end-to-end , bypassing explicit interaction matrices. uncalibrated IBVS, for example, trains policies to estimate relative camera motion from features, enhancing robustness to dynamic disturbances without calibration. Convolutional neural networks (CNNs) provide feature robustness in post-2015 advances, such as regressing 6-DOF poses from perturbed to handle occlusions and lighting variations with sub-millimeter accuracy. CNN-based estimation further supports hybrid schemes by enabling predictive in unstructured settings. As of 2025, further advances incorporate architectures in to enhance PBVS performance and adaptive uncalibrated with for greater robustness. A representative application is UAV landing using hybrid visual-inertial servoing, where visual tracking of infrared targets fuses with IMU data via an to estimate relative pose and predict ship motion for precise touchdown in GPS-denied environments. This integration reduces estimation errors and improves stability under dynamic conditions, as demonstrated in simulations achieving accurate state prediction.

Feature Selection and Interactions

Common visual features

Point features, such as corners or interest points, are among the most commonly employed primitives in visual servoing due to their distinctiveness and ease of tracking across image sequences. These features are typically extracted using corner detection algorithms like the Shi-Tomasi method, which identifies high-quality points by evaluating the eigenvalues of the local to select locations with sufficient texture and stability for reliable tracking. This approach ensures sub-pixel accuracy through techniques such as quadratic interpolation or least-squares fitting on the surrounding intensity gradients, enabling precise localization even under moderate motion or noise. In visual servoing applications, point features facilitate tasks like pose estimation and trajectory following by providing sparse yet robust 2D coordinates that can be directly integrated into control loops. Line segments serve as effective visual primitives for servoing tasks involving structured environments, particularly planar targets where edges define object boundaries. Detection begins with edge extraction using the Canny algorithm, which applies Gaussian smoothing to reduce noise, computes intensity gradients, performs non-maximum suppression to thin edges, and uses hysteresis thresholding to connect weak edges to strong ones, yielding a clean edge map. These edges are then linked into line segments via methods like analysis or variants, ensuring continuity and accurate endpoint localization. In visual servoing, line segments are valuable for representing geometric constraints, such as aligning a end-effector with straight contours on manufactured parts, offering advantages in scenarios with partial occlusions compared to point-based methods. Image moments provide a versatile set of region-based features for visual servoing, capturing global properties of segmented image regions without relying on explicit contours. The zeroth-order moment corresponds to the area of the region, serving as a indicator insensitive to translation, while first-order moments yield the coordinates, enabling position control decoupled from . Higher-order moments, such as second-order for and , extend this to description, computed via integrals over intensities weighted by powers of coordinates. These features are particularly suited for blob-like or symmetric targets in servoing, as they promote stability by avoiding singularities in the interaction matrix and handling effects through . Templates and are utilized in visual servoing for tracking textured or patterned objects where local regions serve as features. Extraction involves selecting a from the initial and matching it to subsequent frames using correlation-based metrics, such as sum of squared differences or zero-mean normalized , to compute vectors with sub-pixel precision via or optimization. This method excels in maintaining consistency for non-rigid or deformable targets, as the entire encodes richer contextual than individual points, facilitating robust servoing in dynamic scenes. Depth-enhanced features incorporate information from RGB-D sensors, such as the Microsoft Kinect, to augment image data with per-pixel depth maps for hybrid servoing. These sensors project structured light or use time-of-flight to estimate depth, enabling features like points or planes by back-projecting detections using the camera intrinsic model and depth values. In visual servoing, this fusion supports tasks requiring accurate distance estimation, such as collision avoidance or precise grasping, by providing direct metric information that mitigates scale ambiguities in setups. Robustness of visual features to environmental variations, particularly illumination changes, is often achieved through techniques like , which normalizes template and image patches by their mean and variance to yield invariance to linear intensity shifts and gains. This metric computes similarity as the over overlapping regions, ensuring stable tracking in varying lighting conditions common to real-world servoing deployments, such as indoor-outdoor transitions or shadowed workspaces.

Performance impacts

The choice of visual features significantly affects the accuracy, speed, and reliability of visual servoing systems, with tradeoffs arising from feature richness and computational demands. Employing a larger number of features, such as multiple image points or regions, enhances accuracy and of axes by providing redundant information that mitigates uncertainties in camera pose or target motion. However, this richness increases processing time, as feature extraction, tracking, and interaction matrix computation scale with the number of elements, potentially limiting performance in resource-constrained setups. For instance, in scenarios requiring six degrees-of-freedom , using more points than minimal configurations can improve precision but increases the computational load. Point features exhibit high sensitivity to noise, particularly in low-texture scenes where distinctive corners or blobs are scarce, leading to tracking failures or large localization errors. In such environments, point-based extraction methods like sum-of-squared-differences correlation yield mean squared errors up to 25 pixels and success rates as low as 40%, exacerbated by background clutter or illumination changes that obscure feature discriminability. Line features, by contrast, offer greater reliability for scenes with prominent straight edges, such as industrial parts or structural elements, as they aggregate edge information over segments, reducing noise impact and improving robustness in textured but non-point-rich areas. This makes lines preferable for tasks like alignment in manufacturing, where point detection might falter. Degeneracy issues further compromise performance when features like points are collinear, causing the interaction matrix in image-based visual servoing to lose and resulting in ill-conditioned that hinders accurate depth . In configurations with five or more points, collinearity in subsets creates singularities where camera motions produce stationary image projections, leading to ambiguous pose and potential system or . These degeneracies are particularly problematic in planar or aligned structures, where they can amplify estimation errors in depth, necessitating careful feature placement to avoid singular cylinders defined by the points' geometry. Experimental evaluations highlight these impacts, with image-based visual servoing using moments as features showing improved performance under dynamic lighting variations compared to traditional point-based approaches, owing to moments' invariance properties that preserve descriptors amid photometric changes. Moments integrate image information, yielding exponential error decay and lower variance in feature trajectories during illumination shifts. Adaptation strategies, such as dynamic , mitigate these issues by switching features based on scene conditions—for example, transitioning from point tracking to in high-motion scenarios to maintain reliability without excessive computation. This approach ensures sustained performance in varying environments, like aerial navigation with occlusions. Key performance metrics underscore these tradeoffs: convergence time measures how quickly features align to desired positions, often extended with richer but noisier sets; trajectory smoothness quantifies path regularity via metrics like jerk or curvature variance, improved by robust features like lines to avoid oscillations; and failure rates under perturbations reflect overall reliability, generally higher for points in noisy conditions compared to moments or adapted flows. These indicators guide feature optimization, prioritizing smoothness in precision tasks like grasping while tolerating longer convergence in exploratory applications. Recent advances as of 2025 incorporate for feature selection and extraction, enabling neural networks to detect keypoints in unstructured environments and to adaptively choose features for robustness against occlusions or varying conditions. These methods enhance performance in dynamic scenarios, such as autonomous , by learning optimal feature interactions without manual tuning.

Analysis and Design

Error propagation and

In visual servoing systems, errors can arise from various sources, including inaccuracies in camera intrinsics and extrinsics, in extracted image features, and time delays in the . errors propagate through the , leading to deviations in the computed camera and potential steady-state offsets in the . Feature , often modeled as additive zero-mean Gaussian disturbances with a standard deviation on the order of several levels, affects the accuracy of feature point tracking and can amplify discrepancies in the interaction matrix \mathbf{L}. Time delays, such as those exceeding one sample period in low-frame-rate systems (e.g., 30 Hz), can destabilize the closed-loop dynamics by introducing phase lags that violate margins. These errors propagate via the , or interaction matrix \mathbf{L}, which relates the feature \dot{\mathbf{e}} to the camera \mathbf{v}_c through \dot{\mathbf{e}} = \mathbf{L} \mathbf{v}_c, with approximations in the estimated \hat{\mathbf{L}} exacerbating the issue in the \mathbf{v}_c = -\lambda \hat{\mathbf{L}}^+ \mathbf{e}. Stability in visual servoing is typically analyzed using Lyapunov methods to ensure exponential convergence of the feature error \mathbf{e} to zero. A common Lyapunov function candidate is the quadratic form V = \frac{1}{2} \mathbf{e}^T \mathbf{e}, which is positive definite for \mathbf{e} \neq 0. The time derivative is \dot{V} = \mathbf{e}^T \dot{\mathbf{e}} = -\lambda \mathbf{e}^T \mathbf{L} \hat{\mathbf{L}}^+ \mathbf{e}, and for asymptotic , this must satisfy \dot{V} < 0 when \mathbf{L} \hat{\mathbf{L}}^+ > 0, guaranteeing under bounded uncertainties. This criterion holds locally around the desired pose but requires full-rank conditions on \mathbf{L} to avoid singularities. Image-based visual servoing (IBVS) exhibits local but is prone to local minima, where the error decreases initially yet converges to suboptimal points due to nonlinear image projections, limiting its suitability for large initial displacements. In contrast, position-based visual servoing (PBVS) offers global for significant pose errors when pose estimates are accurate, as the 3D error align directly with the task space. However, PBVS remains sensitive to errors that corrupt the . To mitigate singularities from ill-conditioned \mathbf{L}, control strategies partition the interaction matrix into independent and components, such as using cylindrical coordinates for or moments for , thereby maintaining invertibility and preventing control failures. Experimental validations through simulations demonstrate bounds under ; for instance, in IBVS setups with an arm like the Viper, adding zero-mean to feature points results in bounded tracking errors, confirming Lyapunov-predicted under moderate noise levels. Robustness to uncertainties, such as varying depth or unmodeled dynamics, is enhanced by adaptive gains that adjust the control parameters online; for example, Lyapunov-stable adaptive laws update depth estimates to bound errors under persistent disturbances, achieving convergence under calibration errors.

Control laws and optimization

The design of control laws in visual servoing aims to regulate the motion of a robot or camera based on visual feedback, typically by minimizing the error between current and desired visual features. The most fundamental approach employs a proportional control law, where the velocity command \mathbf{u} is given by \mathbf{u} = -\lambda \mathbf{L}^+ (\mathbf{s} - \mathbf{s}^*), with \lambda > 0 as the proportional gain, \mathbf{L} the interaction matrix relating feature velocity to camera velocity, and \mathbf{s} - \mathbf{s}^* the feature error. This law drives the system toward the desired configuration but can exhibit steady-state errors due to modeling inaccuracies or disturbances. To address this, proportional-integral (PI) controllers extend the basic form by incorporating an integral term, yielding \mathbf{u} = -\lambda \mathbf{L}^+ [(\mathbf{s} - \mathbf{s}^*) + k_i \int (\mathbf{s} - \mathbf{s}^*) \, dt], where k_i > 0 accumulates past errors to eliminate offsets and improve robustness. Advanced control laws integrate additional mechanisms to handle robotic constraints and nonlinear dynamics. Inverse kinematics is often incorporated to map visual velocity commands to joint velocities, ensuring feasible motions within the robot's workspace, as in \dot{\mathbf{q}} = \mathbf{J}^\# \mathbf{u}, where \mathbf{J} is the and \# denotes the pseudoinverse. Gain scheduling adapts the proportional gain \lambda dynamically based on operating conditions, such as feature depth or , to mitigate non-linearities like those arising from perspective projection; for instance, \lambda may decrease as features approach singularities to prevent oscillations. Optimization techniques enhance control laws by incorporating constraints and multi-objective criteria. Quadratic programming (QP) is commonly used to satisfy joint limits, velocity bounds, or singularity avoidance while minimizing visual error, formulated as \min_{\mathbf{u}} \frac{1}{2} \mathbf{u}^T \mathbf{H} \mathbf{u} + \mathbf{f}^T \mathbf{u} subject to \mathbf{A} \mathbf{u} \leq \mathbf{b}, where the prioritizes feature . Cost functions may also weight visual errors against secondary tasks, such as obstacle avoidance, to balance performance. Predictive control employs models to forecast feature trajectories over a horizon, optimizing future commands via \min \sum_{k=1}^N \| \mathbf{s}_{t+k} - \mathbf{s}^* \|^2 + \| \mathbf{u}_{t+k} \|^2 under constraints, enabling anticipation of occlusions or rapid motions. Tuning methods ensure desirable closed-loop behavior. Pole placement designs gains to assign specific eigenvalues for desired response speeds and , applied to linearized visual servoing models. Linear quadratic regulator (LQR) optimizes gains by minimizing a quadratic cost \int ( \mathbf{x}^T \mathbf{Q} \mathbf{x} + \mathbf{u}^T \mathbf{R} \mathbf{u} ) dt, providing optimal for systems like mobile robots under visual . A representative example is the weighted task function approach for multi-objective servoing, where the primary visual task \mathbf{e}_p = C(\mathbf{s} - \mathbf{s}^*) is augmented with secondary tasks \mathbf{e}_s, solved via \dot{\mathbf{x}} = -\lambda (\mathbf{I} - \mathbf{N} \mathbf{N}^T) \dot{\mathbf{e}}_p - \mu \mathbf{N} \mathbf{N}^T \dot{\mathbf{e}}_s, with \mathbf{N} the null space projector and weights \lambda, \mu prioritizing objectives like joint limit avoidance alongside feature tracking.

Applications

Industrial robotics

Visual servoing has been widely adopted in industrial robotics for tasks involving fixed manipulators in manufacturing environments, enabling precise operations without reliance on fixed fixtures. In bin picking and assembly applications, image-based visual servoing (IBVS) is commonly employed to grasp irregular objects from cluttered bins, where cameras mounted on the robot end-effector track image features such as centroids or edges to guide the gripper toward targets. This approach compensates for variations in object pose and lighting, allowing robots to handle non-rigid or randomly oriented parts in automotive and electronics assembly lines. For instance, a heterogeneous distributed visual servoing system has demonstrated real-time bin-picking of complex industrial objects by integrating multiple cameras for robust feature extraction. In welding and inspection tasks, position-based visual servoing (PBVS) facilitates precise alignment by estimating the 3D pose of workpieces from stereo or monocular vision, guiding the robot tool along seams in automotive production. PBVS is particularly suited for these applications due to its ability to provide metric accuracy for path planning, ensuring weld torches or inspection probes follow curved surfaces with minimal deviation. A robust visual servo control system for double-head welding robots has shown effectiveness in tracking narrow seams under dynamic conditions, reducing misalignment errors in real-time. Similarly, structured light-based visual servoing has been applied to robotic pipe welding, achieving sub-millimeter precision in industrial settings. Case studies from the highlight practical implementations, such as ABB industrial robots integrated with the library for part mating tasks, where eye-in-hand cameras enable hybrid visual-force control to align components like shafts into housings during assembly. These systems, tested on ABB IRB series manipulators, used IBVS to achieve contact-free initial positioning followed by force feedback for insertion, demonstrating reliability in and automotive . The platform's modular architecture supported and deployment, with reported success in simulations and physical setups for peg-in-hole operations. The primary benefits of visual servoing in robotics include enhanced flexibility compared to traditional fixture-based methods, allowing to variable part geometries and reducing setup times in high-mix production. Error rates have been significantly lowered, with positioning accuracies often below 1 mm, enabling reliable operations in precision-demanding tasks like insertion and . For example, a visual servoing method achieved 100% success rates with position errors under 1 mm in robotic . Another dynamic accuracy enhancement technique confined pose errors to less than 0.10 mm for position and 0.05 degrees for orientation in manipulators. Integration challenges arise when incorporating visual servoing with programmable logic controllers (PLCs) and adhering to standards like ISO 10218, which mandates risk assessments for collaborative environments and limits robot speeds near humans. Synchronizing vision feedback loops with PLC-driven factory automation requires low-latency communication protocols, often leading to issues with determinism and . Compliance with ISO 10218-1 and -2 involves additional safeguards, such as emergency stops and speed monitoring, complicating system validation in human-robot collaborative cells. A vision-based inspection setup highlighted these hurdles, emphasizing the need for updated classifications under the 2025 revisions. Post-2020 developments have incorporated augmentation to enhance visual servoing for adaptive , where models predict feature trajectories or compensate for occlusions in dynamic environments. For instance, imitation learning integrated with direct visual servoing uses the large projection formulation for faster convergence in tasks, improving robustness to novel objects. In AR-assisted , -driven augments visual servoing by overlaying , enabling robots to adjust to variations in . These advancements support flexible, reconfigurable lines, as reviewed in -AR integration studies for applications. As of 2025, applications include visual servoing for drawer retrieval and storage operations in robotic manipulators, enhancing precision in tasks.

Mobile and aerial robots

Visual servoing has been integral to ground navigation, particularly through integration with and (SLAM) techniques for rovers operating in unstructured terrains. In planetary exploration, NASA's Mars Exploration Rovers (), such as and , employed visual target tracking—essentially visual servoing—to mitigate slippage and errors during autonomous mobility, enabling precise approach to scientific like rocks despite terrain irregularities. Earlier prototypes like the Rocky 7 demonstrated visual servoing on elevation maps derived from for autonomous rock acquisition, allowing the to approach and position for manipulation from over one meter away without relying on pre-mapped environments. For wheeled mobile s on , switched schemes address nonholonomic constraints, using cameras to track features while compensating for limited maneuverability in indoor or outdoor settings. In aerial , visual servoing enables unmanned aerial vehicles (UAVs) and drones to perform target tracking and autonomous , often leveraging for maintaining hover in dynamic conditions. Image-based visual servoing (IBVS) with fiducial markers, such as AprilTags, allows drones to localize and follow moving in real-time, as demonstrated in systems using particle filters for robust detection during approach and descent phases. -based methods provide ego-motion estimates to stabilize hover and adjust , crucial for operations in windy or cluttered environments where traditional inertial measurements alone are insufficient. The serves as a seminal example, where IBVS facilitates indoor tracking of moving objects by controlling the quadrotor's based on feature errors, achieving without external positioning aids. Case studies from the 2010s highlight visual servoing's role in vision-based autonomy during challenges, such as the Subterranean (SubT) Challenge, where multi-robot teams integrated for mapping and artifact detection in GPS-denied underground environments like tunnels and caves. Teams like CoSTAR used visual-inertial fused with servoing for coordinated , enabling ground and aerial robots to navigate unknown terrains collaboratively. The further exemplified practical deployment, with extensions to outdoor tracking and following using forward-facing cameras for object pursuit in vision-only setups. Adaptations for mobile and aerial robots emphasize handling ego-motion disturbances and to enhance reliability. Visual servoing compensates for rapid ego-motion in drones by estimating pose from 2D features, often integrating with inertial measurement units () for short-term stability during aggressive maneuvers. with GPS and IMUs, as in VINS-Fusion frameworks, combines with inertial data for robust state estimation in hybrid navigation, allowing seamless transitions between GPS-available and denied modes. Performance in GPS-denied environments underscores visual servoing's precision, with landing systems achieving average errors of approximately 19 cm using multi-sensor fiducial tracking, sufficient for safe touchdown on unprepared surfaces. Recent advancements in the extend this to swarm servoing for multi-UAV coordination, where image-based visual servoing enables precise and formation control among s, facilitating collaborative tasks like target encirclement in cluttered spaces. As of 2025, improved DETR-based visual servoing has been applied to satellite tracking, enhancing robustness in space environments.

Challenges and Future Directions

Limitations and robustness issues

Visual servoing systems are highly sensitive to environmental variations, particularly changes in conditions that can alter image feature visibility and lead to tracking failures. For instance, variations in illumination cause edges, corners, and color-based cues to become unreliable, resulting in loss of feature detection and subsequent control instability. Occlusions, whether partial or complete, further exacerbate this by temporarily hiding critical features, causing error spikes in image-based approaches where depth estimation relies on continuous visibility. from rapid camera or object movement introduces additional distortions, degrading feature extraction accuracy and often leading to divergent trajectories if not mitigated. System-level constraints also limit the practical deployment of visual servoing, with high computational demands posing a primary barrier to real-time operation. Image processing tasks, such as feature detection and velocity estimation, require significant processing power, often resulting in latencies that exceed control loop requirements in dynamic scenarios. Calibration drift over time, due to mechanical wear, temperature changes, or sensor inaccuracies, introduces cumulative errors in camera intrinsics and extrinsics, amplifying pose estimation inaccuracies without periodic recalibration. Failure modes are particularly pronounced in unstructured environments, where systems may diverge due to insufficient richness or unexpected perturbations, as seen in stability analyses of eye-in-hand configurations. Studies report high failure rates for standard image-based methods in low-contrast or low-light conditions with partial occlusions, such as up to 100% in some grasping tasks. In these cases, error propagation can amplify discrepancies by factors related to feature loss, leading to non-convergent behaviors without adaptive measures. Robustness gaps persist in handling dynamic obstacles, as conventional visual servoing lacks inherent predictive capabilities, relying instead on reactive tracking that fails against fast-moving interferences. This often results in collision risks or task abandonment in cluttered scenes, underscoring the need for supplementary sensing or to maintain performance. While basic mitigations like multi-camera setups or hybrid control can provide fallback , they do not fully resolve these issues without increasing complexity.

Emerging technologies

The integration of and has advanced visual servoing toward end-to-end control paradigms, particularly through (DRL) techniques that enable adaptive policies without explicit . For instance, DRL-based visual servoing for unmanned aerial vehicles (UAVs) dynamically adjusts servo gains in to handle field-of-view constraints, achieving stable tracking in dynamic environments as demonstrated in simulations and hardware experiments. Similarly, deep deterministic policy gradient (DDPG) variants have been applied to UAV servoing tasks, optimizing continuous action spaces for precise target following while mitigating issues like partial observability. These approaches, prominent in post-2018 research, enhance in aerial by learning robust policies from image data alone, with improvements in task success over classical methods in cluttered scenarios. Event-based , leveraging neuromorphic sensors, represents a breakthrough for low-latency visual servoing in high-speed applications, where traditional frame-based cameras struggle with and limitations. These sensors output asynchronous events triggered by pixel-level changes, enabling sub-millisecond response times suitable for robotic and UAV . A neuromorphic eye-in-hand visual servoing controller, for example, has been validated on industrial manipulators, reducing positioning errors to 0.183 during high-speed tasks. Experimental results show that event-based methods maintain in lighting variations and occlusions, outperforming conventional by factors of 10 in for dynamic object tracking. Multi-modal fusion integrates visual servoing with complementary sensors like and emerging networks to bolster robustness in outdoor , addressing challenges such as GPS denial or adverse weather. In agricultural settings, -assisted visual servoing fuses point clouds with features for precise inter-zone in greenhouses, achieving mean path deviations around 4-6 cm at low speeds (0.2-0.4 m/s) under variable lighting. When combined with for low-latency , this supports distributed servoing in multi-robot systems, enhancing coordination for outdoor tasks like or search-and-rescue. As of 2025, research continues to explore advanced optimization techniques for robotics control, alongside collaborative human-robot servoing through bio-inspired AI frameworks that incorporate human intent via shared visual cues, enabling safer industrial interactions with notable reductions in collision risks in shared workspaces. Research frontiers emphasize scalable swarms, where AI-driven visual servoing coordinates drone collectives for collective perception, as seen in agentic UAV systems that adaptively navigate using distributed vision policies. Ethical considerations in vision control are gaining traction, with frameworks embedding transparency and accountability to mitigate biases in AI-perceived environments, ensuring equitable deployment in surveillance and autonomous operations. Learning-based features have demonstrated improved robustness against perturbations in visual servoing, based on benchmarks showing reduced error variance in feature extractors compared to hand-crafted ones.

Software Tools

Open-source frameworks

One prominent open-source framework for visual servoing is the Visual Servoing Platform (), a modular C++ library with Python bindings that facilitates prototyping and development of applications involving visual tracking and servoing techniques. supports both image-based visual servoing (IBVS) and position-based visual servoing (PBVS), enabling real-time control of robots using camera feedback, and is cross-platform compatible with , Windows, macOS, and others via builds. As of July 2025, the latest release is version 3.7.0. It includes simulation capabilities for testing servoing algorithms without physical hardware and provides interfaces for robotic systems, such as those using microcontrollers like for low-level control in experimental setups. ViSP integrates seamlessly with , an open-source library, through dedicated bridging tools that allow conversion of images, camera parameters, and features between the two for enhanced feature tracking in custom visual servoing pipelines. 's modules for detecting and tracking keypoints, such as corners or blobs, serve as foundational components in these pipelines, supporting real-time processing essential for servoing tasks. This integration has been utilized in various prototypes, including those combining visual feedback with hardware actuation. Within the (ROS) ecosystem, is extended through packages like visp_auto_tracker, which wraps model-based trackers for automated detection and pose estimation of patterns such as QR codes or blobs on objects. This package is particularly suited for robot arms, with examples demonstrating integration with Universal Robots like the UR5 for tasks involving visual guidance and servoing. The vision_visp ROS stack further enables interfacing with ROS nodes for broader robotic applications, supporting both ROS 1 and ROS 2. Another relevant open-source tool is ORB-SLAM, a feature-based (SLAM) library that provides robust , , or RGB-D pose estimation, often incorporated into visual servoing loops for camera-to-object positioning. ORB-SLAM's () features enable loop closure and relocalization, making it valuable for servoing in dynamic environments where initial pose estimates are needed. Frameworks like have been combined with ORB-SLAM variants to bridge SLAM outputs directly into servoing controllers. These frameworks benefit from active open-source communities, with maintaining development since the early 2000s through Inria's team (formerly Lagadic), offering over 125 tutorials and 515 example codes for beginners and advanced users alike. Community contributions via ensure ongoing updates, including enhancements for real-time performance and hardware compatibility.

Simulation and implementation tools

Simulation environments play a crucial role in the development of visual servoing systems, allowing researchers and engineers to test algorithms in virtual settings that replicate real-world physics, lighting, and dynamics without risking damage. These tools facilitate the integration of vision feedback with , enabling iterative design of image-based (IBVS) and position-based (PBVS) servoing strategies. Popular open-source simulators include simulator (in its modern iteration, succeeding Gazebo Classic which reached end-of-life in January 2025) integrated with the (ROS 2), which supports virtual testing of eye-in-hand and eye-to-hand configurations for manipulators and mobile platforms. For instance, Gazebo has been used to simulate five-degree-of-freedom visual servoing robots, easing debugging and validation of laws in dynamic environments. Similarly, (formerly V-REP) provides physics-based simulation for vision-guided tasks, incorporating accurate rendering of camera perspectives and object interactions, as demonstrated in visual servoing experiments with Franka Emika robots where it enables dynamic model validation. Commercial software further supports visual servoing implementation, particularly in and control design contexts. MathWorks' offers dedicated blocks for modeling visual servoing controllers through the Toolbox and System Toolbox, allowing users to simulate camera-robot interactions and tune parameters like feature extraction and velocity commands before hardware deployment. For applications, Cognex VisionPro provides robust tools that can be adapted for IBVS in lines, supporting image processing for pose estimation and feedback control in fixed-camera setups. Hardware kits and integrated platforms bridge simulation and real-world execution in visual servoing. The Franka Emika Panda robot, equipped with visual plugins via libraries like , serves as a standard hardware kit for testing servoing algorithms, supporting eye-in-hand PBVS with depth cameras such as RealSense for precise end-effector positioning. Isaac Sim, built on , facilitates AI-enhanced visual servoing simulations, leveraging GPU-accelerated physics and generation for training models in complex scenarios like multi-robot coordination. Implementation aids like Peter Corke's Robotics Toolbox streamline prototyping by providing functions for visual servoing, including , interaction matrices, and of camera poses relative to targets. This toolbox supports rapid development of control schemes, from basic PBVS to advanced methods, with built-in visualization for analysis. These tools offer significant advantages, such as rapid iteration cycles that substantially reduce the number of physical trials required, minimizing wear on and accelerating development timelines in resource-constrained settings. In recent years, Unity-based simulations have emerged for UAV visual servoing, enabling high-fidelity rendering of aerial environments for tasks like target tracking, as explored in event-based servoing frameworks during the 2020s.

References

  1. [1]
    [PDF] Chapter 34 - Visual Servoing - Hal-Inria
    Visual servoing uses computer vision data to control a robot's motion, using techniques from image processing, computer vision, and control theory.
  2. [2]
    Visual Servoing in Robotics - MDPI
    Visual servoing guides robots using visual information, combining image processing, robotics, and control theory to control robot motion.Missing: definition | Show results with:definition
  3. [3]
  4. [4]
    [PDF] Handbook of Robotics Chapter 24: Visual servoing and visual tracking
    This chapter introduces visual servo control, using computer vision data in the servo loop to control the motion of a robot. We first describe the basic ...
  5. [5]
    An onboard-eye-to-hand visual servo and task coordination control ...
    This paper proposes an onboard-eye-to-hand visual servo and task coordination control for aerial manipulators using a spherical model, controlling both the UAV ...
  6. [6]
    [PDF] A Tutorial on Visual Servo Control - Robotics and Automation, IEEE ...
    Visual servoing is the fusion of results from many elemental areas including high-speed image processing, kinematics, dy- namics, control theory, and real-time ...
  7. [7]
    [PDF] Features tracking for visual servoing purpose - Hal-Inria
    Jan 12, 2009 · In this paper we give an overview of a few tracking algorithms developed for visual servoing experiments at IRISA-INRIA Rennes. I. MOTIVATION.
  8. [8]
    [PDF] FlowControl: Optical Flow Based Visual Servoing
    The pose estimation algorithms must estimate this relative pose. We evaluate ... effects, such as lighting changes and partial occlusion. This helps it ...
  9. [9]
    [PDF] REAL-TIME IMAGE BASED VISUAL SERVOING ARCHITECTURE ...
    A commonly cited problem in real-time applications is the bounded bandwidth communication between the visual sensor and the robot which induce a latency in.Missing: considerations | Show results with:considerations
  10. [10]
    [PDF] Visual servoing - HAL Inria
    Jun 19, 2014 · – Ls is the interaction matrix of s defined such that s = Lsv where v is the relative velocity between the camera and the environment. In the ...
  11. [11]
    [PDF] Visual servo control, Part I: Basic approaches - Hal-Inria
    Jan 6, 2009 · In this article, we will see two very different approaches. First, we describe image-based visual servo con- trol (IBVS), in which s consists of ...Missing: seminal | Show results with:seminal
  12. [12]
    [PDF] 2 1/2 D Visual Servoing - Hal-Inria
    Jan 13, 2009 · Malis, François Chaumette, S. Boudet. 2 1/2 D Visual Servoing. IEEE Transactions on Robotics and Automation, 1999, 15 (2), pp.238-250.
  13. [13]
    [PDF] Visual Servo Control - Hal-Inria
    Mar 3, 2007 · This article is the second of a two-part tutorial on visual servo control. In Part I (IEEE Robotics and. Automation Magazine, vol. 13, no.
  14. [14]
  15. [15]
    [PDF] Visual Servoing - HAL Inria
    Nov 18, 2020 · Visual servoing schemes mainly differ in the way that the visual features are designed. As represented on Fig. 2, the two most classical ...
  16. [16]
    [PDF] Visual servoing Franc¸ois Chaumette IRISA / INRIA Rennes, France ...
    3D visual features with one camera. Based on pose estimation p(t) from Fc to Fo using. • an image of the object: x(t). • the knowledge of the object 3D CAD ...<|control11|><|separator|>
  17. [17]
    Robotic visual servoing system based on SIFT features | Request PDF
    This paper presents a robotic visual servoing system based on SIFT features and its implementation for realtime control of a robot manipulator.
  18. [18]
    [PDF] A new approach to visual servoing in robotics - l'IRISA
    [26]. “Adaptive visual servo control of robots,” in Robot Vision, A. Pugh, Ed. Bedford, UK: IFS Pub. Ltd., 1983, pp. 107-116.
  19. [19]
    [PDF] Potential problems of stability and convergence in image-based and ...
    Jan 13, 2009 · The aim of this paper is to emphasize these problems by considering an eye-in-hand system and a positioning task with respect to a static target ...
  20. [20]
    [PDF] A new hybrid image-based visual servo control scheme
    we present a new partitioned visual servo control scheme that overcomes a number of the perfor- mance problems faced by classical IBVS but with less computation ...Missing: advancements | Show results with:advancements
  21. [21]
    [PDF] Combining IBVS and PBVS to ensure the visibility constraint - Hal-Inria
    Abstract— In this paper we address the issue of hybrid 2D/3D visual servoing. Contrary to popular approaches, we consider the position-based visual servo as ...Missing: seminal | Show results with:seminal
  22. [22]
    Optimized hybrid decoupled visual servoing with supervised learning
    This study proposes an optimized hybrid visual servoing approach to overcome the imperfections of classical two-dimensional, three-dimensional and hybrid ...Missing: advancements | Show results with:advancements
  23. [23]
    [PDF] Hybrid PBVS-IBVS Model Predictive Visual Servoing - Pedro Roque
    It starts with the background of autonomous rendezvous and docking (ARD), highlighting how important visual servoing methods are in space robotics. It is ...<|separator|>
  24. [24]
    DeepMPCVS: Deep Model Predictive Control for Visual Servoing
    May 3, 2021 · We present a deep model predictive visual servoing framework that can achieve precise alignment with optimal trajectories and can generalize to novel ...
  25. [25]
    Classical and Deep Learning based Visual Servoing Systems
    The visual servoing module requires estimating an interaction matrix that maps observed image features on to robot velocities, which in turn requires 3-D ...
  26. [26]
    Deep Reinforcement Learning-Based Uncalibrated Visual Servoing ...
    In this article, we put forward a brand-new uncalibrated image-based visual servoing (IBVS) method. It is designed for monocular hand–eye manipulators with ...
  27. [27]
    [PDF] Visual Servoing from Deep Neural Networks - HAL Inria
    The paper describes how to create a dataset simulating various perturbations. (occlusions and lighting conditions) from a single real-world image of the scene.
  28. [28]
    A visual/inertial integrated landing guidance method for UAV ...
    Aug 7, 2025 · This paper presents a visual/inertial integrated guidance method for UAV shipboard landing. The airborne vision system is utilized to track ...
  29. [29]
    [PDF] Good Features to Track - Duke Computer Science
    This PDF file was recreated from the original LaTeX file for technical report TR 93-1399,. Cornell University. The only changes were this note and the ...
  30. [30]
    [PDF] A Shape Tracking Algorithm for Visual Servoing - l'IRISA
    A 1D Canny edge detector is applied to each measurement line and the points of local maximum are adopted as detected features. The measurement procedure.
  31. [31]
    A General and Useful Set of Features for Visual Servoing
    Aug 6, 2025 · While moments of orders zero up to three are used to represent gross level image ... A first step toward visual servoing using image moments.
  32. [32]
    [PDF] Point-based and region-based image moments for visual servoing of ...
    Jan 12, 2009 · In this paper, we present improvements in image-based visual servo using image moments. First, the analytical form of the interaction matrix.
  33. [33]
    A TSR Visual Servoing System Based on a Novel Dynamic ... - MDPI
    Improved Template Matching. We have introduced some common correlation-based similarity functions in the section above, namely SSD, NCC, SAD and SHD. In ...
  34. [34]
    [PDF] A dense and direct approach to visual servoing using depth maps
    Apr 28, 2015 · Our approach has been validated in various servoing experiments using the depth information from a low cost RGB-D sensor. Positioning tasks are ...Missing: enhanced | Show results with:enhanced
  35. [35]
    [PDF] Direct visual servoing using ZNCC criterion - Hal-Inria
    Jan 16, 2018 · Abstract—This paper proposes a direct visual scheme. In direct visual servoing approaches, the goal is to consider all the image as a whole.
  36. [36]
    [PDF] Robust Visual Servoing
    A visual servoing task in general includes some form of (i) positioning, such as aligning the robot/gripper with the tar- get, and (ii) tracking, updating the ...Missing: richness | Show results with:richness
  37. [37]
    Singularities in the Image-Based Visual Servoing of Five Points
    Nov 5, 2020 · ... collinear ... finite number of singular positions of the camera. The goal is to find the configurations of five non-degenerate points and the camera.
  38. [38]
    Adaptive Visual Servoing for Obstacle Avoidance of Micro ... - MDPI
    Nov 25, 2021 · A vision-based adaptive switching controller that uses optical flow information to avoid obstacles for micro unmanned aerial vehicles (MUAV) ...
  39. [39]
    [PDF] Camera Modelling for Visual Servo Control Applications
    In this paper, we present a detailed camera model which can be used in the design and analysis of visual servo systems. Using the free-standing acrobat as a ...
  40. [40]
    [PDF] Adaptive visual servo control to simultaneously stabilize image and ...
    There are many methods of visual servo control that are classi- cally grouped into image based visual servoing (IBVS) and position based visual servoing (PBVS) ...
  41. [41]
    [PDF] Visual Servo Control Part I: Basic Approaches - l'IRISA
    Dec 2, 2006 · Visual servo control refers to the use of computer vision data to control the motion of a robot. The vision data may be acquired from a camera ...
  42. [42]
  43. [43]
    [PDF] Visual servoing in an optimization framework for the whole-body ...
    Dec 22, 2016 · visual servoing. In fact, the exact same method can be found in ... Quadratic programming objective. Recall that a general optimization ...
  44. [44]
    [PDF] Visual Servoing via Nonlinear Predictive Control
    A nonlinear global model and a local model based on the interaction matrix are considered. Advantages and drawbacks of both models are pointed out. Finally, ...<|separator|>
  45. [45]
  46. [46]
    [PDF] A heterogeneous distributed visual servoing system for real-time ...
    The algorithms we use for model-based matching involve assessment of scene measurement and the pose estimation uncertainties, and feature correspondence search ...
  47. [47]
    A robust visual servo control system for narrow seam double head ...
    Aug 5, 2025 · In this paper, an image-based visual servo control system is developed and integrated into a double head welding robot for CO2 gas shielded ...
  48. [48]
    Visual Servoing Control Based on EGM Interface of an ABB Robot
    A new visual servoing controller based on the (External Guided Motion, EGM) interface of ABB robots is introduced, with the emphasis on its adaptability to ...Missing: case | Show results with:case
  49. [49]
    Research on a Visual Servoing Control Method Based on ... - MDPI
    Nov 18, 2022 · This study presents a visual servo control method based on perspective transformation to transport a workpiece to an unmarked spatial ...
  50. [50]
    Visual Servoing-Based Dynamic Accuracy Enhancement of ...
    Jun 5, 2024 · In the third part, a practical dynamic path tracking (DPT) scheme for industrial robots is elaborated for improving the path tracking accuracy.
  51. [51]
    [PDF] Human-Robot Collaboration for a Vision-Based Quality Inspection
    Jun 16, 2025 · The updated standard ISO 10218-1:2025, Robotics - Safety requirements - Part 1: Industrial robots [2] and ISO 10218-2:2025, Robotics - Safety.<|separator|>
  52. [52]
    Imitation learning-based Direct Visual Servoing using the large ...
    This study introduces a dynamical system-based imitation learning for direct visual servoing. It leverages off-the-shelf deep learning-based perception modules.Missing: manufacturing | Show results with:manufacturing
  53. [53]
    (PDF) Artificial intelligence (AI) in augmented reality (AR)-assisted ...
    Dec 28, 2020 · This research work provides a review of current AR strategies, critical appraisal for these strategies, and potential AI solutions for every component of the ...
  54. [54]
    [PDF] Overview of the Mars Exploration Rovers' Autonomous Mobility and ...
    Both of these effects can result in the rover not quite reaching its target, but both are mitigated by Vi- sual Target Tracking (also known as Visual Servoing), ...
  55. [55]
    [PDF] Autonomous Rock Tracking and Acquisition from a Mars Rover
    Our algorithms perform visual servoing on an elevation map instead of image features, because the latter are subject to abrupt scale changes during the approach ...
  56. [56]
    Switched visual servo control of nonholonomic mobile robots with ...
    Nov 25, 2015 · This paper presents a novel scheme for visual servoing of a nonholonomic mobile robot equipped with a monocular camera in consideration of ...
  57. [57]
    Towards autonomous tracking and landing on moving target
    **Summary of UAV Tracking and Landing Using Visual Servoing:**
  58. [58]
    [PDF] Practical Challenges in Landing a UAV on a Dynamic Target - arXiv
    Sep 28, 2022 · 2) Marker Based Landing: With Optical Flow and Machine learning based methods, the drone can theoretically track and land on any target ...<|separator|>
  59. [59]
    Autonomous indoor object tracking with the Parrot AR.Drone
    Insufficient relevant content. The provided content snippet does not contain details on Parrot AR.Drone visual servoing for object tracking. It only includes a partial title and metadata without substantive information.
  60. [60]
    UAVs Beneath the Surface: Cooperative Autonomy for Subterranean ...
    Jun 16, 2022 · This paper presents a novel approach for autonomous cooperating UAVs in search and rescue operations in subterranean domains with complex topology.
  61. [61]
    How JPL's Team CoSTAR Won the DARPA SubT Challenge: Urban ...
    Nov 16, 2020 · The SubT Challenge is designed to encourage progress in four distinct robotics domains: mobility (how to get around), perception (how to make ...
  62. [62]
  63. [63]
    Visual Navigation Features Selection Algorithm Based on Instance ...
    Jan 2, 2020 · ABSTRACT Ego-motion estimation, as one of the core technologies of unmanned systems, is widely used in autonomous robot navigation, unmanned ...
  64. [64]
    [PDF] A Survey of Simultaneous Localization and Mapping with an ... - arXiv
    Feb 14, 2020 · Furthermore, VINS-Fusion supports multiple visual-inertial sensor types (GPS, mono camera + IMU, stereo cameras + IMU, even stereo cameras only) ...
  65. [65]
    None
    ### Summary of Drone Landing Accuracy Using Visual Servoing or Fiducial Markers in GPS-Denied Environments
  66. [66]
    Precise Interception Flight Targets by Image-based Visual Servoing ...
    Sep 26, 2024 · His main research interests include UAV swarm navigation, multicopter visual servo control, and cooperative guidance. Report issue for ...
  67. [67]
    Full article: Guest Editorial: Special Issue on Visual Servoing
    Jul 14, 2010 · In particular, visual-servoing has found many applications in positioning, localization, and tracking of objects under dynamic and uncertain ...
  68. [68]
    What Matters in Constructing a Visual Servoing Scheme
    By combining with global localization and mapping in large scenes, VS techniques provide local matching calibration and pose optimization, refining global.
  69. [69]
    QP-based Visual Servoing Under Motion Blur-Free Constraint
    Sep 1, 2024 · This work proposes a QP-based visual servoing scheme for limiting motion blur during the achievement of a visual task.
  70. [70]
    Robot Closed-Loop Grasping Based on Deep Visual Servoing ...
    In the robot's camera view, the low-light environment results in diminished brightness and contrast, with noticeable noise in the upper regions of the image.3. Proposed Method · 4. Training And Evaluation... · 5. Physical Grasping...
  71. [71]
    (PDF) A Tutorial on Visual Servo Control - ResearchGate
    Aug 10, 2025 · This article provides a tutorial introduction to visual servo control of robotic manipulators. Since the topic spans many disciplines our goal is limited to ...
  72. [72]
    [PDF] Data-efficient Unsupervised Recalibrating Visual Servoing via ...
    Feb 8, 2022 · In this work, we present a method for unsupervised learning of visual servoing that does not require any prior calibration and is ... extrinsics ...
  73. [73]
    A Visual Servoing approach for road lane following with obstacle ...
    This paper presents a local navigation strategy with obstacle avoidance applied to autonomous robotic automobiles in urban environments, based on the ...
  74. [74]
    Robust and Cooperative Image-Based Visual Servoing System ...
    The work presented in this paper is based on our previous works [19,20]. This paper presents a robust visual servoing based on a redundant and cooperative 2D ...Missing: seminal | Show results with:seminal
  75. [75]
    Deep RL for UAV Visual Servoing Control w/ FOV
    Visual servoing control for UAVs based on the deep reinforcement learning (DRL) method is proposed, which dynamically adjusts the servo gain in real time.Missing: DDPG | Show results with:DDPG
  76. [76]
    (PDF) Deep Reinforcement Learning for the Visual Servoing Control ...
    Jun 3, 2023 · Visual servoing is a control method that utilizes image feedback to control robot motion, and it has been widely applied in unmanned aerial ...Missing: post- | Show results with:post-
  77. [77]
    Progress in artificial intelligence-based visual servoing of ...
    This work comprehensively examines the application and advancements of AI-enhanced visual servoing in autonomous UAV systems, covering critical control tasks.
  78. [78]
    Neuromorphic vision based control for the precise positioning of ...
    We propose a novel neuromorphic vision based controller for robotic machining applications to enable faster and more reliable operation.
  79. [79]
    [2004.07398] Neuromorphic Eye-in-Hand Visual Servoing - arXiv
    Apr 15, 2020 · The event based visual servoing (EVBS) method is validated experimentally using a commercial robot manipulator in an eye-in-hand configuration.Missing: sensors | Show results with:sensors
  80. [80]
    A LiDAR SLAM and Visual-Servoing Fusion Approach to Inter-Zone ...
    To address these challenges, this study presents an integrated localization-navigation framework for mobile robots in multi-span glass greenhouses. In the ...Missing: 5G outdoor
  81. [81]
  82. [82]
    Titans of Tomorrow: Quantum Computing and Robotics on the Brink ...
    Jan 16, 2025 · In 2025, two titans of technology stand at the forefront of innovation: quantum computing and robotics. Each offers a vision of a future transformed.Missing: servoing 2023-2025 inspired
  83. [83]
    (PDF) Artificial Intelligence-Driven and Bio-Inspired Control ...
    Jul 28, 2025 · This systematic review analyzes 160 peer-reviewed industrial robotics control studies (2023–2025), including an expanded bio-inspired/human- ...
  84. [84]
    UAVs Meet Agentic AI: A Multidomain Survey of Autonomous Aerial ...
    Jun 8, 2025 · Advanced path planning algorithms allow UAVs to avoid obstacles, reconfigure missions on-the-fly, and coordinate with other agents ...
  85. [85]
    Privacy, ethics, transparency, and accountability in AI systems for ...
    This article introduces a data-driven methodological framework that embeds transparency, accountability, and regulatory alignment across all stages of AI ...
  86. [86]
    Learning Visual Servoing with Deep Features and Fitted Q-Iteration
    Feb 6, 2017 · Our approach is based on servoing the camera in the space of learned visual features, rather than image pixels or manually-designed keypoints.
  87. [87]
    ViSP
    Open source visual servoing platform library · Easy integration. ViSP provides simple ways to integrate and validate new algorithms with already existing tools.ViSP 3.6.1 main page · Visual servoing · Visual features · Overview
  88. [88]
    lagadic/visp: Open Source Visual Servoing Platform - GitHub
    ViSP is a cross-platform library (Linux, Windows, MacOS, iOS, Android) that allows prototyping and developing applications using visual tracking and visual ...
  89. [89]
    Visual Servoing Platform: ViSP 3.6.1 main page - Inria
    Written in C++, ViSP is based on open-source cross-platform libraries (such as OpenCV) and builds with CMake. Several platforms are supported, including OSX, ...
  90. [90]
    Visual Servoing using a Webcam, Arduino and OpenCV
    Jun 4, 2015 · Install OpenCV as instructed here. Connect the webcam; Connect arduino and load it with the code for the microcontroller; Build the tracking ...Missing: integration | Show results with:integration
  91. [91]
    Tutorial: Bridge over OpenCV - Visual Servoing Platform - Inria
    In this tutorial we explain how to convert data such as camera parameters or images from ViSP to OpenCV or vice versa.Missing: integration | Show results with:integration
  92. [92]
    visp_auto_tracker - ROS Wiki
    Aug 16, 2016 · Overview. This package wraps an automated pattern barcode based tracker using ViSP library. The tracker estimates the pattern position and ...
  93. [93]
    How to use ViSP example for visual servoing?
    Feb 2, 2018 · I want to do visual servoing with real UR5 arm, I can control UR5 with ur_modern_driver properly and I can use visp_auto_tracker to detect the QR-code and ...
  94. [94]
    lagadic/vision_visp: ViSP stack for ROS - GitHub
    ROS 2 vision_visp contains packages to interface ROS 2 with ViSP which is a library designed for visual-servoing and visual tracking applications.
  95. [95]
    UZ-SLAMLab/ORB_SLAM3: ORB-SLAM3: An Accurate ... - GitHub
    ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras.Missing: servoing | Show results with:servoing
  96. [96]
    (PDF) Bridging the Gap Between Visual Servoing and Visual SLAM
    Mar 10, 2021 · The SLAM module provides feedback signals for the servo controller; meanwhile, velocities designed by the servo controller are utilized for the ...
  97. [97]
    [PDF] ViSP for visual servoing: a generic software platform with a ...
    A fully functional modular architecture that allows fast development of visual servoing applications, ViSP (Visual Servoing Platform), which takes the form ...