Fact-checked by Grok 2 weeks ago

Metropolis-adjusted Langevin algorithm

The Metropolis-adjusted Langevin algorithm (MALA) is a (MCMC) sampling method designed to generate samples from a target probability density π on a continuous state space, typically in high dimensions, by incorporating gradient information from the log-density to propose informed moves while ensuring exact invariance through a Metropolis-Hastings correction step. It builds on the of the , dX_t = \frac{1}{2} \nabla \log \pi(X_t) \, dt + dW_t, where W_t is a , whose is π, and discretizes this dynamics to form proposals that are then accepted or rejected to correct for discretization bias. Originally proposed by Julian Besag in 1994 as a way to improve the efficiency of MCMC for continuous multivariate distributions by using vector proposals guided by the gradient, MALA was rigorously analyzed for its and rates by Roberts and Tweedie in 1996, establishing conditions under which it achieves geometric ergodicity. In , starting from a current state x, a proposal y is generated as y = x + \frac{\epsilon}{2} \nabla \log \pi(x) + \sqrt{\epsilon} Z, where ε > 0 is a step size parameter and Z \sim N(0, I_d) in d dimensions; this Euler-Maruyama discretization of the biases proposals towards regions of higher π density. The proposal is accepted with probability \alpha(x, y) = \min\left\{1, \frac{\pi(y) q(x|y)}{\pi(x) q(y|x)}\right\}, where q(\cdot|\cdot) is the Gaussian proposal kernel, ensuring and thus that the is reversible with π regardless of ε. If rejected, the chain remains at x. This adjustment step is crucial because the unadjusted Langevin algorithm (ULA) alone converges only approximately to π, with bias that diminishes as ε → 0 but requires prohibitively small steps in practice. MALA exhibits superior mixing properties compared to random-walk Metropolis algorithms in smooth, high-dimensional targets, as the -informed proposals reduce the effective dimension dependence and can achieve optimal scaling with acceptance rates around 0.574 asymptotically, even in high dimensions. It is geometrically under conditions where the negative log-density -log π is strongly log-concave or has subquadratic growth in its , but fails for heavy-tailed or super-quadratic potentials. Widely implemented in libraries like Probability, MALA remains a foundational tool in , statistical physics, and for sampling from complex posteriors, often serving as a for more advanced methods like .

Introduction

Overview

The Metropolis-adjusted Langevin algorithm (MALA) is a Markov chain Monte Carlo (MCMC) method that operates within the Metropolis-Hastings framework, utilizing proposals derived from the Euler-Maruyama discretization of the Langevin diffusion to sample from a target distribution \pi(\mathbf{x}). This integration of Langevin dynamics provides a structured proposal mechanism informed by the target's gradient, while the Metropolis-Hastings adjustment ensures the resulting Markov chain is invariant to \pi. MALA plays a central role in statistical by generating samples from intricate distributions, which is essential for approximating integrals or expectations under \pi, particularly in high-dimensional Bayesian posterior estimation where direct sampling is infeasible. Its design makes it suitable for applications requiring robust exploration of multimodal or non-Gaussian densities. The algorithm was first introduced by Julian Besag in 1994, in a discussion contribution on representations of in complex systems, with initial motivations tied to spatial statistics problems. A primary strength of MALA lies in its use of gradient information from the log-density of \pi, which allows for more directed and efficient traversal of the parameter space compared to isotropic proposals in standard Metropolis-Hastings algorithms.

Motivation

Sampling from complex target distributions π(x), particularly those that are multimodal or high-dimensional, poses significant challenges in fields such as and statistical physics, where direct sampling methods like become inefficient or intractable due to the curse of dimensionality and the presence of multiple modes separated by low-density regions. These difficulties often lead to poor exploration of the state space, with chains getting trapped in local modes, resulting in biased estimates and slow convergence. The basic Metropolis algorithm, a foundational (MCMC) method, exacerbates these issues by proposing symmetric, directionless steps from a Gaussian distribution centered at the current state, leading to slow mixing times especially in high s. Optimal scaling analysis reveals that its mixing time scales as O(d), where d is the , with an optimal acceptance rate of approximately 0.234, highlighting the inefficiency of undirected in complex landscapes. The Metropolis-adjusted Langevin algorithm (MALA) addresses these limitations by building on the Metropolis-Hastings framework and incorporating the score function ∇ log π(x) to generate proposals that align with the gradient of the log-density, effectively guiding the chain toward regions of higher probability and reducing unnecessary random wandering. This informed proposal mechanism enables more efficient traversal of multimodal structures, improving overall sampling performance. Empirical studies confirm MALA's advantages, showing substantially reduced integrated times relative to random walk Metropolis in moderate dimensions (e.g., 10–100), where the algorithm's optimal yields an acceptance rate of about 0.574 and a mixing time of O(d^{1/3}).

Background

The overdamped Langevin dynamics provide a continuous-time stochastic process that serves as the foundation for proposal mechanisms in algorithms like the Metropolis-adjusted Langevin algorithm (MALA). This process is modeled by the stochastic differential equation (SDE) dX_t = \nabla \log \pi(X_t) \, dt + \sqrt{2} \, dW_t, where X_t \in \mathbb{R}^d is the state at time t, \pi(x) is the target probability density function, \nabla \log \pi(X_t) denotes the score function (gradient of the log-density), and W_t is a standard d-dimensional Brownian motion. The drift term \nabla \log \pi(X_t) \, dt acts as a deterministic force that pushes the process toward regions of higher under \pi, effectively guiding the uphill in the log-probability to concentrate sampling in probable areas. Meanwhile, the term \sqrt{2} \, dW_t introduces isotropic random perturbations, enabling exploration of the state space and preventing the process from getting trapped in local modes. This balance between directed movement and stochastic noise makes the suitable for approximating the target distribution over long times. The scaling factor of \sqrt{2} in the ensures the variance aligns with the of the target (assuming unit temperature), a convention rooted in analogies. Under mild regularity conditions, such as \log \pi being twice continuously differentiable and \pi(x) > 0 for all x \in \mathbb{R}^d, the overdamped Langevin process is ergodic and admits \pi as its unique () distribution. This invariance arises from the reversibility of the dynamics with respect to \pi, verifiable through the Fokker-Planck equation associated with the , which has \pi as its steady-state solution. Additional growth conditions on \nabla \log \pi, such as |\nabla \log \pi(x)| \leq C(1 + |x|) for some constant C > 0, ensure non-explosion of the process and global existence of solutions. To generate discrete-time samples approximating draws from \pi, the continuous dynamics are discretized using the Euler-Maruyama scheme with time step \tau > 0, yielding an update X_{n+1} = X_n + \tau \nabla \log \pi(X_n) + \sqrt{2\tau} Z_n, where Z_n \sim \mathcal{N}(0, I_d). This approximation introduces bias for finite \tau, motivating the use of to ensure exact invariance to \pi in the discrete chain.

Metropolis-Hastings algorithm

The Metropolis-Hastings (MH) algorithm is a (MCMC) method that generates a sequence of random samples from a target π, even when direct sampling is infeasible. Introduced as a of the earlier Metropolis method, it allows for flexible, potentially asymmetric proposal distributions while ensuring the generated has π as its invariant distribution. In the MH framework, starting from a current state x, a candidate state y is proposed according to a conditional distribution q(y \mid x), which may depend on x and can be asymmetric. The proposal is then accepted with probability \alpha(x, y) = \min\left(1, \frac{\pi(y) q(x \mid y)}{\pi(x) q(y \mid x)}\right), and the chain moves to y if accepted; otherwise, it remains at x. This acceptance rule ensures the Markov chain is reversible, meaning the transition probabilities satisfy the condition \pi(x) q(y \mid x) \alpha(x, y) = \pi(y) q(x \mid y) \alpha(y, x) for all states x and y, which in turn guarantees that π is invariant under the chain's transitions. The purpose of this construction is to produce a that converges to the π, allowing samples to approximate expectations under π regardless of the proposal's , as long as q is properly defined and irreducible transitions are possible. A key advantage is its from the of the proposal and densities: the algorithm relies only on the ratios \pi(y)/\pi(x) and q(x \mid y)/q(y \mid x), so unnormalized forms of π suffice, making it applicable to complex s like posterior densities in . In the context of the Metropolis-adjusted Langevin algorithm, the MH framework provides the acceptance mechanism to correct proposals derived from , ensuring for the resulting chain.

Algorithm description

Proposal mechanism

The proposal mechanism in the Metropolis-adjusted Langevin algorithm (MALA) generates candidate states through a discrete-time approximation of the continuous-time , which serves as the underlying inspiration for informed sampling steps. Specifically, starting from the current state x_k, the proposed state \tilde{x}_{k+1} is obtained via the Euler-Maruyama discretization scheme, yielding \tilde{x}_{k+1} = x_k + \tau \nabla \log \pi(x_k) + \sqrt{2\tau} Z, where Z \sim \mathcal{N}(0, I_d) is a standard Gaussian random vector in d dimensions, \pi denotes the target density, and \tau > 0 is the step size parameter. This construction results in a Gaussian proposal distribution q(\tilde{x} \mid x) = \mathcal{N}(\tilde{x}; x + \tau \nabla \log \pi(x), 2\tau I_d), which incorporates information from the target log- to bias proposals toward regions of higher probability , thereby improving efficiency compared to symmetric random-walk proposals. The mean of the proposal shifts according to the local direction, while the $2\tau I_d introduces isotropic noise scaled by the step size, mimicking the diffusive behavior of the continuous process. The is inherently asymmetric, as the reverse density q(x \mid \tilde{x}) centers at \tilde{x} + \tau \nabla \log \pi(\tilde{x}) with the same covariance, depending on the evaluated at the proposed state rather than the current one; this asymmetry necessitates a corrective probability in the Metropolis-Hastings framework to maintain with respect to \pi. The step size \tau plays a crucial role in balancing the approximation quality and sampling efficiency: small values of \tau reduce discretization bias by closely tracking the continuous Langevin dynamics but lead to proposals with low variance and thus slower mixing, whereas larger \tau increases variance for broader exploration at the cost of greater bias from the Euler approximation.

Acceptance step

In the Metropolis-adjusted Langevin algorithm (MALA), the acceptance step follows the standard Metropolis-Hastings (MH) framework to ensure that the Markov chain has the target distribution \pi as its invariant measure. Given a current state x and a proposed state \tilde{x} generated from the Langevin-based proposal distribution q(\tilde{x} | x), the proposal is accepted with probability \alpha(x, \tilde{x}) = \min\left\{1, \frac{\pi(\tilde{x}) q(x | \tilde{x})}{\pi(x) q(\tilde{x} | x)}\right\}\}. If accepted, the chain moves to \(\tilde{x}; otherwise, it remains at x. This acceptance ratio enforces detailed balance, preserving \pi exactly despite the proposal's approximation to the underlying Langevin diffusion. Substituting the Gaussian form of the distribution specific to yields a more explicit expression for the acceptance probability. Assuming the of the Langevin with step \tau and variance $2\tau I (to match the ), the densities are multivariate : q(\tilde{x} | x) = \mathcal{N}(\tilde{x}; x + \tau \nabla \log \pi(x), 2\tau I) and q(x | \tilde{x}) = \mathcal{N}(x; \tilde{x} + \tau \nabla \log \pi(\tilde{x}), 2\tau I). The resulting acceptance probability simplifies to \alpha(x, \tilde{x}) = \min\left\{1, \exp\left( \log \frac{\pi(\tilde{x})}{\pi(x)} + \frac{1}{4\tau} \left[ \|\tilde{x} - x - \tau \nabla \log \pi(x)\|^2 - \|x - \tilde{x} - \tau \nabla \log \pi(\tilde{x})\|^2 \right] \right) \right\}, where the exponent arises from the ratio of the Gaussian densities combined with \pi(\tilde{x})/\pi(x). This form highlights the correction terms that account for the asymmetry in the proposals due to the . The MH adjustment in this step is crucial for correcting the discretization error inherent in the Euler-Maruyama approximation of the continuous Langevin dynamics, ensuring that the discrete chain remains invariant to \pi regardless of the step size \tau > 0. Without this acceptance mechanism, the unadjusted Langevin algorithm would introduce bias and fail to converge to the exact target distribution. Computationally, evaluating \alpha(x, \tilde{x}) requires assessing \pi and \nabla \log \pi at both x and \tilde{x}, doubling the cost of the proposal generation per iteration compared to simpler MCMC methods like random walk Metropolis.

Theoretical properties

Optimal scaling

The optimal scaling of the Metropolis-adjusted Langevin algorithm (MALA) addresses the choice of the step size parameter \tau to maximize sampling efficiency in high-dimensional settings. In the asymptotic regime as the dimension d \to \infty, Roberts and Rosenthal (1998) established that the optimal \tau scales as \tau \propto \frac{l^{2}}{d}, where l > 0 is a target-dependent constant that tunes the proposal variance to achieve an asymptotic acceptance rate of approximately 0.574. This scaling ensures that the discrete-time proposals approximate the underlying continuous Langevin diffusion while maintaining detailed balance through the Metropolis adjustment. The efficiency of under optimal scaling is quantified by the speed measure, which minimizes the asymptotic variance of estimators derived from the . This involves balancing the trade-off between the average acceptance probability (which decreases with larger \tau) and the proposal step size (which increases the displacement per accepted move). By maximizing this speed, achieves an optimal asymptotic variance that scales as d^{1/3}, reflecting improved performance over random-walk Metropolis methods. In the high-dimensional limit, the optimally scaled MALA chain converges weakly to a Langevin with a tuned diffusion constant determined by l and the acceptance rate. This diffusion limit provides a continuous-time of the discrete chain's behavior, facilitating theoretical analysis of its and mixing properties. In practical implementations, the theoretical acceptance rate of 0.574 serves as a guideline, with empirical tuning often targeting rates between 50% and 60% to balance efficiency across finite dimensions and non-i.i.d. targets.

Convergence analysis

The Metropolis-adjusted Langevin algorithm (MALA) generates a that is ergodic under standard conditions for Metropolis-Hastings algorithms, ensuring convergence in distance to the target distribution \pi. Specifically, the chain is Harris recurrent and positive Harris, hence ergodic, provided the proposal distribution is absolutely continuous with respect to (due to the Gaussian noise in the Langevin proposal) and the acceptance probability is positive with probability one, which guarantees irreducibility and aperiodicity. Geometric ergodicity of MALA, which implies exponential convergence rates in appropriate norms, holds under tail conditions on \pi that ensure the drift term dominates appropriately. For distributions \pi in the class \mathcal{B}(P, \gamma) with $1 < P < 2 (corresponding to sub-exponential tails), MALA is V_s-uniformly geometrically ergodic for a suitable Lyapunov function V_s(x) = e^{s \|x\|} with s < 2h \gamma, where h is the discretization step size. In contrast, for strongly log-concave \pi (where the negative Hessian of \log \pi is positive definite with condition number \kappa), MALA exhibits geometric ergodicity with mixing time \tilde{O}(\kappa \sqrt{d}) in d dimensions from a warm start, enabling efficient sampling via exponential decay of the total variation distance to stationarity. However, for lighter-tailed distributions such as Gaussians (P=2) or super-Gaussian tails (P > 2), geometric ergodicity fails, and convergence occurs at subgeometric rates, such as polynomial in some cases, depending on the step size and tail behavior. Under geometric ergodicity, MALA satisfies a central limit theorem for ergodic averages, establishing asymptotic normality of Monte Carlo estimators \bar{g}_n = n^{-1} \sum_{i=1}^n g(X_i) for bounded functions g with \mathbb{E}_\pi[g(X)] = \mu \neq 0. Specifically, \sqrt{n} (\bar{g}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2) as n \to \infty, where the asymptotic variance \sigma^2 is finite and given by \sigma^2 = \mathrm{Var}_\pi(g(X)) + 2 \sum_{k=1}^\infty \mathrm{Cov}_\pi(g(X_0), g(X_k)), depending on the integrated autocorrelation time of the chain. This result holds uniformly over starting points under the V-uniform ergodicity conditions, providing error bounds of order O(1/\sqrt{n}) for posterior means or expectations in Bayesian settings. The proposals in arise from the Euler-Maruyama of the overdamped Langevin dX_t = \nabla \log \pi(X_t) \, dt + \sqrt{2} dW_t, which converges weakly to the continuous-time as the step \tau \to 0. While the unadjusted introduces for finite \tau, the Metropolis-Hastings correction ensures the invariant distribution remains exactly \pi regardless of \tau > 0, preserving guarantees without requiring infinitesimal steps.

Implementation

Pseudocode

The Metropolis-adjusted Langevin algorithm () generates a targeting a \pi by iteratively proposing states via an Euler-Maruyama of the underlying Langevin and correcting for using a Metropolis-Hastings step. The process begins by initializing the current state x_0 from \pi or an appropriate , then proceeds for N iterations: at each step k, compute the proposal \tilde{x}_k using the current state x_{k-1} and the of the log-target , evaluate the probability \alpha_k, and set x_k = \tilde{x}_k with probability \alpha_k or retain x_{k-1} otherwise, appending each x_k to the chain. The following pseudocode implements vanilla MALA in d-dimensions, assuming access to the unnormalized log-density \log \pi and its gradient \nabla \log \pi, with step size \tau > 0 and Gaussian noise drawn from \mathcal{N}(0, I_d). For numerical stability, evaluations of \pi are performed in log-space, and the acceptance ratio is computed using the explicit quadratic form derived from the asymmetric proposal densities to avoid direct density evaluations where possible.
Algorithm MALA(ℓ = log π, ∇ℓ = ∇ log π, τ > 0, N ∈ ℕ, x₀ ∈ ℝᵈ):
    x ← x₀
    chain ← [x]
    for k = 1 to N do:
        μ ← x + τ ⋅ ∇ℓ(x)                          // Proposal mean (drift-adjusted)
        ̃x ← μ + √(2τ) ⋅ Z, where Z ∼ 𝒩(0, I_d)    // Proposal via Euler step
        ν ← ̃x + τ ⋅ ∇ℓ(̃x)                         // Reverse proposal mean
        ℓ_x ← ℓ(x)
        ℓ_̃x ← ℓ(̃x)
        δ₁ ← ‖̃x - μ‖²                              // ‖̃x - x - τ ∇ℓ(x)‖²
        δ₂ ← ‖x - ν‖²                              // ‖x - ̃x - τ ∇ℓ(̃x)‖²
        log α ← ℓ_̃x - ℓ_x + (δ₂ - δ₁) / (4τ)      // Log acceptance ratio
        α ← min{1, exp(log α)}
        if U ∼ Uniform(0,1) < α then:
            x ← ̃x
        chain.append(x)
    return chain
This formulation ensures the chain is reversible with respect to \pi and geometrically ergodic under mild conditions on \pi. In practice, \tau is tuned to achieve an acceptance rate near 0.574 for optimal asymptotic efficiency in high dimensions, as detailed in parameter tuning discussions.

Parameter tuning

Tuning the parameters of the (MALA) is crucial for achieving efficient sampling, with the step size \tau (or equivalently, the scaling parameter l = \sqrt{\tau}) being the primary focus to balance proposal movement and acceptance. Adaptive methods begin with a small initial \tau and iteratively adjust it based on the running acceptance rate, aiming to target the theoretically optimal rate of 0.574 derived from high-dimensional scaling analysis. One common approach employs stochastic approximation, such as the , to update \tau proportionally to the difference between the current acceptance rate and the target, ensuring convergence to optimal performance while maintaining ergodicity under mild conditions. To assess tuning effectiveness, practitioners monitor key diagnostics including the acceptance rate, effective sample size (ESS), and trace plots. The acceptance rate provides an immediate indicator of proposal quality, with values near 0.574 signaling good mixing in high dimensions, while deviations suggest over- or under-scaling of \tau. ESS quantifies the number of independent samples effectively obtained, accounting for autocorrelation, and should be maximized relative to chain length for reliable inference; low ESS relative to total iterations indicates poor tuning. Trace plots visualize chain evolution, revealing mixing behavior through rapid exploration of the state space without trends or stuck regions, helping confirm that \tau enables adequate traversal of the target distribution. After tuning, post-processing involves burn-in and thinning to mitigate initial bias and autocorrelation. Burn-in discards the first 10-20% of samples to allow the chain to reach stationarity, with the exact fraction determined by diagnostics like trace plots showing stabilization. Thinning retains every m-th sample (e.g., m = 10) to reduce dependence between retained points, improving ESS, though excessive thinning wastes computation and is avoided unless storage constraints apply. In high-dimensional settings (d \gg 1), smaller \tau values are necessary, as the optimal scaling l \propto d^{-1/6} implies \tau \propto d^{-1/3}, leading to slower diffusion but better acceptance. Preliminary short runs can estimate this scaling by fitting acceptance rates to the theoretical curve, guiding the choice of \tau before full sampling.

Extensions and variations

Preconditioned MALA

The vanilla Metropolis-adjusted Langevin algorithm (MALA) assumes isotropic noise in its proposals, which leads to inefficient exploration when sampling from target distributions π with ill-conditioned densities, such as those exhibiting strong correlations between variables or elongated geometries. This inefficiency arises because the fixed identity covariance fails to adapt to the local curvature of log π, resulting in proposals that either undersample the target or require excessively small step sizes for acceptable acceptance rates. The preconditioned MALA addresses this limitation by incorporating a positive definite preconditioning matrix G into the Langevin dynamics, allowing the proposals to align better with the target's geometry. The proposal distribution is generated as \tilde{x} = x + \tau G^{-1} \nabla \log \pi(x) + \sqrt{2\tau} \, G^{-1/2} Z, where τ > 0 is the step size, Z ∼ 𝒩(0, I_d) is standard multivariate noise in d dimensions, and G serves as the (e.g., an to the matrix or the negative of log π). This yields a multivariate proposal q(tilde{x} | x) = 𝒩(μ_x, Σ), with μ_x = x + τ G^{-1} ∇ log π(x) and Σ = 2τ G^{-1}. In the acceptance step, the Metropolis-Hastings ratio is updated to account for the preconditioned proposal's asymmetry: \alpha(x, \tilde{x}) = \min\left(1, \frac{\pi(\tilde{x}) \, q(x | \tilde{x})}{\pi(x) \, q(\tilde{x} | x)}\right), where the proposal density ratio q(x | \tilde{x}) / q(\tilde{x} | x) simplifies to an explicit form involving the difference between ∇ log π(x) and ∇ log π(\tilde{x}), adjusted by the precision matrix G / (2τ) in the quadratic exponent. This adjustment ensures while leveraging the to enhance proposal quality without altering the target . Preconditioners like the matrix, estimated as the expected outer product of ∇ log π under π, or approximations provide effective adaptations to the local scale and orientation of π, particularly in Bayesian settings with correlated posteriors. The approach yields significant benefits in non-elliptical or high-dimensional targets, reducing the of the effective geometry and improving asymptotic efficiency toward optimal scaling (e.g., O(d^{1/3}) mixing time) when log π is quadratic and G matches the inverse target . For instance, in ill-conditioned Gaussian targets, preconditioning can achieve near-perfect acceptance rates and faster convergence compared to vanilla . This fixed or approximately constant G distinguishes preconditioned from Riemannian extensions like manifold , which employ position-dependent metrics.

Manifold MALA

The manifold Metropolis-adjusted Langevin algorithm (MMALA), also known as the Riemannian manifold Langevin algorithm, extends the standard to sampling on a equipped with a position-dependent G(\theta), enabling more efficient exploration of target densities with complex geometries or high correlations. Introduced by Girolami and Calderhead in 2011, this variant incorporates the manifold's geometry into the proposal mechanism, where the G(\theta) is typically defined as the expected matrix augmented by the negative of the log-prior, reflecting the local of the parameter space. The proposal distribution is generated via an Euler-Maruyama discretization of the corresponding on the manifold: \theta^* = \theta + \epsilon G(\theta)^{-1} \nabla \log p(\theta) + \sqrt{2\epsilon} G(\theta)^{-1/2} z, where \theta is the current state, \epsilon > 0 is the step size, p(\theta) is the target density, and z \sim \mathcal{N}(0, I_d) is a standard multivariate normal variate in d dimensions. Here, the drift term \epsilon G(\theta)^{-1} \nabla \log p(\theta) follows the natural gradient adjusted by the inverse metric, promoting movement along geodesics that account for varying scales, while the diffusion term \sqrt{2\epsilon} G(\theta)^{-1/2} z introduces isotropic noise in the tangent space scaled by the metric's Cholesky factor. The acceptance step employs the standard Metropolis-Hastings ratio, adapted to the manifold setting: \alpha = \min\left(1, \frac{p(\theta^*) q(\theta \mid \theta^*)}{p(\theta) q(\theta^* \mid \theta)}\right), where the proposal density q(\theta^* \mid \theta) is Gaussian with mean and informed by G(\theta), leading to an explicit dependence on the |\det G(\theta)|^{1/2} in the ratio due to the of the multivariate normal. This factor arises from the volume element on the manifold, ensuring the chain targets the correct while correcting for the non-symmetric proposal induced by the position-dependent . Unlike simpler preconditioned variants that use a static , MMALA dynamically updates G(\theta) at each step, providing a more faithful representation of the underlying geometry. MMALA is particularly suited to Bayesian inference problems involving constrained parameter spaces or densities with heterogeneous curvature, such as those requiring log-transforms for positive variables in models or log-Gaussian Cox processes. For instance, in with limited data, the adaptive metric improves mixing by aligning proposals with the local , reducing the effective dimensionality and in samples compared to Euclidean MALA. However, the computational overhead is notable, scaling as O(d^3) per iteration due to metric tensor inversion and gradient evaluations, though sparsity in G(\theta) can mitigate this in structured models like time-series data. Despite the added cost, MMALA demonstrates superior efficiency in low-data regimes where curvature variations dominate, often achieving faster to the target distribution.

Applications

Bayesian inference

The Metropolis-adjusted Langevin algorithm (MALA) is widely employed in to sample from the posterior π(θ) ∝ L(θ | data) p(θ), where θ denotes the parameters of interest, L(θ | data) is the , and p(θ) is the . The algorithm's leverages the of the log-posterior, ∇ log π(θ) = ∇ log L(θ | data) + ∇ log p(θ), which serves as a data-informed score function guiding proposals toward higher-density regions of the parameter space. This setup ensures that MALA targets the full posterior while correcting for in the underlying via the Metropolis-Hastings acceptance step. In high-dimensional parameter estimation, excels in models like Bayesian with many covariates, where the posterior concentrates in narrow ridges due to the interplay of features and regularization from the . For instance, sampling the coefficient vector in benefits from the gradient's incorporation of the 's linear predictor and nonlinearity, enabling better traversal of or correlated posteriors. A primary advantage of in Bayesian settings is its integration with in frameworks like Probability, which computes the required ∇ log π(θ) efficiently even for intricate likelihoods, streamlining proposal generation without manual gradient derivation. This feature is especially valuable for user-defined models, as it supports and scaling to datasets where would be prohibitive. As a case study, consider Bayesian , where samples hyperparameters like lengthscales and noise variance from the marginal posterior after integrating out latent functions. Preconditioned variants, which adjust the proposal using local estimates, achieve higher effective sample size () per unit time than random walk by exploiting information from the covariance kernel's influence on the posterior geometry; for example, in tasks with priors, such methods yield values up to several times larger for equivalent iterations, reducing overall computation time for posterior summaries.

Statistical physics

In statistical physics, the Metropolis-adjusted Langevin algorithm (MALA) is utilized to generate samples from the , defined as π(x) ∝ exp(-U(x)/T), where U(x) represents the of the system and T denotes the . This encapsulates the probabilities of microscopic configurations in physical systems, enabling the computation of macroscopic thermodynamic properties such as free energies and correlation functions. The algorithm's proposal step incorporates the of the log-density, ∇logπ(x) = -∇U(x)/T, which aligns with the negative per in the system. MALA's efficacy in this domain stems from its ability to produce proposals that follow the local of the , improving compared to random-walk methods in high-dimensional spaces typical of physical systems. The gradient information exploits the structure of physical models, where forces derive from differentiable potentials, facilitating faster mixing in regions of high barriers. This makes MALA particularly valuable for overdamped dynamics approximations in canonical ensembles. Representative applications include simulations of chain configurations in three dimensions, where samples entangled or self-avoiding walks under energy potentials, yielding insights into conformational statistics like the without exhaustive enumeration. These uses highlight MALA's role in bridging microscopic interactions to ensemble averages in complex physical systems. The development of MALA traces back to Besag's 1994 proposal, addressing inefficiencies in unadjusted Langevin updates for invariant distributions. This foundation has since informed its adoption in physics for ensuring in energy-based sampling.

Comparisons with other methods

Random walk Metropolis

The Metropolis (RWM) algorithm serves as a baseline for understanding the advantages of the Metropolis-adjusted Langevin algorithm (MALA), as both are Metropolis-Hastings methods but differ in how s are generated. In RWM, s are drawn by perturbing the current state x with isotropic : y = x + \sigma Z, where Z \sim \mathcal{N}(0, I_d) and \sigma > 0 is a step-size tuned for efficiency. The q(y \mid x) is symmetric, so the Metropolis acceptance probability simplifies to \alpha(x, y) = \min\left\{1, \frac{\pi(y)}{\pi(x)}\right\}, where \pi denotes the unnormalized target density. A key theoretical distinction arises in high-dimensional settings, where reveals optimal performance characteristics for each method. For RWM, the optimal average acceptance rate is 0.234, achieved when the step size scales as \sigma \sim \ell / \sqrt{d} with \ell constant, leading to a speed (measured by expected squared jump distance) of O(1/d). In comparison, incorporates information from \pi, yielding an optimal acceptance rate of 0.574 and a step size scaling as \sigma \sim \ell d^{-1/6}, which results in a faster speed of O(d^{-1/3}). This improved scaling makes substantially more efficient than RWM in dimensions d \gtrsim 10, with the relative speedup growing as d^{2/3}. Despite MALA's advantages, RWM remains preferable in low-dimensional problems (d \lesssim 5) or when the target \pi is a without accessible gradients, as RWM requires only density evaluations. Further enhancements, such as , build on MALA's gradient-based proposals for even greater efficiency in complex targets.

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a (MCMC) method that enhances sampling efficiency by incorporating auxiliary variables and simulating dynamics to generate distant proposals in the state space. In HMC, an auxiliary \mathbf{p} is introduced, sampled from a standard Gaussian distribution independent of the target density \pi(\mathbf{q}), where \mathbf{q} denotes the position variables of interest. The joint density over (\mathbf{q}, \mathbf{p}) is then proportional to \pi(\mathbf{q}) \exp\left( -\frac{1}{2} \mathbf{p}^\top \mathbf{p} \right), defining a H(\mathbf{q}, \mathbf{p}) = -\log \pi(\mathbf{q}) + \frac{1}{2} \mathbf{p}^\top \mathbf{p}. Proposals are generated by numerically integrating for a fixed number of steps using the integrator, which preserves the structure approximately and enables exploration along trajectories that respect the of the target distribution. In contrast to the Metropolis-adjusted Langevin algorithm (MALA), which relies on first-order for local, -informed proposals, HMC employs second-order dynamics to produce more global moves by simulating longer trajectories in the augmented . MALA's proposals are essentially Euler-Maruyama discretizations of overdamped Langevin diffusion, limiting steps to short, local explorations, whereas HMC's steps allow for extended paths that can traverse multiple modes or correlated regions without rejection due to high . This difference often results in HMC achieving higher acceptance rates for effective sample size () in practice, though at the cost of multiple evaluations per —typically L times more than MALA's single , where L is the number of steps. Both methods outperform Metropolis (RWM) by leveraging information, but HMC's trajectory-based proposals reduce behavior more substantially than MALA's incremental steps. Performance comparisons indicate that HMC generally excels over in very high-dimensional settings or for smooth target densities, where its ability to follow low-curvature paths yields superior in terms of evaluations needed for mixing. For instance, in log-concave distributions, HMC requires fewer total computations to achieve the same mixing accuracy as , with empirical advantages becoming pronounced beyond moderate dimensions (e.g., d > 100). However, offers simpler and faster per-step computation for moderate dimensions (d \approx 10-100), making it preferable when setup time or resource constraints limit trajectory lengths in HMC. In scenarios with non-smooth or ill-conditioned targets, 's local nature can sometimes provide more robust initial exploration before tuning HMC parameters like step size and trajectory length. MALA can be viewed as a limiting case of HMC when the trajectory length is reduced to a single leapfrog step, effectively collapsing the momentum augmentation into a preconditioned first-order update akin to . This perspective highlights a spectrum of methods where short-trajectory HMC recovers MALA's behavior, allowing hybrid approaches to interpolate between local efficiency and global exploration as needed.

References

  1. [1]
    [PDF] Markov Chain Monte Carlo for Statistical Inference - UW CSSS
    4.2 Langevin–Hastings algorithm. The Langevin–Hastings algorithm, introduced in Besag (1994), provides a rigorous method of simulating from a continuous ...
  2. [2]
    [PDF] Exponential Convergence of Langevin Distributions and Their ...
    Jan 22, 2008 · We then consider a 'Metropolis-adjusted' version of the algorithm, and find conditions under which this also converges at an exponential rate: ...
  3. [3]
    tfp.mcmc.MetropolisAdjustedLangevinAlgorithm - TensorFlow
    Nov 21, 2023 · Metropolis-adjusted Langevin algorithm (MALA) is a Markov chain Monte Carlo (MCMC) algorithm that takes a step of a discretised Langevin diffusion as a ...
  4. [4]
  5. [5]
    [PDF] Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm ...
    Metropolis-adjusted Langevin algorithm (MALA) was original proposed by Besag (1994) and its properties were examined in detail by Roberts and Tweedie (1996a) ...
  6. [6]
    Representations of Knowledge in Complex Systems
    The aim is to build algorithms for automatic hypothesis formation concerning the number, location, orientation and shape of mitochondria and membranes. For this ...Missing: URL | Show results with:URL
  7. [7]
    A framework for adaptive MCMC targeting multimodal distributions
    Aug 6, 2025 · Standard MCMC algorithms struggle to sample multi-modal distributions because the area of low probability density between modes acts as a ...
  8. [8]
    Weak convergence and optimal scaling of random walk Metropolis ...
    This paper considers the problem of scaling the proposal distribution of a multidimensional random walk Metropolis algorithm in order to maximize the ...
  9. [9]
    Variance Reduction using Nonreversible Langevin Samplers - arXiv
    Jun 16, 2015 · Title:Variance Reduction using Nonreversible Langevin Samplers. Authors:A. B. Duncan, T. Lelievre, G. A. Pavliotis. View a PDF of the paper ...
  10. [10]
    [PDF] Non-asymptotic mixing of the MALA algorithm
    Aug 20, 2010 · The goal of this article is to quantify the non-asymptotic behavior of MALA. The main result of this paper states that the convergence of MALA ...
  11. [11]
    [PDF] MATH 598: TOPICS IN STATISTICS
    There is some theory to suggest that the optimal acceptance rate for the MALA algorithm is around 0.57. We examine the variation in the ESS and acceptance rate ...
  12. [12]
    [PDF] Log-concave sampling: Metropolis-Hastings algorithms are fast
    With this notation, we have the following mixing time bound for the. MALA algorithm for a strongly-log concave measure from a warm start. ... Geometric ergodicity ...
  13. [13]
  14. [14]
    Metropolis-Adjusted Langevin Algorithm (MALA) MCMC
    This method involves simulating Langevin diffusion such that the solution to the time evolution equation (the Fokker-Planck PDE) is a stationary distribution ...
  15. [15]
    An Adaptive Version for the Metropolis Adjusted Langevin Algorithm ...
    Aug 8, 2006 · This paper extends some adaptive schemes that have been developed for the Random Walk Metropolis algorithm to more general versions of the Metropolis-Hastings ...
  16. [16]
    [PDF] On thinning of chains in MCMC - UNL Digital Commons
    Markov chain Monte Carlo (MCMC) is a technique (or more correctly, a family of techniques) for sampling probability dis- tributions. Typical applications are in ...Missing: initial | Show results with:initial<|control11|><|separator|>
  17. [17]
    [PDF] Quantifying the Effectiveness of Linear Preconditioning in Markov ...
    Abstract. We study linear preconditioning in Markov chain Monte Carlo. We consider the class of well-conditioned distributions, for which several mixing ...
  18. [18]
    Optimal Preconditioning and Fisher Adaptive Langevin Sampling
    May 23, 2023 · The paper defines optimal preconditioning for Langevin diffusion using an inverse Fisher information covariance matrix, and applies it to an ...
  19. [19]
    Riemann manifold Langevin and Hamiltonian Monte Carlo methods
    Mar 3, 2011 · Summary. The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve ...
  20. [20]
    [PDF] A Metropolized Adjusted Newton Algorithm for Markov Chain Monte ...
    Jul 25, 2012 · with that of MALA, namely, σ2 = 1.652n−1/3 and the optimal acceptance rate is about 0.574. Here, ` = 1.65 is the stationary value for ...<|control11|><|separator|>
  21. [21]
    [PDF] Auxiliary gradient-based sampling algorithms
    i Gaussian process models in machine learning and statistics (e.g. multi-class ... (i) Preconditioned Metropolis-adjusted Langevin (pMALA) q(y|x) = N(y|x +.
  22. [22]
    Exponential convergence of Langevin distributions and their ...
    We then consider a `Metropolis-adjusted' version of the algorithm, and ®nd conditions under which this also converges at an exponential rate: perhaps ...
  23. [23]
    Efficient Langevin and Monte Carlo sampling algorithms: The case ...
    Mar 20, 2023 · SMC is also known as the Metropolis-adjusted Langevin algorithm or simply Langevin Monte Carlo. The related hybrid Monte Carlo method23 ...
  24. [24]
    Convergence of Position-Dependent MALA with Application to ...
    This paper establishes conditions for geometric convergence of position-dependent MALA algorithms, and applies it to conditional simulation in GLMMs.
  25. [25]
    Optimal scaling of discrete approximations to Langevin diffusions
    Jan 6, 2002 · Abstract. We consider the optimal scaling problem for proposal distributions in Hastings–Metropolis algorithms derived from Langevin diffusions.
  26. [26]
    [PDF] When does Metropolized Hamiltonian Monte Carlo provably ... - arXiv
    Jun 8, 2023 · If too few integration steps are taken, HMC can behave similarly to MALA, whereas too many steps can cause HMC to wander back to the initial ...
  27. [27]
    [PDF] Fast mixing of Metropolized Hamiltonian Monte Carlo
    Hamiltonian Monte Carlo (HMC) is a state-of-the-art Markov chain Monte Carlo sampling algo- rithm for drawing samples from smooth probability densities over ...
  28. [28]
    Scaling of MCMC with dimension (experiments) - Statisfaction
    May 15, 2018 · So on this i.i.d. Gaussian target, HMC scales better than MALA, which scales better than RWMH. The literature contains various generalizations ...
  29. [29]
    [PDF] Optimal tuning of the hybrid Monte Carlo algorithm - Andrew Stuart
    The latter three papers reported that optimal scaling results for RWM and MALA type algorithms extend directly to target measures that have a density with ...Missing: autocorrelation | Show results with:autocorrelation