Fact-checked by Grok 2 weeks ago

Information geometry

Information geometry is a branch of mathematics that applies to the study of probability distributions and statistical models, viewing families of probability distributions as points on a equipped with the . This framework endows the space of distributions with geometric structures such as geodesics, curvatures, and dual affine connections, enabling the analysis of through geometric lenses like and measures. Pioneered by in his 1945 paper introducing the Fisher metric as a measure of information in parameter estimation, the field was formalized and named by Shun-ichi Amari in the 1980s, building on earlier work by Nikolai Chentsov in the 1960s. At its core, information geometry revolves around statistical manifolds, where the tangent space at each is parameterized by expectations, and the Fisher-Rao metric g_{ij}(\theta) = \mathbb{E} \left[ \frac{\partial \log p}{\partial \theta^i} \frac{\partial \log p}{\partial \theta^j} \right] quantifies the distinguishability between nearby distributions. A hallmark is the use of dual connections, such as the and connections, which are conjugate with respect to the metric and lead to dually flat structures in families of distributions, analogous to flat spaces in . These structures facilitate the in terms of s, like the Kullback-Leibler , connecting geometric distances to information loss in . The fundamental theorem of information geometry asserts that invariant connections on statistical manifolds have constant skewness, parameterized by the Amari-Chentsov tensor. The field's applications span , where it underpins bounds like the Cramér-Rao inequality via interpretations, and extend to for optimizing neural networks and clustering via natural gradients. In physics, it models and optimal transport, while recent advances integrate it with and , evidenced by over 6,000 publications since 2019. Amari and Hiroyuki Nagaoka's seminal book Methods of Information Geometry (2000) provides the rigorous foundation, emphasizing alpha-geometries for flexible divergence families. Rao's contributions were recognized with the 2023 , underscoring the enduring impact of information geometry on modern .

Introduction and Overview

Definition

Information geometry is the study of invariant geometrical structures inherent in families of probability distributions, which are modeled as points on a manifold equipped with Riemannian and affine geometries. This field bridges and by treating parameterized collections of probability densities as geometric objects, allowing the application of tools such as metrics and connections to analyze statistical properties in a coordinate-independent manner. At its foundation, information geometry considers a defined by a parameter space \Theta, where each \theta \in \Theta specifies a p(x|\theta) over an space \mathcal{X}. The set of all such distributions \{p(x|\theta) : \theta \in \Theta\} forms a manifold \mathcal{M}, typically finite-dimensional when \Theta is so, embedded within the infinite-dimensional of all probability densities on \mathcal{X}. This manifold \mathcal{M} is endowed with geometric structures, including a and affine connections, which provide a for measuring distances, , and curvatures among distributions in a way that respects the underlying probabilistic nature. Statistical models in this context are families of distributions that arise naturally in inference problems, such as exponential families, and the geometric approach applies because many statistical quantities—such as divergences or estimators—exhibit invariance under reparameterization of \Theta. Reparameterization changes the coordinates on \mathcal{M} but preserves intrinsic geometric features, enabling analyses that are robust to choice of parameterization and facilitating comparisons across different models. This invariance underscores why differential geometry is apt for statistics: it captures essential symmetries and structures independent of arbitrary coordinate systems. A representative example is the family of univariate Gaussian distributions, parameterized by mean \mu \in \mathbb{R} and standard deviation \sigma > 0, which forms a two-dimensional manifold embedded in the infinite-dimensional space of all probability densities on \mathbb{R}. Points on this manifold correspond to densities p(x|\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), and the geometry reveals hyperbolic structures, such as the manifold being isometric to the hyperbolic half-plane, highlighting how variations in \mu and \sigma induce specific curvatures and geodesics.

Scope and Importance

Information geometry encompasses the study of probability distributions as geometric objects, forging deep connections between information theory—particularly through divergences like the Kullback-Leibler divergence—differential geometry via Riemannian metrics such as the Fisher information metric, and statistics in areas like parameter estimation. This interdisciplinary framework treats families of probability distributions as manifolds, allowing the application of geometric invariants to analyze statistical models beyond traditional Euclidean assumptions. The importance of information geometry lies in its provision of invariant tools for model comparison, optimization algorithms, and insights into statistical efficiency, including geometric interpretations of phenomena such as phase transitions in probabilistic models. By equipping data spaces with dual affine connections and metrics that respect the intrinsic structure of distributions, it facilitates a unified understanding of inference procedures, notably bridging frequentist and Bayesian approaches through shared geometric lenses. One of its key benefits is the ability to handle the of high-dimensional data spaces, yielding more robust and efficient algorithms for tasks like gradient-based learning and clustering, which outperform conventional methods in capturing statistical invariances. This geometric perspective not only enhances theoretical insights but also drives practical advancements across sciences by emphasizing structure over ad-hoc parametrizations.

Historical Development

Early Foundations

The foundations of information geometry trace back to early 20th-century developments in statistical theory, particularly Ronald A. Fisher's introduction of in 1922. In his seminal paper, Fisher proposed that the method of maximum likelihood provides estimators with desirable properties, such as efficiency, and defined the information matrix—derived from the expected second derivatives of the log-likelihood function—as a measure of the precision or variability in parameter estimates. This matrix quantified the amount of information about parameters contained in the data, serving as a foundational tool for assessing estimator performance without yet framing it in geometric terms. A pivotal advancement occurred in 1945 with C. Radhakrishna 's paper, where he explicitly interpreted the Fisher information matrix as a Riemannian metric on the manifold of parameter spaces for statistical models. Rao demonstrated that this metric induces a on the space of probability distributions, enabling the measurement of distances and angles between parametric families in a way analogous to classical . This insight established the metric structure of statistics, allowing for the of accuracy and parameter sensitivity, and it directly generalized Fisher's information concept to multiparameter settings. Rao's formulation highlighted how the metric's positive definiteness ensures it defines a valid inner product for vectors in the parameter space, laying the groundwork for subsequent geometric interpretations in statistics. In the following decades, from the late 1940s to the 1960s, these ideas connected to broader statistical developments, particularly in multivariate analysis and the geometric interpretation of the Cramér-Rao bound. Rao independently derived the Cramér-Rao lower bound on the variance of unbiased estimators in his 1945 work, showing that it arises naturally from the inverse of the Fisher information matrix, which geometrically represents the minimal uncertainty ellipsoid around parameter estimates. This bound, also obtained by Harald Cramér in 1946, was increasingly viewed through the lens of the Fisher-Rao metric, illustrating how the geometry constrains achievable precision in multiparameter problems. Rao extended these concepts to multivariate settings in subsequent papers, such as his 1948 exploration of distances between populations using the metric, which facilitated applications in classification and discrimination within high-dimensional data. In the 1960s, Nikolai Chentsov further advanced the theory by developing the geometry of statistical manifolds, proving the uniqueness (up to scaling) of the Fisher information metric—known as Chentsov's theorem—and introducing invariant affine connections parameterized by the Amari-Chentsov tensor, which provided the full differential geometric structure for statistical inference. These efforts emphasized the metric's role in quantifying statistical efficiency and established the core manifold theory, including affine connections, setting the stage for later syntheses.

Modern Formulation

The modern formulation of information geometry emerged in the 1970s through the pioneering efforts of , who built upon Chentsov's foundations to introduce the concepts of dual affine connections and dually flat structures, providing a rigorous geometric framework for . These ideas endowed probability distributions with differential-geometric tools, such as torsion-free affine connections dual with respect to the , enabling the analysis of exponential families as flat spaces. Amari's foundational work culminated in his 1985 monograph, which systematically developed these structures for parametric statistical models. During the 1980s and 1990s, Amari and Hiroshi Nagaoka further formalized information geometry as a comprehensive theory, emphasizing the duality between exponential and mixture connections and their role in optimization and . This period saw the incorporation of α-divergences, a parameterized family of divergences generalizing the Kullback-Leibler divergence, which generate the α-connections and facilitate invariant statistical procedures. Their collaborative textbook, published in 2000, synthesized these advances into a cohesive exposition, establishing dually flat manifolds as central to the field and influencing applications in and . From the 2000s onward, information geometry expanded beyond parametric models to non-parametric settings, where divergences define infinite-dimensional manifolds without assuming finite-dimensional embeddings, as explored in works on and . Concurrently, computational tools emerged to approximate geometric quantities like geodesics and curvatures on high-dimensional probability spaces, enabling practical implementations in algorithms such as natural gradient methods. These developments were supported by the launch of the dedicated journal Information Geometry in 2018, which has fostered interdisciplinary research. Recent milestones include the Further Developments in Information Geometry (FDIG) conferences, with FDIG 2025 at the addressing open problems such as lumpability in Markov chains, which concerns aggregating states while preserving geometric invariants. Advancements in time-reversibility geometries have also gained traction, with 2024 studies developing differential-geometric frameworks for time-reversed Markov processes on manifolds, extending duality to non-equilibrium systems. These efforts highlight ongoing theoretical refinements and applications in stochastic modeling.

Mathematical Framework

Probability Manifolds and Coordinates

In information geometry, the collection of all positive probability functions with respect to a dominating measure μ on a 𝒳 forms an infinite-dimensional , often denoted as 𝒮. This manifold structure arises because densities can be parameterized locally, and operations like addition and of vectors correspond to pointwise operations on functions. However, for tractable analysis in , focus is placed on finite-dimensional submanifolds 𝒫 = {p_θ : θ ∈ Θ ⊂ ℝ^k}, where p_θ(x) denotes a smooth family of probability densities, and Θ is an open connected set ensuring regularity conditions such as positivity and integrability. The parameters θ serve as a natural on the manifold 𝒫, allowing points to be identified with distributions via θ ↦ p_θ. In the special case of exponential families, which admit the p_θ(x) = h(x) exp(θ · T(x) - ψ(θ)) with sufficient statistics T(x) = (T_1(x), ..., T_k(x)), fixed base measure h(x), and function ψ(θ) = log ∫ h(x) exp(θ · T(x)) μ(dx), a dual η emerges, known as the parameters. These η coordinates are affine in the mixture representation and provide a complementary parameterization, enabling dually flat structures essential for geometric interpretations of divergences. The duality between θ and η is captured by the relation η = ∇θ ψ(θ), where ∇θ denotes the gradient with respect to θ. To derive this, consider the normalization condition ∫ p_θ(x) μ(dx) = 1, which implies ψ(θ) = log ∫ h(x) exp(θ · T(x)) μ(dx). Differentiating both sides with respect to θ_i yields ∂ψ/∂θ_i = ∫ T_i(x) p_θ(x) μ(dx) = E{p_θ}[T_i(X)], since ∂/∂θ_i log p_θ(x) = T_i(x) - ∂ψ/∂θ_i and taking expectation gives E[∂/∂θ_i log p_θ] = 0 = E[T_i(X)] - ∂ψ/∂θ_i. Thus, the i-th component satisfies η_i = E{p_θ}[T_i(X)] = ∂ψ/∂θ_i, establishing the coordinate transformation via the convex potential ψ. The inverse relation θ = ∇_η φ(η) holds through the Legendre-Fenchel dual potential φ(η) = sup_θ (θ · η - ψ(θ)), ensuring a one-to-one correspondence in the interior of the domain. The tangent space T_p 𝒫 at a point p = p_θ consists of all functions f : 𝒳 → ℝ that are square-integrable with respect to p (i.e., E_p[f^2] < ∞) and satisfy the centering condition E_p = ∫ f(x) p(x) μ(dx) = 0, forming a Hilbert space under the inner product ⟨f, g⟩_p = E_p[f g]. A natural coordinate frame for this is provided by the score functions l_i(x; θ) = ∂/∂θ_i log p_θ(x), i = 1, ..., k, which are random variables representing infinitesimal changes in the log-likelihood and span T_p 𝒫 under mild regularity assumptions ensuring linear independence. These score functions satisfy E_p[l_i] = 0, as derived from differentiating the normalization ∫ p_θ μ(dx) = 1, and in the exponential family case, they take the explicit form l_i(x; θ) = T_i(x) - η_i, highlighting their role in bridging the dual coordinates.

Fisher Information Metric

The Fisher information metric, also known as the Fisher-Rao metric, provides the fundamental Riemannian structure on a statistical manifold parameterized by θ, where the metric tensor is given by g_{ij}(\theta) = \int \left( \frac{\partial \log p(x|\theta)}{\partial \theta_i} \right) \left( \frac{\partial \log p(x|\theta)}{\partial \theta_j} \right) p(x|\theta) \, dx = \mathbb{E} \left[ \frac{\partial \log p(X|\theta)}{\partial \theta_i} \frac{\partial \log p(X|\theta)}{\partial \theta_j} \right], with the expectation taken over X ∼ p(·|θ). This expression represents the covariance matrix of the score functions ∂ log p / ∂θ_i, which quantify the sensitivity of the log-likelihood to parameter changes. The metric arises naturally from the second-order Taylor expansion of the between nearby distributions p_θ and p_{θ + δθ}, where D_{KL}(p_θ || p_{θ + δθ}) ≈ (1/2) δθ^T g(θ) δθ, capturing the local curvature of divergence in parameter space. Alternatively, it emerges as the variance of the score functions under regularity conditions ensuring differentiability and integrability of the densities. A key property is its invariance under reparameterization of θ and under transformations via sufficient statistics, as established by , which characterizes the Fisher metric as the unique (up to positive scaling) Riemannian metric on statistical models preserving this invariance. The Fisher metric is positive semi-definite, becoming positive definite on identifiable models, and defines a Riemannian distance via the infimum of curve lengths ∫ √{ḣ^T g(θ(t)) ḣ} dt, where ḣ = dθ/dt along paths connecting distributions. In non-parametric settings over densities on a domain, the metric embeds isometrically into the unit sphere in L^2 space via the map p ↦ 2√p, where the induced geometry matches the spherical L^2 metric up to scaling. For the univariate Gaussian family N(μ, σ^2) with both parameters unknown, the metric in (μ, σ) coordinates is diagonal: g = diag(1/σ^2, 2/σ^2), reflecting the distinct scaling of uncertainty in location and scale parameters. This example illustrates how the metric quantifies parameter distinguishability, with larger values indicating higher information content for estimation. In his seminal 1945 work, C.R. Rao first interpreted the Fisher information as a metric tensor measuring the attainable accuracy and uncertainty in estimating statistical parameters from data.

Affine Connections and Duality

In information geometry, affine connections provide the structure for parallel transport on statistical manifolds, enabling the definition of geodesics and covariant derivatives in the space of probability distributions. These connections are represented by Christoffel symbols \Gamma^k_{ij}, which quantify how coordinate vectors are transported along curves without torsion, ensuring a consistent notion of "straight lines" in curved spaces of distributions. A central feature is the duality between two torsion-free affine connections, denoted \nabla^{(-1)} (the exponential connection) and \nabla^{(1)} (the mixture connection), which are compatible with the Fisher information metric g such that \nabla g = 0. This compatibility preserves the metric under parallel transport for both connections. The duality relation is expressed as X g(Y, Z) = g(\nabla_X Y, Z) + g(Y, \nabla^*_X Z), where \nabla^* is the dual connection to \nabla, linking the exponential and mixture perspectives in a symmetric manner. Dually flat manifolds arise when both connections are flat, meaning there exist dual coordinate systems (e.g., \theta for \nabla^{(-1)} and \eta for \nabla^{(1)}) in which the Christoffel symbols vanish, simplifying geodesics to straight lines. In such spaces, the Pythagorean theorem holds for orthogonal projections onto affine subspaces: for points p, q, r where the projection of p onto the subspace through q is r, the divergence satisfies D(p \| q) = D(p \| r) + D(r \| q), reflecting the geometric additivity akin to Euclidean spaces. The family of \alpha-connections generalizes this duality, with Christoffel symbols given by \Gamma^{(\alpha) k}_{ij} = \frac{1 + \alpha}{2} \Gamma^{(1) k}_{ij} + \frac{1 - \alpha}{2} \Gamma^{(-1) k}_{ij}, where \alpha = -1 recovers the exponential connection and \alpha = 1 the mixture connection; these interpolate between the dual extremes while maintaining metric compatibility. This framework of dual affine connections was formalized by Amari in 1982, unifying the mixture and exponential viewpoints on statistical inference through geometric duality. The Fisher metric's compatibility with these connections ensures that the geometry remains invariant under reparameterization, extending the Riemannian structure to a fuller affine picture.

Core Concepts

Dual Geometries: Exponential and Mixture

In information geometry, exponential families provide a fundamental structure where probability distributions are parameterized in a form that endows the manifold with flatness under the exponential affine connection \nabla^{(1)}. Specifically, an exponential family is defined by densities of the form p(x|\theta) = \exp(\theta \cdot t(x) - \psi(\theta)), where \theta are the natural parameters, t(x) is the sufficient statistic vector, and \psi(\theta) = \log \int \exp(\theta \cdot t(x)) \, dx is the log-partition function ensuring normalization. This parameterization makes the manifold flat with respect to \nabla^{(1)}, meaning parallel transport along the connection preserves the exponential structure, and the natural parameters \theta serve as affine coordinates. Dually, mixture families consist of convex combinations of fixed base distributions, given by p(x|w) = \sum_i w_i p_i(x) where w = (w_i) are mixture weights satisfying \sum_i w_i = 1 and w_i \geq 0. These families are flat under the mixture affine connection \nabla^{(0)}, with the mixture parameters w acting as affine coordinates, allowing straight-line interpolations in the mixture space to correspond to geodesics. The duality between exponential and mixture structures arises through the Legendre transform, which relates the cumulant generating function F(\theta) = \log \int \exp(\theta \cdot t(x)) \, dx to its convex conjugate \psi^*(\eta), where \eta = \nabla_\theta F(\theta) represents the expectation parameters; this transform establishes a bijection between the natural \theta-coordinates and the dual \eta-coordinates, highlighting the interplay between the two geometries. Key properties of these dual geometries include the distinction between m-geodesics, which are straight lines in the mixture connection and preserve convex combinations, and e-geodesics, which are straight in the exponential connection and follow exponential interpolations. A remarkable feature is the orthogonality of m- and e-geodesics with respect to the Fisher information metric, ensuring that projections along one geodesic are perpendicular to the other, which underpins geometric interpretations of divergences and projections in statistical inference. An illustrative example is the multinomial distribution, which can be expressed both as an exponential family with natural parameters \theta_k = \log(p_k / p_K) for categories k = 1, \dots, K-1 (flat under \nabla^{(1)}) and as a mixture family of Dirac measures on the categories (flat under \nabla^{(0)}), demonstrating how the dual structures unify discrete probability models within the information geometric framework.

Bregman Divergences and Alpha-Connections

Bregman divergences form a fundamental class of measures in information geometry, generated by a strictly convex and differentiable potential function F. For probability distributions p and q represented in coordinates where F is defined, the Bregman divergence is given by D_F(p \parallel q) = F(p) - F(q) - \nabla F(q) \cdot (p - q), where \nabla F(q) denotes the gradient of F at q, and \cdot is the inner product. This expression captures the difference between the function value at p and its first-order Taylor approximation at q, ensuring D_F(p \parallel q) \geq 0 with equality if and only if p = q, due to the convexity of F. In the context of statistical manifolds, Bregman divergences induce dual affine connections compatible with the Fisher information metric, providing a geometrical structure distinct from other divergence families. A prominent special case is the Kullback-Leibler (KL) divergence, which arises when F is the negentropy (negative Shannon entropy), F(\theta) = \sum_i \theta_i \log \theta_i for coordinates \theta in the exponential representation. This yields D_F(p \parallel q) = \sum_i p_i \log (p_i / q_i), the standard KL divergence, highlighting how Bregman forms encompass key information-theoretic measures. Alpha-connections generalize the affine connections on statistical manifolds, parameterized by \alpha \in \mathbb{R}, and arise from alpha-mixtures of distributions. The connection \nabla^{(\alpha)} interpolates between the exponential connection at \alpha = 1, which is flat on exponential families, and the mixture connection at \alpha = -1, which is flat on mixture families. For \alpha = 0, it reduces to the Levi-Civita connection of the . This family provides a unified framework for deforming the geometry of probability manifolds while preserving duality with respect to the . The relationship between Bregman divergences and alpha-connections is mediated by the third-order cumulant tensor, also known as the Amari-Chentsov tensor, which governs the difference between dual connections. A Bregman divergence induces an alpha-connection through this tensor, with the parameter \alpha controlling the interpolation; the connections exhibit monotonicity in \alpha, meaning geodesics and parallel transport vary continuously and increasingly from mixture-like to exponential-like structures as \alpha increases. The alpha-divergence, central to this interplay, is defined for \alpha \neq \pm 1 as D^{(\alpha)}(p \parallel q) = \frac{4}{1 - \alpha^2} \left[ 1 - \int p^{\frac{1 + \alpha}{2}} q^{\frac{1 - \alpha}{2}} \, dx \right], recovering the KL divergence in the limit as \alpha \to 1 or \alpha \to -1 (up to sign). This form ensures compatibility with alpha-connections, as its infinitesimal version aligns with the cubic tensor. Alpha-divergences uniquely unify the classes of f-divergences and Bregman divergences, as they satisfy the conditions for both families simultaneously. This unification facilitates multi-parameter generalizations, extending the alpha-family to higher-order structures while preserving the dual flatness essential for information projections and statistical inference.

Invariant Structures and Curvature

In information geometry, the invariant structures of statistical manifolds include the torsion-free property of the affine connections and the metric compatibility of the Fisher information metric with the Levi-Civita connection. The connections \nabla^{(\alpha)} and their duals \nabla^{(- \alpha)} are torsion-free, meaning \nabla_X Y - \nabla_Y X = [X, Y] for vector fields X, Y, ensuring no intrinsic twisting in the geometry. The Fisher metric g satisfies metric compatibility with the \alpha = 0 connection, \nabla^{(0)}_X g(Y, Z) = g(\nabla^{(0)}_X Y, Z) + g(Y, \nabla^{(0)}_X Z), which preserves lengths and angles under parallel transport along \nabla^{(0)}. These properties, along with the invariance of the Fisher metric and higher-order tensors under sufficient statistics and Markov embeddings, form the core invariants that distinguish information geometry from classical Riemannian geometry. The curvature of a statistical manifold is quantified by the Riemann curvature tensor R^{(\alpha)}(X, Y)Z = \nabla^{(\alpha)}_X \nabla^{(\alpha)}_Y Z - \nabla^{(\alpha)}_Y \nabla^{(\alpha)}_X Z - \nabla^{(\alpha)}_{[X, Y]} Z associated with each \alpha-connection, measuring the deviation from flatness by capturing how parallel transport fails to commute. This tensor determines sectional curvatures, Ricci curvatures, and the scalar curvature, with the latter providing a measure of overall information loss in curved exponential families compared to their flat embeddings. The Amari-Chentsov tensor, a totally symmetric cubic tensor C_{ijk}, arises as the difference between dual connections, C(X, Y, Z) = g(\nabla^{(1)}_X Y - \nabla^{(-1)}_X Y, Z), and quantifies skewness or non-metricity in the structure. In coordinate form, C_{ijk} = \partial_k g_{ij} in exponential flat coordinates where the exponential connection \nabla^{(1)} has vanishing Christoffel symbols, but simplifies in flat coordinates where one connection vanishes. A key invariant expression for the Amari-Chentsov tensor in exponential flat coordinates (where the exponential connection \nabla^{(1)} has vanishing Christoffel symbols) is C_{ijk} = \partial_k g_{ij}, reflecting its role as the third derivative of the potential function \psi underlying the metric g_{ij} = \partial_i \partial_j \psi. This form is invariant under coordinate transformations, uniquely characterizing the geometry up to scaling, similar to the itself. The non-metricity induced by C for general \alpha-connections is \nabla^{(\alpha)}_X g(Y, Z) = -\alpha \, C(X, Y, Z), linking curvature to divergence from compatibility. The \alpha-curvature, or the Riemann tensor R^{(\alpha)} of the \alpha-connection, depends parametrically on \alpha, with R^{(\alpha)}(X, Y)Z - R^{(- \alpha)}(X, Y)Z = \alpha (R^{(0)}(X, Y)Z - R^{(0)*}(X, Y)Z) relating dual curvatures through the Amari-Chentsov tensor. In dually flat manifolds, where both \nabla^{(\alpha)} and \nabla^{(- \alpha)} have vanishing curvature (R^{(\pm 1)} = 0), the \alpha-curvature disappears for these extremal connections, enabling Pythagorean relations in projections. A key result in information geometry is that a statistical manifold is dually flat if and only if the family admits a minimal exponential family representation.

Methods and Tools

Natural Gradient Descent

In information geometry, natural gradient descent is an optimization method that leverages the Riemannian structure of the statistical manifold to perform steepest descent in a geometrically informed manner. Unlike standard gradient descent, which treats the parameter space \theta as Euclidean and updates via \theta_{t+1} = \theta_t - \epsilon \nabla_\theta L(\theta_t) where L is the loss function, the natural gradient preconditions the update to account for the curvature induced by the Fisher information metric g(\theta). This preconditioning equalizes the effective step sizes across directions, preventing inefficient progress in regions where parameters are highly correlated or the manifold is ill-conditioned. The natural gradient \tilde{\nabla}_\theta L = g(\theta)^{-1} \nabla_\theta L arises from the geometry of probability distributions, where the Fisher metric defines distances that reflect the distinguishability of distributions. Its derivation stems from seeking the direction of minimal increase in the for a fixed Riemannian length on the manifold; specifically, the second-order approximation of the KL divergence between nearby distributions p_\theta and p_{\theta + d\theta} is \frac{1}{2} d\theta^\top g(\theta) d\theta, leading to the natural gradient as the solution that minimizes this under the constraint \|d\theta\|_g = 1. This aligns with geodesic flow, as the natural gradient points along the —the shortest path on the manifold—thereby ensuring updates follow the intrinsic geometry rather than arbitrary coordinates. The corresponding update rule for natural gradient descent is given by \theta_{t+1} = \theta_t - \epsilon \, g(\theta_t)^{-1} \nabla_\theta L(\theta_t), where \epsilon > 0 is the learning rate and g(\theta_t) is the at \theta_t. This formulation is particularly effective for , where it achieves Fisher efficiency, meaning the parameter estimates converge at the same asymptotic rate as the optimal batch using the full . Key properties of natural gradient descent include its invariance to reparameterization: if \theta = \phi(\psi) for a \phi, the metric transforms as g_\theta = J^\top g_\psi J where J = \partial \phi / \partial \psi, ensuring \tilde{\nabla}_\theta L = J^{-1} \tilde{\nabla}_\psi L and thus consistent descent direction in space. Additionally, it exhibits faster in curved parameter spaces, such as those encountered in training, by mitigating issues like plateaus and slow adaptation in highly nonlinear models. As an illustrative example, consider , where parameters represent weights in a with a output. The standard treats weights independently, but they exhibit correlations due to the normalization implicit in the probability outputs; the natural gradient, by inverting the Fisher matrix, adjusts for these correlations, promoting more balanced updates and reducing the number of iterations needed for convergence compared to the approach. Recent extensions as of 2025 include stochastic natural gradient descent for efficient optimization in multi-layer models and quantum natural gradients for applications in , addressing scalability and spaces.

Information Projections

In information geometry, information projections provide a method for mapping a probability distribution onto an affine of a statistical manifold while minimizing a divergence measure, leveraging the affine structures to ensure geometric consistency. The standard information projection, or I-projection, of a distribution p onto a convex set \mathcal{C} is defined as the unique element q \in \mathcal{C} that minimizes the Kullback-Leibler (KL) divergence D(p \| q) = \int p(x) \log \frac{p(x)}{q(x)} \, dx. This projection satisfies a Pythagorean relation when the subspace is flat with respect to one of the dual connections: if q is the I-projection of p onto an affine S and r \in T where S and T are orthogonal (e.g., one e-flat and one m-flat), then D(p \| r) = D(p \| q) + D(q \| r), with the geodesics from p to q and q to r being orthogonal under the Fisher information metric. The dual geometry introduces two primary types of projections corresponding to the and families. The m-projection minimizes D(p \| q) subject to q lying in an m-flat , which is affine in the mixture coordinates \theta (linear constraints on the probabilities, such as \sum_i \theta_i = 1, \theta_i \geq 0); this follows an m-geodesic path q_t(x) = (1-t) p(x) + t r(x). Conversely, the e-projection also minimizes D(p \| q) but onto an e-flat , affine in the (natural) coordinates \eta (linear constraints on the sufficient statistics); it follows an e-geodesic q_t(x) = \exp\left( (1-t) \log p(x) + t \log r(x) - c(t) \right), where c(t) normalizes the . These projections exploit the ity: the m-projection of p onto an e-flat set is to the e-projection minimizing the reverse D(q \| p) onto an m-flat set, ensuring uniqueness and the Pythagorean decomposition holds due to the orthogonality of e- and m-geodesics. Generalizations extend to \alpha-projections using \alpha-connections, where the \alpha-divergence D^{(\alpha)}(p \| q) = \frac{4}{1-\alpha^2} \left( 1 - \sum_x p(x)^{\frac{1+\alpha}{2}} q(x)^{\frac{1-\alpha}{2}} \right) (for \alpha \neq \pm 1) is minimized onto \alpha-flat subspaces; the m-projection corresponds to \alpha = -1 (mixture-like) and e-projection to \alpha = 1 (exponential-like). For computing e-projections onto linear constraints on expectations (defining m-flat subspaces), such as E_q[s(X)] = \mu for sufficient statistics s(X), the solution in exponential coordinates \eta satisfies \nabla \psi(\eta) = \mu, where \psi(\eta) = \log \int \exp(\eta \cdot s(x)) m(dx) is the ; this is solved using Lagrange multipliers for the \min_\eta D(p \| q_\eta) + \lambda \cdot (\nabla \psi(\eta) - \mu). Algorithms for these projections often rely on iterative methods grounded in the dual potentials. Iterative scaling, for instance, computes e-projections by successively adjusting \eta to match moment constraints through multiplicative updates q^{(k+1)}(x) = q^{(k)}(x) \exp\left( \sum_j \lambda_j^{(k)} s_j(x) \right), converging to the projection under suitable regularity. Mirror descent variants use Bregman divergences derived from the potentials \psi and its dual \phi to perform generalized projections, with steps mirroring the geometry of the manifold. A notable application is the expectation-maximization (EM) algorithm, which in exponential families corresponds to alternating e-projections (E-step: project onto the joint distribution manifold) and m-projections (M-step: project onto the parameter subspace), ensuring monotonic decrease in the free energy and convergence to a local maximum likelihood estimate. As of 2025, these projections have been extended in contexts, such as alpha-projections for variational in generative models, enhancing in high-dimensional spaces.

Geometric Statistical Inference

Geometric statistical leverages the Riemannian structure of probability manifolds to analyze and processes, providing frameworks for understanding statistical and asymptotic behavior. In this context, (MLE) emerges as a natural geometric operation: given an i.i.d. sample from an unknown , the empirical forms an e-flat , and the MLE corresponds to the e-projection of this empirical measure onto the model manifold, minimizing the Kullback-Leibler D(\hat{p} \| p_\theta) where \hat{p} is the empirical and p_\theta belongs to the parametric family. This projection property ensures that the MLE achieves minimal information loss in the exponential sense, aligning with the dual affine connections of the manifold. Contrast functions play a central role in defining invariant measures for estimation tasks within information geometry. A contrast function C(\theta, \theta') on the parameter space is a smooth, non-negative function vanishing only when \theta = \theta', often expressed in the form C(\theta, \theta') = D(\theta \| \theta') - D(\theta \| \theta_0) - D(\theta_0 \| \theta') for some reference \theta_0 and divergence D, which generates the and dual connections via its higher-order derivatives. These functions are invariant under reparametrization of the model, as their induced geometric structures— from second derivatives, affine connections from third derivatives—depend solely on the intrinsic differential properties of the , ensuring that procedures remain consistent across coordinate choices. Efficiency bounds in geometric statistical inference are derived from the Fisher-Rao metric, yielding a geometric interpretation of the Cramér-Rao lower bound (CRLB). The CRLB states that for an unbiased estimator \hat{\theta} of a parameter \theta, the covariance matrix satisfies \operatorname{Cov}(\hat{\theta}) \succeq g^{-1}(\theta)/n, where g(\theta) is the Fisher information metric tensor and n is the sample size; this bound is geometrically the inverse metric scaled by sample size, reflecting the minimal volume distortion in parameter estimation. Asymptotically, the MLE attains this bound, with its variance approaching g^{-1}(\theta)/n, but higher-order corrections involving the manifold's curvature—such as the Amari-α curvatures—account for bias and refine the approximation for finite samples, ensuring asymptotic invariance under smooth reparametrizations. The use of dual coordinates further enhances geometric inference, particularly for Bayesian updates. In exponential families, the natural (dual) coordinates parameterize the manifold such that priors and posteriors remain within the family, simplifying updates as m-projections ( projections) onto e-flat submanifolds defined by the data; this duality interchanges and families, allowing to be viewed as an that preserves the geometric without coordinate-dependent computations. Recent advances as of 2025 integrate these inference methods with large-scale , including geometric approaches to in models.

Applications

Statistics and Hypothesis Testing

Information geometry provides a framework for understanding in statistics through the lens of on statistical manifolds. The (AIC) emerges naturally as an approximation to the expected Kullback-Leibler divergence between the true distribution and the fitted model, incorporating a penalty term given by twice the trace of the matrix, which quantifies the local and dimensionality of the parameter space. Similarly, the (BIC) penalizes model complexity using the log of the sample size multiplied by the trace of the , reflecting the volume scaling in the geometric structure under Laplace approximations for models. These criteria leverage the Riemannian metric induced by the to balance goodness-of-fit against , ensuring asymptotic consistency in model ranking for well-specified, statistical families. In singular statistical models, such as overparameterized ones where the matrix degenerates (e.g., reduced-rank or models with identical components), standard AIC and fail due to non-identifiability and irregular asymptotics. Information geometry addresses this via , drawing from to blow up singular points in the space, transforming the degenerate manifold into a smooth one where asymptotic expansions can be rigorously derived. Sumio Watanabe's singular learning theory applies Hironaka's resolution theorem to decompose the Bayesian stochastic complexity into terms involving the real threshold \lambda (a measure of singularity resolution difficulty) and the multiplicity m of the resolved fiber, yielding generalized criteria like the Widely Applicable Information Criterion (WAIC = -2 \log L + 2\lambda \log n) and Wide Bayesian Information Criterion (WBIC), which replace the dimension with these geometric invariants for accurate in irregular cases. This approach resolves overfitting in high-dimensional models by accounting for the effective near singularities, as demonstrated in neural networks where \lambda \leq d/2 with d the count. Hypothesis testing in information geometry reframes classical procedures geometrically, interpreting the Neyman-Pearson lemma as a minimization of divergences subject to error rate constraints on the statistical manifold. The optimal test threshold corresponds to the level set of the log-likelihood ratio, which minimizes the Kullback-Leibler divergence from the null to the alternative while controlling the type I error, aligning with the geodesic projection in the dual affine structure. The generalized likelihood ratio test (GLRT) acquires a geometric interpretation via alpha-connections, where the test statistic measures the squared geodesic distance along the alpha-parallel curve between nested submanifolds, enabling higher-order approximations to the null distribution through the Amari-Chentsov tensor of curvature. For misspecified models, where the true distribution lies outside the assumed family, the Takeuchi Information Criterion (TIC) extends AIC by replacing the trace of the Fisher information with \operatorname{tr}(J^{-1} I), where J is the expected Hessian and I the observed information; geometrically, this captures the bias as the mismatch between the quasi-Fisher metric and the true geometry, providing a divergence-based penalty that adjusts for projection errors onto the misspecified manifold. A representative application is testing for normality using the order of alpha-divergences, which parameterize a family of f-divergences inducing dual alpha-connections on the manifold of distributions. The alpha-divergence of order \alpha, defined as D_\alpha(p \| q) = \frac{4}{1-\alpha^2} \left(1 - \int p^{(1+\alpha)/2} q^{(1-\alpha)/2} \, dx \right) for \alpha \in (-1,1), yields test statistics whose asymptotic order under the null () varies with \alpha, allowing robust goodness-of-fit assessment; for instance, \alpha = 0 recovers the Kullback-Leibler case with \chi^2 order, while \alpha = -1 emphasizes tail discrepancies via reverse I-projection, enhancing power against heavy-tailed alternatives. This geometric ordering facilitates selection of the alpha-value that minimizes the higher-order bias in the test's approximation, tying directly to the curvature invariants of the embedding the .

Machine Learning and Optimization

Information geometry has significantly influenced by providing a framework for optimizing on statistical manifolds, particularly through the natural gradient method applied to neural networks. Shun-ichi Amari proposed the natural gradient in 1998 as an efficient learning algorithm that accounts for the Riemannian structure induced by the , enabling faster convergence compared to standard in high-dimensional spaces of neural nets. This approach preconditions the gradient using the inverse Fisher matrix, which captures the local of the and reduces sensitivity to . Implementations like Kronecker-Factored Approximate Curvature (KFAC) approximate this second-order information for large-scale neural networks, achieving near-natural gradient performance with reduced computational cost by factorizing the Fisher matrix into Kronecker products of layer-specific matrices. In variational inference, information geometry introduces geometric priors on latent spaces by endowing them with the Fisher-Rao metric pulled back from the observation space, allowing for non-Euclidean structures that better model complex data distributions in generative models. This perspective facilitates the design of priors that respect the intrinsic geometry of probability distributions, improving posterior approximations in Bayesian neural networks. Fisher GAN uses the in the critic for stable training, while Wasserstein GAN employs optimal transport distances to improve mode coverage and reduce vanishing gradients. Geodesic distances derived from information geometry enhance clustering and by preserving the manifold structure of data, as seen in variants of that use the to compute shortest paths on statistical manifolds rather than distances. This approach uncovers nonlinear embeddings that align with the intrinsic geometry of high-dimensional datasets, such as in bioinformatics for analysis. In , natural gradients optimize policy parameters by following trust regions defined by the , as in the natural policy gradient method, which maximizes expected rewards while constraining policy updates to geodesically feasible steps, improving sample efficiency in continuous control tasks. Recent advances in 2024-2025, such as trajectory information geometry, derive universal response inequalities for non-stationary Markov processes, with potential applications to dynamical systems in like recurrent neural networks. This framework generalizes linear response theory beyond steady states, enabling more reliable predictions in time-dependent optimization scenarios.

Physics and Signal Processing

In , information geometry provides a framework for analyzing phase transitions within exponential families of statistical models, where external parameters such as or induce geometric structures on the manifold of states. These families, parameterized by sufficient statistics, exhibit dually flat geometries that reveal critical points as singularities in the , marking the onset of phase transitions like those in the . For instance, the quantifies the sensitivity of thermodynamic potentials to parameter variations, enabling a geometric of and phenomena in systems described by generalized distributions. The Fisher metric also plays a central role in relating fluctuations to dissipation, extending the classical to non-equilibrium settings through information-geometric covariances. In this context, the measures the infinitesimal distinguishability of states under perturbations, linking thermodynamic response functions to the of probability distributions and providing bounds on linear susceptibilities in driven systems. Recent advancements, including 2025 studies on universal response inequalities, leverage trajectory geometries—paths on the traced by time-evolving distributions—to derive inequalities for non-stationary processes, generalizing fluctuation-dissipation relations beyond steady states in non-equilibrium physics. These inequalities, derived from the in dually flat spaces, constrain how systems respond to external drives, with applications to colloidal dynamics and . In quantum information theory, the Fisher-Rao metric extends to the space of density matrices, defining a Riemannian geometry that preserves distinguishability under quantum channels. This metric, induced by the symmetric , quantifies the information content of quantum states and facilitates geometric formulations of quantum and testing. Among the family of metrics, the Bogoliubov-Kubo-Mori (BKM) metric stands out for its connection to quantum relative entropy, offering a non-commutative analog of classical divergences and enabling the study of quantum phase transitions through curvature singularities. The BKM metric's monotonicity ensures it contracts under completely positive trace-preserving maps, making it invariant under unitary evolutions and useful for analyzing entanglement and in finite-dimensional Hilbert spaces. Applications in utilize information geometry for modeling through hidden Markov models (s), where the geometric structure of the parameter manifold guides parameter estimation and . By HMM transition probabilities in a equipped with the Fisher-Rao metric, geometric methods enhance smoothing algorithms, hidden states along to reduce estimation variance in sequential data. This approach proves effective for non-stationary signals, such as speech or sensor outputs, by projecting observations onto the of emission distributions. A representative example is interpolation in Gaussian processes for signal denoising, where the metric on the space of matrices defines shortest paths between noisy and clean distributions, yielding smoothed reconstructions that preserve signal while suppressing additive noise.

Emerging Fields

In biology and ecology, information geometry facilitates the modeling of population dynamics through information projections, which enable the analysis of replicator equations on statistical manifolds to describe species interactions and evolutionary stability. These projections, such as ray-projections onto the probability simplex, link replicator dynamics to natural gradient flows, preserving diversity and revealing stable orbits in ecological communities. For instance, the Shahshahani metric, derived from the Fisher information, models cycling in two-locus genetic systems, connecting nonlinear dynamics to self-regulating biological populations. Geometric models of evolution further apply this framework to allometric scaling laws across multi-kingdom organisms, using E8×E8 structures in geometric information field theory to predict metabolic exponents (e.g., 0.683 observed vs. 0.687 empirical) and fractal protein configurations in 65.1% of analyzed species. In finance, alpha-divergences from information geometry provide robust risk measures by quantifying dissimilarity between Lévy processes, such as tempered or variance gamma models, which capture heavy-tailed asset returns. These divergences derive the matrix and α-connections on the manifold of financial processes, enabling sensitivity analysis for assessment in volatile markets. Portfolio optimization leverages statistical manifolds to balance diversity and volatility, with L(α)-divergences guiding multiplicatively generated strategies on the market ; for example, diversity-weighted portfolios interpolate between equal- and market-weighted allocations to maximize relative . Beyond classical settings, non-commutative geometry extends information geometry to , generalizing dually flat manifolds via faithful normal states on algebras and Araki’s relative for infinite-dimensional . Recent 2024 advances address lumpability in Markov kernels, allowing aggregation of quantum states while preserving geometric structures for parameter estimation in dynamical quantum processes. Open problems in information geometry, as discussed at the Further Developments of Information Geometry (FDIG) 2025 conference held March 17-21 in , include the efficiency of (MLE) in curved exponential families, where asymptotic optimality may fail due to manifold , and the of time-reversibility structures in non-reversible Markov chains on statistical manifolds. Integration with artificial intelligence for causal inference has advanced through methods like flow-based information-geometric causal inference (FIGCI), which refines causal direction detection by minimizing divergences on the manifold of joint distributions, enhancing AI models' ability to infer geometric causality from observational data.

Key Contributors and Resources

Pioneering Researchers

Calyampudi Radhakrishna Rao (1920–2023), an influential statistician, pioneered the geometric approach to statistics through his seminal 1945 paper, where he introduced the Fisher-Rao metric as a fundamental tool for measuring distances in parameter spaces of probability distributions. His work established the Riemannian structure underlying statistical manifolds, influencing later geometric interpretations of inference and earning him recognitions such as the Guy Medal in Gold from the Royal Statistical Society for his broad contributions to statistical theory. Shun'ichi Amari (born 1936), a mathematician and engineer, advanced information geometry in the 1970s by developing the theory of dually flat structures and affine connections, providing a framework to analyze divergences and projections in statistical models. With over 300 publications, Amari connected to neural networks and optimization, earning the IEEE Neural Networks Pioneer Award in 1992 for his foundational role in neurocomputing and information geometry. Hiroshi Nagaoka, a information theorist, collaborated with Amari on the influential 2000 textbook Methods of Information Geometry, which systematized the field's mathematical foundations and extended concepts to settings through analyses of monotone metrics and quantum divergences. Other notable contributors include Nikolai Chentsov, who in 1972 proved the invariance theorem characterizing the Fisher metric as the unique statistical invariant up to scaling, solidifying the geometric invariance of information measures. Steffen Lauritzen integrated information geometry with graphical models in the , leveraging exponential families to model conditional independencies in probabilistic networks. Collectively, Rao provided the metric cornerstone, Amari forged interdisciplinary connections, and subsequent researchers like Nagaoka, Chentsov, and Lauritzen expanded the field's applicability without delving into overlapping technical derivations.

Notable Publications and Conferences

One of the foundational texts in information geometry is Methods of Information Geometry by and Hiroshi Nagaoka, originally published in in 1993 and translated into English in 2000, which provides a comprehensive mathematical framework for the field, including dual affine connections and statistical manifolds. Another key book is Information Geometry and Its Applications, edited by Nihat Ay, Panagiotis G. Constantinou, Jürgen Jost, and Lorenz Schwachhöfer in 2017, which compiles contributions from the IGAIA IV conference honoring Amari's 80th birthday and explores applications in diverse areas such as and physics. Seminal papers include Amari's Differential-Geometrical Methods in Statistics from 1985, published as part of Springer's Lecture Notes in series, which introduced differential geometric tools for parametric on curved exponential families. More recently, the 2025 preprint "Open problems in information geometry: a discussion at FDIG 2025" by Tomonari Sei and Hiroshi Matsuzoe collects unresolved challenges in the field, such as characterizing exponential connections and moduli spaces of affine connections, arising from conference discussions. Dedicated journals have supported the field's growth, notably Information Geometry, launched by in 2018 as an interdisciplinary outlet for theoretical and computational advances in Fisher-Rao metrics and related structures. Complementing this, the journal Entropy published by has featured ongoing special issues on information geometry since at least 2014, including "Information Geometry II" and collections on applications, fostering interdisciplinary contributions. Conferences play a central role in advancing the discipline, with the Further Developments of Information Geometry (FDIG) series providing a platform for emerging topics; for instance, FDIG 2025, held at the from March 17-21, emphasized Markov kernels in stochastic processes and . Post-2018, the field has seen accelerated publication activity, including 2024-2025 works on trajectory information geometry for non-stationary Markov processes and steady-state transitions in , as evidenced by papers deriving response inequalities and geometric characterizations of .

References

  1. [1]
  2. [2]
  3. [3]
    An Elementary Introduction to Information Geometry - MDPI
    Professor Shun-ichi Amari, the founder of modern information geometry, defined information geometry in the preface of his latest textbook [2] as follows ...<|control11|><|separator|>
  4. [4]
    Information Geometry | SpringerLink
    Information geometry is a field with a novel mathematical foundation using parametrised measure models, including the Fisher metric and Amari-Chentsov tensor.
  5. [5]
    Methods of Information Geometry - AMS Bookstore
    Information geometry is a new framework using Riemannian metric and α-connections, based on probability distributions, and is used in statistics, linear ...
  6. [6]
  7. [7]
  8. [8]
    Preface
    Information geometry provides the mathematical sciences with a new framework for analysis. This framework is relevant to a wide variety of domains, and it.Missing: importance | Show results with:importance
  9. [9]
    On the mathematical foundations of theoretical statistics - Journals
    A recent paper entitled "The Fundamental Problem of Practical Statistics," in which one of the most eminent of modern statisticians presents what purports to ...
  10. [10]
    Differential-Geometrical Methods in Statistics - SpringerLink
    In stock Free deliveryDifferential-Geometrical Methods in Statistics ; © 1985 ; 1st edition; View latest edition ; Softcover Book USD 159.99. Price excludes VAT (USA) ; Explore related ...
  11. [11]
    Nonparametric Information Geometry: From Divergence Function to ...
    This classic information geometry dealing with parametric statistical models has been investigated in the non-parametric setting using the tools of infinite- ...
  12. [12]
    [PDF] Computational information geometry: theory and practice - arXiv
    Sep 10, 2012 · Abstract: This paper lays the foundations for a unified framework for numerically and computationally applying methods drawn from a range.
  13. [13]
    On the launch of the journal Information Geometry
    Sep 19, 2018 · It is our pleasure to publish the first issue of the journal Information Geometry in the autumn of 2018. We wish to host interesting ...
  14. [14]
    FDIG2025 - Google Sites
    The research meeting will be held from March 18 (Tue) to 21 (Fri), 2025, at Hongo Campus, The University of Tokyo, in Tokyo, Japan. There will be a tutorial ...Missing: Markov | Show results with:Markov
  15. [15]
    Geometric theory of (extended) time-reversal symmetries in ... - arXiv
    Feb 6, 2024 · In this article, we analyze three classes of time-reversal of a Markov process with Gaussian noise on a manifold.
  16. [16]
    Fisher-Rao metric - Scholarpedia
    Jan 31, 2009 · The Fisher–Rao metric is a choice of Riemannian metric in the space of probability distributions. The derived geodesic distance, known as Rao distance,
  17. [17]
    [PDF] On the Properties of Kullback-Leibler Divergence Between ... - arXiv
    Jan 23, 2023 · For example, the second derivative of KL divergence is Fisher information metric. By taking the second-order Taylor expansion, KL divergence.
  18. [18]
    [PDF] Relations between Kullback-Leibler distance and Fisher information
    Evaluating at = gives the result (11) that the second derivative of the Kullback-Leibler distance equals the Fisher information, thereby generalizing (3). □.
  19. [19]
    Information geometry and sufficient statistics | Probability Theory and ...
    Jul 16, 2014 · This proves the invariance of the Fisher metric under sufficient statistics. The invariance of the Amari–Chentsov tensor under sufficient ...
  20. [20]
    [PDF] geometry of the fisher–rao metric on the space of smooth densities ...
    Introduction. The Fisher–Rao metric on the space Prob(M) of probability densities is invariant under the action of the diffeomorphism group Diff(M). Re-.
  21. [21]
    [PDF] Fisher information matrix for Gaussian and categorical distributions
    Nov 28, 2012 · 2.4 Example 3: Normal distribution. Let us calculate the Fisher matrix for univariate normal distribution (1). First, we need to take the.
  22. [22]
    [PDF] Information and the Accuracy Attainable in the Estimation of ... - Gwern
    The earliest method of estimation of statistical parameters is the method of least squares due to Markoff. A set of observations whose expectations are.
  23. [23]
    [PDF] Differential Geometry of Smooth Families of Probability Distributions
    DIFFERENTIAL GEOMETRY. OF SMOOTH FAMILIES OF PROBABILITY DISTRIBUTIONS. HIROSHI NAGAOKA AND SHUN-ICHI AMARI. METR 82-7. OCTOBER 1982. MATHEMATICAL ENGINEERING ...
  24. [24]
    Methods of Information Geometry - American Mathematical Society
    Shun-ichi Amari, RIKEN Brain Science Institute, Saitama, Japan and Hiroshi Nagaoka, University of Electro-Communications, Tokyo, Japan. ... View full volume PDF ...
  25. [25]
    Information Geometry and Its Applications | SpringerLink
    In stockManifold, Divergence and Dually Flat Structure. Shun-ichi Amari. Pages 3-30. Exponential Families and Mixture Families of Probability Distributions. Shun-ichi ...
  26. [26]
    Differential Geometry of Curved Exponential Families-Curvatures ...
    The duality connected by the Legendre transformation is thus extended to include two kinds of affine connections and two kinds of curvatures. The second-order ...
  27. [27]
    [PDF] Bregman divergences, dual information geometry, and generalized ...
    Information geometry & Bregman divergences. • Bregman divergences are ... Amari, Differential geometry of curved exponential families-curvatures and ...
  28. [28]
    [PDF] Information geometry of divergence functions - Semantic Scholar
    Mar 1, 2010 · This article studies the differential-geometrical structure of a manifold induced by a divergence function, which consists of a Riemannian ...
  29. [29]
  30. [30]
    [PDF] THE Lp-FISHER-RAO METRIC AND AMARI-˘CENCOV α ... - HAL
    Jul 30, 2023 · In 1945, Rao [37] showed that the Fisher information could be used to define a. Riemannian metric on this space, and in 1982, ˘Cencov [15] ...
  31. [31]
    [PDF] An Elementary Introduction to Information Geometry - Frank Nielsen
    Professor Shun-ichi Amari, the founder of modern information geometry, defined information geometry in the preface of his latest textbook [2] as follows: “ ...
  32. [32]
    Differential Geometry of Curved Exponential Families-Curvatures ...
    The present paper intends to give a differential-geometrical framework for analyzing statistical problems by the use of the one-parameter family of affine ...
  33. [33]
    Natural Gradient Works Efficiently in Learning | Neural Computation
    Feb 15, 1998 · Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind ...<|control11|><|separator|>
  34. [34]
    [PDF] Natural Gradient Descent for Training Multi-Layer Perceptrons
    Sep 27, 2013 · The main focus of this paper is to present a fast algorithm to compute the inverse of the Fisher information matrix. Several issues related to ...
  35. [35]
    New Insights and Perspectives on the Natural Gradient Method
    New Insights and Perspectives on the Natural Gradient Method. James Martens; 21(146):1−76, 2020. Abstract. Natural gradient descent is an optimization method ...
  36. [36]
  37. [37]
    [PDF] Information Geometry of α-Projection in Mean Field Approximation
    A family of divergence measures named the α-divergence is defined invariantly in the manifold of probability distributions (Amari, 1985; Amari and Nagaoka, 2000) ...
  38. [38]
    [PDF] Iterative Algorithms with an Information Geometry Background I ...
    Iterative projection algorithms that minimize Kullback information divergence (I-divergence) include iterative scaling, the EM algorithm, Cover's portfolio ...
  39. [39]
    Global Geometry of Bayesian Statistics - MDPI
    In this regard, we can say that the aim of Bayesian updating is a geometric setting of a dynamical system. In particular, a Bayesian updating in the conjugate ...
  40. [40]
    [PDF] Information criteria for model selection
    Dec 25, 2022 · Infor- mation criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a ...
  41. [41]
    [PDF] Algebraic Information Geometry for Learning Machines with ...
    In this. Page 2. paper, we show that the asymptotic form of the Bayesian stochastic complexity is rigorously obtained by resolution of singularities. The ...Missing: overparametrized | Show results with:overparametrized
  42. [42]
    Interpreting Kullback–Leibler divergence with the Neyman–Pearson ...
    Kullback–Leibler divergence is the expected log-likelihood ratio, and the Neyman–Pearson lemma is about error rates of likelihood ratio tests.
  43. [43]
    Information criteria for model selection - Zhang - 2023
    Feb 20, 2023 · This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent ...Missing: geometry curvature
  44. [44]
    (PDF) Information geometry of divergence functions - ResearchGate
    Aug 7, 2025 · The α-divergence is a special class of f -divergences. This is unique, sitting at the intersection of the f -divergence and Bregman divergence ...
  45. [45]
    Optimizing Neural Networks with Kronecker-factored Approximate ...
    Mar 19, 2015 · K-FAC is an efficient method for approximating natural gradient descent in neural networks, using an invertible approximation of the Fisher ...
  46. [46]
    Information Geometry and Manifold Learning: A Novel Framework ...
    The edges between nodes are weighted based on geodesic distances derived from information geometry, specifically using the Fisher Information metric. This ...
  47. [47]
  48. [48]
    Exponential Families with External Parameters - MDPI
    In this paper we introduce a class of statistical models consisting of exponential families depending on additional parameters, called external parameters.
  49. [49]
    Information Geometry, Phase Transitions, and Widom Lines - arXiv
    Nov 29, 2011 · We study information geometry of the thermodynamics of first and second order phase transitions, and beyond criticality, in magnetic and liquid ...Missing: exponential families<|separator|>
  50. [50]
    Information Geometry, Fluctuations, Non-Equilibrium ... - NIH
    For Gaussian processes, the Fisher metric is inversely proportional to the covariance matrices of fluctuations in the systems. ... theorems for different ...
  51. [51]
    [PDF] arXiv:2409.18944v1 [quant-ph] 27 Sep 2024
    Sep 27, 2024 · We propose a fluctuation-dissipation theorem in open quantum systems from an information-theoretic per- spective.
  52. [52]
    Universal response inequalities beyond steady states via trajectory ...
    Jul 25, 2025 · This letter presents a complete trajectory information geometric framework that generalizes response theory for nonstationary Markov processes.
  53. [53]
    Information geometry of density matrices and state estimation - arXiv
    Sep 6, 2010 · The associated Fisher-Rao information measure is used to define a unitary invariant Riemannian metric on the space of density matrices. An ...
  54. [54]
    From classical to quantum information geometry: a guide for physicists
    Aug 8, 2023 · Using our definition of the Fisher-Rao metric we arrive at the classical Fisher information metric (CFIM), which we give in both discrete ...
  55. [55]
    [PDF] Monotone metric tensors in Quantum Information Geometry - arXiv
    Sep 12, 2023 · We review some geometrical aspects pertaining to the world of monotone quantum metrics in finite dimensions. Particular emphasis is given to an ...
  56. [56]
    On the monotonicity of scalar curvature in classical and quantum ...
    Jan 18, 2005 · The Bogoliubov–Kubo–Mori ( BKM ) metric is a distinguished element among the monotone metrics which are the quantum analog of Fisher information ...
  57. [57]
    Information Geometry Approach to Parameter Estimation in Hidden ...
    May 17, 2017 · Abstract:We consider the estimation of the transition matrix of a hidden Markovian process by using information geometry with respect to ...Missing: time series geometric smoothing
  58. [58]
    Geometric Learning of Hidden Markov Models via a Method ... - MDPI
    We present a novel algorithm for learning the parameters of hidden Markov models (HMMs) in a geometric setting where the observations take values in Riemannian ...Missing: smoothing | Show results with:smoothing
  59. [59]
    Geodesic Gaussian Processes for the Parametric Reconstruction of ...
    Aug 21, 2020 · We show how the proposed geodesic Gaussian process (GGP) approach better reconstructs the true surface, filtering the measurement noise, than ...
  60. [60]
    [PDF] Information Geometry and Evolutionary Game Theory
    We introduce a new kind of projection dynamics by employing a ray-projection both locally and globally. ... Evolutionary Games and Population Dynamics · J.Missing: emerging | Show results with:emerging
  61. [61]
    GIFT-Bio: Geometric Information Field Theory Applied to Biological ...
    Sep 19, 2025 · PDF | This work investigates potential connections between the geometric information structures of the GIFT (Geometric Information Field ...
  62. [62]
    Information geometry of Lévy processes and financial models
    ### Summary of Alpha-Divergences in Financial Models, Risk Measures, and Connection to Information Geometry
  63. [63]
    (PDF) Information Geometry in Portfolio Theory - ResearchGate
    May 31, 2019 · We review some recent developments in stochastic portfolio theory motivated by information geometry, present illustrative examples and an extension of ...
  64. [64]
    Open problems in information geometry: a discussion at FDIG 2025
    Sep 2, 2025 · Open problems in information geometry are collected and discussed in the conference ``Further Developments of Information Geometry (FDIG) 2025'' ...
  65. [65]
    FIGCI: Flow-Based Information-Geometric Causal Inference
    Aug 7, 2025 · Information-Geometric Causal Inference (IGCI) is a well-established method to identify the causal direction between two variables. It assumes ...Missing: geometry | Show results with:geometry
  66. [66]
    Information Geometry
    Information Geometry is an interdisciplinary journal focused on the mathematical foundations of information science, covering concepts like the Fisher-Rao ...Volumes and issues · Contact the journal · Editorial board · Aims and scopeMissing: 2018 | Show results with:2018
  67. [67]
    Special Issue : Information Geometry - Entropy - MDPI
    Information geometry studies the dually flat structure of a manifold, highlighted by the generalized Pythagorean theorem. The present paper studies a class of ...Special Issue Editors · Special Issue Information · Benefits of Publishing in a...
  68. [68]
    Universal Response Inequalities Beyond Steady States via ... - arXiv
    Mar 16, 2024 · This letter presents a complete trajectory information geometric framework that generalizes response theory for non-stationary Markov processes.