Fact-checked by Grok 2 weeks ago

Algebraic statistics

Algebraic statistics is an interdisciplinary field that integrates tools from , , and to analyze statistical models, particularly those defined by equations and involving random variables. The field emerged prominently in the late , with foundational work by Diaconis and Sturmfels in 1998 introducing Markov bases for log-linear models of tables, which enabled exact goodness-of-fit tests via algebraic methods. This development bridged classical statistics with computational algebra, allowing the study of likelihood equations and structures through ideals and varieties in rings. Key concepts include toric ideals, which parametrize probability distributions in models such as graphical models and Markov models, and Gröbner bases, used to solve systems of equations arising in . Algebraic statistics has applications across diverse areas, including —such as reconstruction using likelihood methods like Felsenstein's and parametric —and social sciences, where it models ranked preferences or through polytope algebras and . Recent extensions include applications to and analysis. In network analysis, it examines conditional independencies in multi-variable systems using tensor representations, as seen in studies of trials or medical experiments assessing variable interactions. Computational tools like 4ti2 for Markov bases and Singular for ideal computations have made these methods practical, while ongoing research explores varieties and phylogenetic invariants to enhance model . The field's growth, documented in influential texts like Pachter and Sturmfels' 2005 book Algebraic Statistics for Computational Biology, underscores its role in advancing both theoretical inference and applied .

Overview

Definition and Scope

Algebraic statistics is an interdisciplinary field that applies tools from , , and to the study and analysis of statistical models, with a particular emphasis on those featuring parametrizations. These tools enable the algebraic description of probability distributions and inference procedures, transforming traditional statistical problems into questions about algebraic structures such as ideals and varieties. The field emerged as a response to the need for exact computational methods in complex probabilistic settings, distinguishing it from classical statistics by prioritizing symbolic and combinatorial techniques over numerical approximations. The scope of algebraic statistics centers on discrete statistical models, such as those involving contingency tables and hidden Markov models, where the underlying data-generating processes correspond to algebraic varieties defined over rings. Unlike broader statistical approaches that often rely on asymptotic approximations or simulation-based methods, algebraic statistics focuses on finite-sample exactness through algebraic invariants, making it especially suited for models with rational or map parametrizations that capture dependencies in categorical data. This delineation excludes purely continuous or non-algebraic models, emphasizing instead the intersection of statistics with exact computability in discrete spaces. At its core, algebraic statistics involves parametrizing probability distributions through rational maps from spaces to probability simplices, allowing the representation of models as images of these maps. It further examines ideals generated by conditional independences, which encode the structural constraints of graphical models, and employs Gröbner bases to facilitate exact inference tasks like hypothesis testing and . These components provide a rigorous for deriving algebraic certificates of statistical properties, such as and likelihood maximizers, without resorting to iterative numerical solvers. The interdisciplinary scope of algebraic statistics extends to through the study of polytopes associated with toric models, to via secant varieties that describe secant dimensions in mixture models, and to optimization by integrating for relaxations in likelihood computations. These connections highlight its role in bridging with applied , fostering advancements in fields like and .

Importance and Interdisciplinary Connections

Algebraic statistics plays a vital role in modern data analysis by enabling exact symbolic computations for key statistical tasks such as model selection, hypothesis testing, and parameter estimation, particularly in high-dimensional settings where numerical methods are prone to failure due to instability or intractability. This approach leverages algebraic structures to provide rigorous guarantees that are absent in approximate numerical techniques, ensuring precise solutions even for complex discrete data distributions. A key advantage over classical statistical methods lies in its ability to handle non-identifiable models through primary decomposition of polynomial ideals, which breaks down the solution set into irreducible components that reveal underlying identifiability structures. Furthermore, it delivers certificates of optimality, for instance via the Hilbert series of toric ideals, which encodes the degrees of freedom in statistical models by yielding a Hilbert polynomial whose leading term corresponds to the model's dimension. The field's interdisciplinary connections underscore its broad impact across sciences. In biology, algebraic statistics facilitates phylogenetic reconstruction by modeling evolutionary processes on trees using toric varieties and invariants, allowing inference of ancestral relationships from genetic sequence data with algebraic guarantees of consistency. Within machine learning, it intersects with tensor decompositions to address latent variable models, where algebraic tools ensure parameter identifiability and efficient recovery of low-rank structures in multi-way arrays representing high-dimensional data. Algebraic methods inspired by algebraic statistics connect to game theory in economics through the representation of equilibria—such as Nash or dependency equilibria—as points on algebraic varieties, enabling the use of commutative algebra to study stability and multiplicity in strategic interactions. Algebraic statistics also tackles pressing challenges in big data, including computational complexity, by deploying specialized algebraic algorithms to resolve issues like moment problems. For example, Fourier-Motzkin elimination projects systems of linear inequalities arising in such problems onto lower dimensions, providing exact feasibility certificates and bounding feasible parameter regions without exhaustive enumeration. This method proves particularly effective in scaling algebraic techniques to large-scale datasets, offering polynomial-time solutions in certain structured cases where heuristic optimizations falter. Recent developments as of 2024 include applications to environmental and ecological modeling for climate change analysis, as well as enhancements to methods like Bayesian additive regression trees using algebraic and combinatorial tools.

Historical Development

Early Foundations (Pre-1990s)

The roots of algebraic statistics trace back to early developments in probability theory during the pre-20th century, where polynomial representations began to emerge as tools for modeling probabilistic outcomes. Jacob Bernoulli's seminal work in Ars Conjectandi (1713) laid foundations for combinatorial probability calculations using polynomial expansions, providing a systematic way to encode sequences of probabilities into power series, implicitly highlighting the polynomial structure inherent in discrete distributions long before explicit algebraic frameworks were applied in statistics. In the 20th century, advanced with contributions that revealed implicit algebraic structures in probabilistic models. Ronald A. Fisher's 1922 paper on the mathematical foundations of theoretical statistics formalized the and explored its properties within families of distributions that later became known as exponential families, where the log-likelihood takes a in parameters, suggesting an underlying without explicit polynomial ideals at the time. This work emphasized sufficiency and , providing the probabilistic backbone for models analyzable via algebraic methods. Building on such ideas, Shelby J. Haberman's research in the 1970s focused on log-linear models for analyzing contingency tables, employing to estimate parameters under sampling, which effectively solved systems of equations arising from marginal constraints in multi-way tables. Haberman's approach highlighted the multiplicative structure of expected frequencies, a key precursor to algebraic representations of these models. Key milestones in the pre-1990s period included early explorations of combinatorial tools for statistical testing. John N. Darroch and Dorothy Ratcliff's 1972 development of for log-linear models offered an algorithmic solution for in exponential families defined on tables, with connections to chi-squared goodness-of-fit tests through the evaluation of model adequacy via deviance measures. This method anticipated later combinatorial moves for exploring the of observed data under fixed margins, akin to precursors of Markov bases used in exact conditional tests. In the 1990s, Pistone and Riccomagno initiated applications of Gröbner bases to statistical problems, particularly in experimental and models, where these bases facilitated the computation of ideal membership for toric ideals arising from factorial structures. The transition toward explicitly algebraic views of statistical models gained traction in the 1980s through work on graphical models. Steffen L. Lauritzen's contributions, including his collaborations on analysis, recognized the structure underlying log-linear parameterizations in undirected graphical models, where constraints correspond to binomial edges in the variety's defining ideal. This perspective bridged probabilistic factorization with , setting the stage for viewing as solving equations on toric varieties without delving into modern computational tools.

Modern Advances (1990s-Present)

The emergence of algebraic statistics as a distinct field in the 1990s was marked by the seminal work of and Bernd Sturmfels, who in their 1998 paper developed algebraic algorithms for sampling from conditional distributions in discrete exponential families, with a focus on contingency tables. This approach leveraged toric ideals to construct Markov bases, enabling exact goodness-of-fit tests by facilitating uniform sampling over conditional tables without relying on approximations. In the , the discipline expanded through foundational texts and computational tools that bridged algebra and applied statistics. The book Algebraic Statistics for by Lior Pachter and Bernd Sturmfels (2005) synthesized these methods for biological applications, such as and phylogenetic reconstruction, emphasizing toric varieties and Gröbner bases for likelihood inference. Complementing this theoretical advancement, the software package 4ti2 was introduced to compute Markov bases and perform related algebraic operations on integer lattices, making these techniques accessible for practical statistical modeling. From the 2010s to the present, algebraic statistics has increasingly intersected with . Advances in have further enhanced , with 2020s research utilizing min-plus algebras to derive efficient algorithms for learning causal structures in extreme value models, outperforming traditional methods on benchmark datasets. Prominent contributors have shaped these developments: Sullivant advanced phylogenetic algebraic geometry by studying the ideals and varieties arising from tree-based evolutionary models, providing tools for and parameter estimation in . Mathias Drton pioneered algebraic techniques for Gaussian graphical models, developing methods to test conditional independences and recover sparse structures from covariance data using determinantal ideals. As of 2023, emerging trends include , exemplified by the of , which extends classical varieties to quantum states for entanglement analysis and processing.

Algebraic Foundations

Key Algebraic Structures

In algebraic statistics, serve as the foundational algebraic objects for modeling probability distributions. A over , denoted \mathbb{Q}[p_1, \dots, p_m], consists of multivariate polynomials in indeterminates p_1, \dots, p_m with coefficients in \mathbb{Q}, where these indeterminates typically represent probability parameters. These rings parametrize statistical models by encoding the algebraic relations among probabilities, particularly in discrete settings like tables or Markov models. A key example is the representation of exponential families as toric rings, where the takes the form p_u = \exp(A^T \alpha \cdot u) / Z(\alpha) for a A and Z(\alpha), capturing log-linear dependencies in graphical models. Ideals within these polynomial rings define the constraints of statistical models, while the corresponding varieties describe the feasible parameter spaces. An ideal I \subseteq \mathbb{Q}[p_1, \dots, p_m] is a subset closed under addition and multiplication by ring elements, generated by polynomials that enforce model equations, such as those arising from conditional independences. Prime ideals play a crucial role in representing irreducible models, as the quotient ring \mathbb{Q}/I being an integral domain ensures the model's algebraic structure cannot be decomposed further, corresponding to indecomposable toric varieties. The affine variety V(I), defined as the zero set \{ p \in \mathbb{C}^m \mid f(p) = 0 \ \forall f \in I \}, geometrically realizes the parameter space of feasible probability distributions, often intersecting with the probability simplex to bound positive probabilities. For instance, in log-linear models, the variety V_A is the Zariski closure of the image of the parametrization map, delineating the manifold of attainable marginal distributions. Gröbner bases provide an algorithmic tool for manipulating ideals to analyze model complexity. A Gröbner basis of an ideal I with respect to a is a generating set \{g_1, \dots, g_s\} such that the leading terms generate the leading-term ideal, allowing reduction of any in I to zero via —effectively reducing the ideal to a canonical form via . This structure enables computation of the of the V(I), which equals the of the and indicates the in the model, as well as the degree, measuring the number of intersection points with a generic and quantifying intrinsic complexity. In practice, software like Macaulay2 or 4ti2 implements these computations for statistical ideals, such as determining the dimension of phylogenetic models. Lattice polytopes arise in the combinatorial aspects of algebraic statistics, particularly for discrete models involving integer constraints. A lattice polytope is the convex hull of lattice points in \mathbb{Z}^d, and in this context, it connects to the marginal polytope of a contingency table model, which is the convex hull of all possible marginal probability vectors (sufficient statistics) attainable under the model, encoding the feasible supports for observed data. Fiber polytopes for fixed margins are also lattice polytopes, and the Ehrhart polynomial L_P(t) of such a polytope P is a polynomial of degree \dim P that counts the number of lattice points in the t-dilate tP \cap \mathbb{Z}^d, providing an exact formula for enumerating the integer points in model fibers—essential for exact hypothesis testing and sampling in contingency tables without approximation. For example, in toric models, the coefficients of the relate to volumes and symmetries of the polytope, facilitating precise computation of the number of contingency tables with given margins.

Toric Ideals and Markov Bases

In algebraic statistics, toric ideals arise as the of a parametrization map encoding the algebraic relations imposed by the . Specifically, for a model with A \in \mathbb{Z}^{d \times n} (d sufficient statistics dimensions, n table cells), the toric ideal I_A is the kernel of the map \pi: \mathbb{K}[x_1, \dots, x_n] \to \mathbb{K}[t_1, \dots, t_d] given by \pi(x_i) = \prod_{j=1}^d t_j^{A_{j i}} for i = 1, \dots, n. This kernel consists of binomials \mathbf{x}^{\mathbf{u}^+} - \mathbf{x}^{\mathbf{u}^-} such that \mathbf{u}^+ - \mathbf{u}^- \in \ker(\mathbb{Z}^n \to \mathbb{Z}^d), where \mathbf{u}^\pm \in \mathbb{N}^n have disjoint supports, reflecting the linear constraints of the model. The generators of this ideal directly correspond to the conditional independences or other structural constraints in the underlying probability model, providing an algebraic certificate for the dimension and variety of the model. Markov bases extend this framework by providing a combinatorial tool for exact in models, defined as a minimal set of vectors \mathcal{B} \subset \ker(A) (where A is the of the model) such that adding multiples of elements from \mathcal{B} connects any two points in a \{ \mathbf{y} \in \mathbb{N}^m : A\mathbf{y} = A\mathbf{x} \} via lattice walks that preserve the sufficient statistics. A key states that if \mathcal{B} is a Markov basis, then the fiber —vertices as tables in the fiber, edges via moves in \mathcal{B}—is connected, enabling uniform sampling from the conditional distribution via methods without rejection, as the chain is irreducible and aperiodic under suitable choices. This connectivity ensures that the is uniform over the fiber, facilitating exact goodness-of-fit tests for models like log-linear families. Computing minimal Markov bases involves solving integer programs to find generating sets for the lattice ideal, often using algorithms that compute Gröbner bases of the toric ideal or directly optimize over the syzygies via branch-and-bound methods in software like 4ti2. However, determining a minimal Markov basis is NP-hard in general, even for simple hierarchical models, due to the exponential growth in the number of generators with model dimension. A classic example is the independence model for a $2 \times 2 contingency table with cells labeled a, b, c, d, where the toric ideal is generated by the binomial ad - bc, corresponding to the constraint on row and column margins. The associated Markov basis consists of the vector (1, -1, -1, 1), which generates moves that preserve margins and connect all tables with fixed totals, such as transitioning between observed and expected counts under .

Core Methods

Maximum Likelihood Estimation

In algebraic statistics, (MLE) involves solving the score equations derived from the log-likelihood function, \frac{\partial \log L}{\partial \theta} = 0, which often yield a for parametric models with discrete or mixed supports. These equations arise naturally in models such as multinomial distributions or Markov models, where the parameters \theta parameterize the , transforming the into finding of an . The multiplicity of solutions to these polynomial systems is analyzed using , which provides an upper bound on the number of intersection points of hypersurfaces in , reflecting the geometric complexity of the likelihood landscape. This bound, known as the maximum likelihood degree (ML degree), quantifies the algebraic difficulty of MLE and remains invariant under reparameterizations of the model. Algebraic methods address these polynomial systems using Gröbner bases to triangularize the , enabling the identification of critical points including the maximum. For instance, in low-dimensional cases, Gröbner bases simplify the system into univariate whose roots correspond to candidate MLEs, with numerical refinement used to select the likelihood-maximizing solution. The expectation-maximization () algorithm, commonly used for models with hidden variables, admits an algebraic reformulation where the E-step computes expectations over latent varieties and the M-step solves restricted likelihood equations, often as toric ideals for discrete supports. This perspective reveals convergence properties through the geometry of secant varieties associated with incomplete data. Identifiability in MLE is assessed via analysis, where the model is locally identifiable if the at a point does not contain the direction, ensuring a unique local maximum. In mixture models, such as Gaussian mixtures, the likelihood equations define points on varieties, and non- occurs when these varieties have deficient dimensions, leading to multiple parameter sets yielding the same observed distribution. For example, in bivariate Gaussian mixtures, algebraic conditions on the structure determine when the variety fills the ambient space, guaranteeing identifiability up to label switching. A example arises in families, where the simplify to forms like \sum x_i \log \theta_j - n \log \sum \theta_j = 0 for multinomial sampling with n trials, solvable algebraically in small dimensions using resultants to yield explicit rational expressions for the MLE. This highlights how the natural parameters \theta_j balance observed counts x_i against , with Gröbner bases providing a computational for higher dimensions. Recent advances as of 2024 employ numerical , such as homotopy continuation methods, to compute MLE solutions in complex models with high ML degrees.

Method of Moments

The method of moments in algebraic statistics formulates parameter estimation as the solution to equations that equate theoretical moments of a to their empirical counterparts computed from . These moment equations arise from expressing expected values, such as \mathbb{E}[X^k] for a X, in terms of model parameters, resulting in a of relations. For instance, in a Gaussian model, the second-order moment satisfies the equation \mathbb{E}[X^2] = \mu^2 + \sigma^2, where \mu is the and \sigma^2 is the variance; higher-order moments yield additional polynomials that must hold simultaneously. When the number of sample moments exceeds the number of parameters, the system becomes overdetermined, and solutions are sought by projecting onto the defined by the moment equations, often using least-squares minimization constrained to the variety. This geometric approach leverages tools from computational algebra to identify feasible parameter sets, ensuring consistency by testing membership in the via Gröbner bases or techniques. In mixture models, such as Gaussian mixtures, the moment variety G_{n,d} parameterizes all possible sequences up to order d in n dimensions, and secant varieties \sigma_k(G_{n,d}) capture mixtures of k components; solving involves intersecting sample moments with these varieties. Algebraic resolution of these systems employs —relations among the generators of the —to derive minimal sets of independent conditions necessary for and estimation. The of the ideal provides a basis for redundant relations, reducing in solving for parameters; for example, in homoscedastic Gaussian mixtures, syzygies among minors enforce the necessary constraints. In non-parametric settings, algebraic techniques address the , which determines whether a given sequence of real numbers corresponds to the moments of a positive measure on \mathbb{R}. This is resolved by checking positive semidefiniteness of associated and representing the representing measure via sums-of-squares () decompositions of positive polynomials, ensuring the sequence lies in the moment cone. methods, implemented via , provide certificates of positivity and enable dual optimization over sequences, with applications to and goodness-of-fit testing in algebraic frameworks. Recent work as of 2025 uses numerical for the method of moments in high-dimensional settings, improving scalability for mixture models.

Applications in Modeling

Graphical Models

Graphical models provide a framework for representing multivariate probability through graphs, where vertices correspond to random variables and edges encode conditional dependencies or independencies. In algebraic statistics, these models are analyzed using polynomial ideals that capture the constraints imposed by the graph structure on the moments, cumulants, or probabilities of the . This algebraic approach facilitates the study of , parameter estimation, and by leveraging tools from and . Conditional independence ideals form the algebraic core of graphical models, encoding the separation criteria derived from the graph. In directed acyclic graphs (DAGs), d-separation determines conditional independencies: two nodes are d-separated given a conditioning set if every path between them is blocked, meaning non-colliders are in the set or colliders and their descendants are not. The generators of these ideals consist of polynomials vanishing on the distributions satisfying these independencies, such as binomial relations in discrete models or determinantal ideals in Gaussian cases. For instance, in a simple chain graph representing the conditional independence between variables X_1 and X_3 given X_2, the ideal is generated by binomials of the form u_{i j k} u_{i' j k'} - u_{i j k'} u_{i' j k}, where u_{i j k} denotes the joint probability for states i of X_1, j of X_2, k of X_3, capturing the Markov properties through the factorization of the joint distribution. For Gaussian Markov random fields, which correspond to undirected graphs, the vanishing ideals are defined by constraints on the covariance matrix, known as the concentration matrix. Entries in the concentration matrix K = \Sigma^{-1} are zero for pairs of variables separated by the , reflecting conditional independencies. These ideals are often toric, generated by quadratic polynomials in the entries of the \Sigma. Parameter estimation in these models involves solving algebraic equations subject to the graph's constraints. For chordal graphs, which admit perfect elimination orderings, the concentration matrix can be decomposed into sums over maximal cliques, enabling explicit via or algorithms that enforce . This decomposition exploits the graph's structure to reduce the problem to estimating parameters on cliques and propagating them globally. Model selection in graphical models benefits from algebraic techniques like primary decomposition of conditional independence ideals, which reveals the irreducible components corresponding to possible graph structures. Score-based tests compare observed data against these decompositions to select the model with the highest posterior probability or lowest discrepancy, using Gröbner bases to compute the necessary statistics efficiently. This approach is particularly useful for testing nested models or identifying minimal sets of independencies.

Phylogenetics

Phylogenetic invariants are functions that vanish on the algebraic varieties parameterizing probability distributions generated by specific evolutionary models, enabling the identification of topologies without estimating numerical parameters such as lengths. These invariants provide a non-parametric approach to , distinguishing between different topologies by evaluating whether observed site pattern frequencies satisfy the polynomials associated with a hypothesized . Seminal work introduced these invariants for discrete-state models, demonstrating their utility in resolving evolutionary relationships from sequence data. A prominent example involves quartet invariants for four taxa, which help determine the topology among the three possible resolved quartets. For taxa labeled A, B, C, D under models like the general , the invariant is derived from the expected frequencies of site patterns. Specifically, for the topology ((A,B),(C,D)), the key is given by D_{AB} D_{CD} - D_{AC} D_{BD} - D_{AD} D_{BC} = 0, where D_{XY} = \det(M_{XY}) and M_{XY} is the $4 \times 4 with entries p_{st}^{XY}, the joint probability of observing s at taxon X and t at Y. This vanishes on the variety for the correct but not for alternatives. Quartet invariants extend to group-based models, where transforms linearize the parameterization, facilitating the computation of higher-degree invariants via toric ideals. Distance-based methods in algebraic phylogenetics employ metrics defined on tree spaces to infer topologies, leveraging the additive property of tree metrics under certain models. These metrics, such as those derived from log-det transformations for aligned sequences, embed trees into spaces while preserving , allowing distance matrices to be tested for tree-likeness via invariants like the four-point condition. The Cavender-Farris-Neyman (CFN) model, a symmetric , parameterizes distributions as a , where the invariants form a toric ideal generated by binomials corresponding to cycle structures in the tree's parameterization. This toric structure enables efficient computations for and on claw trees and balanced trees. Maximum likelihood estimation on phylogenetic trees involves optimizing branch lengths to maximize the probability of observed data under a specified model, often requiring solutions to likelihood equations that resemble Poisson processes for substitution rates. In continuous-time Markov models, branch lengths represent expected substitutions, estimated by solving transcendental equations derived from the matrix exponential parameterization, such as \mathbf{P}(t) = \exp(\mathbf{Q} t), where \mathbf{Q} is the rate matrix and t the branch length. Software like PAUP* implements these via numerical algebraic solvers, including Newton-Raphson iterations on the score equations, to compute maximum likelihood estimates for tree parameters under models like Jukes-Cantor or general time-reversible. Algebraic approaches further analyze the ML degree of phylogenetic varieties, quantifying the number of critical points in the optimization landscape.

Emerging and Advanced Areas

Tropical Statistics

Tropical statistics emerges as a robust framework within algebraic statistics, leveraging to handle uncertainty and outliers in probabilistic inference. Tropicalization transforms classical algebraic varieties associated with statistical models into piecewise-linear objects by replacing addition with minimization and multiplication with addition, known as the min-plus algebra. This shift facilitates the analysis of log-likelihood functions and enables computations that are more resilient to perturbations in data. The core operation in this algebra is the tropical sum, defined as a \oplus b = \min(a, b), with tropical multiplication given by a \otimes b = a + b. These operations extend to vectors and matrices, forming a semiring that underpins tropical varieties—sets defined by the non-negativity of tropical polynomials, which correspond to the limits of classical varieties under logarithmic scaling. In statistical modeling, tropical varieties represent piecewise-linear approximations of exponential families, capturing the geometry of maximum likelihood estimation in a combinatorial manner. For instance, the tropicalization of a graphical model's variety yields a fan structure that encodes conditional independence constraints as linear inequalities. A key application is tropical principal component analysis (PCA), which adapts classical to the tropical projective for dimension reduction and detection. Tropical PCA identifies the "closest" tropical linear space or to observed data points using a tropical , such as d_{\text{tr}}(v, w) = \max_{i,j} |v_i - w_i - v_j + w_j|, making it robust to outliers because the max-norm emphasizes extreme deviations without being overly influenced by noise. This method has been applied to phylogenetic data, where it projects tree spaces into lower dimensions, revealing clusters and removing anomalous trees to improve model fit, as demonstrated on datasets with 252 sequences from eight Apicomplexa . In , tropical statistics employs ultrametrics derived from tropical distances to construct dendrograms that reflect evolutionary or hierarchical structures. Ultrametrics satisfy the strong d(x, z) \leq \max(d(x, y), d(y, z)), aligning naturally with the min-plus and enabling divisive analysis () algorithms over the tropical projective . Such methods outperform clustering in accuracy when applied to simulated phylogenetic trees under multi-species models, achieving higher recovery rates for true clusterings in datasets of 1000 gene trees sampled from 20 individuals. This approach links to by modeling tree metrics as tropical convex hulls. For , the tropical is formulated as the maximum of the negative log-, or "max-log," which tropicalizes the classical by evaluating \ell_{\text{trop}}(u) = \bigoplus_i u_i \otimes \theta_i = \min_i (u_i + \theta_i) over parameters \theta. This perspective simplifies optimization in discrete models, such as those for tables, where tropical ideals are generated by bases of tropical binomials corresponding to conditional independences. These tropical families generalize classical ones by parameterizing distributions via tropical cumulants, offering scalable tools for robust testing in high-dimensional data.

Statistical Learning Theory

Algebraic statistics intersects with by employing tools from and to analyze model complexity, identifiability, and generalization in settings. These methods provide rigorous frameworks for understanding phenomena like and regularization, often through the parameterization of statistical models as algebraic varieties. For instance, the geometry of these varieties reveals bounds on learning , while tensor decompositions ensure unique recovery of latent structures in high-dimensional data. In algebraic regularization, techniques such as the penalty promote sparsity by encouraging solutions where many coefficients vanish, akin to testing membership in ideals generated by sparse in a . The membership problem—determining whether a given lies in an spanned by sparse generators—mirrors the sparsity constraints in Lasso regression, allowing algebraic algorithms like Gröbner bases to certify sparse solutions efficiently. This perspective extends to broader sparse modeling, where the support of the solution corresponds to the minimal generators of the , providing a combinatorial certificate for regularization outcomes. Furthermore, the algebraic geometry of the dimension quantifies the shattering capacity of hypothesis classes; for discrete Markov networks, the dimension equals the dimension of the associated toric , linking to the algebraic complexity of the model. This connection enables bounds on learnability using existential theory of the reals to compute dimensions via descriptions. Tensor methods in algebraic statistics leverage canonical polyadic (CP) decomposition to address identifiability in topic models, where the observed word co-occurrence tensor is decomposed into a sum of rank-1 components representing latent topics. Uniqueness of this decomposition, guaranteed under Kruskal's conditions on the factor matrices, ensures that model parameters can be recovered from moments of the data distribution, facilitating efficient learning algorithms like tensor power iteration. In independent component analysis (ICA), secant varieties of moment varieties play a central role in establishing identifiability; the generic points on the k-th secant variety correspond to mixtures of k independent components, and non-degeneracy conditions (e.g., non-Gaussianity) ensure unique decomposition of the observed covariance structure. This geometric framework, rooted in the closure of unions of linear spans through variety points, provides algebraic certificates for the recoverability of mixing matrices in blind source separation. Generalization bounds in kernel methods benefit from algebraic extensions, such as C*-algebraic structures in reproducing Hilbert modules (RKHM), which refine measures to account for operator-valued kernels. These bounds show that the expected decays as O(1/√n) for n samples, with the complexity term controlled by the module's algebraic , offering tighter estimates than standard analyses. A key concept linking algebra to learning is via the of the : higher- varieties parameterizing the model can interpolate training data more flexibly, but their increased algebraic amplifies variance, leading to poor out-of-sample performance as quantified by the variety's embedding . Similarly, ties to the Hilbert function of the coordinate ring, which tracks the growth rate of basis elements in graded and serves as an algebraic for the function class's richness, bounding the expected supremum deviation over random labels. This integration highlights how algebraic invariants like the Hilbert function predict gaps in overparameterized models.

Algebraic Analysis and Abstract Inference

Algebraic analysis in statistics reframes classical problems using tools from commutative and non-commutative , treating statistical models as algebraic varieties or semialgebraic sets defined by equations and inequalities. In this framework, abstract views the process of drawing conclusions from data as ring homomorphisms between rings of observables, where the is represented by a generated by random variables, and the parameter space maps to the probability via a parametrization homomorphism whose defines the model's toric . Sufficiency emerges algebraically through quotient s: a is sufficient if the conditional factors through a by the generated by functions invariant under the sufficient , preserving the structure of the while reducing dimensionality, as detailed in the geometry of sufficient statistics cones for families where maximum likelihood estimates exist when data points lie in the cone's interior. Decision theory receives an algebraic treatment by encoding Bayesian updates and risk measures into polynomial structures. Algebraic Bayes factors can be computed as ratios derived from the cells of Gröbner fans associated with the ideals of marginal and conditional likelihoods, providing a combinatorial for model comparison in discrete settings like contingency tables, where the fan's regular corresponds to optimal posterior approximations. estimation over spaces addresses equivariant problems under group actions, such as recovering signals up to ; here, invariants from the invariant R^G (where G acts on \mathbb{R}^p) allow estimation of orbits from noisy observations, with sample complexity scaling as \Theta(\sigma^{2d^*}), where d^* is the minimal degree of separating , achieving optimal rates for problems like cryo-electron microscopy. Non-commutative extensions extend this framework to quantum statistics, replacing commutative polynomial rings with free algebras to model entangled quantum systems. In free algebraic statistics, observables are non-commuting operators, and statistical models are defined over free semialgebraic sets where positivity constraints become positive-semidefiniteness of Hermitian matrices; for instance, independence models lift to positive operator-valued measures (POVMs) with tensor-product decompositions, capturing quantum correlations in non-local games via the free convex hull of rank-one parametrizations. This approach facilitates inference in entangled models by evaluating polynomials on matrix tuples, enabling algebraic optimization for quantum hypothesis testing and parameter estimation beyond classical limits. A key result in this algebraic paradigm is the algebraization of the Neyman-Pearson lemma through orthogonal : the most powerful test for simple hypotheses corresponds to the quotient by an orthogonal ideal in the ring of test functions, where orthogonality ensures the separates the null and alternative measures maximally, generalizing the classical lemma to invariant and equivariant settings via of the hypothesis . This provides a computation for critical regions, ensuring type I error control while maximizing in algebraic models of .

Partially Ordered Sets and Lattices

In algebraic statistics, partially ordered sets (posets) provide a natural framework for modeling preference orders in ranking data, where the states of a statistical model correspond to orderings of a finite set of items represented as maximal chains in a graded poset. These models parameterize families of probability distributions over rankings using rational functions, leading to algebraic varieties that capture dependencies among preferences; for instance, the Plackett-Luce model is non-toric, while toric examples include the Birkhoff and Bradley-Terry models, whose toric ideals and Markov bases are analyzed via the poset's combinatorial structure. The Dedekind-MacNeille completion extends such posets to complete lattices by embedding them into the smallest complete lattice containing the original order, facilitating closure operators that ensure the statistical model's hierarchical structure is fully deductive and handles incomplete rankings through ideal completions. Lattice theory plays a central role in algebraic statistics through distributive lattices, which model marginalization operations in multi-way under hierarchical log-linear models. In these models, the set of allowable interactions among categorical variables forms a distributive ordered by refinement of marginal constraints, where the operations correspond to combining or intersecting marginal distributions, enabling exact tests via toric ideals. Birkhoff's representation theorem asserts that every finite distributive is isomorphic to the of ideals in a poset of join-irreducible elements, providing an algebraic decomposition for such marginal lattices in analysis; this representation aids in enumerating sufficient statistics and computing Gröbner bases for in sparse data settings. For inference in these structures, algebraic tests assess compatibility by verifying whether marginal or conditional distributions derive consistently from a global within the of kernels, using projections and conditionings as operations to check dominance relations. inversion from the of the inverts summation over order ideals to perform inclusion-exclusion on probabilities, computing query probabilities as P(\Phi) = \sum_{v \leq \hat{1}} \mu_L(v, \hat{1}) P(\lambda(v)), where \mu_L is the , thus enabling efficient Bayesian updates and avoiding overcounting in hierarchical models. A key example is the Boolean lattice for power set models, where the poset of subsets ordered by inclusion models independence structures in multi-dimensional binary data, such as in latent class analysis; the zeta function is defined by \zeta(S,T) = 1 if S \subseteq T and 0 otherwise, facilitating Möbius inversion to compute exact probabilities over subset events via the incidence algebra.

Illustrative Examples

Basic Example: Independence Models in Contingency Tables

A fundamental illustration of algebraic statistics arises in testing independence between two binary variables using a 2×2 contingency table, where observed counts \mathbf{u} = (u_{11}, u_{12}, u_{21}, u_{22}) follow a multinomial distribution with total sample size n = \sum u_{ij} and unknown probabilities \mathbf{p} = (p_{11}, p_{12}, p_{21}, p_{22}) \in \Delta_3, the 3-simplex. Under the null hypothesis of independence, the probabilities satisfy p_{ij} = a_i b_j for parameters a_1, a_2, b_1, b_2 > 0 with \sum a_i = \sum b_j = 1, forming a toric model whose algebraic variety is the rank-1 surface in \mathbb{R}^4 defined by the condition that the probability matrix has rank at most 1. This model preserves fixed row and column marginals, and the corresponding toric ideal in the homogeneous coordinate ring is generated by the single binomial I = \langle p_{11} p_{22} - p_{12} p_{21} \rangle, which encodes the independence constraints algebraically. To test the hypothesis conditional on the observed marginals, algebraic methods employ a Markov basis for the integer kernel of the encoding the marginal constraints, enabling conditional via sampling. For the 2×2 model, the minimal Markov basis consists of a single essential move up to : \mathbf{m} = (1, -1, -1, 1), corresponding to adding or subtracting 1 from the diagonal entries while adjusting the off-diagonals to preserve margins; this basis connects all tables in the \mathcal{F}_\mathbf{u} = \{ \mathbf{v} \in \mathbb{N}^4 : A\mathbf{v} = A\mathbf{u} \}, where A is the 2×2 for rows and columns. The is the Pearson chi-squared value X^2(\mathbf{v}) = \sum_{i,j} \frac{(v_{ij} - \hat{v}_{ij})^2}{\hat{v}_{ij}}, with expected counts \hat{v}_{ij} computed from the marginals, evaluated at tables \mathbf{v} in the fiber. A step-by-step exact test proceeds as follows: first, generate the Markov basis using computational software, which for this model yields \{\pm \mathbf{m}\}; then, starting from the observed \mathbf{u}, perform a on \mathcal{F}_\mathbf{u} by adding integer multiples of basis elements while staying non-negative, approximating the over the after sufficient steps (mixing time is rapid here due to the simple basis). The p-value is the proportion of sampled tables with X^2(\mathbf{v}) \geq X^2(\mathbf{u}), providing a non-asymptotic to the standard chi-squared approximation, which assumes large n and follows a \chi^2_1 distribution. For small samples, the avoids the approximation's inaccuracies, as the grows factorially but remains tractable for tables. The for the asymptotic equal the of the , which is 1 here, reflecting the single algebraic constraint reducing the 3-dimensional to a 2-dimensional surface (after for the \sum p_{ij} = 1). This aligns with the statistical (2-1)(2-1) = 1, linking the geometric dimension of the model to classical .

Advanced Example: Phylogenetic Invariants

In phylogenetic reconstruction, an advanced application of algebraic statistics involves using invariants to infer quartet trees from distance data under the additive tree model. Consider four taxa labeled A, B, C, and D, where pairwise distances d_{ij} represent the expected evolutionary distances between taxa i and j, assumed to follow an additive model on an unrooted binary tree. Under this model, the distances satisfy the four-point condition: for any quartet, the two largest of the three possible sums d_{AB} + d_{CD}, d_{AC} + d_{BD}, and d_{AD} + d_{BC} are equal, corresponding to the path lengths crossing the internal branch of the true tree topology. The key invariants are constructed as linear polynomials in the distances, each vanishing precisely on the variety defined by one of the three possible quartet topologies. Specifically, the six polynomials consist of the three primary differences and their negatives: \delta_{AB|CD} = d_{AC} + d_{BD} - d_{AD} - d_{BC}, \delta_{AC|BD} = d_{AB} + d_{CD} - d_{AD} - d_{BC}, \delta_{AD|BC} = d_{AB} + d_{CD} - d_{AC} - d_{BD}, and -\delta_{AB|CD}, -\delta_{AC|BD}, -\delta_{AD|BC}. For the true topology AB|CD, \delta_{AB|CD} = 0 holds exactly, while the other two \deltas are negative and equal (each minus twice the length of the internal branch); the negatives ensure the ideal is generated symmetrically for computational purposes in software. These polynomials derive from the edge invariants of the tree metric, providing a coordinate-free description of the model . To reconstruct the tree, evaluate the observed distances on these polynomials and identify the via the sign patterns or magnitudes. Under the with no noise, exactly one \delta vanishes (equals zero), and the corresponding split defines the topology; the other two \deltas share the same negative sign. With noisy data, the topology is the one minimizing the of its \delta, or more algebraically, solve the system \delta_k = 0 for each k using resultants to eliminate variables and test membership in the (though for linear cases, direct evaluation suffices; resultants are crucial for higher-degree invariants in sequence-based models). This approach extends the basic models by handling continuous distance parameters and multivariate dependencies. These invariants offer robustness to mild model misspecification, such as heterogeneous rates across sites, as the persists under certain transformations (e.g., gamma-distributed rates), unlike higher-degree invariants that may fail. However, severe long-branch can distort sign patterns, requiring additional checks like \delta-plots to detect outliers. Software implementations, such as 's distance-based methods or Macaulay2 for membership testing, facilitate practical computation; for instance, computes the distances and applies the four-point metric to score topologies efficiently.

References

  1. [1]
    [PDF] Lectures on Algebraic Statistics - Berkeley Math
    Jan 3, 2014 · Algebraic statistics is concerned with the development of techniques in algebraic geometry, commutative algebra, and combinatorics, ...
  2. [2]
    [PDF] Algebraic Statistics - TUHH Open Research
    Algebraic statistics brings together ideas from algebraic geometry, commutative algebra, and combina- torics to address problems in statistics and its ...
  3. [3]
    [PDF] AN INTRODUCTION TO ALGEBRAIC STATISTICS - diism@unisi
    May 3, 2016 · Algebraic Statistics is rarely interested in situations where just one random variable is concerned. Instead, networks containing several random ...
  4. [4]
    [PDF] 1. What is Algebraic Statistics? - UPCommons
    What is Algebraic Statistics? Algebraic statistics is an interdisciplinary field that uses tools from computational al- gebra, algebraic geometry ...
  5. [5]
    Algebraic statistics - ACM Digital Library
    Jul 22, 2012 · Algebraic statistics advocates polynomial algebra as a tool for addressing problems in statistics and its applications.
  6. [6]
    Open Problems in Algebraic Statistics - SpringerLink
    Algebraic statistics is concerned with the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry.
  7. [7]
    Algebraic Statistics for Computational Biology
    Algebraic Statistics for Computational Biology. Algebraic Statistics for Computational Biology ... By Lior Pachter, Bernd Sturmfels. You have access Access. PDF ...
  8. [8]
    [PDF] ALGEBRAIC STATISTICAL MODELS
    Abstract: Many statistical models are algebraic in that they are defined in terms of polynomial constraints, or in terms of polynomial or rational ...
  9. [9]
    [PDF] Tensor Decompositions for Learning Latent Variable Models
    The relevance of tensor analysis to latent variable modeling has been long recognized in the field of algebraic statistics (Pachter and Sturmfels, 2005), and ...
  10. [10]
    [PDF] Geometry of Dependency Equilibria - UCLA Statistics & Data Science
    Conclusion: We presented a new direction at the interface of game theory and algebraic statistics. It has many open problems. Aside: causal inference is a hot ...
  11. [11]
    Mathematisches Forschungsinstitut Oberwolfach Algebraic Statistics
    For example, it was used in algebraic statistics by Sturmfels–Welker [11] to study linear ordering polytopes, which can be seen as the model polytopes of ...
  12. [12]
    jacob bernoulli and his works on probability and law of large numbers
    Around 328 years ago, Jacob Bernoulli worked out on the game of chances (from 1684 to 1689) which forms the background of the theory of probability. His book ' ...
  13. [13]
    [PDF] The Bernoulli Family: Their Massive Contributions to Mathematics ...
    Jacob Bernoulli was one of the pioneers of the mathematical theory of probability. His first paper on probability theory was published in 1685. The greatest ...
  14. [14]
    On the mathematical foundations of theoretical statistics - Journals
    On the mathematical foundations of theoretical statistics. R. A. Fisher ... DasGupta A (2011) The Exponential Family and Statistical Applications Probability ...
  15. [15]
    [PDF] Sufficient Statistics and Likelihood Equations Shelby J. Haberman ...
    Oct 2, 2007 · 1. Introduction. Log-linear models for contingency tables have received con- siderable attention in recent years; however, with a few exceptions ...Missing: iterative proportional
  16. [16]
    Log-Linear Fit for Contingency Tables
    S. J. Haberman; Log-Linear Fit for Contingency Tables, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 21, Issue 2, 1 June 19.Missing: source | Show results with:source
  17. [17]
    Gröbner bases and factorisation in discrete probability and Bayes
    Aug 7, 2025 · Grbner bases, elimination theory and factorization may be used to perform calculations in elementary discrete probability and more complex ...
  18. [18]
    [PDF] On the toric algebra of graphical models - Berkeley Math
    The first approach is to define graphical models by spec- ifying a graph according to which a probability distribution must factor in order to belong to the ...Missing: Steffen | Show results with:Steffen
  19. [19]
    [PDF] Mixed graphical association models - Oxford statistics department
    The paper surveys the mathematical and statistical theory of mixed graphical association models us introduced by Lauritzen & Wermuth (1984), concerned with ...Missing: toric | Show results with:toric
  20. [20]
    Algebraic algorithms for sampling from conditional distributions
    We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic.
  21. [21]
    4ti2--A software package for algebraic, geometric and combinatorial ...
    Aug 29, 2018 · A software package for algebraic, geometric and combinatorial problems on linear spaces · Fixed bug in `markov' for computing with inhomogeneous ...
  22. [22]
    [2207.10227] The tropical geometry of causal inference for extremes
    Jul 20, 2022 · In this paper, we review recent work where insights from tropical geometry were used to develop new, efficient learning algorithms with leading performance.Missing: plus algebras 2020s
  23. [23]
    [math/0407033] Phylogenetic Algebraic Geometry - arXiv
    Jul 2, 2004 · Abstract: Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees.
  24. [24]
    Advancing mathematics by guiding human intuition with AI - Nature
    Dec 1, 2021 · We propose a process of using machine learning to discover potential patterns and relations between mathematical objects, understanding them with attribution ...
  25. [25]
    [PDF] Algebraic Statistics - TUHH Open Research
    1.8 Toric Ideals. Toric ideals represent algebraic relations between monomials and arise naturally in algebraic statistics. Let K be a field and let X1 ...
  26. [26]
    Markov Bases in Algebraic Statistics - SpringerLink
    In stockIn this book we take up this topic and present a detailed summary of developments following the seminal work of Diaconis and Sturmfels.
  27. [27]
    [PDF] Computing Markov bases, Gröbner bases, and extreme rays
    The worst case complexity of Gröbner basis methods is not fully known but it is at least exponential as is expected since integer programming is NP-hard.
  28. [28]
    [PDF] a simplification of computing markov bases for graphical models ...
    De Loera and Onn (2006) showed that computation of a Markov basis for a log-linear model is an NP-hard problem. However, Markov bases for some special ...
  29. [29]
    [PDF] Solving the Likelihood Equations - arXiv
    Solving the Likelihood Equations. Serkan Hosten, Amit Khetan and Bernd Sturmfels. Abstract. Given a model in algebraic statistics and data, the likelihood func-.
  30. [30]
    [PDF] Maximum Likelihood - UC Berkeley math
    Feb 14, 2023 · The ML degree measures the algebraic complexity of the MLE. Theorem 5 says that the ML degree is a topological invariant.
  31. [31]
    Algebraic Approach to Maximum Likelihood Factor Analysis
    Sep 15, 2025 · In our algebraic algorithm, we employ the theory of Gröbner bases to get simplified sub-problems for a given system of algebraic equations.
  32. [32]
    The EM Algorithm for Hidden Markov Models (Chapter 12)
    As discussed in Chapter 1, the EM algorithm is an iterative procedure used to obtain maximum likelihood estimates (MLEs) for the parameters of statistical ...Missing: reformulation | Show results with:reformulation
  33. [33]
    [PDF] Likelihood Geometry - Princeton Math
    Sturmfels, Algebraic Statistics for Computational. Biology (Cambridge University Press, Cambridge, 2005). [Rai]. C. Raicu, Secant varieties of Segre–Veronese ...
  34. [34]
    [PDF] Algebraic Identifiability of Gaussian Mixtures - arXiv
    Apr 4, 2017 · The following well-known result on higher secant varieties of a variety X allows us to show that X is not k-defective for any k by proving ...
  35. [35]
    [PDF] Moment Varieties of Gaussian Mixtures - arXiv
    Oct 15, 2015 · 3 Pearson's Crabs: Algebraic Statistics in 1894. The method of moments in statistics was introduced by Pearson in his 1894 paper [10]. In our ...
  36. [36]
    Moment Identifiability of Homoscedastic Gaussian Mixtures
    Jul 6, 2020 · In the case of moments of mixture distributions, there is a natural correspondence to secant varieties of the moment varieties. Studying ...
  37. [37]
    [PDF] Moment Identifiability of Homoscedastic Gaussian Mixtures - arXiv
    May 15, 2020 · In the context of algebraic statistics [19], moments of probability distributions have recently been explored from an algebraic and ...
  38. [38]
    [PDF] sums of squares, moment matrices and optimization over polynomials
    Feb 5, 2008 · This paper considers minimizing a polynomial over a semialgebraic set, using sums of squares and moment matrices, and semidefinite relaxations.
  39. [39]
    [PDF] On the toric algebra of graphical models - Microsoft
    The three-variable-chain undirected graphical model with graph G equal to X1 − X2 − X3 has generators G = {{X1,X2},{X2,X3}} ... toric ideal IA provides a set ...<|control11|><|separator|>
  40. [40]
    [PDF] Gaussian Graphical Models: An Algebraic and Geometric Perspective
    Jul 13, 2017 · Gaussian graphical models are used throughout the natural sciences, social sciences, and economics to model the statistical relationships ...
  41. [41]
    A rate-independent technique for analysis of nucleic acid sequences
    A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Free. J A Lake. J A Lake. Search for other works by this author ...
  42. [42]
    Invariants of phylogenies in a simple case with discrete states
    Cavender, J.A., Felsenstein, J. Invariants of phylogenies in a simple case with discrete states. Journal of Classification 4, 57–71 (1987). https://doi.org ...
  43. [43]
    [q-bio/0402015] Toric ideals of phylogenetic invariants - arXiv
    Feb 7, 2004 · Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gröbner bases for these toric ideals.
  44. [44]
    Algebraic Methods in Phylogenetics | Bulletin of Mathematical Biology
    Dec 19, 2018 · The works in this volume showcase the varied directions in which algebra is playing a role in current phylogenetic research.Missing: seminal papers
  45. [45]
    Toric geometry of the Cavender-Farris-Neyman model with a ...
    We give a combinatorial description of the toric ideal of invariants of the Cavender-Farris-Neyman model with a molecular clock (CFN-MC) on a rooted binary ...Missing: statistics | Show results with:statistics
  46. [46]
    [PDF] MOLECULAR PHYLOGENETICS FROM AN ALGEBRAIC VIEWPOINT
    Allman and Rhodes (2006) used phylogenetic invariants to obtain the first tree identifiability results for general sorts of mixtures. A general-Markov-like.Missing: seminal | Show results with:seminal
  47. [47]
    [PDF] paupmanual.pdf - PhyloSolutions
    The last section provides background information concerning parsimony, maximum likelihood, and distance methods. When you need more specific information ...
  48. [48]
    Tropical geometry of statistical models - PNAS
    This article presents a unified mathematical framework for probabilistic inference with statistical models, such as graphical models.
  49. [49]
    Clustering Methods over the Tropical Projective Torus - MDPI
    Aug 7, 2023 · Hierarchical Clustering over the Space of Ultrametrics. In this section, we apply tropical hierarchical clustering methods to the space of ...
  50. [50]
    [PDF] Sparse Polynomials in Polynomial Ideals Bachelorarbeit - Finn Wiersig
    Also, it is possible to solve the ideal membership problem: Given a polynomial f ∈. K[x0,...,xn] and an ideal I ⊆ K[x0,...,xn], decide whether f ∈ I. This ...
  51. [51]
    [2201.05218] The Ideal Membership Problem and Abelian Groups
    Jan 13, 2022 · In this paper we consider IMPs arising from CSPs over `affine' constraint languages, in which constraints are subgroups (or their cosets) of direct products of ...
  52. [52]
    Complexity of concept classes induced by discrete Markov networks ...
    We show that VC dimension, Euclidean dimension and dimension of the toric ideal corresponding to a nontrivial discrete MN are identical.
  53. [53]
    [PDF] Using Existential Theory of the Reals to Bound VC Dimension
    Aug 25, 2022 · We use a connection between algebraic geometry and classic circuit-based approaches of bounding the VC dimension to derive our results. We ...Missing: statistical | Show results with:statistical
  54. [54]
    Uniqueness of Tensor Decompositions with Applications to ... - arXiv
    Apr 30, 2013 · This paper proves a robust version of Kruskal's theorem, enabling approximate tensor decomposition recovery and polynomial identifiability for ...Missing: algebraic | Show results with:algebraic
  55. [55]
    Tensors in Algebraic Statistics - Scribd
    secant variety is the Zarisky closure of this set of tensors). ... order moment data, non-Gaussianity and is heavily based on the ICA identifiability
  56. [56]
    [PDF] Learning in RKHM: a C -Algebraic Twist for Kernel Machines
    We investigated the generalization bound and computational complexity for RKHM learning and showed the connection with existing methods. RKHMs enable us to ...
  57. [57]
    [PDF] Generalized Principal Component Analysis for Algebraic Varieties
    Dec 30, 2020 · However, a higher- degree variety approximates the data points better and consequently has a smaller error. In statistical literatur, this ...
  58. [58]
    [PDF] Algebraic Geometry And Statistical Learning Theory
    Many models in machine learning are inherently algebraic, or can be approximated by algebraic varieties: Neural networks: Certain architectures can be viewed as ...
  59. [59]
    Algebraic statistics [draft ed.] - DOKUMEN.PUB
    The method of moments estimators can often lead to interesting algebraic problems [AFS16]. ... Algebraic statistics for computational biology, Cambridge ...
  60. [60]
    Algebraic Statistics vol. 14 (2023), no. 2 - MSP
    Sullivant, “Toric geometry of the Cavender–Farris–Neyman model with a molecular clock”, Adv. ... evolutionary models, this variety is a projective toric variety.
  61. [61]
    [PDF] Estimation under group actions: recovering orbits from invariants
    Jun 13, 2023 · Estimation under group actions: recovering orbits from invariants. Afonso S. Bandeira∗1, Ben Blum-Smith†2, Joe Kileel‡3, Jonathan Niles-Weed ...
  62. [62]
    [PDF] foundations of free algebraic statistics
    The goal of this project is to merge these to approaches, and thereby develop the foundations of free algebraic statistics. This is not only a technical show-.
  63. [63]
  64. [64]
    [PDF] algebraic statistics and contingency table problems: log-linear ...
    Diaconis and B. Sturmfels (1998). Algebraic algorithms for sampling from conditional distribution, Annals of Statistics, 26, 363–397. Page 24. 24. DOBRA ET AL.
  65. [65]
    [PDF] Computing Query Probability with Incidence Algebras - CSE Home
    The Mobius inversion formula corresponds to the inclusion/exclusion principle, which is ubiquitous in probabilistic inference: the connection between the two in ...Missing: statistics | Show results with:statistics
  66. [66]
    Lectures on Algebraic Statistics - SpringerLink
    Lectures on Algebraic Statistics. Authors: Mathias Drton, Bernd Sturmfels, Seth Sullivant. Series Title: Oberwolfach Seminars. DOI: https://doi.org/10.1007/978 ...
  67. [67]
    Phylogenetics - Charles Semple; Mike Steel - Oxford University Press
    Free delivery 25-day returnsThis book is intended for biologists interested in the mathematical theory behind phylogenetic methods, and for mathematicians, statisticians, and computer ...
  68. [68]
    δ Plots: A Tool for Analyzing Phylogenetic Distance Data
    It is well known that d is additive (i.e., can be represented by a weighted tree labeled by X) if and only if every quartet q =x,y,u,v in X satisfies this ...