Fact-checked by Grok 2 weeks ago

Gram matrix

In linear algebra, the Gram matrix (also known as the Gramian matrix) of a finite set of vectors in an inner product space is the Hermitian matrix whose entries are given by the inner products of pairs of those vectors.^[1] It is named after the Danish mathematician and actuary Jørgen Pedersen Gram (1850–1916), who contributed to its development in the context of orthogonalization processes and least squares approximations.^[2] For a set of m vectors \{v_1, \dots, v_m\} in \mathbb{R}^n, the Gram matrix G is the m \times m matrix with entries G_{ij} = v_i^T v_j, or more generally G_{ij} = \langle v_i, v_j \rangle in an arbitrary inner product space.^[3] The Gram matrix is always positive semi-definite, meaning all its eigenvalues are nonnegative, and it is positive definite if and only if the vectors are linearly independent.^[3] Its rank equals the dimension of the span of the vectors, providing a way to determine linear dependence: the vectors are linearly independent if and only if the determinant of the Gram matrix (known as the Gram determinant) is nonzero.^[4] If the vectors are represented as the columns of a matrix A, then the Gram matrix is G = A^* A (or A^T A in the real Euclidean case), which preserves key algebraic properties like the nullity and invertibility relations between A and G.^[3] Gram matrices play a fundamental role in several areas of mathematics and applications. In the Gram-Schmidt orthogonalization process, they facilitate the construction of orthonormal bases from linearly independent sets by enabling projections and normalizations.^[5] They are essential in least squares problems, where the normal equations involve solving G x = A^T b for approximations in overdetermined systems.^[4] In numerical linear algebra and statistics, Gram matrices underpin analyses of vector correlations and covariance structures.^[6] Beyond pure mathematics, they are central to kernel methods in machine learning, where the Gram (or kernel) matrix encodes pairwise similarities in high-dimensional feature spaces without explicit computation of the features, enabling algorithms like support vector machines and kernel principal component analysis.^[7]

Fundamentals

Definition

In mathematics, particularly in the field of linear algebra, an inner product space is a vector space V over the real numbers \mathbb{R} or the complex numbers \mathbb{C} equipped with an inner product \langle \cdot, \cdot \rangle: V \times V \to \mathbb{R} (or \mathbb{C}). The inner product is a bilinear form that satisfies specific axioms: it is linear in the first argument, conjugate symmetric (i.e., \langle u, v \rangle = \overline{\langle v, u \rangle} for complex spaces), and positive definite (\langle v, v \rangle \geq 0 for all v \in V, with equality if and only if v = 0).^[8] These properties generalize the notion of length and angle from Euclidean space, enabling geometric interpretations in abstract settings.^[9] Given a finite set of vectors \{v_1, \dots, v_n\} in an inner product space V, the Gram matrix G associated with this set is the n \times n matrix whose (i,j)-th entry is the inner product G_{ij} = \langle v_i, v_j \rangle.^[1] This construction captures the pairwise "similarities" or projections between the vectors via the inner product. In the real case, where the inner product is symmetric (\langle u, v \rangle = \langle v, u \rangle), the resulting Gram matrix G is symmetric. In the complex case, the conjugate symmetry of the inner product ensures that G is Hermitian, meaning G_{ij} = \overline{G_{ji}}.^[10] To illustrate the construction, consider two vectors in the inner product space \mathbb{R}^2 equipped with the standard dot product \langle u, v \rangle = u_1 v_1 + u_2 v_2. Let v_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix} and v_2 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. The entries are computed as follows:
G_{11} = \langle v_1, v_1 \rangle = 1 \cdot 1 + 1 \cdot 1 = 2,
G_{12} = \langle v_1, v_2 \rangle = 1 \cdot 1 + 1 \cdot 0 = 1,
G_{21} = \langle v_2, v_1 \rangle = 1 \cdot 1 + 0 \cdot 1 = 1,
G_{22} = \langle v_2, v_2 \rangle = 1 \cdot 1 + 0 \cdot 0 = 1.
Thus, the Gram matrix is

G = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix}.

Historical Background

The Gram matrix is named after Danish mathematician and actuary Jørgen Pedersen Gram (1850–1916), who played a pivotal role in its early conceptualization through his work on orthogonalization and least squares approximations in the late 19th century.^[5] Born on June 27, 1850, in Nustrup, Denmark, Gram earned a master's degree in mathematics from the University of Copenhagen in 1873 and a doctorate in 1879.^[2] His career focused on applied mathematics in insurance, beginning as an assistant at the Hafnia Insurance Company in 1875, where he contributed to probability theory and numerical methods; he later founded the Skjold Insurance Company in 1884 and served as its director until 1910, while also chairing the Danish Insurance Council from 1910 to 1916.^[2] Gram's practical engagements in insurance mathematics and his development of a mathematical model for forest management around 1876 influenced his innovative applications of linear algebraic tools, including those leading to the Gram matrix.^[2] The matrix's origins trace to Gram's foundational contributions in the 1880s, emerging within studies of orthogonalization and inner products as tools for expanding real functions in series via the method of least squares.^[5] In his 1883 paper "Ueber die Entwickelung reeller Funktionen in Reihen mittelst der Methode der kleinsten Quadrate," published in the Journal für die reine und angewandte Mathematik, Gram outlined a procedure for constructing orthogonal bases from linearly independent sets using inner products, implicitly relying on the matrix of pairwise inner products that would later bear his name.^[11] This approach built on earlier ideas by mathematicians like Pierre-Simon Laplace and Augustin-Louis Cauchy but represented the first systematic algorithmic framework for such orthogonalization in function spaces.^[5] The concept predated the formal abstract theory of inner product spaces, which was developed by Maurice Fréchet in 1907 and David Hilbert in the early 1900s.^[5] Gram's method gained further prominence through its connection to the orthogonalization process now known as Gram-Schmidt, which he introduced in the 1880s and which was independently refined by Erhard Schmidt in 1907.^[5] The first systematic application of the Gram matrix appeared in Gram's early 20th-century investigations into probability distributions and analytic number theory, including his contributions to the Gram-Charlier series for expansions of asymmetric distributions beyond the normal Gaussian form, and his 1903 paper "Note sur les zéros de la fonction ζ(s) de Riemann," published in Acta Mathematica, to analyze sign changes in the Riemann zeta function and define the points now called Gram points.^[2]^[12]^[13] There is no single invention date for the Gram matrix, as it evolved organically from Gram's orthogonalization techniques amid broader late-19th-century advances in linear algebra.^[5]

Examples and Applications

Illustrative Examples

A simple example illustrates the construction of a Gram matrix in \mathbb{R}^2 using the standard dot product. Consider the vectors \mathbf{v}_1 = (1, 0) and \mathbf{v}_2 = (1, 1). The Gram matrix G is formed by the inner products: G_{11} = \mathbf{v}_1 \cdot \mathbf{v}_1 = 1, G_{12} = G_{21} = \mathbf{v}_1 \cdot \mathbf{v}_2 = 1, and G_{22} = \mathbf{v}_2 \cdot \mathbf{v}_2 = 2. Thus,

G = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}.

The entries of G directly represent these inner products, capturing the squared lengths on the diagonal and the covariation between vectors off the diagonal.^[14] To demonstrate the connection to linear dependence, consider collinear vectors \mathbf{v}_1 = (1, 0) and \mathbf{v}_2 = (2, 0) = 2\mathbf{v}_1. The Gram matrix becomes G_{11} = 1, G_{12} = G_{21} = 2, and G_{22} = 4, yielding

G = \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}.

This matrix is singular, as its determinant is $1 \cdot 4 - 2 \cdot 2 = 0, reflecting the linear dependence of the vectors.^[14] For orthogonal vectors in \mathbb{R}^3, take \mathbf{v}_1 = (1, 0, 0), \mathbf{v}_2 = (0, 2, 0), and \mathbf{v}_3 = (0, 0, 3), which form an orthogonal basis. The Gram matrix is diagonal: G_{11} = 1, G_{22} = 4, G_{33} = 9, and all off-diagonal entries are zero since \mathbf{v}_i \cdot \mathbf{v}_j = 0 for i \neq j. Thus,

G = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 9 \end{pmatrix}.

The diagonal entries are the squared Euclidean norms \|\mathbf{v}_i\|^2, while the zero off-diagonals confirm orthogonality.^[14] In general, the entries of a Gram matrix encode similarities between vectors: the off-diagonal G_{ij} measures their mutual dependence via the inner product, and for unit vectors, G_{ij} = \cos \theta_{ij} where \theta_{ij} is the angle between \mathbf{v}_i and \mathbf{v}_j. For instance, in the first example, \cos \theta = 1 / \sqrt{2} after normalization. These examples also exhibit positive-semidefiniteness, with all eigenvalues non-negative.^[14]

Practical Applications

In linear algebra and geometry, the Gram matrix provides a means to assess the linear independence of a set of vectors, where the rank of the Gram matrix equals the dimension of the span of those vectors.^[4]^[15] This property allows for the determination of whether vectors form a basis without explicitly computing the rank of the original matrix. Additionally, the square root of the determinant of the Gram matrix gives the volume of the parallelepiped spanned by the vectors, offering a geometric interpretation tied to the Gram determinant.^[15] In machine learning, particularly within kernel methods, the Gram matrix serves as the kernel matrix with entries K_{ij} = k(x_i, x_j), where k is a positive definite kernel function. This formulation enables the application of linear algorithms, such as support vector machines (SVMs) and principal component analysis (PCA), in high-dimensional feature spaces without explicit computation of the features, facilitating non-linear classification and dimensionality reduction. In control theory, the controllability Gramian for linear time-invariant systems quantifies the extent to which inputs can steer the state from zero to any desired point, with its eigenvalues indicating the ease of control along principal directions.^[16] Similarly, the observability Gramian measures how well the system's outputs reveal the internal state, again assessed through its eigenvalues, which inform the design of state estimators and filters.^[17]^[18] In quantum chemistry, the overlap matrix for basis sets of atomic orbitals functions as a Gram matrix, with entries given by the inner products between the basis functions. This matrix is essential in computational simulations for solving the generalized eigenvalue problem to obtain orthonormal molecular orbitals.^[19] Gram matrices appear in other domains as well; in Riemannian geometry, the metric tensor at a point acts as a Gram matrix with respect to a local basis, defining distances and angles on manifolds.^[20] In finite element methods, Gram matrices arise in least-squares formulations, contributing to the construction of positive-definite stiffness matrices for solving partial differential equations. In signal processing, they relate to covariance matrices, aiding in tasks like beamforming and noise reduction through orthogonalization techniques.^[21]^[22] A modern application in machine learning involves using Gram matrices for out-of-distribution detection, where discrepancies between Gram matrices of training and test data highlight anomalous inputs, as demonstrated in methods from around 2020.^[23]

Mathematical Properties

Positive-Semidefiniteness

A Gram matrix G with entries G_{ij} = \langle v_i, v_j \rangle, where v_1, \dots, v_n are vectors in a complex Hilbert space, is positive semi-definite if x^* G x \geq 0 for all x \in \mathbb{C}^n, with equality holding if and only if x lies in the kernel of the linear map sending the standard basis to the v_i's (i.e., \sum_i x_i v_i = 0).^[24] To prove this, consider any x \in \mathbb{C}^n. Then,

x^* G x = \sum_{i,j=1}^n \overline{x_i} x_j \langle v_i, v_j \rangle = \left\langle \sum_i x_i v_i, \sum_j x_j v_j \right\rangle = \left\| \sum_i x_i v_i \right\|^2 \geq 0,

by the properties of the inner product, with equality precisely when \sum_i x_i v_i = 0.^[10] This positive-semidefiniteness implies that all eigenvalues of G are non-negative.^[24] Additionally, the trace of G equals the sum of the squared norms of the vectors: \operatorname{tr}(G) = \sum_{i=1}^n \|v_i\|^2, since the diagonal entries are G_{ii} = \|v_i\|^2 and the trace is their sum, equivalently the squared Frobenius norm of the matrix with columns v_i.^[25] The matrix G is invertible (hence positive definite) if and only if the vectors v_1, \dots, v_n are linearly independent, as this ensures the kernel is trivial and x^* G x > 0 for all x \neq 0.^[24] In a finite-dimensional space of dimension n, strict positive-definiteness holds if the vectors form a basis for the space.^[10]

Vector Realizations

A positive semidefinite matrix G \in \mathbb{R}^{n \times n} can be realized as the Gram matrix of a set of vectors in an inner product space through the Cholesky decomposition, which factors G = LL^T where L is a lower triangular matrix with non-negative diagonal entries.^[26] The rows of L (transposed to column vectors) then serve as the coordinate representations of the vectors \mathbf{v}_1, \dots, \mathbf{v}_n in \mathbb{R}^n, satisfying G_{ij} = \mathbf{v}_i^T \mathbf{v}_j.^[26] This construction embeds the vectors in Euclidean space, with the rank of G, denoted \operatorname{rank}(G) = r \leq n, determining the minimal embedding dimension \mathbb{R}^r.^[26] To compute such a realization, perform the Cholesky factorization of G; if G has full rank r = n, the vectors lie in \mathbb{R}^n, but for deficient rank r < n, one can truncate L to its first r rows to obtain an equivalent embedding in the lower-dimensional space \mathbb{R}^r.^[26] This method assumes G is positive semidefinite, ensuring the decomposition exists and is unique up to the choice of signs on the diagonal (which can be fixed to non-negative).^[26] For illustration, consider the matrix

G = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix},

which admits the Cholesky factorization

L = \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}.

The rows of L yield vectors \mathbf{v}_1 = (1, 0)^T and \mathbf{v}_2 = (1, 1)^T in \mathbb{R}^2, verifying \mathbf{v}_1^T \mathbf{v}_1 = 1, \mathbf{v}_1^T \mathbf{v}_2 = 1, and \mathbf{v}_2^T \mathbf{v}_2 = 2.^[26] In the more general setting, any positive semidefinite matrix G arises as the Gram matrix of vectors in some Hilbert space, by the Moore–Aronszajn theorem, which guarantees the existence of a reproducing kernel Hilbert space where the vectors are feature maps corresponding to the kernel defined by G. This embedding may require an infinite-dimensional space if G does not admit a finite-dimensional realization beyond its rank.

Uniqueness of Realizations

The vector realizations of a Gram matrix G \in \mathbb{R}^{n \times n} that is positive definite (full rank) are unique up to orthogonal transformation. Specifically, if two sets of vectors \{v_1, \dots, v_n\} and \{w_1, \dots, w_n\} in \mathbb{R}^k satisfy \langle v_i, v_j \rangle = \langle w_i, w_j \rangle = G_{ij} for all i, j, then there exists an orthogonal matrix U (i.e., U^T U = I) such that w_i = U v_i for all i.^[27] This invariance follows from the polar decomposition or singular value decomposition of the matrix whose columns are the vectors, where the Gram matrix fixes the singular values and relates the singular vectors via orthogonality.^[27] When \operatorname{rank}(G) = r < n, realizations are not unique in dimensions greater than r, as the vectors span an r-dimensional subspace that can be arbitrarily positioned within the higher-dimensional ambient space while preserving inner products. However, the minimal embedding dimension is uniquely r, and realizations in this minimal dimension remain unique up to orthogonal transformation.^[27] For example, Cholesky decomposition provides one such minimal realization, but applying any orthogonal transformation yields another equivalent set.^[27] The geometry of any realization is fully determined by G: the lengths \|v_i\| = \sqrt{G_{ii}} and angles \cos \theta_{ij} = G_{ij} / \sqrt{G_{ii} G_{jj}} between vectors are fixed, and the span of the vectors is isometric to the column space of G^{1/2}, where G^{1/2} is the unique positive semidefinite square root of G.^[27] A precise characterization is given by the following theorem: two sets of vectors \{v_i\}_{i=1}^n and \{w_i\}_{i=1}^n in a Euclidean space realize the same Gram matrix G if and only if there exists an isometry T of the ambient space such that w_i = T(v_i) for all i.^[27] This holds because isometries preserve inner products, and conversely, equal inner products imply the existence of such a distance-preserving map between the configurations.^[27]

Additional Properties

The Gram matrix G associated with a finite collection of vectors \{v_1, \dots, v_n\} in an inner product space is Hermitian, meaning G = G^*, where G^* denotes the conjugate transpose; in the real case, it is symmetric.^[28] The diagonal entries of G are given by G_{ii} = \langle v_i, v_i \rangle = \|v_i\|^2, representing the squared norms of the individual vectors.^[28] The rank of the Gram matrix equals the dimension of the span of the vectors \{v_1, \dots, v_n\}, as G can be expressed as V^* V where the columns of V are the vectors, preserving the rank.^[28] Moreover, the kernel of G consists of coefficient vectors corresponding to linear dependencies among the v_i, since if \sum c_i v_i = 0, then c^* G c = \|\sum c_i v_i\|^2 = 0.^[29] The trace of G is the sum of the squared norms of the vectors: \operatorname{tr}(G) = \sum_{i=1}^n \|v_i\|^2.^[28] The squared Frobenius norm of G equals the sum of the squared absolute values of all inner products: \|G\|_F^2 = \sum_{i=1}^n \sum_{j=1}^n |\langle v_i, v_j \rangle|^2.^[30] By the Hadamard inequality, the determinant of a Gram matrix satisfies \det G \leq \prod_{i=1}^n G_{ii}, with equality if and only if the vectors are pairwise orthogonal.^[31] The map from the vectors \{v_1, \dots, v_n\} to their Gram matrix is continuous with respect to the inner product topology on the vectors, as the inner product function is continuous in its arguments.^[28]

Gram Determinant

The Gram determinant of a finite set of vectors \{v_1, \dots, v_n\} in an inner product space is defined as \det G = \det(\langle v_i, v_j \rangle_{i,j=1}^n), where G denotes the associated Gram matrix.^[32] In the specific case of vectors in real Euclidean space, the absolute value of the Gram determinant equals the square of the volume of the parallelepiped spanned by the vectors, i.e., |\det G| = [\vol(\{v_1, \dots, v_n\})]^2; this value is zero if and only if the vectors are linearly dependent.^[32] This geometric interpretation follows from the fact that the Gram matrix G = A^\top A for a coordinate matrix A with columns v_i; when A is square (ambient dimension equals n), \det G = (\det A)^2, where |\det A| gives the volume of the parallelepiped.^[32] When the ambient space has dimension greater than n, the Cauchy–Binet formula expresses the Gram determinant as

\det G = \sum_{\sigma} \det(A_\sigma)^2,

where the sum runs over all subsets \sigma of n coordinates from the higher-dimensional embedding, and A_\sigma is the corresponding n \times n submatrix of coordinates; each term \det(A_\sigma)^2 represents the squared volume contribution from a basis projection.^[32] The Gram determinant inherits the property of non-negativity from the positive-semidefiniteness of the Gram matrix, ensuring \det G \geq 0.^[33] It vanishes under linear dependence of the vectors, providing a criterion for testing linear independence in applications such as numerical linear algebra and statistical modeling of vector sets.^[33] For an illustrative example, consider a set of pairwise orthogonal vectors; the Gram matrix is then diagonal with entries \|v_i\|^2, yielding

\det G = \prod_{i=1}^n \|v_i\|^2.

This simplifies the volume computation to the product of individual lengths, highlighting the role of orthogonality in geometric measure theory.^[32]

Orthonormal Basis Construction

The Gram-Schmidt process provides a systematic method to construct an orthonormal basis from a linearly independent set of vectors in a Euclidean space, leveraging the inner products encoded in the Gram matrix to perform the necessary projections and normalizations. Given a set of vectors \{ \mathbf{v}_1, \dots, \mathbf{v}_n \} in \mathbb{R}^m with Gram matrix G where G_{ij} = \langle \mathbf{v}_i, \mathbf{v}_j \rangle, the process begins by normalizing the first vector and iteratively subtracts its projections onto previously constructed basis vectors from subsequent vectors. This orthogonalization step ensures the new basis vectors are mutually orthogonal, and normalization yields unit length, with all computations relying on inner products that can be accessed via G. The classical Gram-Schmidt algorithm proceeds as follows: Set \mathbf{u}_1 = \mathbf{v}_1 / \|\mathbf{v}_1\|, where \|\mathbf{v}_1\|^2 = G_{11}. For k = 2, \dots, n,

\mathbf{w}_k = \mathbf{v}_k - \sum_{j=1}^{k-1} \langle \mathbf{v}_k, \mathbf{u}_j \rangle \mathbf{u}_j, \quad \mathbf{u}_k = \mathbf{w}_k / \|\mathbf{w}_k\|,

with \|\mathbf{w}_k\|^2 computed from the residual inner product after projection, and the coefficients \langle \mathbf{v}_k, \mathbf{u}_j \rangle derived from entries of G and prior orthogonalization steps (specifically, through the upper triangular matrix R whose entries r_{jk} = \langle \mathbf{v}_k, \mathbf{u}_j \rangle satisfy the relations implicit in G = R^T R). The modified Gram-Schmidt variant improves numerical stability by updating the projections sequentially, subtracting each \langle \mathbf{v}_k, \mathbf{u}_j \rangle \mathbf{u}_j immediately from \mathbf{v}_k before computing the next inner product; here, updates to the residual vectors involve solving small subsystems derived from submatrices of G to maintain accuracy in finite precision.^[34] A key role of the Gram matrix arises in the QR decomposition framework, where the matrix A with columns \mathbf{v}_i factors as A = QR with Q having orthonormal columns (the desired basis) and R upper triangular. Since G = A^T A = R^T R, the Cholesky decomposition of G directly yields R (up to transpose), allowing recovery of Q = A R^{-1}; this approach is particularly useful when G is available or cheaply updated, providing an alternative to direct orthogonalization.^[34] This construction enables stable numerical orthogonalization in applications such as least squares solving and spectral analysis, where the relation to Cholesky decomposition facilitates efficient QR computation without explicit inner product evaluations beyond forming G. For instance, in scenarios requiring robustness to ill-conditioning, variants like shifted Cholesky QR adjust the diagonal of G before decomposition to enhance stability while preserving the orthonormal basis. The overall complexity is O(n^2 m + n^3) when forming G and performing Cholesky (dominant for m \gg n), but incremental versions—updating G and its factorization as vectors arrive—support streaming data processing at O(n m) per update, useful in online algorithms.^[34]

Generalizations and Extensions

In Broader Inner Product Spaces

The Gram matrix concept extends naturally to infinite-dimensional Hilbert spaces, where it generalizes from finite matrices to bounded operators or infinite matrices defined via inner products. In a separable Hilbert space H with a countable orthonormal basis \{e_k\}_{k=1}^\infty, for a sequence of vectors \{v_j\}_{j=1}^\infty \subset H, the Gram "matrix" becomes an infinite matrix G with entries G_{ij} = \langle v_i, v_j \rangle_H, or equivalently, the Gram operator G: \ell^2 \to \ell^2 given by G(\alpha)_i = \sum_j G_{ij} \alpha_j. This operator is positive semidefinite if and only if the sequence is a Bessel sequence, and it is bounded with \|G\| \leq B < \infty for some frame bound B.^[35] In reproducing kernel Hilbert spaces (RKHS), a key subclass of Hilbert spaces consisting of functions on a set X, the Gram matrix arises from a positive definite kernel k: X \times X \to \mathbb{R}. For distinct points x_1, \dots, x_n \in X, the finite Gram matrix is G_{ij} = k(x_i, x_j) = \langle k(\cdot, x_i), k(\cdot, x_j) \rangle_H, where H is the RKHS with reproducing property \langle f, k(\cdot, x) \rangle_H = f(x) for all f \in H. This ensures G is positive semidefinite, and the infinite-dimensional analog is the integral operator T_k f(x) = \int_X k(x, y) f(y) \, d\mu(y) on L^2(X, \mu), which is self-adjoint, positive semidefinite, and compact if k is continuous and X compact. The positive semidefiniteness of G extends to T_k being a bounded positive operator with spectrum in [0, \|T_k\|].^[36] Mercer's theorem provides a spectral representation bridging discrete Gram matrices and continuous operators: for a symmetric, continuous, positive definite kernel k on a compact domain with measure \mu, there exist eigenvalues \lambda_m > 0 (decreasing to 0) and orthonormal eigenfunctions \{\psi_m\} in L^2(X, \mu) such that

k(x, y) = \sum_{m=1}^\infty \lambda_m \psi_m(x) \psi_m(y),

with uniform and L^2 convergence. The associated RKHS is then H = \left\{ \sum_m c_m \sqrt{\lambda_m} \psi_m \mid \{c_m\} \in \ell^2 \right\}, with inner product \langle f, g \rangle_H = \sum_m c_m d_m if f = \sum_m c_m \sqrt{\lambda_m} \psi_m and g = \sum_m d_m \sqrt{\lambda_m} \psi_m. This realizes the kernel as inner products in the feature space spanned by \{\sqrt{\lambda_m} \psi_m\}, generalizing finite-dimensional vector realizations to trace-class operators.^[37]^[38] Examples abound in function spaces. In the L^2([0,1]) space of square-integrable functions, the Gram operator for a set of functions \{f_j\} is G_{ij} = \int_0^1 f_i(t) \overline{f_j(t)} \, dt, which is trace-class if \sum_j \|f_j\|_{L^2}^2 < \infty, preserving positive semidefiniteness as \langle G \alpha, \alpha \rangle_{\ell^2} = \left\| \sum_j \alpha_j f_j \right\|_{L^2}^2 \geq 0. In Sobolev spaces W^{r,2}([0,1]) used in partial differential equations, kernels like the unanchored Sobolev kernel of order r \geq 1, k_r^{\text{Sob}}(x,y) = 1 + \sum_{k=1}^r \frac{B_k(x) B_k(y)}{(k!)^2} + \frac{(-1)^{r+1}}{(2r)!} B_{2r}(|x-y|) (with Bernoulli polynomials B_k), induce RKHS norms equivalent to the Sobolev seminorm |f|_{W^{r,2}}^2 = \int_0^1 |f^{(r)}(t)|^2 \, dt. The corresponding Gram matrices are positive semidefinite Mercer kernels with eigenvalues decaying as O(1/m^{2r}), ensuring realizations in feature spaces of smoothness r. These properties maintain the finite-dimensional analogs, such as linear independence via invertibility of G, now checked via injectivity of the Gram operator.^[39]

Modern Uses in Computation and Machine Learning

In deep neural networks, Gram matrices derived from the feature representations of layers have been employed to characterize generalization performance, particularly through the analysis of their spectral properties. Recent studies demonstrate that the spectrum of these effective Gram matrices can predict overfitting risks by modeling the evolution of the generalization gap during gradient descent training, providing tighter bounds on the difference between training and test errors. For instance, in overparameterized networks, the eigenvalues of the Gram matrix influence the convergence rate and stability of optimization, offering insights into why wide networks generalize well despite interpolating training data. The Neural Tangent Kernel (NTK) framework further integrates Gram matrices into the theoretical understanding of deep learning dynamics. In the infinite-width limit, the NTK corresponds to a Gram matrix constructed from the inner products of gradients at initialization, enabling the analysis of training as kernel regression and yielding generalization guarantees based on the kernel's eigenvalue decay. This approach has been pivotal in explaining the lazy training regime of deep networks, where the Gram matrix's positive-semidefiniteness ensures linear convergence to global minima under certain conditions. In functional data analysis, Gram matrices facilitate dimension reduction via multivariate functional principal component analysis (MFPCA), where they capture covariances between multiple functional variables observed over continuous domains. A 2023 method leverages the Gram matrix to compute principal components directly from inner products of discretized functional data, enabling efficient handling of high-dimensional curves without explicit basis expansions and improving scalability for large datasets.^[40] This technique preserves the functional structure while reducing dimensionality, as the eigenvectors of the Gram matrix yield scores that summarize shared variability across variables.^[40] For out-of-distribution (OOD) detection in machine learning, Gram matrices of input activations serve as unsupervised anomaly detectors by quantifying deviations in feature correlations. Introduced in 2020, this approach computes the Gram matrix for pairwise inner products of hidden layer outputs and flags inputs whose Gram entries fall outside the empirical ranges observed on in-distribution training data, achieving high AUROC scores on benchmarks like CIFAR-10 versus SVHN without requiring labeled anomalies.^[23] Subsequent techniques since 2020 have extended this by incorporating spectral norms or trace statistics of the Gram matrix to enhance sensitivity to distributional shifts in vision and tabular data.^[23] In bioinformatics, Gram matrices provide a compact representation of molecular 3D conformations, encoding pairwise distances between atoms as inner products in an embedded space for alignment and prediction tasks. A 2024 pretraining model uses the Gram matrix as both input features and a learning objective, enabling graph neural networks to reconstruct molecular geometries from SMILES strings with improved accuracy on datasets like QM9, where it outperforms coordinate-based methods by focusing on rotation-invariant properties.^[41] This formulation supports efficient alignment of conformers by solving Procrustes problems via Gram matrix decompositions, aiding drug discovery pipelines.^[41] Lattice-based cryptography utilizes Gram root computations to enable precise Gaussian sampling over integer lattices, circumventing floating-point precision issues in post-quantum protocols. Developed in 2019, an algorithm computes an integral square root of the Gram matrix to generate samples from discrete Gaussians without approximations, ensuring constant-time operations and exact distribution matching for schemes like BLISS signatures. This method has been adopted in implementations since 2020 to support homomorphic encryption and zero-knowledge proofs, where the Gram root's integer nature guarantees security reductions without numerical errors. Regarding numerical stability in computational algorithms, the total positivity of Gram matrices in the Bernstein basis ensures accurate evaluations and inversions for polynomial approximations. A 2022 analysis establishes that these Gram matrices are strictly totally positive on [0,1], allowing bidiagonal factorizations that bound rounding errors to machine epsilon levels, far superior to monomial bases in ill-conditioned scenarios like high-degree interpolation.^[42] This property underpins stable computations in computer-aided geometric design, where Gram matrix manipulations maintain shape preservation and avoid oscillations.^[42]

References

[1]
Gram Matrix -- from Wolfram MathWorld
Given a set V of m vectors (points in R^n ), the Gram matrix G is the matrix of all possible inner products of V , ie, g_(ij)=v_i^(T)v_j.
[2]
Jorgen Gram (1850 - 1916) - Biography - MacTutor
In 1875 Gram was appointed as an assistant in the Hafnia Insurance Company. Around the same time he began working on a mathematical model of forest management.
[3]
[PDF] Lecture 7.3: Gram matrices - Mathematical and Statistical Sciences
Every Gram matrix is nonnegative. 2. The Gram matrix of a set of linearly independent vectors is positive. 3. Every positive matrix is a Gram matrix.
[4]
[PDF] Linear Algebra - Arizona Math
May 4, 2005 · An m by n matrix A can define a linear transformation from Rn to Rm by ... First we prove that invertibility of the Gram matrix implies linear ...<|control11|><|separator|>
[5]
[PDF] Gram--Schmidt Orthogonalization: 100 Years and More - CIS UPenn
Jun 8, 2010 · Jørgen Pedersen Gram (1850–1916),. Danish mathematician, Gram worked for. Hafnia Insurance Company and made contributions to probability and ...
[6]
Gram matrix - Applied Mathematics Consulting
Jul 9, 2023 · The idea of the Gram matrix generalizes to more than two vectors. If we have m vectors, the Gram matrix is m × m whose (i, j) entry is the dot ...
[7]
Kernel methods in machine learning - Project Euclid
K := (k(xi,xj ))ij. (7) is called the Gram matrix (or kernel matrix) of k with respect to x1,...,xn. DEFINITION 2 (Positive definite matrix). A real n × n ...
[8]
https://math.libretexts.org/Bookshelves/Linear_Algebra/Book%3A_Linear_Algebra_(Schilling_Nachtergaele_and_Lankham)/09%3A_Inner_product_spaces
[9]
[PDF] Inner Product Spaces - Linear Algebra Done Right
An inner product space is a vector space with an inner product, a function that takes pairs of elements to a number, and has specific properties.
[10]
[PDF] Inner Products and Norms (part III)
If L is a inner product complex linear space, then the Gram matrix of V is a Hermitian matrix. The next theorem shows that every positive definite matrix A ∈ C.
[11]
Gram‐Schmidt orthogonalization: 100 years and more
Jun 4, 2012 · 58 Gram JP.Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate. Journal für die Reine und ...
[12]
Note sur les zéros de la fonction ζ(s) de Riemann | Acta Mathematica
Cite this article. Gram, J.P. Note sur les zéros de la fonction ζ(s) de Riemann. Acta Math. 27, 289–304 (1903). https://doi.org/10.1007/BF02421310. Download ...
[13]
Gram matrix – Linear Algebra and Applications - Pressbooks.pub
The Gram matrix of the collection is the ... m\times m ... matrix G with elements ... G_{ij}=x_i^Tx_j ... The matrix can be expressed compactly in terms of the matrix ...
[14]
[PDF] THE GRAMIAN AND k-VOLUME IN n-SPACE: SOME
The gramian determines if k vectors are linearly independent and the volume of the parallelepiped they form. It's defined as detM*M, where M is (v1...vk) and M* ...
[15]
Eigenvalues and Eigenvectors in Controllability Analysis - IntechOpen
We also define steering control using eigenvalues and eigenvectors of the controllability Gramian, which steers the system from an arbitrary initial state to a ...
[16]
[PDF] Observability - MIT OpenCourseWare
Observability is a notion that plays a major role in filtering and reconstruction of states from inputs and outputs. Together with reachability, observability ...
[17]
[PDF] Linear Systems, 2019 - Lecture 3 - Automatic control (LTH)
Observability Gramian. The matrix function. M(t0,tf ) = Z tf t0. Φ(t, t0)T C(t)T C(t)Φ(t, t0)dt is called the observability Gramian of the system. ˙x(t) = A(t)x ...
[18]
Riemannian geometry for efficient analysis of protein dynamics data
In the canonical form of Riemannian geometry, i.e., having just a metric tensor ... The eigendecomposition of the Gram matrix Γ = U Λ U ⊤ , where U ...
[19]
[PDF] Gram Matrices and Statistical Signal Processing - Khalil Elkhalil
▸ ST S = diag (s). where s = {si }i=1,⋯,n . The resultant error covariance matrix, which we denote by ΣS , easily writes as. ΣS = (HH ST (SRST ). −1. SH).
[20]
[PDF] Covariance and Gramian matrices in control and systems theory
The main aim of this thesis is to study the appearance of. Gramian and canonical matrices in control and systems theory including pattern recognition and signal ...
[21]
[1912.12510] Detecting Out-of-Distribution Examples with In ... - arXiv
Dec 28, 2019 · Abstract page for arXiv paper 1912.12510: Detecting Out-of-Distribution Examples with In-distribution Examples and Gram Matrices. ... gram matrix ...
[22]
[PDF] 12. Minimization - Numerical Analysis Lecture Notes
May 18, 2008 · All Gram matrices are positive semi-definite. The Gram matrix K = AT A is positive definite if and only if ker A = {0}.
[23]
[PDF] Chapter 4 Vector Norms and Matrix Norms - UPenn CIS
The following proposition show that the Frobenius norm is a matrix norm satisfying other nice properties.
[24]
https://www-users.cse.umn.edu/~olver/num_/lnz.pdf
[25]
Matrix Analysis | Cambridge Aspire website
Discover Matrix Analysis, 2nd Edition, Roger A. Horn, HB ISBN: 9780521839402 on Cambridge Aspire website. ... Johnson. Online publication date: 05 June 2012.
[26]
None
Below is a merged summary of Gram matrix properties based on the provided segments from "Positive Definite Matrices" by Rajendra Bhatia and the document at https://www.cmat.edu.uy/~lessa/tesis/Positive%20Definite%20Matrices.pdf. To retain all information in a dense and comprehensive format, I will use a table in CSV-style format, followed by a narrative summary that consolidates the details. This ensures all details are preserved while making the information accessible and structured.
[27]
[PDF] Designing structured tight frames via an alternating projection method
the Gram matrix are c1,...,cN . In this case, the majorization condition is ... We adopt the sense used by Horn and Johnson [25]. III. DESIGN VIA ...
[28]
[PDF] Numerical Algorithms - People | MIT CSAIL
... symmetric, that is, when A> = A, we have the well-known formula ∇f(x)=2Ax ... Gram matrix. If at least n rows of A are linearly independent, then A>A ...<|control11|><|separator|>
[29]
[PDF] Lower Bounds for the Hadamard Maximal Determinant Problem
Feb 7, 2015 · A short proof of Hadamard's inequality. Consider the “Gram matrix” G = AT A. Note that G is positive semi-definite, so has non-negative real ...
[30]
[PDF] A geometric approach to the Cauchy-Binet formula - IITB Math
Abstract. The Cauchy-Binet formula is one of the most important and substantially non-trivial result on the theory of determinants.
[31]
[PDF] Gram determinants of real binary tensors
Dec 15, 2016 · we take the determinant of its Gram matrix, det(MMT ). We consider ... By the Cauchy–Binet formula, it is the sum of squares of the 2 ...
[32]
https://www.math.iitb.ac.in/~gopal/papers/Cauchy_Binet.pdf
[33]
Cross-Gram Matrix associated to two sequences in Hilbert spaces
May 10, 2018 · In this paper we investigate the cross-Gram operator, G, associated to the sequence \{\langle f_{k}, g_{j}\rangle\}_{j, k=1}^{\infty}
[34]
[PDF] Reproducing Kernel Hilbert Spaces - People @EECS
For K to be positive definite as a matrix, the determinant of K must be nonnegative: =⇒ k(x1,x1)k(x2,x2) − k(x2,x1)k(x1,x2) ≥ 0, which implies Cauchy-Schwartz.
[35]
XVI. Functions of positive and negative type, and their connection ...
Functions of positive and negative type, and their connection the theory of integral equations. James Mercer ... Published:01 January 1909https://doi.org/10.1098/ ...
[36]
[PDF] Reproducing kernel Hilbert spaces and Mercer theorem - arXiv
Feb 1, 2008 · This paper is both a research article and a self-contained survey about the characterization of the reproducing kernel Hilbert spaces whose ...
[37]
[PDF] Multiple Kernels and Reproducing Kernel Hilbert Spaces
We say that the kernel k is positive definite (p.d.) if its Gram matrix is positive definite for all x1,x2, ..., xn. Similarly, we define positive semidefinite ...<|separator|>
[38]
On the use of the Gram matrix for multivariate functional principal ...
Jun 22, 2023 · The key tool to reduce the dimension of the data is functional principal component analysis. Existing approaches for functional principal ...
[39]
Gram matrix: an efficient representation of molecular conformation ...
Jul 11, 2024 · The model accurately predicts the 3D structure of a molecule by estimating the Gram matrix. Our findings demonstrate that Pre-GTM model ...
[40]
Total positivity and accurate computations with Gram matrices of ...
May 24, 2022 · The Bernstein basis is normalized and totally positive on its natural domain. Actually, it has optimal shape preserving [8] and [14] stability ...