Hessenberg matrix
In linear algebra, a Hessenberg matrix is a special kind of square matrix that is almost triangular, with an upper Hessenberg matrix defined by having all entries zero below the first subdiagonal (i.e., a_{ij} = 0 for i > j + 1) and a lower Hessenberg matrix defined by having all entries zero above the first superdiagonal (i.e., a_{ij} = 0 for j > i + 1).[1] A matrix that satisfies both conditions is tridiagonal.[1] These matrices are named after the German mathematician and electrical engineer Karl Hessenberg (1904–1959), who first introduced the upper Hessenberg form in his 1940 technical report on solving linear eigenvalue problems using the Hamilton-Cayley equation.[2][3]
Hessenberg matrices possess several key properties that make them computationally advantageous. An upper Hessenberg matrix remains in Hessenberg form under QR iterations, and if it is unreduced—meaning all entries on the subdiagonal are nonzero—it simplifies the analysis of its eigenvalues and eigenvectors.[1][3] Any square matrix can be efficiently reduced to upper Hessenberg form via an orthogonal similarity transformation, often using Householder reflections, which requires O(n^3) operations for an n \times n matrix and preserves the eigenvalues.[3] They also admit stable factorizations, such as LU or QR decompositions, with reduced fill-in compared to dense matrices.[3]
Hessenberg matrices are fundamental in numerical linear algebra, particularly for eigenvalue computation. The QR algorithm, a cornerstone method for finding all eigenvalues of a matrix, is applied after reduction to Hessenberg form to achieve quadratic convergence and computational efficiency, as each QR step costs only O(n^2) operations on such matrices.[3] They also arise naturally in Krylov subspace methods for solving large-scale eigenproblems and in the companion matrices used for polynomial root-finding.[3] Beyond eigenvalues, Hessenberg forms appear in applications like block algorithms for parallel computing and the analysis of totally positive matrices in approximation theory.[4]
Upper Hessenberg Matrix
An upper Hessenberg matrix is a square matrix A = (a_{ij}) of size n \times n over the real or complex numbers that satisfies the condition a_{ij} = 0 for all indices i > j + 1. This structure ensures that all entries below the first subdiagonal are zero, leaving non-zero entries possible only on the main diagonal, the first subdiagonal, and all superdiagonals.[5][6]
The general form of an n \times n upper Hessenberg matrix can be represented as
H = \begin{pmatrix}
h_{11} & h_{12} & h_{13} & \cdots & h_{1n} \\
h_{21} & h_{22} & h_{23} & \cdots & h_{2n} \\
0 & h_{32} & h_{33} & \cdots & h_{3n} \\
\vdots & \ddots & \ddots & \ddots & \vdots \\
0 & \cdots & 0 & h_{n,n-1} & h_{nn}
\end{pmatrix},
where the h_{ij} denote arbitrary entries (possibly zero except where structurally required), and zeros occupy all positions strictly below the first subdiagonal. This banded structure highlights the matrix's near-triangular nature, with bandwidth limited to the main diagonal and adjacent bands.[5][1]
The term "Hessenberg matrix" honors Karl Hessenberg (1904–1959), a German mathematician and electrical engineer whose investigations into eigenvalue problems introduced this form in a 1940 technical report on solving linear eigenproblems using the Hamilton-Cayley equation.[2] The lower Hessenberg form is equivalently defined as the transpose of an upper Hessenberg matrix.[6]
Lower Hessenberg Matrix
A lower Hessenberg matrix is defined as a square matrix A = (a_{i,j}) \in \mathbb{R}^{n \times n} (or \mathbb{C}^{n \times n}) such that a_{i,j} = 0 whenever j > i + 1, meaning all entries strictly above the first superdiagonal are zero.[7] This structure confines non-zero entries to the main diagonal, all subdiagonals, and the first superdiagonal, resulting in a banded form that is nearly lower triangular.[7]
The general template for an n \times n lower Hessenberg matrix can be represented as follows, where asterisks denote potentially non-zero entries:
H = \begin{pmatrix}
* & * & 0 & 0 & \cdots & 0 \\
* & * & * & 0 & \cdots & 0 \\
* & * & * & * & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
* & * & * & * & \cdots & * \\
* & * & * & * & \cdots & *
\end{pmatrix}
This form arises symmetrically to the upper Hessenberg matrix, which has zeros below the first subdiagonal; specifically, the transpose of an upper Hessenberg matrix is lower Hessenberg.[7] The lower Hessenberg structure is particularly useful in numerical algorithms where similarity transformations preserve this band form, facilitating efficient computations in linear algebra.[7]
An upper Hessenberg matrix is called unreduced if all entries on its first subdiagonal are nonzero, that is, if h_{i+1,i} \neq 0 for i=1,\dots,n-1.[8] Similarly, a lower Hessenberg matrix is unreduced if all entries on its first superdiagonal are nonzero, that is, if h_{i,i+1} \neq 0 for i=1,\dots,n-1.[9]
The term "unreduced" distinguishes these matrices from reducible Hessenberg matrices, where one or more zeros on the relevant off-diagonal allow the matrix to be permuted into a block-triangular form via similarity transformations.[8] In such reducible cases, the eigenvalues of the original matrix are simply the union of the eigenvalues of the diagonal blocks, enabling separate computation for each block.[8]
This unreduced condition is crucial in eigenvalue problems, as it ensures the matrix cannot be decomposed into independent blocks without additional transformations, thereby preserving the coupled structure of the spectrum and requiring the full matrix to be processed in algorithms like the QR iteration.[8] For instance, an unreduced upper Hessenberg matrix guarantees that its eigenvalues have geometric multiplicity at most one, which simplifies convergence analysis in iterative methods.[10]
Examples and Properties
Matrix Examples
An example of an upper Hessenberg matrix is the following 4×4 matrix:
\begin{pmatrix}
1 & 4 & 2 & 3 \\
3 & 4 & 1 & 7 \\
0 & 2 & 3 & 4 \\
0 & 0 & 1 & 3
\end{pmatrix}
This matrix has zeros in all positions below the first subdiagonal, specifically at entries (3,1), (4,1), and (4,2), while the entries on and above the main diagonal, as well as the subdiagonal, may be nonzero.[5][11]
Similarly, an example of a lower Hessenberg matrix is:
\begin{pmatrix}
1 & 2 & 0 & 0 \\
5 & 2 & 3 & 0 \\
3 & 4 & 3 & 7 \\
5 & 6 & 1 & 1
\end{pmatrix}
Here, zeros appear above the first superdiagonal, at positions (1,3), (1,4), and (2,4), with potential nonzero entries on and below the main diagonal and the superdiagonal.[5]
Both examples are unreduced, as the upper Hessenberg matrix has nonzero subdiagonal entries (3, 2, 1) and the lower Hessenberg matrix has nonzero superdiagonal entries (2, 3, 7).[11]
A special case is the tridiagonal matrix, which is both upper and lower Hessenberg. For instance, the 3×3 matrix
\begin{pmatrix}
a & b & 0 \\
c & d & e \\
0 & f & g
\end{pmatrix}
has zeros outside the main diagonal and the first super- and subdiagonals.[5][11]
Key Properties
A Hessenberg matrix exhibits several intrinsic algebraic properties that stem from its banded structure. All $1 \times 1 and $2 \times 2 square matrices are both upper and lower Hessenberg, as a $1 \times 1 matrix has no entries below the main diagonal, and a $2 \times 2 matrix has at most one entry below the main diagonal, which lies on the subdiagonal.[1] Similarly, a square matrix that is simultaneously upper Hessenberg and lower Hessenberg must be tridiagonal, with nonzero entries confined to the main diagonal and the diagonals immediately adjacent to it.[1]
A key closure property is that the product of an upper Hessenberg matrix H and an upper triangular matrix U is again upper Hessenberg. To see this, consider the (i,j)-entry of the product HU, given by
(HU)_{ij} = \sum_{k=1}^n h_{ik} u_{kj}.
For i > j+1, the terms h_{ik} = 0 whenever k \leq i-2 (by the upper Hessenberg structure of H), so the sum is over k \geq i-1. However, the upper triangular structure of U implies u_{kj} = 0 whenever k > j. Since i > j+1 yields i-1 > j, the intersection of k \geq i-1 > j and k \leq j is empty, forcing (HU)_{ij} = 0. This preserves the zero structure below the first subdiagonal.[12][13]
An unreduced upper Hessenberg matrix, defined as one with all subdiagonal entries nonzero, implies strong irreducibility properties with respect to invariant subspaces. Specifically, no permutation similarity transformation can reveal a nontrivial block upper triangular structure, meaning there are no proper invariant subspaces aligned with a reordered basis that would allow decomposition into smaller blocks.[13] This follows from the absence of zero subdiagonals, which prevents the existence of deflatable subspaces under such reorderings.[13]
Reduction Techniques
Householder transformations provide a numerically stable method to reduce a general square matrix to upper Hessenberg form through a sequence of unitary similarity transformations.[11] The process involves applying a product of Householder reflectors U = H_1 H_2 \cdots H_{n-2}, where each H_k is unitary, such that H = U A U^H is upper Hessenberg, preserving the eigenvalues of A.[14] This reduction is essential as a preprocessing step for eigenvalue algorithms like the QR iteration, transforming the matrix into a form where subsequent computations are more efficient.[15]
The algorithm proceeds column by column for k = 1 to n-2. In step k, a Householder reflector is constructed to zero out the entries below the subdiagonal in column k of the current matrix, specifically targeting positions (k+2:n, k). The reflector is applied from both the left and right to maintain similarity: first, A := H_k A to eliminate the subcolumn entries, then A := A H_k^H to update the trailing submatrix. The Householder vectors are typically stored in the lower triangle of A below the subdiagonal, with scalar parameters stored separately for efficient application.[11] This in-place storage avoids extra memory overhead.[14]
A Householder reflector is defined as
H = I - \frac{2 \mathbf{v} \mathbf{v}^H}{\mathbf{v}^H \mathbf{v}},
where \mathbf{v} is chosen such that H \mathbf{x} aligns a target vector \mathbf{x} (the subcolumn from position k+1) with a multiple of the first standard basis vector, thereby zeroing the desired entries while introducing no fill above the subdiagonal.[15] The vector \mathbf{v} is normalized for computational convenience, often with \|\mathbf{v}\| = 1, and the transformation can be applied using Level-2 BLAS operations for the panel and Level-3 for updates in blocked implementations.[14]
The computational cost of the reduction is approximately \frac{10}{3} n^3 floating-point operations for an n \times n matrix, dominated by the matrix-matrix multiplications in the trailing submatrix updates.[11] Each step k requires O((n-k)^2) operations, leading to the cubic total, though blocked variants in libraries like LAPACK achieve high performance on modern hardware by exploiting cache locality.[14]
Orthogonal transformations like Householder reflectors ensure backward stability, meaning the computed H is the exact Hessenberg form of a slightly perturbed input matrix, with the perturbation bounded independently of conditioning.[15] This stability is crucial for subsequent eigenvalue solvers, as it avoids magnifying rounding errors in the spectrum.[11]
Givens Rotations
Givens rotations, also known as Jacobi rotations, provide an alternative method to Householder transformations for reducing a matrix to upper Hessenberg form through orthogonal similarity transformations. These rotations are particularly effective when targeting specific entries to zero out while minimizing disruption to the matrix structure. A Givens rotation in the plane of rows p and q is defined by an orthogonal matrix G_{p,q}(\theta) that differs from the identity only in the $2 \times 2 block corresponding to those rows:
G_{p,q}(\theta) = \begin{pmatrix}
I_{p-1} & & \\
& \cos \theta & \sin \theta & \\
& -\sin \theta & \cos \theta & \\
& & & I_{n-q}
\end{pmatrix},
where \theta is chosen such that premultiplication by G_{p,q}^T zeros a selected subdiagonal entry in the column below the subdiagonal.[11]
The algorithm employs a sequence of such rotations to systematically zero all entries below the first subdiagonal in an n \times n matrix A. For each column j = 1, \dots, n-2, rotations are applied starting from the bottom: a rotation G_{n,j+1}^T zeros a_{n,j}, followed by G_{n-1,j+1}^T to zero a_{n-1,j}, and continuing upward until G_{j+2,j+1}^T zeros a_{j+2,j}. This process introduces a nonzero "bulge" below the subdiagonal in subsequent columns, which is then chased downward by applying additional rotations in planes (i, i-1) for i = j+3, \dots, n to restore the Hessenberg form without fill-in above the subdiagonal. The overall transformation is H = Q^T A Q, where Q is the product of all Givens rotations, ensuring H is upper Hessenberg. The procedure requires approximately $5n^3 flops for dense matrices.[16]
In sparse or banded matrices, Givens rotations offer advantages over Householder methods by affecting only two rows or columns at a time, thereby preserving sparsity patterns and minimizing fill-in during the zero-chasing process. For banded matrices with bandwidth b, the complexity reduces to O(n b^2) operations, making it suitable for structured problems where Householder reflections would introduce unnecessary computations across the entire subcolumn.[17]
However, for dense matrices, the accumulation of numerous individual rotations can lead to greater rounding error propagation compared to the fewer, broader Householder reflections, potentially reducing numerical stability in practice.
Decompositions and Algorithms
Hessenberg Decomposition
The Hessenberg decomposition theorem states that every square matrix A \in \mathbb{C}^{n \times n} admits a unitary similarity transformation to upper Hessenberg form: there exists a unitary matrix Q such that A = Q H Q^H, where H is an upper Hessenberg matrix with h_{ij} = 0 for all i > j + 1.[11]
This decomposition preserves the spectrum of A, as unitary similarities do not alter eigenvalues.[11]
The existence of the decomposition follows from a constructive proof that builds the unitary matrix Q via successive orthogonal transformations applied as similarity operations. Specifically, starting with the original matrix, one applies transformations to eliminate entries below the subdiagonal in the first column (leaving the (2,1) entry intact), then in the second column below the subdiagonal (leaving (3,2)), and so on up to the (n-1)th column; each step preserves previously zeroed entries due to the banded structure that emerges progressively. Householder reflections or Givens rotations serve as the orthogonal building blocks, ensuring the overall Q is unitary as a product of such factors.[11]
For real matrices A \in \mathbb{R}^{n \times n}, the theorem holds with a real orthogonal matrix Q (i.e., Q^H = Q^T) and a real upper Hessenberg matrix H, avoiding complex numbers entirely; this real form relates to the real Schur decomposition as an intermediate step toward quasi-triangular structure for handling complex conjugate eigenvalue pairs.[11]
The decomposition is not unique in general, but in the unreduced case—where all subdiagonal entries of H are nonzero—the Hessenberg form is unique up to the signs of those subdiagonal elements, which can be controlled by choices in the orthogonal transformations.[18] It is also possible to select Q such that \det(Q) = 1, for instance by adjusting phases in the unitary factors for the complex case or ensuring an even number of reflections with negative determinants in the real orthogonal case.
Connection to QR Algorithm
The Hessenberg form is essential to the practical implementation of the QR algorithm, an iterative method for computing the eigenvalues of a square matrix, as it serves as a prerequisite transformation that enables efficient subsequent iterations. A general matrix is first reduced to upper Hessenberg form via orthogonal similarity transformations, preserving its eigenvalues while simplifying the structure for the QR steps.[19]
In the QR algorithm applied to an upper Hessenberg matrix H_k, each iteration begins with a QR decomposition H_k = Q_k R_k, where Q_k is orthogonal and R_k is upper triangular, followed by the update H_{k+1} = R_k Q_k, which yields another upper Hessenberg matrix and thus preserves the form. This preservation allows each QR iteration to be performed in O(n^2) floating-point operations for an n \times n matrix, a substantial improvement over the O(n^3) cost for dense matrices without the Hessenberg structure.[19][19]
The algorithm converges to a form where the subdiagonal elements tend to zero, isolating eigenvalues on the diagonal (or in 2×2 blocks for complex conjugate pairs), enabling deflation and revealing the spectrum progressively from the corners.[19]
For real matrices, which may have complex conjugate eigenvalue pairs, the double-shift variant of the QR algorithm is employed to avoid complex arithmetic; it applies two shifts simultaneously, typically selected as the eigenvalues of the trailing 2×2 principal submatrix using the Wilkinson shift—defined as the eigenvalue closer to the (n,n) entry—to ensure rapid cubic convergence on unreduced Hessenberg blocks.[19][19]
This integration of Hessenberg form with the QR algorithm, originally proposed by John G. F. Francis in 1961–1962 and independently by Vera N. Kublanovskaya in 1961, became a cornerstone of practical eigenvalue solvers in the 1960s through refinements by J. H. Wilkinson, including shift strategies that enhanced reliability and speed.[20][20]
Applications and Implementations
Role in Eigenvalue Computation
Hessenberg matrices play a pivotal role in numerical eigenvalue computation by enabling significant efficiency gains in solving the standard eigenvalue problem for dense matrices. The initial reduction of a general n \times n matrix to upper Hessenberg form via orthogonal transformations requires O(n^3) floating-point operations, after which subsequent iterative steps, such as those in the QR algorithm, can be performed at a reduced cost of O(n^2) operations per iteration, making the overall process computationally tractable for moderate-sized problems. This structure exploitation is essential for practical implementations, as it avoids the full O(n^3) cost at every iteration that would otherwise be necessary for unreduced matrices.[1][21]
The Francis QR algorithm, introduced in the early 1960s, builds directly on the Hessenberg form to compute the full eigensystem, including eigenvalues and eigenvectors, for nonsymmetric matrices. By applying implicit double-shifted QR iterations to the Hessenberg matrix, the algorithm converges to the Schur form, from which eigenvalues are extracted from the diagonal blocks and eigenvectors can be recovered through back-substitution. This method has become the cornerstone of modern eigenvalue solvers due to its quadratic convergence properties and robustness for a wide range of matrix classes.[22][11]
Orthogonal reductions to Hessenberg form provide numerical stability, particularly for non-normal matrices where eigenvalues are sensitive to perturbations. The Householder-based reduction is backward stable, meaning the computed Hessenberg matrix is the exact form for a matrix close to the original, with the perturbation bounded independently of conditioning; this avoids error amplification that could occur with non-orthogonal methods, ensuring reliable eigenvalue approximations even when the matrix exhibits ill-conditioning.
In generalized eigenvalue problems of the form Ax = \lambda Bx, where A and B are n \times n matrices, Hessenberg reduction is applied to transform A into upper Hessenberg form while simultaneously reducing B to upper triangular form via generalized orthogonal transformations. This paired structure facilitates the QZ algorithm, an extension of the QR method, to compute the generalized eigenvalues and eigenvectors efficiently and stably. Recent advancements in large-scale eigenproblems, such as those in quantum simulations, continue to leverage Hessenberg-based techniques for improved stability and accuracy.[23]
Software and Numerical Implementations
The LAPACK library includes dedicated routines for Hessenberg reduction, such as the double-precision function DGEHRD, which computes an orthogonal similarity transformation to convert a real general matrix to upper Hessenberg form using Householder reflections.[24] This implementation stores the Householder vectors explicitly for later reconstruction of the transformation matrix Q if needed, and it supports workspace queries for optimal performance.[25]
The computational complexity of the DGEHRD routine is approximately \frac{10}{3} n^3 floating-point operations for an n \times n matrix, dominated by level-3 BLAS updates for efficiency on modern hardware.[14] Regarding numerical stability, the Householder-based reduction is backward stable, meaning the computed Hessenberg form \tilde{H} and transformation satisfy \tilde{Q}^T \tilde{H} \tilde{Q} = A + \Delta A where \frac{\|\Delta A\|}{\|A\|} = O(\epsilon) and \epsilon is the machine precision, ensuring reliable results for well-conditioned problems.[26]
Modern numerical frameworks build on LAPACK for accessible implementations. In Python, SciPy's scipy.linalg.hessenberg function performs the decomposition by interfacing with LAPACK routines, returning the Hessenberg matrix and optionally the transformation matrix, with support for both real and complex inputs.[27] The C++ Eigen library provides the HessenbergDecomposition class, which computes the form for dense matrices using a blocked Householder approach for better cache performance and scalability. For GPU acceleration, libraries like MAGMA (version 2.9.0 as of 2024) extend LAPACK-style reductions to NVIDIA and AMD GPUs via cuBLAS, cuSOLVER, and ROCm/HIP backends, with historical speedups up to 25× over CPU baselines reported for large matrices on hybrid systems.[28][29]
A high-level pseudocode outline for the classical Householder reduction algorithm, as implemented in LAPACK and similar libraries, proceeds column-by-column as follows:
for k = 1 to n-2 do
// Extract subcolumn below diagonal
x = A[k+1:n, k]
// Generate Householder reflector to zero x[2:end]
v = householder_vector(x) // v[1] = 1, unnormalized
beta = 2 / (v^T v)
// Apply reflector from left to trailing submatrix
A[k+1:n, k:n] -= beta * outer(v, v^T * A[k+1:n, k:n])
// Apply reflector from right to trailing submatrix
A[k+1:n, k+1:n] -= beta * outer(A[k+1:n, k+1:n] * v, v)
// Store v for later Q reconstruction (tau scalar in LAPACK)
end
for k = 1 to n-2 do
// Extract subcolumn below diagonal
x = A[k+1:n, k]
// Generate Householder reflector to zero x[2:end]
v = householder_vector(x) // v[1] = 1, unnormalized
beta = 2 / (v^T v)
// Apply reflector from left to trailing submatrix
A[k+1:n, k:n] -= beta * outer(v, v^T * A[k+1:n, k:n])
// Apply reflector from right to trailing submatrix
A[k+1:n, k+1:n] -= beta * outer(A[k+1:n, k+1:n] * v, v)
// Store v for later Q reconstruction (tau scalar in LAPACK)
end
This snippet assumes unnormalized v with v[30]=1 and includes the beta factor for the reflector; it captures the core similarity transformations that preserve eigenvalues while introducing zeros below the subdiagonal.[11]
Recent developments have integrated Hessenberg reductions into machine learning frameworks for handling tensor eigenvalue problems, particularly in differentiable programming environments. For instance, JAX's jax.scipy.linalg.hessenberg (introduced in November 2022) supports GPU-accelerated computations via XLA compilation, enabling autograd-compatible reductions for higher-order tensor decompositions in applications like quantum simulation and neural network spectral analysis.[31] Libraries built on JAX, such as matfree (2023), extend this to matrix-free Hessenberg factorizations for large sparse tensors, reducing memory footprint while supporting autodiff for gradient-based optimization.[32]
Hessenberg Operators
Definition and Construction
In functional analysis, a Hessenberg operator is defined as a bounded linear operator H on the separable Hilbert space \ell^2(\mathbb{N}) whose matrix representation (h_{i,j}) in the standard orthonormal basis \{e_n\}_{n=1}^\infty, with h_{i,j} = \langle H e_j, e_i \rangle, satisfies h_{i,j} = 0 for all i > j + 1.[33] This condition ensures that the matrix is upper Hessenberg, meaning all entries strictly below the first subdiagonal vanish, analogous to the finite-dimensional case where such matrices serve as nearly triangular forms for computational purposes.[34]
The general structure of a Hessenberg operator can be expressed via its infinite matrix form:
H = \begin{pmatrix}
h_{11} & h_{12} & h_{13} & h_{14} & \cdots \\
h_{21} & h_{22} & h_{23} & h_{24} & \cdots \\
0 & h_{32} & h_{33} & h_{34} & \cdots \\
0 & 0 & h_{43} & h_{44} & \cdots \\
\vdots & \vdots & \vdots & \vdots & \ddots
\end{pmatrix},
where the h_{i,j} are complex coefficients, and the operator is bounded if the \ell^2 norms of both the rows and columns are uniformly bounded.[35] A canonical example is the unilateral (forward) shift operator S on \ell^2(\mathbb{N}), defined by S e_n = e_{n+1} for n \geq 1, which has matrix entries s_{i,j} = \delta_{i,j+1} (1's on the subdiagonal and zeros elsewhere), rendering it an isometric Hessenberg operator with subdiagonal structure.[36]
Hessenberg operators commonly arise through the construction of compressions of multiplication operators in reproducing kernel Hilbert spaces of analytic functions, such as the Hardy space H^2 or Bergman space A^2 on the unit disk. In these spaces, the multiplication operator M_z by the coordinate function z is represented in an orthonormal basis of polynomials \{p_n\}_{n=0}^\infty by expanding z p_n(z) = \sum_{k=0}^\infty a_{k,n} p_k(z); the coefficients a_{k,n} form the entries of an infinite upper Hessenberg matrix due to the degree structure, ensuring a_{k,n} = 0 for k > n+1.[37] For instance, in the Hardy space H^2 with respect to Lebesgue measure on the unit circle, the Verblunsky coefficients parameterize an orthonormal basis leading to a unitary Hessenberg operator known as the CMV matrix.[38] Similarly, in the Bergman space A^2, the orthogonal polynomials yield a Hessenberg representation for M_z, often banded depending on the underlying measure.[39]
A key relation exists between Hessenberg operators and systems of orthogonal polynomials, particularly in the framework of moment problems. On the real line, the Jacobi matrix—a symmetric, tridiagonal (hence Hessenberg) operator—encodes the three-term recurrence relation for monic orthogonal polynomials \{p_n\} associated with a positive Borel measure \mu via x p_n(x) = p_{n+1}(x) + b_n p_n(x) + a_n p_{n-1}(x), where the off-diagonal entries a_n > 0 and diagonal b_n derive from the moments of \mu.[40] This matrix represents the compressed multiplication-by-x operator on L^2(\mathbb{R}, \mu) restricted to the polynomials, and solving the Hamburger moment problem involves determining \mu from the operator's spectral measure. In the complex plane or on the unit circle, generalizations yield non-symmetric or unitary Hessenberg forms tied to analytic orthogonal polynomials and associated moment functionals.[41]
The notion of Hessenberg operators builds on Karl Hessenberg's early 20th-century work on finite Hessenberg matrices for determinant theory and eigenvalue problems, extended to the infinite-dimensional setting in operator theory through studies of continued fractions and moment problems by H. S. Wall in the mid-20th century.
Properties in Infinite Dimensions
In infinite-dimensional Hilbert spaces, such as \ell^2(\mathbb{N}), Hessenberg operators are bounded linear operators represented by infinite Hessenberg matrices, where all entries below the first subdiagonal vanish for the upper Hessenberg form (h_{i,j} = 0 for i > j + 1), and all entries above the first superdiagonal vanish for the lower Hessenberg form (h_{i,j} = 0 for j > i + 1).[42] These operators exhibit subnormality when they admit a normal extension on a larger Hilbert space, a property that facilitates the analysis of their spectral behavior and ensures the existence of a spectral measure supported on the spectrum.[43] For compact Hessenberg operators, as for compact operators in general, the spectrum consists of 0 and a countable set of eigenvalues accumulating only at 0, which can be analyzed using the structure of the Hessenberg form.[44]
Hessenberg operators are intimately connected to orthogonal polynomials through the three-term recurrence relations satisfied by the polynomials in the associated basis. Specifically, the action of the operator on the orthonormal basis \{p_n\}_{n=0}^\infty generates the recurrence x p_n = a_n p_n + b_n p_{n-1} + c_n p_{n+1}, where the coefficients a_n, b_n, c_n populate the diagonal and off-diagonals of the Hessenberg matrix, linking the operator's structure directly to the moments of the underlying measure.[45] In this context, the characteristic equation for the eigenvalues of finite truncations approximates the spectral measure, expressed in terms of the moments \mu_k = \int x^k \, d\mu(x) via the determinant of the associated Hankel matrix truncated to the relevant order.[43]
Boundedness of a Hessenberg operator on \ell^2 requires the diagonal entries to be bounded and the subdiagonal (or superdiagonal) entries to exhibit controlled decay, such as \sup_n |d_{n,n+1}| < \infty for lower Hessenberg form, ensuring the operator norm remains finite.[42] Faster decay, like |d_{n,n+1}| = O(1/n), can render the operator compact, further simplifying spectral analysis.[44]
These properties underpin applications in approximation theory, where Hessenberg operators model multiplication by the independent variable in the orthogonal polynomial basis, enabling efficient computation of expansions and error estimates. In quadrature theory, the eigenvalues of finite Hessenberg truncations yield Gaussian quadrature nodes and weights, converging to the support of the spectral measure of the infinite operator as the dimension increases.[45]