Bidiagonal matrix
A bidiagonal matrix is a sparse matrix in linear algebra characterized by non-zero entries confined to the main diagonal and exactly one adjacent off-diagonal, either the superdiagonal (resulting in an upper bidiagonal matrix) or the subdiagonal (resulting in a lower bidiagonal matrix).[1] This structure is a special case of banded matrices with bandwidth 1, enhancing computational efficiency.
Bidiagonal matrices are fundamental in numerical linear algebra, serving as intermediates in key decomposition algorithms for dense and sparse matrices.[2] Notably, the Golub-Kahan-Lanczos bidiagonalization process reduces a general matrix to bidiagonal form as a precursor to computing its singular value decomposition (SVD), enabling stable and efficient eigenvalue analysis for applications in data compression, principal component analysis, and pseudoinverse computation.[3] They also underpin QR factorizations via Householder transformations, where iterative applications yield bidiagonal structures for solving least squares problems and eigenvalue computations.[4] Beyond general matrices, bidiagonal forms facilitate specialized solutions for structured systems, such as totally positive matrices and Vandermonde-like arrays, by exploiting factorizations that maintain numerical stability and accuracy in high-precision environments.[5] Products of bidiagonal matrices often exhibit desirable properties like total nonnegativity, enhancing their utility in optimization and approximation theory.[2]
Fundamentals
Definition
A bidiagonal matrix is a sparse matrix with all entries equal to zero except those on the main diagonal and exactly one adjacent off-diagonal band, either the superdiagonal (for upper bidiagonal matrices) or the subdiagonal (for lower bidiagonal matrices).[2] This structure results in at most $2k - 1 nonzero entries for a square k \times k matrix, making it a particularly efficient form for certain numerical computations.[2]
Unlike tridiagonal matrices, which permit nonzero entries on both the superdiagonal and subdiagonal in addition to the main diagonal, a bidiagonal matrix restricts nonzeros to only one off-diagonal band, yielding a narrower bandwidth.[6] Bidiagonal matrices form a subclass of banded matrices, where nonzeros are confined within a fixed distance from the main diagonal, but with the minimal bandwidth of two for the nonzero bands.[6]
The concept extends naturally to rectangular matrices. For an upper bidiagonal matrix B \in \mathbb{R}^{m \times n}, the general entry satisfies b_{ij} = 0 if |i - j| > 1 or i > j, so nonzeros appear only on the main diagonal (for i = j \leq \min(m,n)) and the superdiagonal (for i = j - 1, up to the dimensions).[4] An analogous definition holds for lower bidiagonal matrices, with nonzeros on the main diagonal and subdiagonal (i = j + 1).[4]
Notation and Examples
A bidiagonal matrix is specified by its non-zero entries along the main diagonal and one adjacent off-diagonal, either the superdiagonal (for upper bidiagonal) or subdiagonal (for lower bidiagonal). In standard notation, an upper bidiagonal matrix B \in \mathbb{R}^{m \times n} (with m \geq n) has diagonal elements d_i for i = 1, \dots, n and superdiagonal elements e_i for i = 1, \dots, n-1, with all other entries zero.[7] Similarly, a lower bidiagonal matrix has diagonal elements d_i and subdiagonal elements f_i for i = 2, \dots, m, assuming m \geq n.[7]
For a square $3 \times 3 upper bidiagonal matrix, the structure takes the form
\begin{pmatrix}
a & b & 0 \\
0 & c & d \\
0 & 0 & e
\end{pmatrix},
where a, c, e are the diagonal entries and b, d are the superdiagonal entries.[8] A corresponding $3 \times 3 lower bidiagonal matrix is
\begin{pmatrix}
a & 0 & 0 \\
f & c & 0 \\
0 & g & e
\end{pmatrix},
with f, g on the subdiagonal.[8]
Rectangular bidiagonal matrices follow analogous patterns, with the off-diagonal truncated at the matrix boundaries. For example, a $4 \times 3 upper bidiagonal matrix is
\begin{pmatrix}
a & b & 0 \\
0 & c & d \\
0 & 0 & e \\
0 & 0 & 0
\end{pmatrix},
where the superdiagonal ends after the second row due to only three columns, and the fourth row is entirely zero.[7] In larger matrices, this bidiagonal structure implies a sparse, band-like form concentrated along the specified diagonals, facilitating efficient storage and computation.[8]
Properties
Structural Properties
A bidiagonal matrix is a sparse matrix with non-zero entries confined to the main diagonal and one adjacent off-diagonal (either the superdiagonal for upper bidiagonal or subdiagonal for lower bidiagonal). For an n \times n square bidiagonal matrix, there are exactly $2n - 1 non-zero entries: n on the main diagonal and n-1 on the off-diagonal.[9] This sparsity pattern gives bidiagonal matrices a bandwidth of 1 (or semi-bandwidth of 1), meaning the non-zeros are limited to positions where the row and column indices differ by at most 1.[9]
The rank of a square bidiagonal matrix is at most n, and it achieves full rank if all its non-zero entries are non-zero, particularly if every diagonal element is non-zero.[9] Rank deficiency occurs if one or more diagonal elements are zero, as this decouples the matrix into independent blocks of lower dimension, reducing the overall rank accordingly.[9]
Bidiagonal matrices are not closed under multiplication, as the product of two bidiagonal matrices generally yields a tridiagonal matrix with non-zeros on the main diagonal and both adjacent off-diagonals.[9] However, certain operations preserve the bidiagonal structure; for instance, the product of a bidiagonal matrix and a diagonal matrix remains bidiagonal, since the diagonal matrix scales the rows or columns without introducing new non-zeros.[9]
The trace of a square bidiagonal matrix, being the sum of its diagonal elements, is simply the sum of the n main diagonal entries.[9] For a triangular bidiagonal matrix (upper or lower), the determinant is the product of its diagonal elements, as the matrix is essentially triangular with the off-diagonal elements not affecting the determinant computation.[9]
Spectral Properties
A symmetric bidiagonal matrix, being a real symmetric matrix with nonzeros restricted to the main diagonal and the adjacent off-diagonals, possesses real eigenvalues, a fundamental property of all real symmetric matrices. These eigenvalues can be determined using specialized numerical methods that exploit the matrix's banded structure, though such methods are beyond the scope of theoretical properties here.[9]
For a general bidiagonal matrix B, the singular values \sigma_i(B) are defined as the square roots of the eigenvalues of either B^\top B or B B^\top, both of which are symmetric tridiagonal matrices whose spectral structure inherits the sparsity of B. This connection facilitates analysis of the singular value distribution through the well-studied eigenvalues of tridiagonal matrices.[9] If all entries of a bidiagonal matrix are nonzero, its singular values are distinct.[10]
The Gershgorin circle theorem provides a localization for the eigenvalues of any bidiagonal matrix: each eigenvalue lies within at least one of the disks in the complex plane centered at a diagonal entry a_i with radius equal to the absolute value of the single adjacent off-diagonal entry in that row (or column, by equivalence). For an upper bidiagonal matrix, the disks are centered at (a_1, |e_1|), (a_2, |e_2|), \dots, (a_{n-1}, |e_{n-1}|), (a_n, 0), where e_i are the superdiagonal entries; this often yields tight bounds due to the minimal off-diagonal support.[9]
In the simple case of a $2 \times 2 upper bidiagonal matrix
\begin{pmatrix}
a & b \\
0 & c
\end{pmatrix},
the matrix is upper triangular, so its eigenvalues are precisely the diagonal entries a and c.[9]
For an upper bidiagonal matrix with positive diagonal entries, the singular values are positive and strictly decreasing, reflecting the matrix's nonnegative structure and ordered spectral decay.
The structural sparsity of bidiagonal matrices aids in efficient spectral computations by reducing the complexity of associated operations.[9]
Algorithms
Bidiagonalization
Bidiagonalization refers to the process of reducing a general m \times n matrix A (with m \geq n) to upper bidiagonal form through a series of orthogonal transformations, primarily to simplify the computation of its singular value decomposition (SVD). This reduction preserves the singular values of A and is a foundational step in numerical linear algebra algorithms for SVD. The resulting bidiagonal matrix facilitates subsequent iterative methods for singular value extraction by concentrating the non-zero structure along the main diagonal and the superdiagonal.[11]
The seminal Golub-Kahan bidiagonalization algorithm, introduced in 1965, achieves this reduction using Householder reflections applied alternately from the left and right sides of the matrix. It produces orthogonal matrices U \in \mathbb{R}^{m \times m} and V \in \mathbb{R}^{n \times n}, along with an upper bidiagonal matrix B \in \mathbb{R}^{m \times n}, such that
A = U B V^T,
where B has non-zero entries only on its main diagonal and superdiagonal. The process begins with the first column of A: a left Householder transformation U_1 is constructed to zero out all entries below the (1,1) position in that column, effectively performing an initial QR-like step. Then, a right Householder transformation V_1 is applied to the first row of the transformed matrix to zero out entries to the right of the (1,2) position, except for the superdiagonal. This alternation continues iteratively for subsequent columns and rows: for each stage k = 1, \dots, n-1, a left reflector U_k zeros elements below the diagonal in column k, followed by a right reflector V_k that zeros elements beyond the superdiagonal in row k. The final stage for k = n applies only a left reflector if necessary. These transformations are accumulated to form the full U and V.[11][12]
The algorithm's steps can be outlined in pseudocode as follows, assuming real arithmetic and focusing on the core Householder applications:
for k = 1 to n
Form the Householder vector u_k for the subcolumn A(k:m, k) to zero below the diagonal
Apply left reflection: A(k:m, k:n) ← I - 2 u_k u_k^T times A(k:m, k:n)
Accumulate U ← U (I - 2 u_k u_k^T)
if k < n
Form the Householder vector v_k for the subrow A(k, k+1:n) to zero beyond the superdiagonal
Apply right reflection: A(k:m, k+1:n) ← A(k:m, k+1:n) (I - 2 v_k v_k^T)
Accumulate V ← V (I - 2 v_k v_k^T)
end if
end for
for k = 1 to n
Form the Householder vector u_k for the subcolumn A(k:m, k) to zero below the diagonal
Apply left reflection: A(k:m, k:n) ← I - 2 u_k u_k^T times A(k:m, k:n)
Accumulate U ← U (I - 2 u_k u_k^T)
if k < n
Form the Householder vector v_k for the subrow A(k, k+1:n) to zero beyond the superdiagonal
Apply right reflection: A(k:m, k+1:n) ← A(k:m, k+1:n) (I - 2 v_k v_k^T)
Accumulate V ← V (I - 2 v_k v_k^T)
end if
end for
This iterative QR-like factorization builds the bidiagonal structure progressively while maintaining numerical stability through the orthogonal nature of Householder reflectors. The computational complexity of the Golub-Kahan procedure is O(m n^2) floating-point operations for m \geq n, dominated by the matrix-vector products in the later iterations.[11][12][13]
While the focus here is on general rectangular matrices, a related variant for symmetric matrices reduces them to tridiagonal form using only left (or right) Householder transformations, as the bidiagonal structure simplifies further due to symmetry; however, the Golub-Kahan method remains the standard for the nonsymmetric case.[11]
Singular Value Computation
Computing the singular value decomposition (SVD) of a bidiagonal matrix represents the primary computational bottleneck in the full SVD of a general matrix after bidiagonalization, often accounting for a significant portion of the total runtime due to the need for high accuracy across all singular values.[14] For an n \times n upper bidiagonal matrix B with diagonal entries d_1, d_2, \dots, d_n and superdiagonal entries e_1, e_2, \dots, e_{n-1}, the SVD takes the form B = U \Sigma V^T, where U and V are orthogonal matrices, and \Sigma = \operatorname{diag}(\sigma_1, \sigma_2, \dots, \sigma_n) contains the nonnegative singular values \sigma_i in nonincreasing order. Equivalently, this is expressed via iterative orthogonal transformations as Q^T B P = \Sigma, where Q and P accumulate the left and right singular vectors, respectively.[15]
The classical algorithm for this computation is a bidiagonal variant of the QR iteration, which applies successive QR decompositions with implicit shifts to deflate and converge to the singular values. In each iteration, a shift is chosen (often based on the Wilkinson shift from the bottom 2x2 submatrix) to accelerate convergence, followed by a bulge-chasing phase that propagates the transformation through the bidiagonal structure, leading to deflation of zero superdiagonal elements. This process isolates singular values one by one from the bottom, with the accumulated transformations yielding the singular vectors upon request. The method ensures quadratic convergence near the singular values and is implemented in the LAPACK routine DBDSQR, which handles both real and complex cases.[16][15]
To improve accuracy in floating-point arithmetic, particularly for small singular values prone to underflow or cancellation errors, zero-shift QR iterations are employed, avoiding explicit shifts and relying on the inherent deflation properties of the bidiagonal form. These zero-shift variants compute singular values to high relative accuracy independent of their magnitude, as demonstrated in analyses showing bounded relative errors after a fixed number of steps. For enhanced performance, especially on parallel architectures, divide-and-conquer approaches like the MR³ algorithm recursively split the bidiagonal matrix into smaller subproblems, solving them independently and combining via low-rank updates from rank-revealing decompositions. This method, building on earlier zero-shift techniques, reduces the asymptotic complexity and is competitive with QR iteration in speed while maintaining accuracy.[17][18]
Overall, these algorithms achieve a computational complexity of O(n^2) operations for the bidiagonal SVD of an n \times n matrix, dominated by O(n) iterations each costing O(n) time due to the sparse structure, contrasting with the O(n^3) cost of the preceding bidiagonalization. LAPACK's DBDSDC routine implements a divide-and-conquer variant for faster execution when singular vectors are not required. Numerical experiments confirm that these methods deliver singular values accurate to nearly machine precision across a wide range of matrix conditionings.[19][20]
Applications
In Numerical Linear Algebra
Bidiagonal matrices serve as a crucial intermediate form in the computation of the singular value decomposition (SVD) of general matrices, particularly through the Golub-Reinsch algorithm, which reduces an arbitrary matrix to bidiagonal form using Householder transformations before applying iterative methods to extract the singular values. This approach enables stable and efficient SVD computation by simplifying the problem to a form where only the diagonal and one superdiagonal (or subdiagonal) elements are nonzero, reducing the complexity of subsequent iterations.[15] The foundational work on this bidiagonalization procedure and its application to SVD was developed by Golub and Kahan in the 1960s, with their 1965 paper introducing methods to calculate singular values and the pseudoinverse directly from the bidiagonal form.[21]
In the context of eigenvalue problems, bidiagonal matrices integrate with variants of the QR algorithm, as the singular values of a bidiagonal matrix correspond to the eigenvalues of an associated symmetric tridiagonal matrix, allowing eigenvalue techniques to be adapted for SVD computation on the bidiagonal form. This connection facilitates the use of QR iterations on the tridiagonal counterpart, enhancing the efficiency of singular value refinement while maintaining the structure's sparsity.[21]
For least squares problems, the bidiagonal form aids in solving overdetermined systems by leveraging the SVD, where the reduced matrix allows for straightforward computation of the minimum-norm solution via the pseudoinverse, avoiding direct formation of the normal equations that can amplify conditioning issues. This is particularly valuable in applications requiring the solution to \min \| Ax - b \|_2, as the SVD of the bidiagonal intermediate yields the necessary decompositions with controlled error propagation.[15]
The use of bidiagonal forms preserves numerical accuracy for ill-conditioned matrices, as the Householder-based reduction ensures backward stability, with perturbations bounded by machine epsilon times the matrix norm, preventing loss of information in nearly singular cases. For instance, consider a bidiagonal matrix B with SVD B = U \Sigma V^T; the pseudoinverse is then B^+ = V \Sigma^+ U^T, where \Sigma^+ inverts the nonzero singular values, enabling direct computation of the least squares solution x = B^+ b for a corresponding system, often with relative errors on the order of the condition number times machine precision. This example illustrates how the bidiagonal SVD not only simplifies pseudoinverse calculation but also supports stable handling of rank-deficient problems in numerical linear algebra.[22][21]
In Optimization and Other Fields
Bidiagonal matrices play a significant role in trust-region methods for solving large-scale sparse nonlinear least squares problems. In these methods, the LSQR algorithm, which relies on bidiagonalization of the Jacobian matrix, is used to approximately solve the trust-region subproblem by generating an efficient path for the step computation. This approach ensures numerical stability and satisfies the necessary conditions for global convergence, outperforming alternatives like the conjugate gradient least squares (CGLS) method in terms of iteration count and residual reduction on benchmark problems.[23]
In modern machine learning, particularly in online convex optimization for training deep neural networks, bidiagonal matrices facilitate structured sparsity in second-order methods such as the Sparsified Online Newton (SONew) algorithm. Here, a unit lower bidiagonal matrix forms part of the tridiagonal approximation in minimizing the LogDet divergence, enabling linear time and space complexity while achieving optimal regret bounds of O(\sqrt{T}) for T iterations. This structure enhances scalability to billion-parameter models, yielding up to 30% faster convergence and 3.4% accuracy improvements over first-order optimizers like Adam on tasks involving vision transformers and graph neural networks.[24]
In signal processing, bidiagonal factorizations of Fourier matrices enable systolic array implementations for discrete Fourier transforms (DFTs), requiring 2n-1 cycles on arrays of n² processors while maintaining accuracy for spectral analysis tasks.[25]