Fact-checked by Grok 2 weeks ago

QR decomposition

QR decomposition, also known as QR factorization, is a fundamental technique in linear algebra that decomposes a real or A of size m \times n (with m \geq n) into the product A = QR, where Q is an m \times n with orthonormal columns (i.e., Q^H Q = I_n, with ^H denoting the ) and R is an n \times n . This factorization preserves the column space of A in Q and captures the triangular structure in R, making it particularly useful for in computations. The method was first systematically developed by Gene H. Golub in 1965 as a stable numerical tool for solving linear least squares problems, leveraging orthogonal transformations to avoid error amplification common in direct methods. QR decomposition can be computed via the classical Gram-Schmidt process, which orthogonalizes the columns of A sequentially, or more stably using Householder reflections or Givens rotations, which introduce zeros below the diagonal of R through a series of unitary transformations. For full rank matrices, Q can be extended to an m \times m orthogonal/unitary matrix, and variants like thin QR or rank-revealing QR handle rectangular or rank-deficient cases by incorporating column pivoting to reveal numerical rank. Key applications of QR decomposition include solving overdetermined linear systems Ax = b by transforming them into triangular systems via Rx = Q^H b, which is backward stable and avoids ill-conditioning issues in Gaussian elimination. It also forms the basis for the QR algorithm in eigenvalue computation, where iterative applications yield the Schur decomposition for finding eigenvalues and eigenvectors of general matrices. Additional uses span least squares optimization in data fitting, signal processing for MIMO systems, and preconditioning in iterative solvers for large-scale problems, highlighting its role in modern numerical linear algebra libraries like LAPACK.

Definitions and Formulations

Square Matrices

For an n \times n A, the QR decomposition provides an orthogonal-triangular A = QR, where Q is an n \times n satisfying Q^T Q = I_n and R is an n \times n with positive diagonal entries. The columns of Q form an for \mathbb{R}^n, spanning the entire space and preserving lengths and angles under the transformation defined by Q. This leverages the properties of to simplify various operations while maintaining . The uniqueness of the QR decomposition for square invertible matrices follows directly from the conditions imposed: if A = Q_1 R_1 = Q_2 R_2 with both R_1 and R_2 upper triangular and having positive diagonals, then Q_1 = Q_2 and R_1 = R_2. This uniqueness arises because the orthogonal factor Q is determined by successively orthogonalizing the columns of A without sign ambiguities on the diagonal of R, ensuring a . Geometrically, the QR decomposition interprets A as a of an orthogonal transformation followed by a triangular one: Q rotates or reflects the standard of \mathbb{R}^n to align with an orthonormal basis for the column space of A, while R captures the relative scalings along these directions and the shearing effects between them. This view highlights how the decomposition separates the (length-preserving) component from the volumetric changes encoded in R. The QR decomposition for square matrices, based on the Gram-Schmidt orthogonalization process, was developed as a numerically stable tool in mid-20th-century , with Gene H. Golub advancing its applications in 1965.

Rectangular Matrices

For a full rank m \times n A with m \geq n, the QR decomposition takes the form A = QR, where Q is an m \times m and R is an m \times n whose first n rows are nonzero and the remaining m - n rows are zero. This full factorization extends the square case by embedding the decomposition into a larger orthogonal factor, preserving the property that Q^T Q = I_m. In practice, the economy or reduced QR decomposition is often preferred for efficiency, expressed as A = \hat{Q} \hat{R}, where \hat{Q} is an m \times n matrix with orthonormal columns (satisfying \hat{Q}^T \hat{Q} = I_n) and \hat{R} is an n \times n upper triangular matrix. This thin form avoids the unnecessary (m - n) \times (m - n) block of identity in the full Q, reducing storage and computation costs when m \gg n. If A has full column rank, \hat{R} is invertible, enabling direct computation of least squares solutions via back-substitution. For the case m < n (a fat matrix), the full QR decomposition is A = QR with Q an m \times m orthogonal matrix and R an m \times n upper triangular matrix. Here, no reduced form is typically emphasized, as the orthogonal factor is already square and minimal in rows. In contrast, for tall matrices (m > n), the thin QR is the standard variant, focusing on the column space without excess dimensions. The columns of Q (or \hat{Q} in the reduced form) form an orthonormal basis for the column space of A, capturing its range exactly. The upper triangular structure of R (or \hat{R}) facilitates forward or backward substitution in applications like solving overdetermined systems, as the triangular form allows efficient triangular solves. For the thin QR, the equation A = QR holds with Q^T Q = I_n and R upper triangular, ensuring uniqueness up to signs in the diagonal of R for full rank A. The QL, RQ, and LQ decompositions form a family of orthogonal-triangular factorizations analogous to the standard QR decomposition, differing primarily in the placement of the orthogonal and triangular components. These variants arise naturally in when alternative triangular structures facilitate specific algorithmic needs, such as processing matrices from the bottom-up or row-wise. In the QL decomposition, an m \times n matrix A is factored as A = QL, where Q is an m \times m and L is an m \times n lower trapezoidal matrix. For square matrices, the diagonal elements of L are conventionally chosen to be positive. This form is computed by applying orthogonal transformations in reverse order compared to QR, effectively orthogonalizing columns from the bottom. QL proves useful in eigenvalue computations, particularly in the QL step of algorithms for symmetric tridiagonal matrices, where it aids in implicit shifting for improved convergence. The RQ decomposition factors A = RQ, with R an m \times n upper trapezoidal matrix and Q an n \times n . It corresponds to the transpose of the QR decomposition of A^T, thereby relating to row echelon forms when rows are prioritized. RQ is advantageous for row-wise processing in certain or structured algorithms, where upper triangular structure aligns with row operations. The LQ decomposition expresses A = LQ, where L is an m \times n lower trapezoidal matrix and Q is an n \times n . As the dual of QR, it is obtained via the QR factorization of A^T followed by . LQ is particularly suited to underdetermined systems, enabling the computation of minimum-norm solutions by providing an for the row space. For a square A \in \mathbb{R}^{n \times n}, the QL form is A = QL with L lower triangular and positive diagonal entries, mirroring the QR convention but inverted in triangular orientation. The following table summarizes the factor positions across these decompositions:
DecompositionFormOrthogonal FactorTriangular FactorPrimary Utility
QRA = QRLeft-multipliedUpper, rightColumn space basis
QLA = QLLeft-multipliedLower, rightBottom-up eigenvalue steps
RQA = RQRight-multipliedUpper, leftRow-wise or echelon processing
LQA = LQRight-multipliedLower, leftMinimum-norm solutions

Algorithms for Computation

Gram-Schmidt Orthogonalization

The classical Gram-Schmidt orthogonalization process provides a direct method to compute the QR decomposition of . The algorithm begins by setting the columns of A as \mathbf{a}_1, \dots, \mathbf{a}_n, and proceeds to generate the columns of Q as \mathbf{q}_1, \dots, \mathbf{q}_n through successive projections and normalizations. Specifically, initialize \mathbf{q}_1 = \mathbf{a}_1 / \|\mathbf{a}_1\|_2, where \|\cdot\|_2 denotes the Euclidean norm. For each subsequent column k = 2, \dots, n, compute the projection coefficients r_{jk} = \mathbf{q}_j^T \mathbf{a}_k for j = 1, \dots, k-1, subtract the sum of these projections from \mathbf{a}_k to obtain \mathbf{u}_k = \mathbf{a}_k - \sum_{j=1}^{k-1} r_{jk} \mathbf{q}_j, and normalize \mathbf{q}_k = \mathbf{u}_k / \|\mathbf{u}_k\|_2 with the diagonal entry r_{kk} = \|\mathbf{u}_k\|_2. The upper triangular matrix R is then formed with off-diagonal entries r_{jk} for j < k and zeros below the diagonal. This process ensures that Q has orthonormal columns and A = QR, as each \mathbf{q}_k is orthogonal to the previous basis vectors by construction. To illustrate, consider the $2 \times 2 matrix A = \begin{pmatrix} 3 & 1 \\ 1 & 2 \end{pmatrix}. First, \mathbf{a}_1 = \begin{pmatrix} 3 \\ 1 \end{pmatrix}, so \|\mathbf{a}_1\|_2 = \sqrt{10} and \mathbf{q}_1 = \begin{pmatrix} 3/\sqrt{10} \\ 1/\sqrt{10} \end{pmatrix}, with r_{11} = \sqrt{10}. For \mathbf{a}_2 = \begin{pmatrix} 1 \\ 2 \end{pmatrix}, the projection coefficient r_{12} = \mathbf{q}_1^T \mathbf{a}_2 = 5/\sqrt{10}. Then, \mathbf{u}_2 = \mathbf{a}_2 - r_{12} \mathbf{q}_1 = \begin{pmatrix} -0.5 \\ 1.5 \end{pmatrix}, \|\mathbf{u}_2\|_2 = \sqrt{5/2}, and \mathbf{q}_2 = \begin{pmatrix} -0.5 / \sqrt{5/2} \\ 1.5 / \sqrt{5/2} \end{pmatrix} = \begin{pmatrix} -1/\sqrt{10} \\ 3/\sqrt{10} \end{pmatrix}, with r_{22} = \sqrt{5/2}. Thus, Q = \begin{pmatrix} 3/\sqrt{10} & -1/\sqrt{10} \\ 1/\sqrt{10} & 3/\sqrt{10} \end{pmatrix} and R = \begin{pmatrix} \sqrt{10} & 5/\sqrt{10} \\ 0 & \sqrt{5/2} \end{pmatrix}. The classical Gram-Schmidt algorithm is simple and intuitive, as it explicitly builds an orthonormal basis from the original columns through projections, making it easy to implement and understand conceptually. It directly produces the Q factor with orthonormal columns, which is advantageous for applications requiring an explicit orthogonal basis. However, the algorithm suffers from numerical instability in finite-precision arithmetic, primarily due to the quadratic growth of rounding errors in the subtraction steps of the projections, leading to a loss of orthogonality in the computed Q. This instability arises because errors in earlier \mathbf{q}_j propagate and amplify in subsequent orthogonalizations, potentially making the columns of Q only approximately orthogonal for ill-conditioned matrices. A variant known as modified Gram-Schmidt addresses some instability by altering the order of projections and normalizations, and applying this process to the rows of A (or equivalently, to A^T) yields an RQ decomposition where A = RQ with R lower triangular. For improved stability without such issues, methods like Householder reflections are preferred in practice.

Householder Reflections

The Householder method for QR decomposition relies on a sequence of orthogonal Householder reflections to triangularize the matrix. For an m \times n matrix A with m \geq n, the algorithm applies n Householder matrices H_k (for k = 1, \dots, n) from the left such that H_n H_{n-1} \cdots H_1 A = R, where R is upper triangular and each H_k introduces zeros below the diagonal in the k-th column of the current matrix. Each H_k acts on rows k to m and is defined to reflect the subvector x = A(k:m, k) onto a multiple of the first standard basis vector e_1, thereby zeroing out the subdiagonal entries in column k. The product Q = H_1 H_2 \cdots H_n is orthogonal, yielding the decomposition A = Q R. This approach was introduced by Alston S. Householder in his 1958 paper on unitary triangularization. The core of each reflection is the Householder transformation, an orthogonal matrix of the form H = I - \frac{2 u u^T}{u^T u}, where u is chosen to map the target subvector appropriately. Specifically, for the subvector x \in \mathbb{R}^{m-k+1}, compute \sigma = \|x\|_2; set \beta = -\operatorname{sign}(x_1) \sigma to avoid cancellation; then u = x - \beta e_1. The transformation H_k = I - \frac{2 u u^T}{u^T u} satisfies H_k x = \beta e_1, ensuring the desired zeros while preserving the Euclidean norm. In practice, the reflector is applied to the trailing submatrix without forming H_k explicitly, using O(m n^2) floating-point operations overall for m \geq n, which reduces to O(n^3) for square matrices. This efficiency stems from the rank-one update structure, allowing vectorized implementations in libraries like . To illustrate, consider the $3 \times 2 matrix A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}. For the first column, x = \begin{pmatrix} 1 \\ 3 \\ 5 \end{pmatrix}, \|x\|_2 = \sqrt{35}, \beta = -\sqrt{35} (since \operatorname{sign}(1) > 0), and u = \begin{pmatrix} 1 + \sqrt{35} \\ 3 \\ 5 \end{pmatrix}. The matrix H_1 = I - \frac{2 u u^T}{u^T u} is applied to A, zeroing entries below the (1,1) and updating the second column to \begin{pmatrix} R_{12} \\ 0 \\ 0 \end{pmatrix}, where R_{11} = -\sqrt{35}. For the second column of the updated matrix, take the subvector from row 2: y = \begin{pmatrix} y_2 \\ y_3 \end{pmatrix} (computed from the update), compute its Householder vector v = y - \gamma e_1 with \gamma = -\operatorname{sign}(y_2) \|y\|_2, and apply H_2 (a $3 \times 3 matrix with in the top-left) to zero below the (2,2) , yielding the full upper triangular R. The resulting Q can be formed explicitly if needed, though in applications it is often used implicitly via the stored reflectors. This example demonstrates how each step progressively triangularizes one column at a time. The method is backward stable, meaning the computed \hat{R} satisfies \hat{Q} \hat{R} = A + \Delta A with \|\Delta A\| \leq \epsilon O(n) \|A\| in the 2-norm under standard , where \epsilon is ; this stability arises because each reflection is orthogonal and thus preserves norms exactly in exact arithmetic, with rounding errors bounded componentwise. It also inherently maintains in Q, avoiding the loss of orthogonality that can plague projection-based methods like classical Gram-Schmidt. These properties make it the preferred choice for dense matrices in numerical software. However, storing the Householder vectors requires O(n^2) space for an n \times n matrix (each of the n vectors has length decreasing from n to 1), which is comparable to storing Q explicitly but more compact. The sequential algorithm can be less cache-friendly due to the irregular access patterns in applying reflectors to trailing submatrices, leading to potential performance bottlenecks on modern processors. To address these, blocked variants group multiple reflectors into a single block transformation, enabling level-3 BLAS operations for better data locality and parallelism on multicore systems; the WY representation, where a product of k Householder matrices is expressed as Q_k = I + W Y^T with W and Y block matrices, facilitates this by converting sequences of rank-one updates into matrix multiplications. For distributed-memory settings, particularly tall-skinny matrices (m \gg n), the TSQR algorithm applies Householder QR in a tree reduction fashion across processors, reducing communication from O(\log p) to O(1) words per processor for p processors while maintaining stability, and allows reconstruction of the implicit Householder vectors post-factorization. These extensions enhance scalability without sacrificing the core method's numerical reliability.

Givens Rotations

Givens rotations provide an alternative approach to computing the QR decomposition by successively applying a series of plane rotations to the original matrix A, transforming it into upper triangular form R, such that Q^T A = R where Q is the product of the inverse rotations. This method, introduced by , relies on unitary transformations that zero out one subdiagonal entry at a time, typically proceeding column by column from left to right and row by row from bottom to top within each column. A Givens rotation matrix G_{i,j}(\theta) is an orthogonal matrix that differs from the identity matrix only in the i-th and j-th rows and columns, forming a 2×2 rotation block: G_{i,j}(\theta) = \begin{pmatrix} I_{i-1} & & & \\ & c & & s & \\ & & I_{j-i-1} & & \\ & -s & & c & \\ & & & I_{m-j} & \end{pmatrix}, where c = \cos \theta and s = \sin \theta. To zero the entry a_{k,l} (with k > l) in column l using a left multiplication by G_{l,k}(\theta), the parameters are chosen as r = \sqrt{a_{l,l}^2 + a_{k,l}^2}, c = a_{l,l}/r, and s = a_{k,l}/r (with appropriate sign conventions to avoid numerical cancellation). Each such rotation affects only the two specified rows, updating the relevant entries in the current matrix. The full algorithm applies approximately n(n-1)/2 such rotations for an n \times n , accumulating the transformations to form Q. For implementation, the rotations are often stored explicitly or as parameters to reconstruct Q later, and the process yields R in place of A. Unlike bulk transformations, this incremental zeroing allows targeted control over sparsity patterns. Consider a 3×3 A = \begin{pmatrix} 1 & 2 & 0 \\ 3 & 4 & 5 \\ 0 & 6 & 7 \end{pmatrix}. First, apply a Givens rotation in planes 1 and 2 to zero the (2,1) entry: r = \sqrt{1^2 + 3^2} = \sqrt{10}, c = 1/\sqrt{10}, s = 3/\sqrt{10}. The updated matrix becomes \begin{pmatrix} \sqrt{10} & \frac{7\sqrt{10}}{5} & \frac{3\sqrt{10}}{2} \\ 0 & -\frac{\sqrt{10}}{5} & \frac{\sqrt{10}}{2} \\ 0 & 6 & 7 \end{pmatrix}. Next, apply a rotation in planes 2 and 3 to zero the new (3,2) entry, using the current values in column 2 below the diagonal. This process yields an upper triangular R. For banded matrices like this, adjacent-plane rotations (i.e., |i-j|=1) maintain the bandwidth throughout. This method excels for banded or sparse matrices, as rotations between adjacent indices introduce minimal fill-in, preserving structure and reducing storage and computation compared to dense methods. Additionally, independent rotations (e.g., in non-overlapping row pairs) enable straightforward parallelization across processors or threads. However, for fully dense matrices, the need for O(n^2) rotations—each applied in O(n) time—results in an overall O(n^3) complexity with a larger constant factor than reflections, making it less efficient for such cases.

Numerical Properties and Stability

Relation to Determinants and Eigenvalues

For a square A with QR decomposition A = QR, where Q is an and R is upper triangular, the satisfies \det(A) = \det(Q) \det(R). Since Q is orthogonal, \det(Q) = \pm 1, and \det(R) is the product of its diagonal entries, so \det(A) = \pm \prod_{i=1}^n r_{ii}. This relation highlights how the QR factorization encodes the signed product of the diagonal elements of R as the of A. Furthermore, the of the is preserved under orthogonal transformations, yielding |\det(A)| = \prod_{i=1}^n |r_{ii}|. This equation links the volume scaling factor of A to the magnitudes of the R diagonals, as Q preserves volumes while R provides the triangular scaling. The eigenvalues of A connect to the QR decomposition through the : the product of the eigenvalues of A equals \det(A), so up to the sign from \det(Q), it matches \prod_{i=1}^n r_{ii}, and in absolute terms, \prod_{i=1}^n |\lambda_i| = \prod_{i=1}^n |r_{ii}|. This property provides theoretical insight into eigenvalue magnitudes via the QR factors. The QR iteration, originating in the work of John G. F. Francis in 1961–1962, leverages repeated QR decompositions to converge toward a triangular form where the diagonal entries approximate the eigenvalues of the original matrix. In this process, the product of the evolving R diagonals remains tied to the eigenvalue product (up to sign), facilitating the algorithm's relation to the Schur decomposition.

Column Pivoting for Stability

Column pivoting enhances the numerical stability of the QR decomposition by strategically reordering the columns of the matrix before applying orthogonal transformations. In the column-pivoted QR decomposition, the factorization takes the form A \Pi = QR, where A is an m \times n matrix with m \geq n, \Pi is an n \times n permutation matrix, Q is an m \times n matrix with orthonormal columns, and R is an n \times n upper triangular matrix. At each step of the algorithm, the permutation selects the column in the current trailing submatrix with the largest Euclidean norm as the pivot column, which mitigates the propagation of rounding errors and prevents premature loss of orthogonality in Q. This approach was first proposed by Businger and Golub in 1965 as a means to improve the reliability of least squares computations on ill-conditioned systems. The pivoting strategy is seamlessly integrated into standard QR algorithms, such as those based on reflections or Givens rotations. Before applying the to zero out the subdiagonal elements in the pivot column, the columns of the submatrix are permuted to place the entry with the maximum norm in the position. This process repeats for each column, ensuring that the diagonal elements of R are as large as possible relative to off-diagonal elements in the submatrices. Analysis shows that column pivoting bounds the growth of the elements in R, providing a computed that is backward stable with respect to small perturbations in A, and it effectively controls the of the leading principal submatrices of R. In the rank-revealing QR (RRQR) variant, stronger guarantees are achieved by selecting pivots that not only maximize norms but also minimize the ratio of singular values in the trailing submatrix, bounding the of the k \times k leading subfactor R_{11} by $1 + f(m,n,k) (\sigma_k / \sigma_{k+1}), where f is a moderate and \sigma are singular values. A primary benefit of column pivoting is its capacity to reveal the numerical of A, which is crucial for handling rank-deficient matrices. After factorization, the numerical r is estimated by inspecting the diagonal elements of R: specifically, r is the largest such that |r_{ii}| \geq \tau \|A\|_F for i = 1, \dots, r, where \tau is a user-defined often set to a multiple of times the Frobenius \|A\|_F. Small |r_{ii}| for i > r indicate linear dependence among the columns, allowing reliable detection of deficiency without computing the full . This feature is particularly valuable in rank-deficient problems, where unpivoted QR may fail to isolate the effective due to error accumulation. The column-pivoted QR decomposition provides revelation of the matrix structure for low-rank approximation. The form is A = Q R P^T, where P = \Pi^T is the permutation matrix, Q has orthonormal columns spanning an approximation of the column space, and R is upper triangular. For numerical rank r, the leading r \times r submatrix of R is well-conditioned, and the trailing part of R (with small diagonals) captures dependencies corresponding to the null space dimension. The first r columns of Q span the approximate column space, while an orthonormal basis for the null space can be obtained by solving the underdetermined system involving the trailing part of R and the corresponding permuted standard basis vectors. This enables precise rank determination and stable computations for underdetermined systems. While permutations introduce sign changes in the determinant (an even number of transpositions preserves the sign, odd reverses it), this effect is accounted for in applications requiring determinant evaluation. The decomposition is essential for robust numerical algorithms, as it ensures bounded error amplification in ill-conditioned or rank-deficient scenarios.

Applications

Solving Linear Systems

QR decomposition provides an effective method for solving linear systems Ax = b, where A is an m \times n matrix. For square invertible matrices (m = n), the factorization A = QR with Q orthogonal and R upper triangular transforms the system into Rx = Q^T b. The vector Q^T b is computed using the orthogonal property of Q by applying the sequence of Householder reflections, and x is then obtained by back-substitution on the triangular system Rx = c, where c = Q^T b. This approach requires O(n^2) operations for the solve phase after computing the QR factorization. In practice, explicit storage of Q is avoided to reduce memory usage; instead, the Householder vectors from the QR computation are stored, allowing Q^T b to be evaluated in O(mn) operations by applying a sequence of reflections. The solution can thus be expressed as x = R^{-1} (Q^T b), with back-substitution on R costing O(n^2) flops. This storage-efficient form is particularly useful in implementations like . For overdetermined systems (m > n), assuming A has full column rank, the thin QR decomposition A = [Q](/page/Q)[R](/page/R) (with Q m \times n having orthonormal columns and R n \times n upper triangular) yields the solution minimizing \|Ax - b\|_2 as x = R^{-1} (Q^T b). Here, Q^T b projects b onto the column space of A, followed by back-substitution, with the same computational costs as the square case for the solve. The numerical stability of this method stems from the backward stability of QR factorizations computed via or Givens transformations, ensuring the computed \hat{x} satisfies (A + \Delta A) \hat{x} = b + \Delta b with \|\Delta A\|_2 / \|A\|_2 = O(\epsilon) and \|\Delta b\|_2 / \|b\|_2 = O(\epsilon), where \epsilon is machine epsilon. The forward error \|x - \hat{x}\|_2 / \|x\|_2 is then bounded by approximately \kappa_2(A) times the backward error, where \kappa_2(A) = \|A\|_2 \|A^{-1}\|_2 is the 2-norm condition number; thus, well-conditioned systems yield accurate solutions. Column pivoting in QR can further enhance stability for ill-conditioned A.

Least Squares Problems

The linear least squares problem seeks to find the vector x \in \mathbb{R}^n that minimizes the Euclidean norm of the residual \|Ax - b\|_2, where A is an m \times n matrix with m > n and full column rank, and b \in \mathbb{R}^m. Given the thin QR decomposition A = QR, where Q is an m \times n matrix with orthonormal columns and R is an n \times n upper triangular matrix, the least squares problem reduces to minimizing \|Rx - Q^T b\|_2. Since the columns of Q are orthonormal, this minimization is equivalent to the original problem, as \|Ax - b\|_2 = \|Q(Rx - Q^T b) + (I - QQ^T)b\|_2 = \sqrt{ \|Rx - Q^T b\|_2^2 + \|(I - QQ^T)b\|_2^2 }, where the terms are orthogonal and the second term is the fixed component orthogonal to the column space of A. The solution x is obtained by solving the upper triangular system Rx = Q^T b via back substitution, and the minimum residual norm is \sqrt{ \|b\|_2^2 - \|Q^T b\|_2^2 }. This QR-based approach is numerically more stable than forming and solving the normal equations A^T A x = A^T b, because the condition number of A^T A is \kappa(A)^2, which squares the sensitivity to perturbations in A, whereas the QR method's backward stability is governed primarily by \kappa(A). In the rank-deficient case, where A does not have full column rank, a column-pivoted QR decomposition A \Pi = Q R (with permutation matrix \Pi) reveals the numerical rank r by identifying the number of significantly large diagonal entries in R before a sharp drop-off. The minimum-norm least squares solution is then computed by solving the reduced upper triangular system R_{11} x_{\mathcal{B}} = Q^T b for the basic variables corresponding to the first r pivoted columns, setting the nonbasic variables to zero, where R_{11} is the leading r \times r submatrix of R. For illustration, consider the A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}, b = \begin{pmatrix} 1 \\ 1 \\ 3 \end{pmatrix}. The thin QR decomposition is Q = \begin{pmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} \\ 0 & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} \end{pmatrix}, R = \begin{pmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} \\ 0 & \sqrt{\frac{3}{2}} \end{pmatrix} (up to signs), but solving Rx = Q^T b yields x = \begin{pmatrix} 4/3 \\ 4/3 \end{pmatrix}. The is Ax - b = \begin{pmatrix} 1/3 \\ 1/3 \\ -1/3 \end{pmatrix}, with $1/\sqrt{3} \approx 0.577.

Other Uses in Numerical Linear Algebra

The QR algorithm serves as a cornerstone for computing the eigenvalues and Schur decomposition of nonsymmetric matrices in numerical linear algebra. Introduced by Francis, the method begins with a matrix A_0 = A and iteratively computes the QR factorization A_k = Q_k R_k, followed by the update A_{k+1} = R_k Q_k, preserving the eigenvalues of A. Without modifications, the basic iteration reveals the eigenvalues on the diagonal as it converges slowly to upper triangular form; however, incorporating shifts—such as Wilkinson shifts—accelerates convergence to the real Schur form, where the matrix is quasi-triangular with 1×1 or 2×2 blocks on the diagonal corresponding to real or complex conjugate eigenvalue pairs. This shifted variant, also due to Francis, ensures quadratic convergence near the eigenvalues and is the basis for implementations in libraries like LAPACK. To enhance computational efficiency before applying QR iterations, matrices are typically reduced to upper Hessenberg form, where all entries below the first subdiagonal are zero, using a sequence of reflections. This reduction, which requires approximately \frac{10}{3}n^3 floating-point operations for an n \times n , preserves eigenvalues and minimizes fill-in during subsequent iterations, as each QR step on a costs only O(n^2) operations. The process applies n-2 transformations to eliminate subdiagonal elements column by column, accumulating the reflectors into an Q_H such that A = Q_H H Q_H^T, where H is upper Hessenberg; for symmetric matrices, this yields tridiagonal form. This preprocessing step is integral to the practical and was refined in early implementations to ensure . In the computation of the singular value decomposition (SVD), QR techniques play a key role through the Golub-Kahan bidiagonalization process, which preconditions the matrix for efficient iterative refinement. The algorithm applies alternating Householder reflections from the left and right to reduce a general m \times n matrix A to upper bidiagonal form B = P^T A Q, where P and Q are orthogonal, using O(mn^2) operations; this step leverages QR factorization principles to introduce structure while preserving singular values. Subsequent QR-like iterations, often with shifts, are then applied to the bidiagonal matrix to deflate and compute the singular values on the diagonal, converging rapidly due to the reduced bandwidth. This approach, foundational to SVD algorithms in numerical software, enables accurate computation even for ill-conditioned matrices by isolating small singular values early. QR decompositions are also updated efficiently for dynamic matrices in applications requiring incremental modifications, such as adaptive simulations or . For rank-one updates, where a column or row is added or modified, stable algorithms reorthogonalize the using Gram-Schmidt processes with selective reorthogonalization to maintain without full recomputation, achieving O(n^2) cost per update. Downdating—removing a column or row—is handled similarly by solving for the inverse transformation, ensuring backward stability in finite precision arithmetic. These methods, exemplified in the Daniel-Gragg-Kaufman-Stewart framework, are crucial for maintaining QR factors in evolving least-squares problems without excessive overhead.

Generalizations and Extensions

Complex and Sparse Matrices

QR decomposition extends naturally to matrices with entries, where the factorization A = QR has Q as a satisfying Q^H Q = I (with ^H denoting the Hermitian transpose) and R upper triangular. The standard algorithms, such as those based on reflections or Givens rotations, adapt to the complex case by replacing transposes with Hermitian transposes in the relevant computations, ensuring the unitarity property while maintaining similar to the real case. For example, in the Householder approach, the reflection vector is chosen to zero out subdiagonal elements, with the reflector defined using complex conjugation to preserve the unitary structure. For sparse matrices, QR decomposition must address the challenge of preserving the nonzero pattern to avoid excessive computational cost and storage overhead from fill-in during . Algorithms apply reflections or Givens rotations selectively—only to nonzeros that need elimination—rather than to the entire , thereby minimizing disruptions to sparsity. Multifrontal methods assemble the by frontal matrices corresponding to elimination trees, enabling efficient exploitation of sparsity through block operations on dense submatrices while controlling fill-in via ordering techniques. Supernodal methods further optimize this by grouping columns with identical nonzero patterns into supernodes, reducing overhead in sparse BLAS operations. Software implementations support these extensions effectively. The routine ZGEQRF computes the QR factorization for dense complex matrices, storing the essential information in a compact form for subsequent reconstruction of Q. For sparse cases, SuiteSparseQR provides a multifrontal, multithreaded implementation that uses reflections within frontal matrices to achieve high performance while revealing through column pivoting. A key challenge in sparse QR is fill-in minimization, where additional nonzeros arise during elimination; this is mitigated by preordering rows and columns using heuristics like minimum degree or minimum local fill to predict and reduce the growth in the sparsity pattern of R. Such orderings ensure that the factorization remains efficient for large-scale problems, though the choice depends on the matrix structure and desired trade-offs between fill-in and .

Block and Parallel Variants

To enhance performance on modern computer architectures with deep memory hierarchies, the classical QR factorization has been extended to a blocked variant that partitions into submatrices, allowing the application of multiple Householder transformations as a single block reflector via BLAS level-3 operations like matrix-matrix multiplication (). This blocking strategy reduces memory accesses by promoting data reuse in cache, achieving up to several times higher flop rates compared to unblocked level-2 BLAS implementations. The library implements this in the generic routine xGEQRF (e.g., DGEQRF for double precision), which computes the QR while storing the Householder vectors and scalars compactly for subsequent explicit formation of Q if needed. For large-scale matrices where the number of rows greatly exceeds the number of columns (m ≫ n), the tall-skinny QR (TSQR) algorithm provides a communication-optimal variant that minimizes data movement in distributed environments. TSQR recursively applies QR factorizations to row-partitioned blocks of the matrix, forming local R factors that are then reduced via a structure to yield the global R, with the overall Q reconstructed from the tree of transformations. This approach reduces communication volume from O(m n^2 / √P) to O(n^2 √P) words for P processors, making it suitable for processors with limited bandwidth, and maintains the same as classical QR. The algorithm was introduced by Demmel et al. in their work on communication-avoiding linear . In distributed-memory systems using MPI, the ScaLAPACK library extends blocked QR to parallel settings through the routine PDGEQRF, which employs a block-cyclic of across a processor grid to balance load and minimize communication. The algorithm proceeds by performing local QR panel factorizations on process columns, broadcasting information, and updating the trailing submatrix via parallel and reductions, scaling efficiently up to thousands of processors for matrices up to petabyte scale. This implementation builds on the blocked approach but incorporates collective operations like broadcasts and all-reduces to handle data , as detailed in the foundational ScaLAPACK design by , Dongarra, and colleagues. For GPU acceleration, NVIDIA's cuSOLVER library provides high-performance implementations of blocked QR via the cusolverDngeqrf routine, which mirrors LAPACK's interface and leverages kernels for panel and trailing updates, achieving high performance on GPUs for dense up to thousands by thousands. Additionally, cuBLAS supports batched QR through cublasgeqrfBatched for simultaneously factorizing multiple small or rectangular , enabling efficient of independent problems common in and simulations, with significant speedups, often 10x or more, over CPU baselines depending on size and GPU model. These GPU variants preserve the numerical properties of the blocked algorithm while exploiting massive thread parallelism for the inner loops. As of 2025, recent versions such as cuSOLVER 13.0 support advanced GPU architectures like Blackwell.