Schur decomposition
The Schur decomposition, also known as Schur triangulation, is a matrix decomposition in linear algebra that asserts every square complex matrix is unitarily similar to an upper triangular matrix.[1] Specifically, for any n \times n complex matrix A, there exist a unitary matrix Q (satisfying Q^* Q = I, where Q^* denotes the conjugate transpose) and an upper triangular matrix T such that A = Q T Q^*, with the diagonal entries of T being the eigenvalues of A.[2] This decomposition, named after the German mathematician Issai Schur (1875–1941), guarantees the existence of such a triangular form for any square matrix over the complex numbers, providing a canonical way to reveal the eigenvalues without requiring diagonalizability.[3]
Key properties of the Schur decomposition include its preservation of eigenvalues and algebraic multiplicities through unitary similarity, which maintains the spectrum of the original matrix.[2] The decomposition is not unique: the eigenvalues on the diagonal of T can be arranged in any order, and for eigenvalues with multiplicity greater than one, the corresponding upper-triangular block can be modified by right-multiplication with a unitary matrix while preserving triangularity.[2] For real matrices, a real Schur decomposition exists, where A = Q T Q^T with Q orthogonal and T upper quasi-triangular, featuring 1×1 blocks for real eigenvalues and 2×2 blocks for complex conjugate eigenvalue pairs.[3] When A is normal (satisfying A^* A = A A^*), the Schur form reduces to the spectral decomposition, with T diagonal.[3]
The Schur decomposition plays a central role in numerical linear algebra, enabling efficient eigenvalue computations via algorithms like the QR iteration, which typically requires O(n^3) arithmetic operations.[1] It facilitates the evaluation of matrix functions, solving Sylvester equations through methods like the Bartels–Stewart algorithm, and assessing matrix normality by measuring the norm of the strictly upper triangular part of T.[3] As a generalization of diagonalization that applies to all matrices, it underpins stability analysis in dynamical systems and serves as a building block for more advanced decompositions in applied mathematics and engineering.[2]
Background and Statement
Historical Context
The Schur decomposition theorem originated in the work of Issai Schur, a German mathematician whose contributions spanned representation theory, group theory, and linear algebra. In 1909, Schur published the seminal result in his paper "Über die charakteristischen Wurzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integralgleichungen," appearing in Mathematische Annalen. There, he established that any square matrix over the complex numbers admits a unitary similarity transformation to an upper triangular form, with the diagonal elements corresponding to the matrix's eigenvalues. This decomposition provided a fundamental tool for analyzing linear transformations while preserving the Euclidean norm through the unitary factor.
Schur's theorem built directly on foundational advances by his doctoral advisor, Ferdinand Georg Frobenius, who had explored matrix canonical forms in the late 19th century. Frobenius's 1878 paper "Über lineare Substitutionen und bilineare Formen" demonstrated that commuting families of matrices could be simultaneously triangularized via similarity transformations, laying groundwork for understanding joint spectral properties. Earlier influences included Arthur Cayley's 1858 Cayley-Hamilton theorem, which linked matrix powers to their characteristic polynomial and informed Schur's approach to eigenvalue placement on the diagonal. These developments occurred within the broader context of invariant theory, where figures like David Hilbert and Frobenius investigated algebraic invariants under linear group actions during the 1890s.
The theorem's emergence aligned with rapid progress in representation theory around 1900–1920, driven by efforts to classify finite group actions on vector spaces. Schur, motivated by problems in symmetric functions and determinant theory—areas central to his earlier work on group characters—sought a canonical form that facilitated computations of traces and other invariants. His 1909 result not only resolved key questions in linear algebra but also supported applications to integral equations, reflecting the era's growing interplay between finite-dimensional matrix theory and infinite-dimensional operator methods. This period marked a shift toward unitary transformations, enhancing numerical and theoretical stability over general similarities.
Theorem Statement
The Schur decomposition theorem concerns square matrices over the complex numbers. Consider an n \times n complex matrix A. A unitary matrix U satisfies U^* U = I_n, where U^* denotes the Hermitian transpose (conjugate transpose) of U, and I_n is the n \times n identity matrix. An upper triangular matrix T has all entries below the main diagonal equal to zero, and the diagonal entries of T are the eigenvalues of A.
The theorem asserts that for every such matrix A, there exists a unitary matrix U and an upper triangular matrix T such that
A = U T U^*,
where the diagonal entries of T are the eigenvalues of A (counted with algebraic multiplicity).[4]
To illustrate the form, consider a general $2 \times 2 complex matrix A. Its Schur decomposition is
A = U \begin{pmatrix} \lambda_1 & t_{12} \\ 0 & \lambda_2 \end{pmatrix} U^*,
where \lambda_1 and \lambda_2 are the eigenvalues of A, and t_{12} is some complex scalar.[4]
Proof and Properties
Proof Outline
The proof of the Schur decomposition theorem proceeds by mathematical induction on the dimension n of the square complex matrix A \in \mathbb{C}^{n \times n}.[5]
For the base case n=1, any $1 \times 1 matrix is already upper triangular, as it consists of a single scalar entry, which is trivially the eigenvalue of A.[5]
Assume the statement holds for all matrices of dimension n-1, where n \geq 2. By the fundamental theorem of algebra, the characteristic polynomial \det(A - \lambda I) of degree n has at least one root \lambda \in \mathbb{C}, so A possesses an eigenvalue \lambda with a corresponding nonzero eigenvector v. Normalizing v yields a unit eigenvector with \|v\|_2 = 1.[5]
To construct the unitary matrix, extend the set \{v\} to an orthonormal basis \{v, u_2, \dots, u_n\} for \mathbb{C}^n using the Gram-Schmidt process applied to v and any basis for its orthogonal complement. Let Q \in \mathbb{C}^{n \times n} be the unitary matrix whose columns are these basis vectors, so the first column is v.[5]
Consider the unitary similarity transformation T = Q^* A Q. The first column of T is Q^* A v = Q^* (\lambda v) = \lambda e_1, where e_1 is the first standard basis vector. The submatrix in rows 2 through n and column 1 is Q_1^* A v = \lambda Q_1^* v = 0, where Q_1 denotes the last n-1 columns of Q. Thus, T takes the block upper triangular form
T = \begin{pmatrix} \lambda & w^* \\ 0 & A' \end{pmatrix},
where w^* \in \mathbb{C}^{1 \times (n-1)} and A' = Q_1^* A Q_1 \in \mathbb{C}^{(n-1) \times (n-1)}.[5]
By the induction hypothesis, there exists a unitary matrix U' \in \mathbb{C}^{(n-1) \times (n-1)} such that U'^* A' U' = T', where T' is upper triangular. Define the block diagonal unitary matrix U'' = \operatorname{diag}(1, U'). Then the full transformation matrix U = Q U'' is unitary, and
U^* A U = \begin{pmatrix} \lambda & (w^* U') \\ 0 & T' \end{pmatrix},
which is upper triangular with \lambda and the diagonal entries of T' on its main diagonal. A key lemma ensures that the eigenvalues of A' are precisely the eigenvalues of A excluding \lambda, as unitary similarity transformations preserve eigenvalues (including multiplicities).[5]
This completes the induction, establishing the existence of a unitary U and upper triangular T such that A = U T U^*, with the diagonal of T comprising the eigenvalues of A.[5]
Uniqueness and Properties
The Schur decomposition of a square matrix A \in \mathbb{C}^{n \times n} is not unique, as there are multiple choices for the unitary matrix U and the upper triangular matrix T that satisfy A = U T U^H. The columns of U, known as Schur vectors, are not uniquely determined except for the first column corresponding to a specific eigenvalue; subsequent vectors depend on the selection process, leading to different possible T matrices even for the same eigenvalue ordering.[3][6]
However, the decomposition is unique up to permutation of the eigenvalues on the diagonal of T. The diagonal entries of T are precisely the eigenvalues of A, which can appear in any order, but their multiset is invariant. The superdiagonal elements of T, which capture interactions between eigenvalues, are not unique and can vary depending on the choice of Schur vectors.[7][6]
Key properties of the Schur decomposition include the preservation of the spectrum of A, as the eigenvalues lie on the diagonal of T. If A is normal (satisfying A^H A = A A^H), then T is diagonal, reducing the decomposition to a unitary diagonalization where the Schur vectors are eigenvectors. In contrast to the Jordan canonical form, which provides a block-diagonal structure revealing the full algebraic and geometric multiplicities via Jordan blocks, the Schur form achieves a weaker upper triangularization that does not necessarily expose the minimal polynomial or invariant subspaces as explicitly, though it maintains numerical advantages due to the unitary transformation.[3][7][6]
For practical purposes, eigenvalues are often ordered on the diagonal of T by decreasing magnitude or by argument to enhance numerical stability in subsequent computations, such as deflation or reordering.[7]
Numerical Computation
Algorithms for Computation
The computation of the Schur decomposition typically begins with a reduction of the general matrix to upper Hessenberg form using Householder reflections, which requires approximately \frac{10}{3}n^3 floating-point operations for an n \times n matrix and preserves the eigenvalues.[8] This step is followed by iterative QR iterations to triangularize the Hessenberg matrix, where each iteration involves a QR factorization of the current matrix followed by a reverse multiplication to form a similarity transformation, gradually driving subdiagonal elements toward zero.[8] The basic unshifted QR algorithm converges slowly, with an overall complexity of O(n^4) for dense matrices, but shifts accelerate convergence to O(n^3).[8]
The Francis QR step, an implicit variant of the QR iteration, enhances efficiency by avoiding explicit computation of the orthogonal factor Q through bulge chasing techniques.[8] In this method, a double-shift strategy is employed for real matrices to handle complex conjugate eigenvalue pairs without introducing complex arithmetic: shifts are based on the eigenvalues of the trailing 2x2 submatrix, and a bulge is introduced and chased down the subdiagonal using Householder reflectors or Givens rotations.[8] The QR iteration phase using double-shift Francis steps requires in total about \frac{20}{3}n^3 operations when computing only the Schur form (for the real case), enabling the full decomposition in O(n^3) time.[8]
Practical implementations are provided in numerical libraries such as LAPACK, where the driver routine xGEES computes the Schur decomposition of a general matrix A, returning an orthogonal/unitary matrix Z and a quasi-upper triangular matrix T such that A = Z T Z^T (real) or A = Z T Z^H (complex), optionally ordering selected eigenvalues on the diagonal of T.[9] The routine internally performs Hessenberg reduction via xGEHRD followed by QR iterations using xHSEQR, with parameters including the input matrix A, an optional eigenvalue selection function SELCTG, and outputs for the Schur form T, Schur vectors in Z, and workspace for computations.[9] Related expert routines like xGEESX include error bounds and balancing for improved conditioning.[9]
For illustration, the high-level steps of a basic QR iteration on a small Hessenberg matrix are as follows:
Input: Hessenberg matrix H ∈ ℂ^{n×n}, tolerance ε
Output: Approximate Schur form T ≈ upper triangular, unitary U such that H ≈ U T U*
1. Initialize U ← I_n, A ← H
2. While not converged (e.g., max |A_{i+1,i}| > ε for i=1 to n-1):
a. Compute QR factorization: A = Q R (using [Householder](/page/The_Householder) or Givens)
b. Update: A ← R Q
c. U ← U Q
3. Return T ← A, U
Input: Hessenberg matrix H ∈ ℂ^{n×n}, tolerance ε
Output: Approximate Schur form T ≈ upper triangular, unitary U such that H ≈ U T U*
1. Initialize U ← I_n, A ← H
2. While not converged (e.g., max |A_{i+1,i}| > ε for i=1 to n-1):
a. Compute QR factorization: A = Q R (using [Householder](/page/The_Householder) or Givens)
b. Update: A ← R Q
c. U ← U Q
3. Return T ← A, U
This pseudocode represents the unshifted process; in practice, shifts and the implicit Francis formulation are used for efficiency on larger matrices.[8]
Numerical Stability
The QR algorithm for computing the Schur decomposition is backward stable, meaning that the computed Schur form T and unitary matrix Q satisfy (A + E) Q = Q T, where the perturbation E satisfies \|E\|_2 \leq c n u \|A\|_2 for some modest constant c, dimension n, and unit roundoff u.[10] This bound ensures that the algorithm introduces only small relative perturbations on the order of machine epsilon times the matrix norm, preserving the essential numerical properties of the original matrix throughout the iterations.[10]
The sensitivity of the computed Schur vectors and eigenvalues to perturbations in A depends critically on the separation of the eigenvalues; clustered eigenvalues lead to a larger condition number, amplifying errors in the output.[10] For instance, the condition number for simple eigenvalues is given by \kappa(\lambda) = 1 / |y^H x|, where x and y are normalized right and left eigenvectors, but poor separation can cause subspace angles to satisfy \|\sin \Theta\|_F \leq 4 \|E_{21}\|_F / \sep(T_{11}, T_{22}), highlighting increased vulnerability when diagonal blocks are close.[10]
Rounding errors in floating-point arithmetic primarily affect the superdiagonal elements of the computed Schur form, with perturbations bounded by O(u \|A\|_2), though these can propagate if not mitigated.[10] Balancing the matrix prior to applying the QR algorithm—via a diagonal similarity transformation—scales the rows and columns to reduce the condition number \kappa_2(A) and equalize matrix norms, thereby minimizing the impact of these errors and improving overall accuracy for ill-conditioned problems.[10]
Post-2000 advancements in high-performance computing have incorporated block algorithms into the QR iteration, processing submatrices in parallel to enhance scalability on multicore and distributed systems while maintaining backward stability with error bounds analogous to the classical case, O(\varepsilon \|A\|).[11]
The triangular structure of the Schur form contributes to its superior numerical stability compared to direct computation of the Jordan form, which requires identifying sensitive Jordan chains and is prone to severe ill-conditioning even for modest multiple eigenvalues.[10]
Applications
Eigenvalue and Eigenvector Computation
The Schur decomposition of a square matrix A \in \mathbb{C}^{n \times n} expresses it as A = [Q](/page/Q) T Q^H, where Q is unitary and T is upper triangular, enabling direct extraction of eigenvalues from the diagonal entries of T. These diagonal elements \lambda_1, \lambda_2, \dots, \lambda_n are precisely the eigenvalues of A, listed in the order determined by the decomposition algorithm, with multiplicities preserved. This approach ensures all eigenvalues are computed simultaneously and reliably, without requiring prior knowledge of their locations.[12]
For eigenvector computation, the columns of Q, known as Schur vectors, provide an orthonormal basis for the invariant subspaces associated with the eigenvalues. When A is diagonalizable (e.g., all eigenvalues distinct or sufficient geometric multiplicity), the Schur vectors corresponding to each diagonal entry of T are the eigenvectors of A. In defective cases, where the algebraic multiplicity exceeds the geometric multiplicity, the Schur vectors span the generalized eigenspace; the true eigenvectors form a basis within this subspace, while additional Schur vectors serve as generalized eigenvectors satisfying chains like (A - \lambda I) v_{k} = v_{k-1}. This structure allows reconstruction of the full eigensystem by solving within the deflated blocks of T.[12]
Deflation exploits zeros on the superdiagonal of T to identify invariant subspaces, decoupling the problem into smaller independent subproblems. A zero superdiagonal entry at position (i, i+1) indicates that the leading i \times i principal submatrix of T (and corresponding columns of Q) corresponds to an invariant subspace, allowing the eigenvalues and vectors for that block to be isolated and the remaining matrix reduced accordingly. This process, often performed incrementally during computation, enhances efficiency by avoiding redundant iterations on converged portions.[12]
Compared to iterative methods like the power method, which converge to dominant eigenvalues via repeated matrix-vector multiplications and may require separate runs for each eigenvalue, the Schur decomposition offers global convergence for dense matrices, computing the entire spectrum in a single process with guaranteed numerical stability under unitary transformations. The power method excels for extreme eigenvalues in sparse settings but struggles with interior or clustered eigenvalues, whereas Schur-based algorithms (e.g., via QR iterations) handle dense cases robustly, often converging in O(n^3) time with high reliability.[12]
Consider a non-diagonalizable 2×2 matrix A = \begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix}, which has a repeated eigenvalue \lambda = 3 with algebraic multiplicity 2 but geometric multiplicity 1. Its Schur decomposition is A = Q T Q^H with T = A (already upper triangular) and Q = I, so the diagonal yields \lambda = 3 twice. The first column of Q is the eigenvector v_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, while the second Schur vector v_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix} is a generalized eigenvector satisfying (A - 3I) v_2 = v_1. Deflation on the full block confirms the single invariant subspace \mathbb{C}^2.[12]
Matrix Exponentiation and Functions
The Schur decomposition A = Q T Q^*, where Q is unitary and T is upper triangular, enables the computation of a matrix function f(A) via the relation f(A) = Q f(T) Q^*. The triangular structure of T simplifies the evaluation of f(T), as its entries can be determined recursively using forward substitution or by solving small Sylvester equations for off-diagonal blocks in the Schur-Parlett algorithm.[10]
For the matrix exponential \exp(A), the Schur form allows approximation of \exp(T) through power series truncation or Padé approximants applied directly to the triangular matrix, with the scaling-and-squaring technique reducing computational cost while maintaining accuracy. This approach requires approximately $2(q + j + 1/3) n^3 flops for a Padé approximant of type [q/j], where n is the matrix dimension.[10][13]
A key advantage of this method is that it circumvents the need for complete diagonalization, which fails for non-diagonalizable matrices; instead, the quasi-triangular form of T (with 2×2 blocks for complex conjugate eigenvalue pairs) accommodates defective eigenspaces while preserving numerical stability.[10] For the matrix logarithm \log(A), particularly when A is positive definite and Hermitian (yielding a diagonal T), the computation simplifies to \log(A) = Q \log(D) Q^*, where D is the diagonal of T and logarithms are taken element-wise; in general cases, inverse scaling and squaring or the Denman-Beavers iteration is applied to T.[14]
In control theory, the Schur decomposition supports stability analysis by facilitating solutions to Lyapunov equations A X + X A^* + W = 0, where the Bartels-Stewart algorithm transforms the equation into a block-triangular system solvable via back-substitution in O(n^3) operations, enabling efficient assessment of system controllability and quadratic performance metrics.[15]
As an illustrative example, consider the 2×2 Jordan block J = \begin{pmatrix} \lambda & 1 \\ 0 & \lambda \end{pmatrix}, which coincides with its Schur form T = J. The exponential is given by
\exp(J) = e^{\lambda} \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},
obtained by exponentiating the equal diagonal entries and determining the superdiagonal via the nilpotent part's series expansion \exp(N) = I + N for the unit upper shift N = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}.[13]
Generalizations
For real matrices, the Schur decomposition is adapted to maintain real arithmetic throughout the process, resulting in the real Schur form. Specifically, any real square matrix A \in \mathbb{R}^{n \times n} can be decomposed as A = Q T Q^T, where Q is a real orthogonal matrix (Q^T Q = I) and T is a real block upper triangular matrix, known as quasi-triangular. The diagonal blocks of T are either 1×1 (corresponding to real eigenvalues) or 2×2 (corresponding to pairs of complex conjugate eigenvalues).[7][3]
The construction of the real Schur form handles complex conjugate eigenvalue pairs by incorporating 2×2 rotation blocks on the diagonal of T. These blocks are of the form \begin{pmatrix} \alpha & \beta \\ -\beta & \alpha \end{pmatrix}, where \alpha, \beta \in \mathbb{R} and \beta \neq 0, ensuring the eigenvalues \alpha \pm i \beta are the roots of the characteristic polynomial of the block. The off-diagonal elements above these blocks maintain the upper triangular structure, while the entire factorization preserves the eigenvalues of A in the diagonal blocks.[7]
Key properties of the real Schur form include its preservation of the real nature of the input matrix, avoiding any complex numbers in the decomposition. The 1×1 blocks directly give real eigenvalues, and the 2×2 blocks encapsulate complex conjugate pairs without explicit complex entries. This form embeds naturally into the complex Schur decomposition by viewing the real orthogonal Q as unitary over the complexes and the quasi-triangular T as upper triangular in the complex sense.[3][7]
In numerical software, the real Schur form is preferred for real input matrices to avoid complex arithmetic, which simplifies implementation and reduces floating-point operations. For instance, the QR algorithm is used to compute this form, producing real outputs even when complex eigenvalues are present.[16][17]
Generalized Schur Decomposition
The generalized Schur decomposition extends the classical Schur decomposition to the generalized eigenvalue problem involving a pair of square matrices A and B \in \mathbb{C}^{n \times n}, where B is invertible.[18] It seeks unitary matrices Q and Z such that
Q^* A Z = S, \quad Q^* B Z = T,
where S and T are both upper triangular.[18] The diagonal entries of this decomposition yield the generalized eigenvalues \lambda_i = S_{ii} / T_{ii} for i = 1, \dots, n, which are the roots of the equation \det(A - \lambda B) = 0.[18]
This decomposition is computed using the QZ algorithm, a double-sided orthogonal transformation method analogous to the QR algorithm for the standard eigenvalue problem.[18] Introduced by Moler and Stewart in 1973, the QZ algorithm proceeds iteratively by applying generalized Givens rotations from both the left and right to progressively triangularize the pencil (A, B), handling cases where B may be singular through preliminary reductions.[18] The algorithm's convergence relies on shifts based on the Rayleigh quotients, ensuring quadratic convergence for non-defective pencils under suitable ordering of eigenvalues.[18]
Regarding uniqueness, the generalized Schur form is unique up to permutations of the diagonal blocks and unitary similarities within blocks corresponding to clusters of equal generalized eigenvalues, mirroring the properties of the standard Schur decomposition.[18] This non-uniqueness arises from the freedom in ordering the eigenvalues and selecting bases for degenerate eigenspaces, but the triangular structure and diagonal ratios remain invariant under such transformations.[18]
Applications of the generalized Schur decomposition include solving the generalized eigenvalue problem in control systems, where matrix pencils model descriptor systems of the form E \dot{x} = A x + B u with E possibly singular, facilitating pole placement and stability analysis.[19] It also underpins computations of transmission zeros and residue derivatives in linear control theory, providing a robust framework for system identification and design.[19]