Fact-checked by Grok 2 weeks ago

Matrix exponential

The matrix exponential of a square matrix A, denoted \exp(A) or e^A, is defined by the power series \sum_{n=0}^\infty \frac{A^n}{n!}, which converges absolutely to an for every square matrix A with entries in the real or numbers. This construction extends the familiar scalar e^z = \sum_{n=0}^\infty \frac{z^n}{n!} to the non-commutative setting of matrices, preserving key algebraic properties such as \exp(0) = I (the ) and \exp(A)^{-1} = \exp(-A). A fundamental property is that if two matrices A and B commute (i.e., AB = BA), then \exp(A + B) = \exp(A) \exp(B); however, this fails in general due to the non-commutativity of matrix multiplication. Computationally, the matrix exponential can be evaluated using the eigenvalues and eigenvectors of A, where if A = PDP^{-1} with D diagonal, then \exp(A) = P \exp(D) P^{-1}, and \exp(D) is the diagonal matrix of scalar exponentials of the eigenvalues. The concept originated in the late 19th century, introduced by mathematicians James Joseph Sylvester, Edmond Laguerre, and Giuseppe Peano as part of early developments in matrix theory. In , the matrix exponential is essential for solving linear systems of ordinary differential equations of the form \dot{x} = Ax, where the unique satisfying x(0) = x_0 is x(t) = \exp(tA) x_0. More broadly, it serves as the in Lie group theory, mapping elements of the ( at the identity) to the itself for matrix Lie groups like GL(n, \mathbb{C}) or SO(n), facilitating the study of continuous symmetries and group structures. Notable applications include for system stability analysis, numerical methods for time-dependent simulations, modeling in probability, and in physics. In , it is used for operators.

Definition and Basic Properties

Power series definition

The matrix exponential of an n \times n complex matrix A is defined via the power series expansion \exp(A) = \sum_{k=0}^{\infty} \frac{A^k}{k!}, where A^0 = I is the identity matrix and A^k = A \cdot A^{k-1} for k \geq 1. This definition extends the familiar Taylor series for the scalar exponential function e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!}, which converges for all complex scalars x. For matrices, the series is interpreted entrywise, with each entry of \exp(A) given by the corresponding power series in the entries of A. The power series for \exp(A) converges absolutely to a well-defined matrix for every finite-dimensional square matrix A over the complex numbers \mathbb{C} or real numbers \mathbb{R}. Absolute convergence follows from bounding the matrix norms: if \|A\| denotes a submultiplicative matrix norm, then \|A^k / k!\| \leq \|A\|^k / k!, and the scalar series \sum \|A\|^k / k! converges for all finite \|A\|, implying term-by-term convergence in any finite dimension. This holds uniformly on compact sets in the space of matrices, ensuring the limit is independent of the chosen norm. The same argument applies over the reals, as real matrices embed into complex ones. When n=1, the matrix A reduces to a scalar, and \exp(A) coincides exactly with the scalar e^a, confirming that the matrix definition generalizes the classical case without alteration. This scalar limit underscores the power series as a natural and consistent extension. The power series definition for matrices was introduced in the late 19th century by , Edmond Laguerre, and . developed the related in the context of continuous transformation groups, now known as Lie groups, where it connects Lie algebras to groups. It gained significant application in the 1930s through John von Neumann's work in , particularly for time evolution operators in the .

Elementary properties

The matrix exponential function satisfies several fundamental algebraic properties that parallel those of the scalar exponential, holding for square matrices over the real or complex numbers. The exponential of the zero matrix is the identity matrix: \exp(\mathbf{0}) = \mathbf{I}. This result follows directly from substituting \mathbf{A} = \mathbf{0} into the power series definition, where all terms \mathbf{A}^k/\,k! for k \geq 1 vanish, leaving only the \mathbf{I} term. For any real \mathbf{A}, the transpose of the matrix exponential equals the exponential of the transpose: \exp(\mathbf{A})^\top = \exp(\mathbf{A}^\top). To verify this, that the transpose distributes over matrix addition and multiplication, so (\mathbf{A}^k)^\top = (\mathbf{A}^\top)^k for each k. Applying the transpose to the power series for \exp(\mathbf{A}) term by term yields \exp(\mathbf{A})^\top = \sum_{k=0}^\infty \frac{(\mathbf{A}^k)^\top}{k!} = \sum_{k=0}^\infty \frac{(\mathbf{A}^\top)^k}{k!} = \exp(\mathbf{A}^\top). This property extends analogously to the conjugate transpose for complex matrices. The matrix exponential is always invertible, with the inverse given explicitly by \exp(\mathbf{A})^{-1} = \exp(-\mathbf{A}). This holds because \mathbf{A} and -\mathbf{A} always commute (since [\mathbf{A}, -\mathbf{A}] = -\mathbf{A}\mathbf{A} + \mathbf{A}\mathbf{A} = \mathbf{0}), and thus \exp(\mathbf{A} + (-\mathbf{A})) = \exp(\mathbf{A})\exp(-\mathbf{A}). The left side simplifies to \exp(\mathbf{0}) = \mathbf{I}, confirming the inverse relationship. For any scalar c \in \mathbb{R} and square matrix \mathbf{A}, the scaling property states that \exp(c\mathbf{A}) = [\exp(\mathbf{A})]^c, where the right side denotes the c-th power of the matrix \exp(\mathbf{A}), defined via the power series or functional calculus for non-integer c. This equality arises from the power series: the left side is \sum_{k=0}^\infty (c\mathbf{A})^k / k! = \sum_{k=0}^\infty c^k \mathbf{A}^k / k!, while the right side, through the relation \log(\exp(\mathbf{A})) = \mathbf{A} (for matrices where the logarithm is well-defined, or more generally via analytic continuation), yields \exp(c \log(\exp(\mathbf{A})) ) = \exp(c\mathbf{A}). For integer c, it follows by repeated multiplication using the inverse property. A central algebraic identity is that if two square matrices \mathbf{A} and \mathbf{B} commute, meaning [\mathbf{A}, \mathbf{B}] = \mathbf{A}\mathbf{B} - \mathbf{B}\mathbf{A} = \mathbf{0}, then \exp(\mathbf{A} + \mathbf{B}) = \exp(\mathbf{A}) \exp(\mathbf{B}). To prove this, expand both sides using the power series definition. The right side is \exp(\mathbf{A}) \exp(\mathbf{B}) = \left( \sum_{m=0}^\infty \frac{\mathbf{A}^m}{m!} \right) \left( \sum_{n=0}^\infty \frac{\mathbf{B}^n}{n!} \right) = \sum_{m=0}^\infty \sum_{n=0}^\infty \frac{\mathbf{A}^m \mathbf{B}^n}{m! \, n!}. Since \mathbf{A} and \mathbf{B} commute, \mathbf{A}^m \mathbf{B}^n = \mathbf{B}^n \mathbf{A}^m for all m, n, allowing arbitrary rearrangement of powers in products. Now consider the left side: \exp(\mathbf{A} + \mathbf{B}) = \sum_{k=0}^\infty \frac{(\mathbf{A} + \mathbf{B})^k}{k!}. The binomial theorem gives (\mathbf{A} + \mathbf{B})^k = \sum_{j=0}^k \binom{k}{j} \mathbf{A}^j \mathbf{B}^{k-j}, where the terms commute because \mathbf{A} and \mathbf{B} do. Substituting yields \exp(\mathbf{A} + \mathbf{B}) = \sum_{k=0}^\infty \frac{1}{k!} \sum_{j=0}^k \binom{k}{j} \mathbf{A}^j \mathbf{B}^{k-j} = \sum_{k=0}^\infty \sum_{j=0}^k \frac{\mathbf{A}^j \mathbf{B}^{k-j}}{j! \, (k-j)!}. Reindex the double sum by letting m = j and n = k - j, so as k runs from 0 to \infty and j from 0 to k, m and n run independently over all nonnegative integers. This gives exactly \sum_{m=0}^\infty \sum_{n=0}^\infty \frac{\mathbf{A}^m \mathbf{B}^n}{m! \, n!} = \exp(\mathbf{A}) \exp(\mathbf{B}). Thus, the identity holds under the commutativity assumption. This property is essential for many applications, such as solving linear systems of equations where coefficient matrices commute.

Determinant properties

One fundamental property of the matrix exponential is that for any A \in \mathbb{C}^{n \times n}, the satisfies \det(e^A) = e^{\operatorname{tr}(A)}. This identity links the determinant, a multiplicative functional on matrices, to the trace, an additive functional, through the exponential map. To establish this for the case where A is diagonalizable, suppose A = PDP^{-1} with D = \operatorname{diag}(\lambda_1, \dots, \lambda_n), where \lambda_i are the eigenvalues of A. Then e^A = P e^D P^{-1}, where e^D = \operatorname{diag}(e^{\lambda_1}, \dots, e^{\lambda_n}). The determinant is invariant under similarity transformations, so \det(e^A) = \det(e^D) = \prod_{i=1}^n e^{\lambda_i} = e^{\sum_{i=1}^n \lambda_i}. Since the trace \operatorname{tr}(A) = \sum_{i=1}^n \lambda_i, it follows that \det(e^A) = e^{\operatorname{tr}(A)}. The result extends to all square matrices, including non-diagonalizable ones, as the identity holds continuously by of diagonalizable matrices in the of all matrices over \mathbb{C}, with the exponential, determinant, and being continuous functions. For real matrices A \in \mathbb{R}^{n \times n}, the identity \det(e^A) = e^{\operatorname{tr}(A)} holds exactly, as the trace is real and the eigenvalues of A (possibly ) occur in conjugate pairs, ensuring the product \prod e^{\lambda_i} is real and positive, matching the positive real value e^{\operatorname{tr}(A)}.

Properties for Special Matrix Classes

Symmetric and Hermitian matrices

Real symmetric matrices possess real eigenvalues and can be orthogonally diagonalized as A = Q D Q^T, where Q is an and D is a with the eigenvalues \lambda_i on the diagonal. The matrix exponential then satisfies \exp(A) = Q \exp(D) Q^T, where \exp(D) = \operatorname{diag}(\exp(\lambda_1), \dots, \exp(\lambda_n)). Consequently, \exp(A) is also real symmetric, as it inherits the from the orthogonal . Moreover, if A is positive definite, then all \lambda_i > 0, ensuring \exp(A) is positive definite with eigenvalues \exp(\lambda_i) > 1. For complex Hermitian matrices, the situation is analogous but uses unitary diagonalization: A = U D U^*, where U is unitary and D is diagonal with real eigenvalues \lambda_i. The exponential is \exp(A) = U \exp(D) U^*, which is Hermitian since U^* = U^{-1}. The eigenvalues of \exp(A) are \exp(\lambda_i), which are , confirming that \exp(A) is always positive definite for any Hermitian A. In the real case, where Hermitian reduces to symmetric, this aligns with the result. Positive definite matrices, being a special case of Hermitian (or symmetric) matrices with positive eigenvalues, admit a unique Hermitian (or symmetric) matrix logarithm. Specifically, if B is positive definite, there exists a unique Hermitian matrix L such that \exp(L) = B. This uniqueness follows from the bijective mapping of the exponential on the space of Hermitian matrices onto the positive definite cone.

Inequalities for Hermitian exponentials

For Hermitian matrices A and B, the Golden-Thompson inequality provides an upper bound on the trace of the exponential of their sum in terms of the trace of the product of their individual exponentials: \operatorname{tr}(e^{A + B}) \le \operatorname{tr}(e^{A} e^{B}). This inequality, independently established by Golden and Thompson, highlights the non-commutative nature of matrix exponentials and implies that the trace of the exponential is a convex function on the space of Hermitian matrices. The Araki-Lieb-Thirring inequality extends such trace bounds to more general settings involving powers of positive semidefinite operators, which can be applied to exponentials of Hermitian matrices by substitution (e.g., setting operators to forms like e^{X/2}). Specifically, for positive semidefinite matrices A, B \geq 0, q \geq 0, and $0 \leq r \leq 1, \operatorname{tr}[(A^r B^r A^r)^q] \leq \operatorname{tr}[(A B A)^{r q}], with the inequality reversing for r \geq 1. This provides refined controls on sums of exponentials through log-majorization and operator monotonicity properties, useful for bounding \operatorname{tr} e^{A + B} in non-commuting cases. A fundamental norm inequality for the matrix exponential of a Hermitian matrix A is \|e^A\| \leq e^{\|A\|}, where \|\cdot\| denotes the operator norm (spectral norm). Equality holds when A is a scalar multiple of the identity, and the bound arises from the power series expansion of the exponential, which is submultiplicative with respect to consistent matrix norms, or equivalently from the spectral radius formula for Hermitian matrices. These inequalities find key applications in quantum statistical mechanics, where the partition function \operatorname{tr} e^{-\beta H} for a Hamiltonian H (a Hermitian operator) requires bounding traces of exponentials to estimate free energies and thermodynamic potentials, often via variational principles like the Gibbs-Bogoliubov inequality.

Exponentials of Matrix Sums

Lie product formula

The , also known as the Trotter product formula, expresses the matrix exponential of a A + B as a of products of individual exponentials, providing an even when A and B do not commute. For square matrices A and B over \mathbb{C} and scalar t \in \mathbb{R}, the formula states \exp(t(A + B)) = \lim_{n \to \infty} \left[ \exp\left( \frac{t A}{n} \right) \exp\left( \frac{t B}{n} \right) \right]^n, where holds in the . If [A, B] = 0, the equality is exact for all n \geq 1. This result, originally due to for the finite-dimensional matrix case, underpins approximations in theory and . In the more general setting of strongly continuous contraction semigroups on a , the Trotter product formula extends the approximation to operator sums. For generators A and B generating \{e^{tA}\}_{t \geq 0} and \{e^{tB}\}_{t \geq 0}, the \{e^{t(A+B)}\}_{t \geq 0} satisfies e^{t(A+B)} = \lim_{n \to \infty} \left( e^{tA/n} e^{tB/n} \right)^n = \lim_{n \to \infty} \left( e^{tB/n} e^{tA/n} \right)^n =: \mathrm{s-lim}, with convergence under suitable conditions on the domains. The Kato approximation refines this for pairs of contraction , ensuring the limit holds when A + B generates a , by incorporating consistent approximations to the resolvents. A proof sketch for the matrix case proceeds via the Baker-Campbell-Hausdorff formula applied to the logarithm of the product term. Specifically, \log \left( \exp(tA/n) \exp(tB/n) \right) = t(A + B)/n + O(1/n^2) from the BCH expansion, since higher-order terms vanish in the limit as n \to \infty; exponentiating and to the nth power then yields the result. Alternatively, Duhamel's principle can be used by interpreting \exp(t(A+B)) as the solution to Y' = (A+B)Y with Y(0) = I, where the product approximates the Picard iteration for the . For finite n, the approximation incurs an error of order O(t^2 / n) in the operator norm, arising from the leading commutator term [A, B] in the BCH expansion. More precisely, for the first-order Trotter step of size \tau = t/n, \left\| \exp(\tau (A + B)) - \exp(\tau A) \exp(\tau B) \right\| \leq \frac{\tau^2}{2} \left( \| [A, B] \| e^{\tau (\|A\| + \|B\|)} + O(\tau) \right), with the total error over n steps bounded similarly, enabling controlled approximations in applications like quantum simulation. Higher-order variants, such as , reduce this to O(t^3 / n^2), but the basic formula suffices for the limit.

Baker-Campbell-Hausdorff formula

The Baker-Campbell-Hausdorff (BCH) formula provides an explicit expression in a \mathfrak{g} for the Lie algebra element corresponding to the product of two elements in the associated G, via the \exp: \mathfrak{g} \to G. For A, B \in \mathfrak{g}, \log(\exp(A) \exp(B)) = A + B + \frac{1}{2}[A, B] + \sum_{k=3}^\infty C_k(A, B), where [A, B] = AB - BA denotes the Lie bracket (), and the higher-order terms C_k(A, B) are homogeneous polynomials of degree k formed from nested commutators of A and B, with rational coefficients c_{i_1,\dots,i_m} determined recursively by the relation C_k(A, B) = \sum \frac{c_{i_1,\dots,i_m}}{i_1! \cdots i_m!} [A, B]^{i_1} \cdots [A, [A, B]]^{i_m}, summed over multi-indices with i_1 + 2i_2 + \cdots + m i_m = k. This infinite series lies entirely within \mathfrak{g}, establishing a formal logarithm for products near the identity in G. The BCH formula originated in the late 19th and early 20th centuries through independent contributions addressing the structure of Lie groups and algebras. John Edward Campbell introduced foundational ideas in 1897 while studying canonical forms for systems of linear differential equations, showing that solutions could be expressed via exponentials with corrections involving commutators. Henry Frederick Baker extended this in 1902, applying it to hypercomplex numbers and finite groups to derive series expansions for products of exponentials. Felix Hausdorff provided the first complete and rigorous proof in 1906, demonstrating convergence in associative algebras and clarifying the role of nested commutators. The series converges absolutely in a neighborhood of the origin in \mathfrak{g}, specifically when A and B lie in a ball of radius \log 2 with respect to a submultiplicative norm on \mathfrak{g} such that \|\exp(X) - I\| \leq e^{\|X\|} - 1 for X \in \mathfrak{g}. This ensures the arguments remain within the domain where the matrix logarithm is well-defined and analytic. For practical computations, especially in numerical simulations of Lie group flows, finite truncations of the BCH series are employed, often up to order 4 or 6 depending on the required accuracy. One prominent truncation is the Zassenhaus formula, which inverts the BCH relation to express \exp(A + B) as an infinite product \exp(A + B) = \prod_{k=0}^\infty \exp(D_k(A, B)), where D_0(A, B) = A + B and the D_k for k \geq 1 are nested commutator terms starting with D_1(A, B) = \frac{1}{2}[A, B], allowing disentanglement into ordered exponentials useful for quantum simulations and Lie group integrators. The Zassenhaus formula, introduced by Hans Zassenhaus in 1935 for determining structure constants of finite groups, converges under similar small-norm conditions as the BCH series.

The Exponential Map

General properties

The matrix exponential induces the exponential map \exp: \mathfrak{gl}(n,\mathbb{C}) \to \mathrm{GL}(n,\mathbb{C}), where \mathfrak{gl}(n,\mathbb{C}) denotes the of all n \times n complex matrices under the bracket, and \mathrm{GL}(n,\mathbb{C}) is the of invertible n \times n complex matrices under multiplication. This map associates to each element A \in \mathfrak{gl}(n,\mathbb{C}) the group element \exp(A) = \sum_{k=0}^\infty \frac{A^k}{k!}, providing a from the additive structure of the Lie algebra to the multiplicative structure of the Lie group. Since \mathrm{GL}(n,\mathbb{C}) is connected, the is surjective, meaning every element of \mathrm{GL}(n,\mathbb{C}) lies in the image of \exp. The is a smooth morphism of Lie groups, and it serves as a in a neighborhood of the in \mathfrak{gl}(n,\mathbb{C}). Specifically, the d\exp_0: T_0 \mathfrak{gl}(n,\mathbb{C}) \to T_I \mathrm{GL}(n,\mathbb{C}) is the identity isomorphism, where I is the identity matrix, ensuring that \exp preserves the tangent space structure at the identity. This property implies that \exp is bijective and smooth (with smooth inverse) on a sufficiently small open ball around 0 in the Lie algebra, facilitating the identification of the Lie algebra with the tangent space at the identity. In matrix Lie groups like \mathrm{GL}(n,\mathbb{C}), the exhibits covering properties locally near the identity, acting as a universal covering map onto its image within the injectivity radius—the maximal \rho > 0 such that \exp is a from the ball of radius \rho in \mathfrak{gl}(n,\mathbb{C}) (endowed with a suitable ) to an open neighborhood of I in \mathrm{GL}(n,\mathbb{C}). Globally, however, \exp is neither injective nor a covering map due to the non-simply connected topology of \mathrm{GL}(n,\mathbb{C}) for n \geq 2 and the presence of non-trivial classes. The injectivity radius depends on the chosen on the but is positive and finite, bounding the region where distinct algebra elements map to distinct group elements without branching. The curves \gamma(t) = \exp(tA) for A \in \mathfrak{gl}(n,\mathbb{C}) and t \in \mathbb{R} form one-parameter subgroups of \mathrm{GL}(n,\mathbb{C}), which are integral curves of left-invariant vector fields generated by A. When \mathrm{GL}(n,\mathbb{C}) is equipped with a left-invariant Riemannian , these one-parameter subgroups coincide with the emanating from the , tracing minimal paths in the group manifold. This geodesic interpretation underscores the exponential map's role in connecting the algebraic structure of the to the geometric structure of the .

Directional derivatives on Hermitian matrices

The directional derivative of the matrix exponential at a Hermitian matrix H in the direction of a skew-Hermitian matrix K is captured by the Fréchet derivative D\exp_H(K), which quantifies the first-order sensitivity of \exp(H) to perturbations along K. This specialization is relevant to the geometry of the unitary group, as Hermitian matrices parameterize the Lie algebra via multiplication by i, mapping to unitaries through the exponential. The explicit integral representation is D\exp_H(K) = \exp(H) \int_0^1 \exp(-s H) K \exp(s H) \, ds. When H is Hermitian, \exp(H) is positive definite Hermitian, and the integrand involves the adjoint action of \exp(s H) on K, preserving the skew-Hermitian structure of the transformed K. This formula can be derived by differentiating the power series expansion of \exp(H + t K) with respect to t at t = 0. The resulting expression is \sum_{n=1}^\infty \frac{1}{n!} \sum_{j=0}^{n-1} H^j K H^{n-1-j}, a double sum that simplifies to the integral form through summation techniques for non-commuting matrix powers, such as recognizing it as a discretized version of the continuous adjoint evolution. Alternatively, applying Duhamel's principle to the perturbed ODE Y'(t) = (H + t K) Y(t) with Y(0) = I yields the variation \delta Y(1) = \int_0^1 \exp((1-s) H) K \exp(s H) \, ds, which rewrites to the given form by left-multiplication with \exp(H) and variable substitution. A key simplification occurs when [H, K] = 0, in which case \exp(-s H) K \exp(s H) = K for all s, reducing the integral to K and thus D\exp_H(K) = \exp(H) K. This commuting case facilitates exact computations in models where the perturbation aligns with the unperturbed operator's eigenspaces. In quantum mechanics, this directional derivative underpins perturbation theory for unitary evolution operators generated by perturbed Hamiltonians. For a base Hermitian Hamiltonian H_0 with skew-Hermitian perturbation direction K (arising in contexts like infinitesimal gauge transformations or effective non-Hermitian shifts in open systems), the first-order expansion of \exp(-i (H_0 + \epsilon K)) uses the formula to approximate state evolution and transition amplitudes, avoiding costly full exponentiation. This approach is central to the Magnus expansion, which logarithmizes the time-ordered exponential for numerical integration of Schrödinger equations with small perturbations.

Applications to Linear Differential Equations

Homogeneous systems

The matrix exponential arises naturally as the solution operator for the homogeneous linear system of ordinary differential equations \dot{\mathbf{x}}(t) = A \mathbf{x}(t), where A is an n \times n constant matrix and \mathbf{x}(t) \in \mathbb{R}^n. Given an initial condition \mathbf{x}(0) = \mathbf{x}_0, the unique solution is \mathbf{x}(t) = e^{tA} \mathbf{x}_0. To verify this, consider the power series definition of the matrix exponential, e^{tA} = \sum_{k=0}^{\infty} \frac{(tA)^k}{k!}, which converges absolutely for all t \in \mathbb{R}. Differentiating term by term with respect to t gives \frac{d}{dt} e^{tA} = \sum_{k=1}^{\infty} \frac{k t^{k-1} A^k}{k!} = \sum_{k=1}^{\infty} \frac{t^{k-1} A^k}{(k-1)!} = A \sum_{k=0}^{\infty} \frac{t^k A^k}{k!} = A e^{tA}. Thus, \mathbf{x}(t) = e^{tA} \mathbf{x}_0 satisfies \dot{\mathbf{x}}(t) = A \mathbf{x}(t). Moreover, evaluating at t=0 yields e^{0 \cdot A} = I, the identity matrix, so \mathbf{x}(0) = I \mathbf{x}_0 = \mathbf{x}_0, confirming the initial condition. The uniqueness of this solution follows from the applied to the system, as the right-hand side f(\mathbf{x}) = A \mathbf{x} is continuously differentiable (hence locally continuous) on \mathbb{R}^n. The theorem guarantees a unique local solution extending to all t \in \mathbb{R} for linear systems, since no finite escape time occurs. In the context of dynamical systems, e^{tA} serves as the time-t evolution operator, or , \Phi(t, \mathbf{x}_0) = e^{tA} \mathbf{x}_0, which propagates initial states along integral curves of the linear \mathbf{x} \mapsto A \mathbf{x}. This interpretation highlights its role in describing the complete phase flow of the system.

Homogeneous example

Consider the damped harmonic oscillator governed by the second-order x''(t) + 3x'(t) + 2x(t) = 0, where the coefficients correspond to a unit mass, damping constant 3, and spring constant 2./Book%3A_University_Physics_I_-Mechanics_Sound_Oscillations_and_Waves(OpenStax)/15%3A_Oscillations/15.06%3A_Damped_Oscillations) This equation models an overdamped system, where the damping exceeds the critical value, leading to without . To solve this using the matrix exponential, rewrite the equation as a first-order system. Let \mathbf{y}(t) = \begin{pmatrix} x(t) \\ x'(t) \end{pmatrix}, so \mathbf{y}'(t) = A \mathbf{y}(t) with A = \begin{pmatrix} 0 & 1 \\ -2 & -3 \end{pmatrix}. The general solution is \mathbf{y}(t) = e^{tA} \mathbf{y}(0)./3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials) The matrix A has eigenvalues \lambda_1 = -1 and \lambda_2 = -2, found from the characteristic equation \det(A - \lambda I) = \lambda^2 + 3\lambda + 2 = 0. Since the eigenvalues are distinct and real, A is diagonalizable: A = P D P^{-1} where D = \operatorname{diag}(-1, -2), P = \begin{pmatrix} 1 & 1 \\ -1 & -2 \end{pmatrix}, and P^{-1} = \begin{pmatrix} 2 & 1 \\ -1 & -1 \end{pmatrix}. Thus, e^{tA} = P e^{tD} P^{-1} = \begin{pmatrix} 1 & 1 \\ -1 & -2 \end{pmatrix} \begin{pmatrix} e^{-t} & 0 \\ 0 & e^{-2t} \end{pmatrix} \begin{pmatrix} 2 & 1 \\ -1 & -1 \end{pmatrix} = \begin{pmatrix} 2e^{-t} - e^{-2t} & e^{-t} - e^{-2t} \\ -2e^{-t} + 2e^{-2t} & -e^{-t} + 2e^{-2t} \end{pmatrix}. /3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials) To verify, consider initial conditions x(0) = 1, x'(0) = 0, so \mathbf{y}(0) = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. Then \mathbf{y}(t) = e^{tA} \mathbf{y}(0) = \begin{pmatrix} 2e^{-t} - e^{-2t} \\ -2e^{-t} + 2e^{-2t} \end{pmatrix}, and x(t) = 2e^{-t} - e^{-2t}. Differentiating gives x'(t) = -2e^{-t} + 2e^{-2t} and x''(t) = 2e^{-t} - 4e^{-2t}. Substituting into the original ODE yields x''(t) + 3x'(t) + 2x(t) = (2e^{-t} - 4e^{-2t}) + 3(-2e^{-t} + 2e^{-2t}) + 2(2e^{-t} - e^{-2t}) = 0, confirming the solution satisfies the equation. The initial conditions hold: x(0) = 2 - 1 = 1 and x'(0) = -2 + 2 = 0./3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials) The eigenvalues \lambda_1 = -1 < 0 and \lambda_2 = -2 < 0 imply that all solutions exponentially to zero as t \to \infty, indicating asymptotic of the at the origin./3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials)

Inhomogeneous systems

The solution to the inhomogeneous of equations \mathbf{x}'(t) = A \mathbf{x}(t) + \mathbf{f}(t) with \mathbf{x}(0) = \mathbf{x}_0, where A is a constant n \times n and \mathbf{f}(t) is a given forcing function, combines the homogeneous solution with a particular term. The full solution is given by \mathbf{x}(t) = e^{tA} \mathbf{x}_0 + \int_0^t e^{(t-s)A} \mathbf{f}(s) \, ds, where e^{tA} is the exponential. This formula, known as Duhamel's principle in the context of s, expresses the particular solution as the convolution of the forcing term with the fundamental solution e^{(t-s)A}. To derive this using the variation of constants method, assume a solution of the form \mathbf{x}(t) = e^{tA} \mathbf{u}(t), where \mathbf{u}(t) is a function to be determined. Differentiating yields \mathbf{x}'(t) = A e^{tA} \mathbf{u}(t) + e^{tA} \mathbf{u}'(t). Substituting into the original equation gives A e^{tA} \mathbf{u}(t) + e^{tA} \mathbf{u}'(t) = A e^{tA} \mathbf{u}(t) + \mathbf{f}(t), which simplifies to e^{tA} \mathbf{u}'(t) = \mathbf{f}(t). Multiplying both sides by e^{-tA} results in \mathbf{u}'(t) = e^{-tA} \mathbf{f}(t). Integrating from 0 to t and using the \mathbf{u}(0) = \mathbf{x}_0 produces \mathbf{u}(t) = \mathbf{x}_0 + \int_0^t e^{-sA} \mathbf{f}(s) \, ds. Thus, \mathbf{x}(t) = e^{tA} \left( \mathbf{x}_0 + \int_0^t e^{-sA} \mathbf{f}(s) \, ds \right) = e^{tA} \mathbf{x}_0 + \int_0^t e^{(t-s)A} \mathbf{f}(s) \, ds. The term represents a \left( e^{tA} * \mathbf{f} \right)(t), which inherits properties from the generated by A, such as : the value of \mathbf{x}(t) depends only on \mathbf{f}(s) for s \leq t, reflecting the forward propagation of influences in time. When \mathbf{f}(t) = \mathbf{b} is a constant vector and A is invertible, the can be evaluated explicitly as \int_0^t e^{(t-s)A} \, ds \, \mathbf{b} = A^{-1} \left( e^{tA} - I \right) \mathbf{b}, yielding the closed-form solution \mathbf{x}(t) = e^{tA} \mathbf{x}_0 + A^{-1} \left( e^{tA} - I \right) \mathbf{b}.

Variation of parameters method

The method extends the classical technique for scalar equations to systems of linear ordinary differential equations with time-varying coefficients, addressing the inhomogeneous system \mathbf{x}'(t) = A(t) \mathbf{x}(t) + \mathbf{f}(t), where \mathbf{x}(t) and \mathbf{f}(t) are n-dimensional functions and A(t) is an n \times n function./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) This approach assumes a fundamental \Phi(t) for the corresponding homogeneous system \mathbf{x}'(t) = A(t) \mathbf{x}(t) is available, satisfying \Phi'(t) = A(t) \Phi(t) with \Phi(0) = I, the n \times n ./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) The method constructs a particular solution by varying the constants in the homogeneous solution, leading to an integral representation that incorporates the forcing term \mathbf{f}(t). To apply the method, posit a particular solution of the form \mathbf{x}_p(t) = \Phi(t) \mathbf{u}(t), where \mathbf{u}(t) is an unknown vector function./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) Differentiating yields \mathbf{x}_p'(t) = \Phi'(t) \mathbf{u}(t) + \Phi(t) \mathbf{u}'(t) = A(t) \Phi(t) \mathbf{u}(t) + \Phi(t) \mathbf{u}'(t). Substituting into the original equation gives A(t) \Phi(t) \mathbf{u}(t) + \Phi(t) \mathbf{u}'(t) = A(t) \Phi(t) \mathbf{u}(t) + \mathbf{f}(t), simplifying to \Phi(t) \mathbf{u}'(t) = \mathbf{f}(t)./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) Thus, \mathbf{u}'(t) = \Phi(t)^{-1} \mathbf{f}(t), assuming \Phi(t) is invertible, which holds for fundamental matrices. Integrating from 0 to t produces \mathbf{u}(t) = \int_0^t \Phi(s)^{-1} \mathbf{f}(s) \, ds, so the particular solution is \mathbf{x}_p(t) = \Phi(t) \int_0^t \Phi(s)^{-1} \mathbf{f}(s) \, ds. The general solution then becomes \mathbf{x}(t) = \Phi(t) \left( \mathbf{x}(0) + \int_0^t \Phi(s)^{-1} \mathbf{f}(s) \, ds \right)./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) The computational algorithm follows these steps when \Phi(t) is known: (1) Solve the homogeneous system to obtain \Phi(t) with \Phi(0) = I; (2) Compute the inverse \Phi(t)^{-1} at required points; (3) Form \mathbf{u}'(t) = \Phi(t)^{-1} \mathbf{f}(t); (4) Numerically or analytically integrate \mathbf{u}(t) = \int_0^t \mathbf{u}'(s) \, ds; (5) Assemble the solution \mathbf{x}(t) = \Phi(t) (\mathbf{x}(0) + \mathbf{u}(t))./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) This process is particularly useful in applications where the homogeneous solution is tractable, such as through series expansions or known closed forms. When the coefficient matrix A(t) is constant, the fundamental matrix simplifies to the matrix exponential \Phi(t) = e^{At}, connecting the method directly to properties of matrix exponentials./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) In this case, the integral expression aligns with the Duhamel integral for constant-coefficient systems./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) Compared to the method of undetermined coefficients, variation of parameters offers greater flexibility for non-constant A(t) and arbitrary continuous \mathbf{f}(t), without requiring specific functional forms for the forcing term, though it demands prior knowledge or computation of \Phi(t)./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) This makes it essential in and time-dependent physical models where coefficients vary with time.

Computing the Matrix Exponential

Diagonalizable matrices

A matrix A \in \mathbb{C}^{n \times n} is diagonalizable if there exists an P and a D = \operatorname{diag}(\lambda_1, \dots, \lambda_n) such that A = P D P^{-1}, where the \lambda_i are the eigenvalues of A. In this case, the matrix exponential \exp(A) admits a derived from the : \exp(A) = P \exp(D) P^{-1}, where \exp(D) = \operatorname{diag}(e^{\lambda_1}, \dots, e^{\lambda_n}) applies the scalar entrywise to the diagonal elements. This formula follows directly from the power series definition of the matrix exponential, as the series commutes with similarity transformations and reduces to the scalar case on the diagonal. The condition for diagonalizability requires that A possesses a complete set of n linearly independent eigenvectors, which forms the columns of P. This is guaranteed if A has n distinct eigenvalues, since distinct eigenvalues yield linearly independent eigenvectors; however, diagonalizability can also hold when eigenvalues are repeated, provided the geometric multiplicity (dimension of the eigenspace) equals the algebraic multiplicity (from the ) for each eigenvalue. To compute \exp(A) via this approach, first solve the eigenvalue problem to obtain the eigenvalues \lambda_i and corresponding eigenvectors v_i, forming D from the \lambda_i and P from the v_i as columns. Then, compute \exp(D) by exponentiating each diagonal entry, invert P to obtain P^{-1}, and finally assemble \exp(A) = P \exp(D) P^{-1}. This method is in but can suffer from numerical in finite , as the accuracy of \exp(A) is sensitive to perturbations in the computed eigenvectors; specifically, the relative error is bounded by a factor involving the \kappa(P) = \|P\| \|P^{-1}\|, which grows large if the eigenvectors are ill-conditioned (e.g., when eigenvalues are close). For non-normal matrices with clustered eigenvalues, this conditioning can lead to substantial loss of accuracy, making alternative methods preferable in practice.

Diagonalizable example

To illustrate the computation of the matrix exponential for a diagonalizable matrix, consider the 2×2 matrix A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}. This matrix has distinct eigenvalues \lambda_1 = \frac{5 + \sqrt{33}}{2} and \lambda_2 = \frac{5 - \sqrt{33}}{2}, confirming it is diagonalizable via the for matrices with distinct eigenvalues. The corresponding eigenvectors are v_1 = \begin{pmatrix} 2 \\ \frac{3 + \sqrt{33}}{2} \end{pmatrix} for \lambda_1 and v_2 = \begin{pmatrix} 2 \\ \frac{3 - \sqrt{33}}{2} \end{pmatrix} for \lambda_2, obtained by solving (A - \lambda_i I) v_i = 0 for each i. Let s = \sqrt{33}, \alpha = \frac{3 + s}{2}, and \beta = \frac{3 - s}{2}. The matrix of eigenvectors is then P = \begin{pmatrix} 2 & 2 \\ \alpha & \beta \end{pmatrix}, with determinant \det(P) = -2s. The inverse is P^{-1} = \frac{1}{-2s} \begin{pmatrix} \beta & -2 \\ -\alpha & 2 \end{pmatrix}. The diagonal matrix of exponentials is D = \operatorname{diag}(e^{\lambda_1}, e^{\lambda_2}). Thus, e^A = P \begin{pmatrix} e^{\lambda_1} & 0 \\ 0 & e^{\lambda_2} \end{pmatrix} P^{-1}. Let e_1 = e^{\lambda_1} and e_2 = e^{\lambda_2}. Performing the matrix multiplication yields the explicit entries: e^A = \begin{pmatrix} \frac{e_2 \alpha - e_1 \beta}{s} & \frac{2(e_1 - e_2)}{s} \\[1em] \frac{3(e_1 - e_2)}{s} & \frac{\alpha e_1 - \beta e_2}{s} \end{pmatrix}. Note that \alpha \beta = -6 and \alpha - \beta = s, which follow from the quadratic relations of the eigenvalues. To verify, numerical evaluation provides \lambda_1 \approx 5.372, \lambda_2 \approx -0.372, e_1 \approx 215.044, e_2 \approx 0.689, \alpha \approx 4.372, and \beta \approx -1.372, giving e^A \approx \begin{pmatrix} 51.86 & 74.64 \\ 111.96 & 163.82 \end{pmatrix}. The partial sums of the power series e^A = \sum_{k=0}^\infty \frac{A^k}{k!} converge to this value; for instance, the sum up to k=3 is approximately \begin{pmatrix} 11.67 & 16.00 \\ 24.00 & 35.67 \end{pmatrix}, and further terms approach the diagonalized result. Additionally, A e^A = e^A A holds numerically to machine precision, as expected since e^A commutes with A. Regarding numerical , the approach is stable here because the eigenvalue separation |\lambda_1 - \lambda_2| = s \approx 5.744 is sufficiently large, yielding \kappa_2(P) \approx 6.2 (the 2-norm of P), which bounds error propagation in the computation of e^A. In general, closer eigenvalues increase \kappa(P), amplifying errors in P and P^{-1}, though the itself remains well-conditioned for this non-normal A.

matrices

A A satisfies A^k = 0 for some positive k, where k is the minimal such index, known as the index of nilpotency. For such matrices, the matrix exponential \exp(A) admits an exact via the power series definition, which terminates after finitely many terms: \exp(A) = \sum_{m=0}^{k-1} \frac{A^m}{m!}, as all higher powers of A vanish. This finite sum yields a in A of degree at most k-1, facilitating straightforward computation without approximation. A canonical example is the Jordan block N of size n \times n, featuring 1's on the superdiagonal and zeros elsewhere, for which N^n = 0. The exponential is then \exp(N) = \sum_{m=0}^{n-1} \frac{N^m}{m!}, where each N^m (for m < n) places 1's on the m-th superdiagonal. Consequently, \exp(N) is an upper-triangular matrix with 1's on the and \frac{1}{m!} on the m-th superdiagonal. The exponential of a N is always invertible, since \det(\exp(N)) = \exp(\trace(N)) = \exp(0) = 1./05:_Systems_of_ODEs/5.08:_Matrix_exponentials) Furthermore, \exp(N) is unipotent, possessing all eigenvalues equal to 1, as the eigenvalues of N are zero and the eigenvalues of \exp(N) are the exponentials of those of N. In , unipotent elements of the general linear group \mathrm{GL}(n, \mathbb{C}) are exactly the exponentials of elements in its \mathfrak{gl}(n, \mathbb{C}), with the providing a bijective correspondence between these sets.

General matrices via Jordan form

For a general square matrix A \in \mathbb{C}^{n \times n}, the matrix exponential \exp(A) can be computed using the canonical form, which decomposes A as A = P J P^{-1}, where P is invertible and J is a block-diagonal matrix consisting of blocks. Since the matrix exponential is an , it preserves similarity transformations, yielding \exp(A) = P \exp(J) P^{-1}. Each block in J corresponds to an eigenvalue \lambda of A, and \exp(J) is computed blockwise. Consider a Jordan block of size m \times m given by J_\lambda = \lambda I_m + N, where N is the nilpotent Jordan block with 1's on the superdiagonal and zeros elsewhere, satisfying N^m = 0. The exponential of this block is \exp(J_\lambda) = e^\lambda \exp(N) = e^\lambda \sum_{k=0}^{m-1} \frac{N^k}{k!}, as higher powers of N vanish. The nilpotent part \exp(N) is an upper-triangular with entries determined by the partial sums of the exponential series along the superdiagonals. To implement this approach, first compute the eigenvalues of A using standard algorithms such as the QZ method for generalized eigenvalues if needed. Then, determine the Jordan chains by solving (A - \lambda I)v_k = v_{k-1} for generalized eigenvectors v_k starting from actual eigenvectors v_1, and assemble the columns of P from these chains for each block. The full P and J are formed by collecting the chains across all eigenvalues and blocks. However, this method is often numerically unstable in practice, as the condition number of P can be extremely large for matrices with nearly defective eigenvalues or clustered spectra, leading to severe rounding errors. For this reason, more robust alternatives like the are preferred in numerical software.

Special cases: Projections and rotations

A P satisfies P^2 = P, making it idempotent. For such matrices, the matrix exponential admits a simple : \exp(tP) = I + (e^t - 1)P. This formula can be derived using the power series definition of the matrix exponential, \exp(tP) = \sum_{n=0}^\infty \frac{(tP)^n}{n!}. Since P^n = P for all n \geq 1, the series simplifies to I + P \sum_{n=1}^\infty \frac{t^n}{n!} = I + (e^t - 1)P. Alternatively, the result follows from the minimal polynomial of P, which is \lambda(\lambda - 1) = 0. Any f applied to P can be expressed as f(P) = f(0)(I - P) + f(1)P. Setting f(\lambda) = e^{t\lambda} yields \exp(tP) = e^{0}(I - P) + e^{t}P = I + (e^t - 1)P. Skew-symmetric matrices generate rotations via the matrix exponential. If K is skew-symmetric (K^T = -K), then \exp(K) is an with 1, belonging to the special SO(n). In two dimensions, rotations are explicitly given by the exponential of a scaled skew-symmetric matrix. Consider K = \theta \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}, where \theta is the rotation . Then, \exp(K) = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}. This follows from the power series expansion, where K^2 = -\theta^2 I, K^3 = -\theta^3 K, and higher powers alternate similarly, mirroring the for cosine and sine: \exp(K) = (\cos \theta) I + (\sin \theta) K / \theta.

Series-based methods

One practical approach to computing the matrix exponential involves truncating the power series expansion, which converges absolutely for all finite matrices. The is given by \exp(A) \approx \sum_{k=0}^{N} \frac{A^k}{k!}, where N is chosen sufficiently large to achieve the desired accuracy. The R_N = \exp(A) - \sum_{k=0}^{N} \frac{A^k}{k!} satisfies the bound \|R_N\| \leq \frac{\|A\|^{N+1}}{(N+1)!} \exp(\|A\|), derived from the remainder term in the scalar exponential series applied to the matrix norm. This method is particularly effective when \|A\| is small, as fewer terms are needed for , or when A has structured sparsity that allows efficient computation of successive matrix powers. To accelerate of the series, techniques such as the Euler-Maclaurin formula can be adapted from the scalar case to provide asymptotic expansions that improve the rate of for the partial sums. For matrices, analogous acceleration methods have been explored to reduce the number of terms required, though they are less commonly implemented than direct due to added in matrix arithmetic. In cases involving a shift, such as computing \exp(zI - B) where B is invertible and z is a scalar, a expansion can be employed around z = \infty, incorporating negative powers to capture the behavior for large |z|. This representation facilitates evaluation when the of B leads to rapid growth or decay in the standard , though it requires careful handling of the principal part to ensure .

Sylvester's formula approach

Sylvester's formula offers an explicit expression for the matrix exponential of a diagonalizable matrix with distinct eigenvalues, derived from the holomorphic functional calculus via contour integration. For a square matrix A that is diagonalizable with distinct eigenvalues \lambda_1, \dots, \lambda_n, the matrix exponential is given by \exp(A) = \sum_{i=1}^n \exp(\lambda_i) \, Z_i, where each Z_i is the spectral projection matrix onto the eigenspace corresponding to \lambda_i, defined as Z_i = \prod_{\substack{j=1 \\ j \neq i}}^n \frac{A - \lambda_j I}{\lambda_i - \lambda_j}. These projections satisfy Z_i Z_k = \delta_{ik} Z_i and \sum_i Z_i = I, ensuring the representation aligns with the spectral decomposition A = \sum_i \lambda_i Z_i. This formula arises from the more general contour integral representation in the . For an f defined in a containing the of A, f(A) = \frac{1}{2\pi i} \oint_\Gamma f(z) (zI - A)^{-1} \, dz, where \Gamma is a positively oriented enclosing the of A but no other singularities of the integrand. Specializing to f(z) = \exp(z), which is entire, yields \exp(A) = \frac{1}{2\pi i} \oint_\Gamma \exp(z) (zI - A)^{-1} \, dz. By the residue theorem, since the resolvent (zI - A)^{-1} has simple poles at the eigenvalues \lambda_i with residues Z_i, the integral reduces to the sum \sum_i \exp(\lambda_i) Z_i, recovering Sylvester's formula. The approach is particularly advantageous for computing matrix functions that may have poles or branch cuts, as the contour can be selected to enclose the while avoiding singularities of f, allowing evaluation even when direct series expansions or other methods fail. For the exponential specifically, the formula provides a when eigenvalues are known, avoiding the need for full while leveraging at the eigenvalues.

Numerical and Advanced Computation

Scaling and squaring method

The scaling and squaring method is a classical numerical technique for computing the exponential e^A of an n \times n A, particularly effective for matrices with large where direct series would be inefficient. The approach leverages the e^A = (e^{A/2^m})^{2^m} for m \geq 0, allowing the problem to be reduced to approximating the exponential of a scaled with small norm, followed by repeated squaring to restore the original scale. To implement the algorithm, first determine the smallest integer m such that \|2^{-m} A\| \leq 1, where \|\cdot\| denotes a suitable (often the norm for implementation convenience). Then, compute an E \approx e^{2^{-m} A} using a truncated or a suitable for small arguments, as the reduced norm ensures rapid convergence. Finally, iteratively square E exactly m times via to obtain the approximation X = E^{2^m} \approx e^A. This process minimizes errors from the initial while distributing computational effort across the squaring steps. A key advantage of the method lies in its backward error analysis, which demonstrates : the computed X satisfies X = e^{A + \Delta A} for some \Delta A with \|\Delta A\| = O(\epsilon \|A\|), where \epsilon is the unit roundoff of the floating-point arithmetic, assuming exact arithmetic in the squaring phase. This backward stability ensures that the result is accurate for the slightly perturbed input, making the method robust for practical computations even with ill-conditioned matrices. The analysis relies on bounding truncation errors in the approximant and propagation through squaring, with optimal choices of m and approximant degree balancing forward and backward errors. The scaling and squaring algorithm is widely implemented in numerical software libraries, including MATLAB's expm function, which employs a variant with Padé approximants and balancing for enhanced accuracy. For dense matrices, the is O(n^3 \log \|A\|), arising from approximately m + c matrix multiplications (where c accounts for the approximant evaluation and m \approx \log_2 \|A\| governs the squaring iterations), each costing O(n^3) via standard algorithms like those in . This makes it efficient for moderate-sized where the is not excessively large.

Padé approximant methods

provide a to the matrix exponential that matches the expansion up to a higher order than the degree of the polynomials involved, offering improved accuracy and compared to polynomial methods. Specifically, the [m/m] diagonal to the \exp(x) is a r_{m,m}(x) = p_m(x)/q_m(x), where p_m(x) and q_m(x) are polynomials of degree at most m such that \exp(x) - r_{m,m}(x) = O(x^{2m+1}) as x \to 0. For a A, this extends to \exp(A) \approx p_m(A) q_m(A)^{-1}, where the polynomials are applied via matrix powers and addition. This rational form is particularly advantageous for numerical computation because it converges faster and remains bounded for large arguments, unlike truncated Taylor series which can diverge. The inversion q_m(A)^{-1} is typically computed via a linear solve or by reformulating the approximation to avoid explicit inversion, such as using the identity r_{m,m}(A) = p_m(A) (q_m(A))^{-1} = D_m^{-1} p_m(A) D_m q_m(A)^{-1} with a scaling matrix D_m, though the core evaluation relies on efficient polynomial arithmetic. For practical use in double precision arithmetic, optimal [m/m] orders are selected based on the matrix norm \|A\|; common choices include [3/3] for moderate norms and [13/13] for higher accuracy requirements, as these balance computational cost with relative error below machine epsilon for \|A\| \lesssim 10. These orders ensure the approximation error is dominated by roundoff rather than truncation for typical problems. Padé methods are often combined with the scaling and squaring technique: first, scale A by $2^{-s} to reduce its norm to O(1), compute the [m/m] Padé approximant of the scaled matrix, then square the result s times to recover \exp(A). This hybrid approach minimizes both approximation error and roundoff propagation, with backward error bounds showing stability even for matrices with widely varying eigenvalues. Compared to methods, exhibit superior stability for stiff problems or matrices with large-magnitude eigenvalues, as the denominator effectively damps oscillatory components without amplifying errors during evaluation. Error analysis confirms that the relative backward error is O(\epsilon) times a modest factor, making it reliable across a broad range of classes. In modern implementations, such as MATLAB's expm function, Higham's refined scaling and squaring algorithm with adaptive Padé orders (selecting from [3/3], [5/5], [7/7], [9/9], or [13/13] based on \|A\| and unit roundoff) is employed, achieving high accuracy with O(n^3) complexity for n \times n dense matrices.

Matrix-matrix exponentials

In matrix theory, exponentials of matrices constructed from multiple component matrices often admit closed-form expressions that exploit structural properties, such as block forms or Kronecker products. These identities are particularly valuable in applications requiring efficient computation or analysis of large-scale systems without explicit formation of the full matrix. Consider the 2×2 block matrix M = \begin{pmatrix} 0 & B \\ C & 0 \end{pmatrix}, where B and C are square matrices of the same dimension that commute (BC = CB) and for which the matrix square root \sqrt{BC} is defined. Under these conditions, the matrix exponential is given by \exp(M) = \begin{pmatrix} \cosh(\sqrt{BC}) & B (\sqrt{BC})^{-1} \sinh(\sqrt{BC}) \\ C (\sqrt{BC})^{-1} \sinh(\sqrt{BC}) & \cosh(\sqrt{BC}) \end{pmatrix}, where \cosh and \sinh denote the matrix hyperbolic cosine and sine functions, respectively. This formula generalizes the scalar case and holds because the series expansions of \cosh and \sinh align with the nilpotent structure induced by the off-diagonal blocks when commuting. If B and C do not commute, the exponential requires more general methods, such as Jordan form decomposition. In special cases where additional structure exists (e.g., B and C positive semidefinite), the diagonal blocks relate to products involving square roots of individual exponentials, such as \exp(B)^{1/2} \exp(C)^{1/2}, providing an alternative representation under compatibility conditions on eigenvalues. A fundamental identity arises for Kronecker sums, which combine multiple matrices via tensor products. For an m \times m matrix A and an n \times n matrix B, the Kronecker sum is defined as A \oplus B = A \otimes I_n + I_m \otimes B. The matrix exponential satisfies \exp(A \oplus B) = \exp(A) \otimes \exp(B). This follows directly from the power series definition, as the Kronecker product distributes over addition and scalar multiplication, allowing term-by-term factorization in the expansion \exp(A \oplus B) = \sum_{k=0}^\infty \frac{(A \oplus B)^k}{k!}. The identity requires no commutativity assumptions beyond the inherent structure. These multi-matrix exponentials find key applications in and . For the continuous A X + X A^T = Q with stable A, yields (I \otimes A + A \otimes I) \mathrm{vec}(X) = -\mathrm{vec}(Q), or equivalently (A \oplus A^T) \mathrm{vec}(X) = -\mathrm{vec}(Q). The unique solution is X = \int_0^\infty \exp(A t) Q \exp(A^T t) \, dt, linking directly to Kronecker-structured exponentials for verification or low-rank approximations. In bilinear systems of the form \dot{x} = A x + \sum_{i=1}^m N_i x u_i, where u are inputs, the state transition maps involve flows generated by Lie brackets of matrices, often reducible to exponentials of Kronecker sums like A \oplus N_i for discretized or approximated solutions. Computing such exponentials efficiently avoids forming the potentially huge mn \times mn for Kronecker sums. Vectorization transforms the problem into a large , solvable via direct methods like Bartels-Stewart for moderate sizes, but for large dimensions, structure-exploiting algorithms are essential. These include the mixed Kronecker product property: if v = \mathrm{vec}(V) for an m \times n V, then \exp(A \oplus B) v = \mathrm{vec}( \exp(A) V \exp(B)^T ), computable by applying the smaller exponentials \exp(A) and \exp(B) to the factors of V without tensor expansion. For block matrices like M, the commuting case permits direct evaluation of the using Padé approximants on \sqrt{BC}, scaled to the block size. Iterative methods, such as the alternating direction implicit (ADI) scheme, further leverage the separability for Lyapunov-related computations, achieving quadratic convergence under suitable shifts.

References

  1. [1]
    Matrix Exponential -- from Wolfram MathWorld
    The matrix exponential, exp(A), is defined as e^(A) = sum_(n=0)^(infty)(A^n)/(n!) and converges for any square matrix A.
  2. [2]
    [PDF] The Jacobian of the exponential function - Jan Magnus
    The matrix exponential eX is more complicated and it was not introduced until the late 19th century by Sylvester, Laguerre, and Peano. The matrix exponential ...
  3. [3]
    Matrix Exponentials | Differential Equations - MIT OpenCourseWare
    This section provides materials for a session on the basic linear theory for systems, the fundamental matrix, and matrix-vector algebra.
  4. [4]
    [PDF] Lie Groups, Lie Algebras and the Exponential Map
    As a result, all matrix groups are Lie groups. For matrix groups the exponential map is given explicitly by the standard matrix power series: exp(tX)=1+ tX ...
  5. [5]
    [PDF] The Scaling and Squaring Method for the Matrix Exponential Revisited
    Computation of eA is required in applications such as nuclear magnetic resonance spectroscopy [8], [18], control theory [5], and Markov chain analysis [20].<|control11|><|separator|>
  6. [6]
    [PDF] The exponential function for matrices - UCR Math Department
    But this means that the matrix power series converges absolutely. SPECIAL CASE. If A is a square matrix, then the exponential series exp(A) = ∞.
  7. [7]
    [PDF] exponential.pdf - 18.701 Algebra I - DSpace@MIT
    To prove that the series for the exponential converges absolutely, we need to show that the entries of the powers Ak do not grow too fast. The norm of an n × n ...
  8. [8]
    [PDF] Notes on the Matrix Exponential and Logarithm Howard E. Haber
    Nov 14, 2023 · These notes summarize important properties of the matrix exponential and logarithm, including a series expansion for dlnA(t)/dt.
  9. [9]
    [PDF] Matrix-Exponentials | MIT
    Sep 7, 2017 · But what is the exponential of a matrix? We can guess at least one case. For eigenvectors, the matrix A acts like a scalar λ, so we should have ...
  10. [10]
    [PDF] Exponentials and Rotations - UMD MATH
    Jul 21, 2021 · Since we are interested in orthogonal matrices, we will need to take the transpose of a matrix exponential. This is easy enough. Theorem ...
  11. [11]
    [PDF] The Matrix Exponential = = I + A + A3 + ··· D2 + ··· ··· = 0 ··· 0 1
    Oct 3, 2019 · An(s + t)n n! −A = eA(1+(−1)) = e0 = I. In other words, regardless of the matrix A, the exponential matrix eA is always invertible, and has ...
  12. [12]
    [PDF] The Exponential of a Matrix
    May 10, 2023 · Another familiar property of ordinary exponentials holds for the matrix exponential: If A and B com- mute (that is, AB = BA), then. eAeB = eA ...
  13. [13]
    The Exponential of a Matrix
    The Exponential of a Matrix. In this section, all of the matrices will be real or complex matrices. The solution to the exponential growth equation.Missing: applications | Show results with:applications
  14. [14]
    Lie Groups, Lie Algebras, and Representations - SpringerLink
    In stockThis textbook treats Lie groups, Lie algebras and their representations in an elementary but fully rigorous fashion requiring minimal prerequisites.
  15. [15]
    On an inequality of Lieb and Thirring | Letters in Mathematical Physics
    On an inequality of Lieb and Thirring. Published: February 1990. Volume 19, pages 167–170, (1990); Cite this article. Download PDF · Letters in Mathematical ...
  16. [16]
    [PDF] On the Araki-Lieb-Thirring inequality - arXiv
    Mar 29, 2007 · Abstract. We prove an inequality that complements the famous Araki-Lieb-Thirring (ALT) inequality for pos- itive matrices A and B, ...
  17. [17]
    [PDF] Nineteen Dubious Ways to Compute the Exponential of a Matrix ...
    The exponential of a matrix can be computed using approximation theory, differential equations, matrix eigenvalues, and the matrix characteristic polynomial.
  18. [18]
    [PDF] A SURVEY OF CERTAIN TRACE INEQUALITIES
    Kosaki, An inequality of Araki–Lieb–Thirring (von Neumann algebra case), Proc. ... So, Equality cases in matrix exponential inequalities, SIAM J. Matrix Anal ...
  19. [19]
    [PDF] A Theory of Trotter Error arXiv:1912.08854v3 [quant-ph] 3 Feb 2021
    Feb 3, 2021 · The Lie-Trotter formula, together with its higher-order generalizations, provides a simple approach to decomposing the exponential of a sum of ...
  20. [20]
    [PDF] arXiv:2207.00613v1 [math.CO] 1 Jul 2022
    Jul 1, 2022 · Theorem (Lie-Trotter product formula). Let A and B be complex square matrices. Then lim n→∞. (eA/neB/n)n. → eA+B, where the convergence is ...
  21. [21]
    On the product of semi-groups of operators - Semantic Scholar
    Semantic Scholar extracted view of "On the product of semi-groups of operators" by H. Trotter.
  22. [22]
    On the Product of Semi-Groups of Operators - jstor
    ON THE PRODUCT OF SEMI-GROUPS OF OPERATORS'. H. F. TROTTER. 1. Introduction. We consider semi-groups of operators on a Banach space X, which are of class (C0) ...
  23. [23]
    Trotter–Kato product formula and fractional powers of self-adjoint ...
    Kato. Trotter's product formula for an arbitrary pair of self-adjoint contraction semigroups. I. Gohberg, Mc. Kac (Eds.), Topics in Functional Analysis ...
  24. [24]
    Proving the Lie-Product formula - Math Stack Exchange
    Nov 25, 2016 · Can the matrix transpose be represented by XT=AXB for a given A and B? 2 · Proving the exponent of an element of a the Lie Algebra is an element ...Simulating Hamiltonians using the Lie-Product formulaLie Product Formula - linear algebra - Math Stack ExchangeMore results from math.stackexchange.com
  25. [25]
    [PDF] Prove the Baker–Campbell–Hausdorff formula - MIT Mathematics
    Dec 6, 2017 · Hausdorff (1906, [5]) gave a much clearer demonstration of existence by studying more universal expressions in associative algebras. Finally, ...
  26. [26]
    [PDF] Notes on the theorem of Baker-Campbell-Hausdorff-Dynkin
    The BCH theorem states that log(eXeY), where X and Y are non-commuting variables, is a Lie series, denoted as H(X, Y).
  27. [27]
    The early proofs of the theorem of Campbell, Baker, Hausdorff, and ...
    Apr 3, 2012 · The aim of this paper is to provide a comprehensive exposition of the early contributions to the so-called Campbell, Baker, Hausdorff, Dynkin Theorem during ...
  28. [28]
    [PDF] The Exponential Map, Lie Groups, and Lie Algebras - UPenn CIS
    The Lie algebra can be considered as a linearization of the Lie group (near the identity element), and the exponential map pro- vides the “delinearization,” i. ...
  29. [29]
    [PDF] Fall, 2022 Lecture III The Exponential Map, Local Lie Groups, and ...
    Sep 26, 2022 · This is a smooth map from M(n × n,R) → GL(n,R) whose differential at the origin is the identity. By the implicit function theorem there is a.
  30. [30]
    one-parameter subgroup and geodesics on Lie group - MathOverflow
    Nov 22, 2011 · Given a Matrix Lie Group, I would like to know if the one-parameter subgroups (which can be written as exptX) are the same as the geodesics ( ...Geodesics equation on Lie groups with left invariant metricsTotally geodesic subgroups in Lie groups - MathOverflowMore results from mathoverflow.net
  31. [31]
    [PDF] Notes on the Riemannian Geometry of Lie Groups
    Familiar examples of Lie groups include Rn under addition and the elementary matrix groups (GL(n,R), SL(n,R), O(n), U(n), etc.). In fact, all of the matrix ...Missing: radius | Show results with:radius
  32. [32]
    [PDF] The Magnus expansion and some of its applications - arXiv
    Oct 30, 2008 · Lemma 2 The derivative of a matrix exponential can be written alternatively as. (a) d dt exp(Ω(t)) = d expΩ(t)(Ω′(t)) exp(Ω(t)),. (33). (b) d.
  33. [33]
    [PDF] inhomogeneous linear systems of differential equations
    The general solution of the inhomogeneous system of equations (1) is x(t) = xh(t) + xp(t), where xh(t) is the general solution to the homogeneous equation, and ...
  34. [34]
    [PDF] Notes on Variation of Parameters for Nonhomogeneous Linear ...
    Variation of Parameters Consider differential equation x0 = P(t)x + f(t), where xc = c1x1(t) + ··· + cnxn(t). The fundamental matrix of the solutions is. Φ(t) ...<|control11|><|separator|>
  35. [35]
    Diagonalization
    Diagonalizable matrices are similar if and only if they have the same characteristic polynomial, or equivalently, the same eigenvalues with the same algebraic ...
  36. [36]
    Functions of Matrices: Theory and Computation
    Detailed treatment of the matrix sign function, matrix roots, the polar decomposition, and transcendental matrix functions (exponential, logarithm, cosine, sine) ...
  37. [37]
    [PDF] Unit 16: Diagonalization
    16.9. How do we decide whether a matrix is diagonalizable? Theorem: A is diagonalizable ⇔ malg(λ) = mgeom(λ) for all eigenvalues. If A is diagonalizable, then ...
  38. [38]
    [PDF] n × n matrix. A is diagonalizable if and only if it has eigenvectors
    [B'] If A is an n×n matrix with n distinct eigenvalues, then A is diagonalizable. Fact. If one chooses linearly independent sets of eigenvectors ...
  39. [39]
    EIG-0050: Diagonalizable Matrices and Multiplicity - Ximera
    Let be an matrix . Then is diagonalizable if and only if for each eigenvalue of , the algebraic multiplicity of is equal to the geometric multiplicity of . ...
  40. [40]
    [PDF] I Jordan canonical form I generalized modes I Cayley ... - EE263
    Exponential of Jordan block. I consider a single jordan block J = I + N of size k k. I N is the nilpotent matrix with ones on the 1st upper diagonal. I the ...
  41. [41]
    [PDF] Lie Groups and Algebraic Groups - UCSD Math
    Thus the exponential function is a bijective polynomial map from the nilpotent ele- ments in Mn(C) onto the unipotent elements in GL(n, C), with polynomial ...
  42. [42]
    Nineteen Dubious Ways to Compute the Exponential of a Matrix ...
    Matrix-algebra-based calculations of the time evolution of the binary spin ... Gilbert Strang. SIAM Journal on Numerical Analysis, Vol. 51, No. 6 | 19 ...<|control11|><|separator|>
  43. [43]
    Matrix Computations - 4th Edition | SIAM Publications Library
    Van Loan's classic is an essential reference for computational scientists and engineers in addition to researchers in the numerical linear algebra community.
  44. [44]
    The Equivalence of Definitions of a Matric Function - jstor
    R. F. RINEHART, Case Institute of Technology. 1. Introduction. The general field of matric analysis, which is finding ever- increasing applicability in ...
  45. [45]
    The Scaling and Squaring Method for the Matrix Exponential Revisited
    The scaling and squaring method is the most widely used method for computing the matrix exponential, not least because it is the method implemented in MATLAB's ...Missing: properties | Show results with:properties
  46. [46]
    [PDF] Numerical methods for Lyapunov equations
    This is a vectorized Lyapunov equation. It can consequently be solved efficiently with, e.g., Bartels-Stewart method. A Lyapunov equation approach can be useful ...