Fact-checked by Grok 2 weeks ago

Matrix exponential

The matrix exponential of a square matrix A, denoted \exp(A) or e^A, is defined by the power series \sum_{n=0}^\infty \frac{A^n}{n!}, which converges absolutely to an invertible matrix for every square matrix A with entries in the real or complex numbers.^[1] This construction extends the familiar scalar exponential function e^z = \sum_{n=0}^\infty \frac{z^n}{n!} to the non-commutative setting of matrices, preserving key algebraic properties such as \exp(0) = I (the identity matrix) and \exp(A)^{-1} = \exp(-A).^[1] A fundamental property is that if two matrices A and B commute (i.e., AB = BA), then \exp(A + B) = \exp(A) \exp(B); however, this fails in general due to the non-commutativity of matrix multiplication.^[2] Computationally, the matrix exponential can be evaluated using the eigenvalues and eigenvectors of A, where if A = PDP^{-1} with D diagonal, then \exp(A) = P \exp(D) P^{-1}, and \exp(D) is the diagonal matrix of scalar exponentials of the eigenvalues.^[3] The concept originated in the late 19th century, introduced by mathematicians James Joseph Sylvester, Edmond Laguerre, and Giuseppe Peano as part of early developments in matrix theory.^[4] In applied mathematics, the matrix exponential is essential for solving linear systems of ordinary differential equations of the form \dot{x} = Ax, where the unique solution satisfying initial condition x(0) = x_0 is x(t) = \exp(tA) x_0.^[5] More broadly, it serves as the exponential map in Lie group theory, mapping elements of the Lie algebra (tangent space at the identity) to the Lie group itself for matrix Lie groups like GL(n, \mathbb{C}) or SO(n), facilitating the study of continuous symmetries and group structures.^[6] Notable applications include control theory for system stability analysis, numerical methods for time-dependent simulations, Markov chain modeling in probability, and nuclear magnetic resonance spectroscopy in physics.^[7] In quantum mechanics, it is used for time evolution operators.

Definition and Basic Properties

Power series definition

The matrix exponential of an n \times n complex matrix A is defined via the power series expansion

\exp(A) = \sum_{k=0}^{\infty} \frac{A^k}{k!},

where A^0 = I is the identity matrix and A^k = A \cdot A^{k-1} for k \geq 1. This definition extends the familiar Taylor series for the scalar exponential function e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!}, which converges for all complex scalars x. For matrices, the series is interpreted entrywise, with each entry of \exp(A) given by the corresponding power series in the entries of A.^[8] The power series for \exp(A) converges absolutely to a well-defined matrix for every finite-dimensional square matrix A over the complex numbers \mathbb{C} or real numbers \mathbb{R}. Absolute convergence follows from bounding the matrix norms: if \|A\| denotes a submultiplicative matrix norm, then \|A^k / k!\| \leq \|A\|^k / k!, and the scalar series \sum \|A\|^k / k! converges for all finite \|A\|, implying term-by-term convergence in any finite dimension. This holds uniformly on compact sets in the space of matrices, ensuring the limit is independent of the chosen norm. The same argument applies over the reals, as real matrices embed into complex ones.^[9]^[8]^[10] When n=1, the matrix A reduces to a scalar, and \exp(A) coincides exactly with the scalar exponential e^a, confirming that the matrix definition generalizes the classical case without alteration. This scalar limit underscores the power series as a natural and consistent extension.^[8] The power series definition for matrices was introduced in the late 19th century by James Joseph Sylvester, Edmond Laguerre, and Giuseppe Peano. Sophus Lie developed the related exponential map in the context of continuous transformation groups, now known as Lie groups, where it connects Lie algebras to groups. It gained significant application in the 1930s through John von Neumann's work in quantum mechanics, particularly for time evolution operators in the Heisenberg picture.^[4]

Elementary properties

The matrix exponential function satisfies several fundamental algebraic properties that parallel those of the scalar exponential, holding for square matrices over the real or complex numbers. The exponential of the zero matrix is the identity matrix:

\exp(\mathbf{0}) = \mathbf{I}.

This result follows directly from substituting \mathbf{A} = \mathbf{0} into the power series definition, where all terms \mathbf{A}^k/\,k! for k \geq 1 vanish, leaving only the \mathbf{I} term.^[11] For any real square matrix \mathbf{A}, the transpose of the matrix exponential equals the exponential of the transpose:

\exp(\mathbf{A})^\top = \exp(\mathbf{A}^\top).

To verify this, note that the transpose distributes over matrix addition and multiplication, so (\mathbf{A}^k)^\top = (\mathbf{A}^\top)^k for each k. Applying the transpose to the power series for \exp(\mathbf{A}) term by term yields

\exp(\mathbf{A})^\top = \sum_{k=0}^\infty \frac{(\mathbf{A}^k)^\top}{k!} = \sum_{k=0}^\infty \frac{(\mathbf{A}^\top)^k}{k!} = \exp(\mathbf{A}^\top).

This property extends analogously to the conjugate transpose for complex matrices.^[12] The matrix exponential is always invertible, with the inverse given explicitly by

\exp(\mathbf{A})^{-1} = \exp(-\mathbf{A}).

This holds because \mathbf{A} and -\mathbf{A} always commute (since [\mathbf{A}, -\mathbf{A}] = -\mathbf{A}\mathbf{A} + \mathbf{A}\mathbf{A} = \mathbf{0}), and thus \exp(\mathbf{A} + (-\mathbf{A})) = \exp(\mathbf{A})\exp(-\mathbf{A}). The left side simplifies to \exp(\mathbf{0}) = \mathbf{I}, confirming the inverse relationship.^[13] For any scalar c \in \mathbb{R} and square matrix \mathbf{A}, the scaling property states that

\exp(c\mathbf{A}) = [\exp(\mathbf{A})]^c,

where the right side denotes the c-th power of the matrix \exp(\mathbf{A}), defined via the power series or functional calculus for non-integer c. This equality arises from the power series: the left side is \sum_{k=0}^\infty (c\mathbf{A})^k / k! = \sum_{k=0}^\infty c^k \mathbf{A}^k / k!, while the right side, through the relation \log(\exp(\mathbf{A})) = \mathbf{A} (for matrices where the logarithm is well-defined, or more generally via analytic continuation), yields \exp(c \log(\exp(\mathbf{A})) ) = \exp(c\mathbf{A}). For integer c, it follows by repeated multiplication using the inverse property.^[14] A central algebraic identity is that if two square matrices \mathbf{A} and \mathbf{B} commute, meaning [\mathbf{A}, \mathbf{B}] = \mathbf{A}\mathbf{B} - \mathbf{B}\mathbf{A} = \mathbf{0}, then

\exp(\mathbf{A} + \mathbf{B}) = \exp(\mathbf{A}) \exp(\mathbf{B}).

To prove this, expand both sides using the power series definition. The right side is

\exp(\mathbf{A}) \exp(\mathbf{B}) = \left( \sum_{m=0}^\infty \frac{\mathbf{A}^m}{m!} \right) \left( \sum_{n=0}^\infty \frac{\mathbf{B}^n}{n!} \right) = \sum_{m=0}^\infty \sum_{n=0}^\infty \frac{\mathbf{A}^m \mathbf{B}^n}{m! \, n!}.

Since \mathbf{A} and \mathbf{B} commute, \mathbf{A}^m \mathbf{B}^n = \mathbf{B}^n \mathbf{A}^m for all m, n, allowing arbitrary rearrangement of powers in products. Now consider the left side:

\exp(\mathbf{A} + \mathbf{B}) = \sum_{k=0}^\infty \frac{(\mathbf{A} + \mathbf{B})^k}{k!}.

The binomial theorem gives (\mathbf{A} + \mathbf{B})^k = \sum_{j=0}^k \binom{k}{j} \mathbf{A}^j \mathbf{B}^{k-j}, where the terms commute because \mathbf{A} and \mathbf{B} do. Substituting yields

\exp(\mathbf{A} + \mathbf{B}) = \sum_{k=0}^\infty \frac{1}{k!} \sum_{j=0}^k \binom{k}{j} \mathbf{A}^j \mathbf{B}^{k-j} = \sum_{k=0}^\infty \sum_{j=0}^k \frac{\mathbf{A}^j \mathbf{B}^{k-j}}{j! \, (k-j)!}.

Reindex the double sum by letting m = j and n = k - j, so as k runs from 0 to \infty and j from 0 to k, m and n run independently over all nonnegative integers. This gives exactly

\sum_{m=0}^\infty \sum_{n=0}^\infty \frac{\mathbf{A}^m \mathbf{B}^n}{m! \, n!} = \exp(\mathbf{A}) \exp(\mathbf{B}).

Thus, the identity holds under the commutativity assumption. This property is essential for many applications, such as solving linear systems of differential equations where coefficient matrices commute.^[15]

Determinant properties

One fundamental property of the matrix exponential is that for any square matrix A \in \mathbb{C}^{n \times n}, the determinant satisfies \det(e^A) = e^{\operatorname{tr}(A)}. This identity links the determinant, a multiplicative functional on matrices, to the trace, an additive functional, through the exponential map.^[16] To establish this for the case where A is diagonalizable, suppose A = PDP^{-1} with D = \operatorname{diag}(\lambda_1, \dots, \lambda_n), where \lambda_i are the eigenvalues of A. Then e^A = P e^D P^{-1}, where e^D = \operatorname{diag}(e^{\lambda_1}, \dots, e^{\lambda_n}). The determinant is invariant under similarity transformations, so \det(e^A) = \det(e^D) = \prod_{i=1}^n e^{\lambda_i} = e^{\sum_{i=1}^n \lambda_i}. Since the trace \operatorname{tr}(A) = \sum_{i=1}^n \lambda_i, it follows that \det(e^A) = e^{\operatorname{tr}(A)}. The result extends to all square matrices, including non-diagonalizable ones, as the identity holds continuously by density of diagonalizable matrices in the space of all matrices over \mathbb{C}, with the exponential, determinant, and trace being continuous functions.^[16] For real matrices A \in \mathbb{R}^{n \times n}, the identity \det(e^A) = e^{\operatorname{tr}(A)} holds exactly, as the trace is real and the eigenvalues of A (possibly complex) occur in conjugate pairs, ensuring the product \prod e^{\lambda_i} is real and positive, matching the positive real value e^{\operatorname{tr}(A)}.

Properties for Special Matrix Classes

Symmetric and Hermitian matrices

Real symmetric matrices possess real eigenvalues and can be orthogonally diagonalized as A = Q D Q^T, where Q is an orthogonal matrix and D is a diagonal matrix with the eigenvalues \lambda_i on the diagonal. The matrix exponential then satisfies \exp(A) = Q \exp(D) Q^T, where \exp(D) = \operatorname{diag}(\exp(\lambda_1), \dots, \exp(\lambda_n)). Consequently, \exp(A) is also real symmetric, as it inherits the symmetry from the orthogonal similarity transformation. Moreover, if A is positive definite, then all \lambda_i > 0, ensuring \exp(A) is positive definite with eigenvalues \exp(\lambda_i) > 1. For complex Hermitian matrices, the situation is analogous but uses unitary diagonalization: A = U D U^*, where U is unitary and D is diagonal with real eigenvalues \lambda_i. The exponential is \exp(A) = U \exp(D) U^*, which is Hermitian since U^* = U^{-1}. The eigenvalues of \exp(A) are \exp(\lambda_i), which are positive real numbers, confirming that \exp(A) is always positive definite for any Hermitian A. In the real case, where Hermitian reduces to symmetric, this aligns with the orthogonal diagonalization result. Positive definite matrices, being a special case of Hermitian (or symmetric) matrices with positive eigenvalues, admit a unique Hermitian (or symmetric) matrix logarithm. Specifically, if B is positive definite, there exists a unique Hermitian matrix L such that \exp(L) = B. This uniqueness follows from the bijective mapping of the exponential on the space of Hermitian matrices onto the positive definite cone.

Inequalities for Hermitian exponentials

For Hermitian matrices A and B, the Golden-Thompson inequality provides an upper bound on the trace of the exponential of their sum in terms of the trace of the product of their individual exponentials:

\operatorname{tr}(e^{A + B}) \le \operatorname{tr}(e^{A} e^{B}).

This inequality, independently established by Golden and Thompson, highlights the non-commutative nature of matrix exponentials and implies that the trace of the exponential is a convex function on the space of Hermitian matrices. The Araki-Lieb-Thirring inequality extends such trace bounds to more general settings involving powers of positive semidefinite operators, which can be applied to exponentials of Hermitian matrices by substitution (e.g., setting operators to forms like e^{X/2}). Specifically, for positive semidefinite matrices A, B \geq 0, q \geq 0, and $0 \leq r \leq 1,

\operatorname{tr}[(A^r B^r A^r)^q] \leq \operatorname{tr}[(A B A)^{r q}],

with the inequality reversing for r \geq 1. This provides refined controls on sums of exponentials through log-majorization and operator monotonicity properties, useful for bounding \operatorname{tr} e^{A + B} in non-commuting cases.^[17]^[18] A fundamental norm inequality for the matrix exponential of a Hermitian matrix A is

\|e^A\| \leq e^{\|A\|},

where \|\cdot\| denotes the operator norm (spectral norm). Equality holds when A is a scalar multiple of the identity, and the bound arises from the power series expansion of the exponential, which is submultiplicative with respect to consistent matrix norms, or equivalently from the spectral radius formula for Hermitian matrices.^[19] These inequalities find key applications in quantum statistical mechanics, where the partition function \operatorname{tr} e^{-\beta H} for a Hamiltonian H (a Hermitian operator) requires bounding traces of exponentials to estimate free energies and thermodynamic potentials, often via variational principles like the Gibbs-Bogoliubov inequality.^[20]

Exponentials of Matrix Sums

Lie product formula

The Lie product formula, also known as the Trotter product formula, expresses the matrix exponential of a sum A + B as a limit of products of individual exponentials, providing an approximation even when A and B do not commute. For square matrices A and B over \mathbb{C} and scalar t \in \mathbb{R}, the formula states

\exp(t(A + B)) = \lim_{n \to \infty} \left[ \exp\left( \frac{t A}{n} \right) \exp\left( \frac{t B}{n} \right) \right]^n,

where convergence holds in the operator norm.^[21] If [A, B] = 0, the equality is exact for all n \geq 1.^[22] This result, originally due to Sophus Lie for the finite-dimensional matrix case, underpins approximations in Lie group theory and numerical analysis.^[23] In the more general setting of strongly continuous contraction semigroups on a Banach space, the Trotter product formula extends the approximation to operator sums. For generators A and B generating semigroups \{e^{tA}\}_{t \geq 0} and \{e^{tB}\}_{t \geq 0}, the semigroup \{e^{t(A+B)}\}_{t \geq 0} satisfies

e^{t(A+B)} = \lim_{n \to \infty} \left( e^{tA/n} e^{tB/n} \right)^n = \lim_{n \to \infty} \left( e^{tB/n} e^{tA/n} \right)^n =: \mathrm{s-lim},

with strong operator topology convergence under suitable conditions on the domains.^[24] The Kato approximation refines this for pairs of self-adjoint contraction semigroups, ensuring the limit holds when A + B generates a semigroup, by incorporating consistent approximations to the resolvents.^[25] A proof sketch for the matrix case proceeds via the Baker-Campbell-Hausdorff formula applied to the logarithm of the product term. Specifically, \log \left( \exp(tA/n) \exp(tB/n) \right) = t(A + B)/n + O(1/n^2) from the BCH expansion, since higher-order terms vanish in the limit as n \to \infty; exponentiating and raising to the nth power then yields the result.^[26] Alternatively, Duhamel's principle can be used by interpreting \exp(t(A+B)) as the solution to the ODE Y' = (A+B)Y with Y(0) = I, where the product approximates the Picard iteration for the integral equation.^[21] For finite n, the approximation incurs an error of order O(t^2 / n) in the operator norm, arising from the leading commutator term [A, B] in the BCH expansion. More precisely, for the first-order Trotter step of size \tau = t/n,

\left\| \exp(\tau (A + B)) - \exp(\tau A) \exp(\tau B) \right\| \leq \frac{\tau^2}{2} \left( \| [A, B] \| e^{\tau (\|A\| + \|B\|)} + O(\tau) \right),

with the total error over n steps bounded similarly, enabling controlled approximations in applications like quantum simulation. Higher-order variants, such as Strang splitting, reduce this to O(t^3 / n^2), but the basic formula suffices for the limit.^[21]

Baker-Campbell-Hausdorff formula

The Baker-Campbell-Hausdorff (BCH) formula provides an explicit expression in a Lie algebra \mathfrak{g} for the Lie algebra element corresponding to the product of two elements in the associated Lie group G, via the exponential map \exp: \mathfrak{g} \to G. For A, B \in \mathfrak{g},

\log(\exp(A) \exp(B)) = A + B + \frac{1}{2}[A, B] + \sum_{k=3}^\infty C_k(A, B),

where [A, B] = AB - BA denotes the Lie bracket (commutator), and the higher-order terms C_k(A, B) are homogeneous polynomials of degree k formed from nested commutators of A and B, with rational coefficients c_{i_1,\dots,i_m} determined recursively by the relation

C_k(A, B) = \sum \frac{c_{i_1,\dots,i_m}}{i_1! \cdots i_m!} [A, B]^{i_1} \cdots [A, [A, B]]^{i_m},

summed over multi-indices with i_1 + 2i_2 + \cdots + m i_m = k.^[27] This infinite series lies entirely within \mathfrak{g}, establishing a formal logarithm for products near the identity in G.^[28] The BCH formula originated in the late 19th and early 20th centuries through independent contributions addressing the structure of Lie groups and algebras. John Edward Campbell introduced foundational ideas in 1897 while studying canonical forms for systems of linear differential equations, showing that solutions could be expressed via exponentials with corrections involving commutators.^[29] Henry Frederick Baker extended this in 1902, applying it to hypercomplex numbers and finite groups to derive series expansions for products of exponentials.^[29] Felix Hausdorff provided the first complete and rigorous proof in 1906, demonstrating convergence in associative algebras and clarifying the role of nested commutators.^[29] The series converges absolutely in a neighborhood of the origin in \mathfrak{g}, specifically when A and B lie in a ball of radius \log 2 with respect to a submultiplicative norm on \mathfrak{g} such that \|\exp(X) - I\| \leq e^{\|X\|} - 1 for X \in \mathfrak{g}.^[28] This ensures the arguments remain within the domain where the matrix logarithm is well-defined and analytic. For practical computations, especially in numerical simulations of Lie group flows, finite truncations of the BCH series are employed, often up to order 4 or 6 depending on the required accuracy. One prominent truncation is the Zassenhaus formula, which inverts the BCH relation to express \exp(A + B) as an infinite product

\exp(A + B) = \prod_{k=0}^\infty \exp(D_k(A, B)),

where D_0(A, B) = A + B and the D_k for k \geq 1 are nested commutator terms starting with D_1(A, B) = \frac{1}{2}[A, B], allowing disentanglement into ordered exponentials useful for quantum simulations and Lie group integrators. The Zassenhaus formula, introduced by Hans Zassenhaus in 1935 for determining structure constants of finite groups, converges under similar small-norm conditions as the BCH series.

The Exponential Map

General properties

The matrix exponential induces the exponential map \exp: \mathfrak{gl}(n,\mathbb{C}) \to \mathrm{GL}(n,\mathbb{C}), where \mathfrak{gl}(n,\mathbb{C}) denotes the Lie algebra of all n \times n complex matrices under the commutator bracket, and \mathrm{GL}(n,\mathbb{C}) is the Lie group of invertible n \times n complex matrices under multiplication. This map associates to each element A \in \mathfrak{gl}(n,\mathbb{C}) the group element \exp(A) = \sum_{k=0}^\infty \frac{A^k}{k!}, providing a homomorphism from the additive structure of the Lie algebra to the multiplicative structure of the Lie group. Since \mathrm{GL}(n,\mathbb{C}) is connected, the exponential map is surjective, meaning every element of \mathrm{GL}(n,\mathbb{C}) lies in the image of \exp.^[30] The exponential map is a smooth morphism of Lie groups, and it serves as a local diffeomorphism in a neighborhood of the origin in \mathfrak{gl}(n,\mathbb{C}). Specifically, the differential d\exp_0: T_0 \mathfrak{gl}(n,\mathbb{C}) \to T_I \mathrm{GL}(n,\mathbb{C}) is the identity isomorphism, where I is the identity matrix, ensuring that \exp preserves the tangent space structure at the identity. This property implies that \exp is bijective and smooth (with smooth inverse) on a sufficiently small open ball around 0 in the Lie algebra, facilitating the identification of the Lie algebra with the tangent space at the identity.^[31] In matrix Lie groups like \mathrm{GL}(n,\mathbb{C}), the exponential map exhibits covering properties locally near the identity, acting as a universal covering map onto its image within the injectivity radius—the maximal \rho > 0 such that \exp is a diffeomorphism from the ball of radius \rho in \mathfrak{gl}(n,\mathbb{C}) (endowed with a suitable norm) to an open neighborhood of I in \mathrm{GL}(n,\mathbb{C}). Globally, however, \exp is neither injective nor a covering map due to the non-simply connected topology of \mathrm{GL}(n,\mathbb{C}) for n \geq 2 and the presence of non-trivial homotopy classes. The injectivity radius depends on the chosen metric on the Lie algebra but is positive and finite, bounding the region where distinct algebra elements map to distinct group elements without branching.^[32] The curves \gamma(t) = \exp(tA) for A \in \mathfrak{gl}(n,\mathbb{C}) and t \in \mathbb{R} form one-parameter subgroups of \mathrm{GL}(n,\mathbb{C}), which are integral curves of left-invariant vector fields generated by A. When \mathrm{GL}(n,\mathbb{C}) is equipped with a left-invariant Riemannian metric, these one-parameter subgroups coincide with the geodesics emanating from the identity, tracing minimal paths in the group manifold. This geodesic interpretation underscores the exponential map's role in connecting the algebraic structure of the Lie algebra to the geometric structure of the Lie group.^[33]

Directional derivatives on Hermitian matrices

The directional derivative of the matrix exponential at a Hermitian matrix H in the direction of a skew-Hermitian matrix K is captured by the Fréchet derivative D\exp_H(K), which quantifies the first-order sensitivity of \exp(H) to perturbations along K. This specialization is relevant to the geometry of the unitary group, as Hermitian matrices parameterize the Lie algebra via multiplication by i, mapping to unitaries through the exponential. The explicit integral representation is

D\exp_H(K) = \exp(H) \int_0^1 \exp(-s H) K \exp(s H) \, ds.

When H is Hermitian, \exp(H) is positive definite Hermitian, and the integrand involves the adjoint action of \exp(s H) on K, preserving the skew-Hermitian structure of the transformed K. This formula can be derived by differentiating the power series expansion of \exp(H + t K) with respect to t at t = 0. The resulting expression is \sum_{n=1}^\infty \frac{1}{n!} \sum_{j=0}^{n-1} H^j K H^{n-1-j}, a double sum that simplifies to the integral form through summation techniques for non-commuting matrix powers, such as recognizing it as a discretized version of the continuous adjoint evolution. Alternatively, applying Duhamel's principle to the perturbed ODE Y'(t) = (H + t K) Y(t) with Y(0) = I yields the variation \delta Y(1) = \int_0^1 \exp((1-s) H) K \exp(s H) \, ds, which rewrites to the given form by left-multiplication with \exp(H) and variable substitution.^[34] A key simplification occurs when [H, K] = 0, in which case \exp(-s H) K \exp(s H) = K for all s, reducing the integral to K and thus D\exp_H(K) = \exp(H) K. This commuting case facilitates exact computations in models where the perturbation aligns with the unperturbed operator's eigenspaces. In quantum mechanics, this directional derivative underpins perturbation theory for unitary evolution operators generated by perturbed Hamiltonians. For a base Hermitian Hamiltonian H_0 with skew-Hermitian perturbation direction K (arising in contexts like infinitesimal gauge transformations or effective non-Hermitian shifts in open systems), the first-order expansion of \exp(-i (H_0 + \epsilon K)) uses the formula to approximate state evolution and transition amplitudes, avoiding costly full exponentiation. This approach is central to the Magnus expansion, which logarithmizes the time-ordered exponential for numerical integration of Schrödinger equations with small perturbations.^[34]

Applications to Linear Differential Equations

Homogeneous systems

The matrix exponential arises naturally as the solution operator for the homogeneous linear system of ordinary differential equations \dot{\mathbf{x}}(t) = A \mathbf{x}(t), where A is an n \times n constant matrix and \mathbf{x}(t) \in \mathbb{R}^n. Given an initial condition \mathbf{x}(0) = \mathbf{x}_0, the unique solution is \mathbf{x}(t) = e^{tA} \mathbf{x}_0. To verify this, consider the power series definition of the matrix exponential,

e^{tA} = \sum_{k=0}^{\infty} \frac{(tA)^k}{k!},

which converges absolutely for all t \in \mathbb{R}. Differentiating term by term with respect to t gives

\frac{d}{dt} e^{tA} = \sum_{k=1}^{\infty} \frac{k t^{k-1} A^k}{k!} = \sum_{k=1}^{\infty} \frac{t^{k-1} A^k}{(k-1)!} = A \sum_{k=0}^{\infty} \frac{t^k A^k}{k!} = A e^{tA}.

Thus, \mathbf{x}(t) = e^{tA} \mathbf{x}_0 satisfies \dot{\mathbf{x}}(t) = A \mathbf{x}(t). Moreover, evaluating at t=0 yields e^{0 \cdot A} = I, the identity matrix, so \mathbf{x}(0) = I \mathbf{x}_0 = \mathbf{x}_0, confirming the initial condition. The uniqueness of this solution follows from the Picard–Lindelöf theorem applied to the system, as the right-hand side f(\mathbf{x}) = A \mathbf{x} is continuously differentiable (hence locally Lipschitz continuous) on \mathbb{R}^n. The theorem guarantees a unique local solution extending to all t \in \mathbb{R} for linear systems, since no finite escape time occurs. In the context of dynamical systems, e^{tA} serves as the time-t evolution operator, or flow map, \Phi(t, \mathbf{x}_0) = e^{tA} \mathbf{x}_0, which propagates initial states along integral curves of the linear vector field \mathbf{x} \mapsto A \mathbf{x}. This interpretation highlights its role in describing the complete phase flow of the system.

Homogeneous example

Consider the damped harmonic oscillator governed by the second-order linear differential equation x''(t) + 3x'(t) + 2x(t) = 0, where the coefficients correspond to a unit mass, damping constant 3, and spring constant 2./Book%3A_University_Physics_I_-Mechanics_Sound_Oscillations_and_Waves(OpenStax)/15%3A_Oscillations/15.06%3A_Damped_Oscillations) This equation models an overdamped system, where the damping exceeds the critical value, leading to exponential decay without oscillation. To solve this using the matrix exponential, rewrite the equation as a first-order system. Let \mathbf{y}(t) = \begin{pmatrix} x(t) \\ x'(t) \end{pmatrix}, so \mathbf{y}'(t) = A \mathbf{y}(t) with A = \begin{pmatrix} 0 & 1 \\ -2 & -3 \end{pmatrix}. The general solution is \mathbf{y}(t) = e^{tA} \mathbf{y}(0)./3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials) The matrix A has eigenvalues \lambda_1 = -1 and \lambda_2 = -2, found from the characteristic equation \det(A - \lambda I) = \lambda^2 + 3\lambda + 2 = 0. Since the eigenvalues are distinct and real, A is diagonalizable: A = P D P^{-1} where D = \operatorname{diag}(-1, -2), P = \begin{pmatrix} 1 & 1 \\ -1 & -2 \end{pmatrix}, and P^{-1} = \begin{pmatrix} 2 & 1 \\ -1 & -1 \end{pmatrix}. Thus,

e^{tA} = P e^{tD} P^{-1} = \begin{pmatrix} 1 & 1 \\ -1 & -2 \end{pmatrix} \begin{pmatrix} e^{-t} & 0 \\ 0 & e^{-2t} \end{pmatrix} \begin{pmatrix} 2 & 1 \\ -1 & -1 \end{pmatrix} = \begin{pmatrix} 2e^{-t} - e^{-2t} & e^{-t} - e^{-2t} \\ -2e^{-t} + 2e^{-2t} & -e^{-t} + 2e^{-2t} \end{pmatrix}.

/3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials) To verify, consider initial conditions x(0) = 1, x'(0) = 0, so \mathbf{y}(0) = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. Then \mathbf{y}(t) = e^{tA} \mathbf{y}(0) = \begin{pmatrix} 2e^{-t} - e^{-2t} \\ -2e^{-t} + 2e^{-2t} \end{pmatrix}, and x(t) = 2e^{-t} - e^{-2t}. Differentiating gives x'(t) = -2e^{-t} + 2e^{-2t} and x''(t) = 2e^{-t} - 4e^{-2t}. Substituting into the original ODE yields

x''(t) + 3x'(t) + 2x(t) = (2e^{-t} - 4e^{-2t}) + 3(-2e^{-t} + 2e^{-2t}) + 2(2e^{-t} - e^{-2t}) = 0,

confirming the solution satisfies the equation. The initial conditions hold: x(0) = 2 - 1 = 1 and x'(0) = -2 + 2 = 0./3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials) The eigenvalues \lambda_1 = -1 < 0 and \lambda_2 = -2 < 0 imply that all solutions decay exponentially to zero as t \to \infty, indicating asymptotic stability of the equilibrium at the origin./3%3A_Systems_of_ODEs/3.8%3A_Matrix_exponentials)

Inhomogeneous systems

The solution to the inhomogeneous linear system of ordinary differential equations \mathbf{x}'(t) = A \mathbf{x}(t) + \mathbf{f}(t) with initial condition \mathbf{x}(0) = \mathbf{x}_0, where A is a constant n \times n matrix and \mathbf{f}(t) is a given forcing function, combines the homogeneous solution with a particular integral term.^[5] The full solution is given by

\mathbf{x}(t) = e^{tA} \mathbf{x}_0 + \int_0^t e^{(t-s)A} \mathbf{f}(s) \, ds,

where e^{tA} is the matrix exponential.^[5]^[35] This formula, known as Duhamel's principle in the context of linear systems, expresses the particular solution as the integral convolution of the forcing term with the fundamental matrix solution e^{(t-s)A}.^[5] To derive this using the variation of constants method, assume a solution of the form \mathbf{x}(t) = e^{tA} \mathbf{u}(t), where \mathbf{u}(t) is a vector function to be determined.^[35] Differentiating yields \mathbf{x}'(t) = A e^{tA} \mathbf{u}(t) + e^{tA} \mathbf{u}'(t). Substituting into the original equation gives A e^{tA} \mathbf{u}(t) + e^{tA} \mathbf{u}'(t) = A e^{tA} \mathbf{u}(t) + \mathbf{f}(t), which simplifies to e^{tA} \mathbf{u}'(t) = \mathbf{f}(t). Multiplying both sides by e^{-tA} results in \mathbf{u}'(t) = e^{-tA} \mathbf{f}(t). Integrating from 0 to t and using the initial condition \mathbf{u}(0) = \mathbf{x}_0 produces \mathbf{u}(t) = \mathbf{x}_0 + \int_0^t e^{-sA} \mathbf{f}(s) \, ds. Thus, \mathbf{x}(t) = e^{tA} \left( \mathbf{x}_0 + \int_0^t e^{-sA} \mathbf{f}(s) \, ds \right) = e^{tA} \mathbf{x}_0 + \int_0^t e^{(t-s)A} \mathbf{f}(s) \, ds.^[35] The integral term represents a convolution \left( e^{tA} * \mathbf{f} \right)(t), which inherits properties from the semigroup generated by A, such as causality: the value of \mathbf{x}(t) depends only on \mathbf{f}(s) for s \leq t, reflecting the forward propagation of influences in time.^[5] When \mathbf{f}(t) = \mathbf{b} is a constant vector and A is invertible, the integral can be evaluated explicitly as \int_0^t e^{(t-s)A} \, ds \, \mathbf{b} = A^{-1} \left( e^{tA} - I \right) \mathbf{b}, yielding the closed-form solution \mathbf{x}(t) = e^{tA} \mathbf{x}_0 + A^{-1} \left( e^{tA} - I \right) \mathbf{b}.^[5]^[35]

Variation of parameters method

The variation of parameters method extends the classical technique for scalar equations to systems of linear ordinary differential equations with time-varying coefficients, addressing the inhomogeneous system \mathbf{x}'(t) = A(t) \mathbf{x}(t) + \mathbf{f}(t), where \mathbf{x}(t) and \mathbf{f}(t) are n-dimensional vector functions and A(t) is an n \times n matrix function./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) This approach assumes a fundamental matrix \Phi(t) for the corresponding homogeneous system \mathbf{x}'(t) = A(t) \mathbf{x}(t) is available, satisfying \Phi'(t) = A(t) \Phi(t) with initial condition \Phi(0) = I, the n \times n identity matrix./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) The method constructs a particular solution by varying the constants in the homogeneous solution, leading to an integral representation that incorporates the forcing term \mathbf{f}(t).^[36] To apply the method, posit a particular solution of the form \mathbf{x}_p(t) = \Phi(t) \mathbf{u}(t), where \mathbf{u}(t) is an unknown vector function./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) Differentiating yields \mathbf{x}_p'(t) = \Phi'(t) \mathbf{u}(t) + \Phi(t) \mathbf{u}'(t) = A(t) \Phi(t) \mathbf{u}(t) + \Phi(t) \mathbf{u}'(t). Substituting into the original equation gives A(t) \Phi(t) \mathbf{u}(t) + \Phi(t) \mathbf{u}'(t) = A(t) \Phi(t) \mathbf{u}(t) + \mathbf{f}(t), simplifying to \Phi(t) \mathbf{u}'(t) = \mathbf{f}(t)./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) Thus, \mathbf{u}'(t) = \Phi(t)^{-1} \mathbf{f}(t), assuming \Phi(t) is invertible, which holds for fundamental matrices.^[36] Integrating from 0 to t produces \mathbf{u}(t) = \int_0^t \Phi(s)^{-1} \mathbf{f}(s) \, ds, so the particular solution is \mathbf{x}_p(t) = \Phi(t) \int_0^t \Phi(s)^{-1} \mathbf{f}(s) \, ds. The general solution then becomes \mathbf{x}(t) = \Phi(t) \left( \mathbf{x}(0) + \int_0^t \Phi(s)^{-1} \mathbf{f}(s) \, ds \right)./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) The computational algorithm follows these steps when \Phi(t) is known: (1) Solve the homogeneous system to obtain \Phi(t) with \Phi(0) = I; (2) Compute the inverse \Phi(t)^{-1} at required points; (3) Form \mathbf{u}'(t) = \Phi(t)^{-1} \mathbf{f}(t); (4) Numerically or analytically integrate \mathbf{u}(t) = \int_0^t \mathbf{u}'(s) \, ds; (5) Assemble the solution \mathbf{x}(t) = \Phi(t) (\mathbf{x}(0) + \mathbf{u}(t))./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) This process is particularly useful in applications where the homogeneous solution is tractable, such as through series expansions or known closed forms.^[36] When the coefficient matrix A(t) is constant, the fundamental matrix simplifies to the matrix exponential \Phi(t) = e^{At}, connecting the method directly to properties of matrix exponentials./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) In this case, the integral expression aligns with the Duhamel integral for constant-coefficient systems./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) Compared to the method of undetermined coefficients, variation of parameters offers greater flexibility for non-constant A(t) and arbitrary continuous \mathbf{f}(t), without requiring specific functional forms for the forcing term, though it demands prior knowledge or computation of \Phi(t)./4.7%3A_Variation_of_Parameters_for_Nonhomogeneous_Linear_Systems) This makes it essential in control theory and time-dependent physical models where coefficients vary with time.^[36]

Computing the Matrix Exponential

Diagonalizable matrices

A matrix A \in \mathbb{C}^{n \times n} is diagonalizable if there exists an invertible matrix P and a diagonal matrix D = \operatorname{diag}(\lambda_1, \dots, \lambda_n) such that A = P D P^{-1}, where the \lambda_i are the eigenvalues of A.^[37] In this case, the matrix exponential \exp(A) admits a closed-form expression derived from the spectral decomposition: \exp(A) = P \exp(D) P^{-1}, where \exp(D) = \operatorname{diag}(e^{\lambda_1}, \dots, e^{\lambda_n}) applies the scalar exponential function entrywise to the diagonal elements.^[38] This formula follows directly from the power series definition of the matrix exponential, as the series commutes with similarity transformations and reduces to the scalar case on the diagonal.^[38] The condition for diagonalizability requires that A possesses a complete set of n linearly independent eigenvectors, which forms the columns of P.^[39] This is guaranteed if A has n distinct eigenvalues, since distinct eigenvalues yield linearly independent eigenvectors; however, diagonalizability can also hold when eigenvalues are repeated, provided the geometric multiplicity (dimension of the eigenspace) equals the algebraic multiplicity (from the characteristic polynomial) for each eigenvalue.^[40]^[41] To compute \exp(A) via this approach, first solve the eigenvalue problem to obtain the eigenvalues \lambda_i and corresponding eigenvectors v_i, forming D from the \lambda_i and P from the v_i as columns. Then, compute \exp(D) by exponentiating each diagonal entry, invert P to obtain P^{-1}, and finally assemble \exp(A) = P \exp(D) P^{-1}.^[11] This method is exact in exact arithmetic but can suffer from numerical instability in finite precision, as the accuracy of \exp(A) is sensitive to perturbations in the computed eigenvectors; specifically, the relative error is bounded by a factor involving the condition number \kappa(P) = \|P\| \|P^{-1}\|, which grows large if the eigenvectors are ill-conditioned (e.g., when eigenvalues are close).^[19] For non-normal matrices with clustered eigenvalues, this conditioning can lead to substantial loss of accuracy, making alternative methods preferable in practice.^[19]

Diagonalizable example

To illustrate the computation of the matrix exponential for a diagonalizable matrix, consider the 2×2 matrix

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}.

This matrix has distinct eigenvalues \lambda_1 = \frac{5 + \sqrt{33}}{2} and \lambda_2 = \frac{5 - \sqrt{33}}{2}, confirming it is diagonalizable via the spectral theorem for matrices with distinct eigenvalues.^[11] The corresponding eigenvectors are v_1 = \begin{pmatrix} 2 \\ \frac{3 + \sqrt{33}}{2} \end{pmatrix} for \lambda_1 and v_2 = \begin{pmatrix} 2 \\ \frac{3 - \sqrt{33}}{2} \end{pmatrix} for \lambda_2, obtained by solving (A - \lambda_i I) v_i = 0 for each i. Let s = \sqrt{33}, \alpha = \frac{3 + s}{2}, and \beta = \frac{3 - s}{2}. The matrix of eigenvectors is then

P = \begin{pmatrix} 2 & 2 \\ \alpha & \beta \end{pmatrix},

with determinant \det(P) = -2s. The inverse is

P^{-1} = \frac{1}{-2s} \begin{pmatrix} \beta & -2 \\ -\alpha & 2 \end{pmatrix}.

The diagonal matrix of exponentials is D = \operatorname{diag}(e^{\lambda_1}, e^{\lambda_2}). Thus,

e^A = P \begin{pmatrix} e^{\lambda_1} & 0 \\ 0 & e^{\lambda_2} \end{pmatrix} P^{-1}.

Let e_1 = e^{\lambda_1} and e_2 = e^{\lambda_2}. Performing the matrix multiplication yields the explicit entries:

e^A = \begin{pmatrix} \frac{e_2 \alpha - e_1 \beta}{s} & \frac{2(e_1 - e_2)}{s} \\[1em] \frac{3(e_1 - e_2)}{s} & \frac{\alpha e_1 - \beta e_2}{s} \end{pmatrix}.

Note that \alpha \beta = -6 and \alpha - \beta = s, which follow from the quadratic relations of the eigenvalues.^[11] To verify, numerical evaluation provides \lambda_1 \approx 5.372, \lambda_2 \approx -0.372, e_1 \approx 215.044, e_2 \approx 0.689, \alpha \approx 4.372, and \beta \approx -1.372, giving

e^A \approx \begin{pmatrix} 51.86 & 74.64 \\ 111.96 & 163.82 \end{pmatrix}.

The partial sums of the power series e^A = \sum_{k=0}^\infty \frac{A^k}{k!} converge to this value; for instance, the sum up to k=3 is approximately \begin{pmatrix} 11.67 & 16.00 \\ 24.00 & 35.67 \end{pmatrix}, and further terms approach the diagonalized result. Additionally, A e^A = e^A A holds numerically to machine precision, as expected since e^A commutes with A.^[11] Regarding numerical conditioning, the diagonalization approach is stable here because the eigenvalue separation |\lambda_1 - \lambda_2| = s \approx 5.744 is sufficiently large, yielding \kappa_2(P) \approx 6.2 (the 2-norm condition number of P), which bounds error propagation in the computation of e^A. In general, closer eigenvalues increase \kappa(P), amplifying rounding errors in P and P^{-1}, though the exponential map itself remains well-conditioned for this non-normal A.

Nilpotent matrices

A nilpotent matrix A satisfies A^k = 0 for some positive integer k, where k is the minimal such index, known as the index of nilpotency.^[42] For such matrices, the matrix exponential \exp(A) admits an exact closed-form expression via the power series definition, which terminates after finitely many terms:

\exp(A) = \sum_{m=0}^{k-1} \frac{A^m}{m!},

as all higher powers of A vanish. This finite sum yields a matrix polynomial in A of degree at most k-1, facilitating straightforward computation without approximation.^[42] A canonical example is the nilpotent Jordan block N of size n \times n, featuring 1's on the superdiagonal and zeros elsewhere, for which N^n = 0. The exponential is then

\exp(N) = \sum_{m=0}^{n-1} \frac{N^m}{m!},

where each N^m (for m < n) places 1's on the m-th superdiagonal. Consequently, \exp(N) is an upper-triangular matrix with 1's on the main diagonal and \frac{1}{m!} on the m-th superdiagonal.^[42] The exponential of a nilpotent matrix N is always invertible, since \det(\exp(N)) = \exp(\trace(N)) = \exp(0) = 1./05:_Systems_of_ODEs/5.08:_Matrix_exponentials) Furthermore, \exp(N) is unipotent, possessing all eigenvalues equal to 1, as the eigenvalues of N are zero and the eigenvalues of \exp(N) are the exponentials of those of N.^[43] In Lie theory, unipotent elements of the general linear group \mathrm{GL}(n, \mathbb{C}) are exactly the exponentials of nilpotent elements in its Lie algebra \mathfrak{gl}(n, \mathbb{C}), with the exponential map providing a bijective polynomial correspondence between these sets.^[43]

General matrices via Jordan form

For a general square matrix A \in \mathbb{C}^{n \times n}, the matrix exponential \exp(A) can be computed using the Jordan canonical form, which decomposes A as A = P J P^{-1}, where P is invertible and J is a block-diagonal matrix consisting of Jordan blocks.^[38] Since the matrix exponential is an entire function, it preserves similarity transformations, yielding \exp(A) = P \exp(J) P^{-1}.^[38] Each Jordan block in J corresponds to an eigenvalue \lambda of A, and \exp(J) is computed blockwise. Consider a Jordan block of size m \times m given by J_\lambda = \lambda I_m + N, where N is the nilpotent Jordan block with 1's on the superdiagonal and zeros elsewhere, satisfying N^m = 0. The exponential of this block is

\exp(J_\lambda) = e^\lambda \exp(N) = e^\lambda \sum_{k=0}^{m-1} \frac{N^k}{k!},

as higher powers of N vanish.^[38] The nilpotent part \exp(N) is an upper-triangular Toeplitz matrix with entries determined by the partial sums of the exponential series along the superdiagonals.^[44] To implement this approach, first compute the eigenvalues of A using standard algorithms such as the QZ method for generalized eigenvalues if needed. Then, determine the Jordan chains by solving (A - \lambda I)v_k = v_{k-1} for generalized eigenvectors v_k starting from actual eigenvectors v_1, and assemble the columns of P from these chains for each block.^[45] The full P and J are formed by collecting the chains across all eigenvalues and blocks. However, this method is often numerically unstable in practice, as the condition number of P can be extremely large for matrices with nearly defective eigenvalues or clustered spectra, leading to severe rounding errors.^[44] For this reason, more robust alternatives like the Schur decomposition are preferred in numerical software.^[38]

Special cases: Projections and rotations

A projection matrix P satisfies P^2 = P, making it idempotent. For such matrices, the matrix exponential admits a simple closed-form expression: \exp(tP) = I + (e^t - 1)P. This formula can be derived using the power series definition of the matrix exponential, \exp(tP) = \sum_{n=0}^\infty \frac{(tP)^n}{n!}. Since P^n = P for all n \geq 1, the series simplifies to I + P \sum_{n=1}^\infty \frac{t^n}{n!} = I + (e^t - 1)P. Alternatively, the result follows from the minimal polynomial of P, which is \lambda(\lambda - 1) = 0. Any analytic function f applied to P can be expressed as f(P) = f(0)(I - P) + f(1)P. Setting f(\lambda) = e^{t\lambda} yields \exp(tP) = e^{0}(I - P) + e^{t}P = I + (e^t - 1)P. Skew-symmetric matrices generate rotations via the matrix exponential. If K is skew-symmetric (K^T = -K), then \exp(K) is an orthogonal matrix with determinant 1, belonging to the special orthogonal group SO(n). In two dimensions, rotations are explicitly given by the exponential of a scaled skew-symmetric matrix. Consider K = \theta \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}, where \theta is the rotation angle. Then,

\exp(K) = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}.

This follows from the power series expansion, where K^2 = -\theta^2 I, K^3 = -\theta^3 K, and higher powers alternate similarly, mirroring the Taylor series for cosine and sine: \exp(K) = (\cos \theta) I + (\sin \theta) K / \theta.

Series-based methods

One practical approach to computing the matrix exponential involves truncating the Taylor power series expansion, which converges absolutely for all finite matrices. The approximation is given by

\exp(A) \approx \sum_{k=0}^{N} \frac{A^k}{k!},

where N is chosen sufficiently large to achieve the desired accuracy. The truncation error R_N = \exp(A) - \sum_{k=0}^{N} \frac{A^k}{k!} satisfies the bound

\|R_N\| \leq \frac{\|A\|^{N+1}}{(N+1)!} \exp(\|A\|),

derived from the remainder term in the scalar exponential series applied to the matrix norm. This method is particularly effective when \|A\| is small, as fewer terms are needed for convergence, or when A has structured sparsity that allows efficient computation of successive matrix powers. To accelerate convergence of the series, techniques such as the Euler-Maclaurin formula can be adapted from the scalar case to provide asymptotic expansions that improve the rate of approximation for the partial sums. For matrices, analogous summation acceleration methods have been explored to reduce the number of terms required, though they are less commonly implemented than direct truncation due to added complexity in matrix arithmetic. In cases involving a shift, such as computing \exp(zI - B) where B is invertible and z is a scalar, a Laurent series expansion can be employed around z = \infty, incorporating negative powers to capture the behavior for large |z|. This representation facilitates evaluation when the spectrum of B leads to rapid growth or decay in the standard power series, though it requires careful handling of the principal part to ensure numerical stability.

Sylvester's formula approach

Sylvester's formula offers an explicit expression for the matrix exponential of a diagonalizable matrix with distinct eigenvalues, derived from the holomorphic functional calculus via contour integration. For a square matrix A that is diagonalizable with distinct eigenvalues \lambda_1, \dots, \lambda_n, the matrix exponential is given by

\exp(A) = \sum_{i=1}^n \exp(\lambda_i) \, Z_i,

where each Z_i is the spectral projection matrix onto the eigenspace corresponding to \lambda_i, defined as

Z_i = \prod_{\substack{j=1 \\ j \neq i}}^n \frac{A - \lambda_j I}{\lambda_i - \lambda_j}.

These projections satisfy Z_i Z_k = \delta_{ik} Z_i and \sum_i Z_i = I, ensuring the representation aligns with the spectral decomposition A = \sum_i \lambda_i Z_i.^[46] This formula arises from the more general contour integral representation in the holomorphic functional calculus. For an analytic function f defined in a domain containing the spectrum of A,

f(A) = \frac{1}{2\pi i} \oint_\Gamma f(z) (zI - A)^{-1} \, dz,

where \Gamma is a positively oriented contour enclosing the spectrum of A but no other singularities of the integrand. Specializing to f(z) = \exp(z), which is entire, yields

\exp(A) = \frac{1}{2\pi i} \oint_\Gamma \exp(z) (zI - A)^{-1} \, dz.

By the residue theorem, since the resolvent (zI - A)^{-1} has simple poles at the eigenvalues \lambda_i with residues Z_i, the integral reduces to the sum \sum_i \exp(\lambda_i) Z_i, recovering Sylvester's formula.^[46] The approach is particularly advantageous for computing matrix functions that may have poles or branch cuts, as the contour can be selected to enclose the spectrum while avoiding singularities of f, allowing evaluation even when direct series expansions or other methods fail. For the exponential specifically, the formula provides a closed-form expression when eigenvalues are known, avoiding the need for full diagonalization while leveraging polynomial interpolation at the eigenvalues.

Numerical and Advanced Computation

Scaling and squaring method

The scaling and squaring method is a classical numerical technique for computing the matrix exponential e^A of an n \times n matrix A, particularly effective for matrices with large norms where direct series summation would be inefficient. The approach leverages the identity e^A = (e^{A/2^m})^{2^m} for integer m \geq 0, allowing the problem to be reduced to approximating the exponential of a scaled matrix with small norm, followed by repeated squaring to restore the original scale.^[47] To implement the algorithm, first determine the smallest integer m such that \|2^{-m} A\| \leq 1, where \|\cdot\| denotes a suitable matrix norm (often the infinity norm for implementation convenience). Then, compute an approximation E \approx e^{2^{-m} A} using a truncated Taylor series or a Padé approximant suitable for small arguments, as the reduced norm ensures rapid convergence. Finally, iteratively square E exactly m times via matrix multiplication to obtain the approximation X = E^{2^m} \approx e^A. This process minimizes truncation errors from the initial approximation while distributing computational effort across the squaring steps.^[47]^[19] A key advantage of the method lies in its backward error analysis, which demonstrates numerical stability: the computed X satisfies X = e^{A + \Delta A} for some perturbation \Delta A with \|\Delta A\| = O(\epsilon \|A\|), where \epsilon is the unit roundoff of the floating-point arithmetic, assuming exact arithmetic in the squaring phase. This backward stability ensures that the result is accurate for the slightly perturbed input, making the method robust for practical computations even with ill-conditioned matrices. The analysis relies on bounding truncation errors in the approximant and propagation through squaring, with optimal choices of m and approximant degree balancing forward and backward errors.^[47] The scaling and squaring algorithm is widely implemented in numerical software libraries, including MATLAB's expm function, which employs a variant with Padé approximants and balancing for enhanced accuracy. For dense matrices, the computational complexity is O(n^3 \log \|A\|), arising from approximately m + c matrix multiplications (where c accounts for the approximant evaluation and m \approx \log_2 \|A\| governs the squaring iterations), each costing O(n^3) via standard algorithms like those in LAPACK. This makes it efficient for moderate-sized matrices where the norm is not excessively large.^[47]

Padé approximant methods

Padé approximants provide a rational function approximation to the matrix exponential that matches the Taylor series expansion up to a higher order than the degree of the polynomials involved, offering improved accuracy and numerical stability compared to polynomial methods. Specifically, the [m/m] diagonal Padé approximant to the scalar exponential function \exp(x) is a ratio r_{m,m}(x) = p_m(x)/q_m(x), where p_m(x) and q_m(x) are polynomials of degree at most m such that \exp(x) - r_{m,m}(x) = O(x^{2m+1}) as x \to 0.^[44] For a matrix A, this extends to \exp(A) \approx p_m(A) q_m(A)^{-1}, where the polynomials are applied via matrix powers and addition.^[47] This rational form is particularly advantageous for numerical computation because it converges faster and remains bounded for large arguments, unlike truncated Taylor series which can diverge. The inversion q_m(A)^{-1} is typically computed via a linear solve or by reformulating the approximation to avoid explicit inversion, such as using the identity r_{m,m}(A) = p_m(A) (q_m(A))^{-1} = D_m^{-1} p_m(A) D_m q_m(A)^{-1} with a scaling matrix D_m, though the core evaluation relies on efficient polynomial arithmetic.^[44] For practical use in double precision arithmetic, optimal [m/m] orders are selected based on the matrix norm \|A\|; common choices include [3/3] for moderate norms and [13/13] for higher accuracy requirements, as these balance computational cost with relative error below machine epsilon for \|A\| \lesssim 10. These orders ensure the approximation error is dominated by roundoff rather than truncation for typical problems.^[47] Padé methods are often combined with the scaling and squaring technique: first, scale A by $2^{-s} to reduce its norm to O(1), compute the [m/m] Padé approximant of the scaled matrix, then square the result s times to recover \exp(A). This hybrid approach minimizes both approximation error and roundoff propagation, with backward error bounds showing stability even for matrices with widely varying eigenvalues.^[47] Compared to Taylor series methods, Padé approximants exhibit superior stability for stiff problems or matrices with large-magnitude eigenvalues, as the denominator polynomial effectively damps oscillatory components without amplifying errors during evaluation. Error analysis confirms that the relative backward error is O(\epsilon) times a modest condition number factor, making it reliable across a broad range of matrix classes.^[44]^[47] In modern implementations, such as MATLAB's expm function, Higham's refined scaling and squaring algorithm with adaptive Padé orders (selecting from [3/3], [5/5], [7/7], [9/9], or [13/13] based on \|A\| and unit roundoff) is employed, achieving high accuracy with O(n^3) complexity for n \times n dense matrices.^[47]

Matrix-matrix exponentials

In matrix theory, exponentials of matrices constructed from multiple component matrices often admit closed-form expressions that exploit structural properties, such as block forms or Kronecker products. These identities are particularly valuable in applications requiring efficient computation or analysis of large-scale systems without explicit formation of the full matrix. Consider the 2×2 block matrix M = \begin{pmatrix} 0 & B \\ C & 0 \end{pmatrix}, where B and C are square matrices of the same dimension that commute (BC = CB) and for which the matrix square root \sqrt{BC} is defined. Under these conditions, the matrix exponential is given by

\exp(M) = \begin{pmatrix} \cosh(\sqrt{BC}) & B (\sqrt{BC})^{-1} \sinh(\sqrt{BC}) \\ C (\sqrt{BC})^{-1} \sinh(\sqrt{BC}) & \cosh(\sqrt{BC}) \end{pmatrix},

where \cosh and \sinh denote the matrix hyperbolic cosine and sine functions, respectively. This formula generalizes the scalar case and holds because the series expansions of \cosh and \sinh align with the nilpotent structure induced by the off-diagonal blocks when commuting. If B and C do not commute, the exponential requires more general methods, such as Jordan form decomposition. In special cases where additional structure exists (e.g., B and C positive semidefinite), the diagonal blocks relate to products involving square roots of individual exponentials, such as \exp(B)^{1/2} \exp(C)^{1/2}, providing an alternative representation under compatibility conditions on eigenvalues. A fundamental identity arises for Kronecker sums, which combine multiple matrices via tensor products. For an m \times m matrix A and an n \times n matrix B, the Kronecker sum is defined as A \oplus B = A \otimes I_n + I_m \otimes B. The matrix exponential satisfies

\exp(A \oplus B) = \exp(A) \otimes \exp(B).

This follows directly from the power series definition, as the Kronecker product distributes over addition and scalar multiplication, allowing term-by-term factorization in the expansion \exp(A \oplus B) = \sum_{k=0}^\infty \frac{(A \oplus B)^k}{k!}. The identity requires no commutativity assumptions beyond the inherent structure. These multi-matrix exponentials find key applications in control theory and numerical linear algebra. For the continuous Lyapunov equation A X + X A^T = Q with stable A, vectorization yields (I \otimes A + A \otimes I) \mathrm{vec}(X) = -\mathrm{vec}(Q), or equivalently (A \oplus A^T) \mathrm{vec}(X) = -\mathrm{vec}(Q). The unique solution is X = \int_0^\infty \exp(A t) Q \exp(A^T t) \, dt, linking directly to Kronecker-structured exponentials for verification or low-rank approximations.^[48] In bilinear systems of the form \dot{x} = A x + \sum_{i=1}^m N_i x u_i, where u are inputs, the state transition maps involve flows generated by Lie brackets of matrices, often reducible to exponentials of Kronecker sums like A \oplus N_i for discretized or approximated solutions. Computing such exponentials efficiently avoids forming the potentially huge mn \times mn matrix for Kronecker sums. Vectorization transforms the problem into a large linear system, solvable via direct methods like Bartels-Stewart for moderate sizes, but for large dimensions, structure-exploiting algorithms are essential. These include the mixed Kronecker product property: if v = \mathrm{vec}(V) for an m \times n matrix V, then \exp(A \oplus B) v = \mathrm{vec}( \exp(A) V \exp(B)^T ), computable by applying the smaller exponentials \exp(A) and \exp(B) to the factors of V without tensor expansion. For block matrices like M, the commuting case permits direct evaluation of the hyperbolic functions using Padé approximants on \sqrt{BC}, scaled to the block size. Iterative methods, such as the alternating direction implicit (ADI) scheme, further leverage the separability for Lyapunov-related computations, achieving quadratic convergence under suitable shifts.^[48]

References

[1]
Matrix Exponential -- from Wolfram MathWorld
The matrix exponential, exp(A), is defined as e^(A) = sum_(n=0)^(infty)(A^n)/(n!) and converges for any square matrix A.
[2]
[PDF] The Jacobian of the exponential function - Jan Magnus
The matrix exponential eX is more complicated and it was not introduced until the late 19th century by Sylvester, Laguerre, and Peano. The matrix exponential ...
[3]
Matrix Exponentials | Differential Equations - MIT OpenCourseWare
This section provides materials for a session on the basic linear theory for systems, the fundamental matrix, and matrix-vector algebra.
[4]
[PDF] Lie Groups, Lie Algebras and the Exponential Map
As a result, all matrix groups are Lie groups. For matrix groups the exponential map is given explicitly by the standard matrix power series: exp(tX)=1+ tX ...
[5]
[PDF] The Scaling and Squaring Method for the Matrix Exponential Revisited
Computation of eA is required in applications such as nuclear magnetic resonance spectroscopy [8], [18], control theory [5], and Markov chain analysis [20].<|control11|><|separator|>
[6]
[PDF] The exponential function for matrices - UCR Math Department
But this means that the matrix power series converges absolutely. SPECIAL CASE. If A is a square matrix, then the exponential series exp(A) = ∞.
[7]
[PDF] exponential.pdf - 18.701 Algebra I - DSpace@MIT
To prove that the series for the exponential converges absolutely, we need to show that the entries of the powers Ak do not grow too fast. The norm of an n × n ...
[8]
[PDF] Notes on the Matrix Exponential and Logarithm Howard E. Haber
Nov 14, 2023 · These notes summarize important properties of the matrix exponential and logarithm, including a series expansion for dlnA(t)/dt.
[9]
[PDF] Matrix-Exponentials | MIT
Sep 7, 2017 · But what is the exponential of a matrix? We can guess at least one case. For eigenvectors, the matrix A acts like a scalar λ, so we should have ...
[10]
[PDF] Exponentials and Rotations - UMD MATH
Jul 21, 2021 · Since we are interested in orthogonal matrices, we will need to take the transpose of a matrix exponential. This is easy enough. Theorem ...
[11]
[PDF] The Matrix Exponential = = I + A + A3 + ··· D2 + ··· ··· = 0 ··· 0 1
Oct 3, 2019 · An(s + t)n n! −A = eA(1+(−1)) = e0 = I. In other words, regardless of the matrix A, the exponential matrix eA is always invertible, and has ...
[12]
[PDF] The Exponential of a Matrix
May 10, 2023 · Another familiar property of ordinary exponentials holds for the matrix exponential: If A and B com- mute (that is, AB = BA), then. eAeB = eA ...
[13]
The Exponential of a Matrix
The Exponential of a Matrix. In this section, all of the matrices will be real or complex matrices. The solution to the exponential growth equation.Missing: applications | Show results with:applications
[14]
Lie Groups, Lie Algebras, and Representations - SpringerLink
In stockThis textbook treats Lie groups, Lie algebras and their representations in an elementary but fully rigorous fashion requiring minimal prerequisites.
[15]
On an inequality of Lieb and Thirring | Letters in Mathematical Physics
On an inequality of Lieb and Thirring. Published: February 1990. Volume 19, pages 167–170, (1990); Cite this article. Download PDF · Letters in Mathematical ...
[16]
[PDF] On the Araki-Lieb-Thirring inequality - arXiv
Mar 29, 2007 · Abstract. We prove an inequality that complements the famous Araki-Lieb-Thirring (ALT) inequality for pos- itive matrices A and B, ...
[17]
[PDF] Nineteen Dubious Ways to Compute the Exponential of a Matrix ...
The exponential of a matrix can be computed using approximation theory, differential equations, matrix eigenvalues, and the matrix characteristic polynomial.
[18]
[PDF] A SURVEY OF CERTAIN TRACE INEQUALITIES
Kosaki, An inequality of Araki–Lieb–Thirring (von Neumann algebra case), Proc. ... So, Equality cases in matrix exponential inequalities, SIAM J. Matrix Anal ...
[19]
[PDF] A Theory of Trotter Error arXiv:1912.08854v3 [quant-ph] 3 Feb 2021
Feb 3, 2021 · The Lie-Trotter formula, together with its higher-order generalizations, provides a simple approach to decomposing the exponential of a sum of ...
[20]
[PDF] arXiv:2207.00613v1 [math.CO] 1 Jul 2022
Jul 1, 2022 · Theorem (Lie-Trotter product formula). Let A and B be complex square matrices. Then lim n→∞. (eA/neB/n)n. → eA+B, where the convergence is ...
[21]
On the product of semi-groups of operators - Semantic Scholar
Semantic Scholar extracted view of "On the product of semi-groups of operators" by H. Trotter.
[22]
On the Product of Semi-Groups of Operators - jstor
ON THE PRODUCT OF SEMI-GROUPS OF OPERATORS'. H. F. TROTTER. 1. Introduction. We consider semi-groups of operators on a Banach space X, which are of class (C0) ...
[23]
Trotter–Kato product formula and fractional powers of self-adjoint ...
Kato. Trotter's product formula for an arbitrary pair of self-adjoint contraction semigroups. I. Gohberg, Mc. Kac (Eds.), Topics in Functional Analysis ...
[24]
Proving the Lie-Product formula - Math Stack Exchange
Nov 25, 2016 · Can the matrix transpose be represented by XT=AXB for a given A and B? 2 · Proving the exponent of an element of a the Lie Algebra is an element ...Simulating Hamiltonians using the Lie-Product formulaLie Product Formula - linear algebra - Math Stack ExchangeMore results from math.stackexchange.com
[25]
[PDF] Prove the Baker–Campbell–Hausdorff formula - MIT Mathematics
Dec 6, 2017 · Hausdorff (1906, [5]) gave a much clearer demonstration of existence by studying more universal expressions in associative algebras. Finally, ...
[26]
[PDF] Notes on the theorem of Baker-Campbell-Hausdorff-Dynkin
The BCH theorem states that log(eXeY), where X and Y are non-commuting variables, is a Lie series, denoted as H(X, Y).
[27]
The early proofs of the theorem of Campbell, Baker, Hausdorff, and ...
Apr 3, 2012 · The aim of this paper is to provide a comprehensive exposition of the early contributions to the so-called Campbell, Baker, Hausdorff, Dynkin Theorem during ...
[28]
[PDF] The Exponential Map, Lie Groups, and Lie Algebras - UPenn CIS
The Lie algebra can be considered as a linearization of the Lie group (near the identity element), and the exponential map pro- vides the “delinearization,” i. ...
[29]
[PDF] Fall, 2022 Lecture III The Exponential Map, Local Lie Groups, and ...
Sep 26, 2022 · This is a smooth map from M(n × n,R) → GL(n,R) whose differential at the origin is the identity. By the implicit function theorem there is a.
[30]
one-parameter subgroup and geodesics on Lie group - MathOverflow
Nov 22, 2011 · Given a Matrix Lie Group, I would like to know if the one-parameter subgroups (which can be written as exptX) are the same as the geodesics ( ...Geodesics equation on Lie groups with left invariant metricsTotally geodesic subgroups in Lie groups - MathOverflowMore results from mathoverflow.net
[31]
[PDF] Notes on the Riemannian Geometry of Lie Groups
Familiar examples of Lie groups include Rn under addition and the elementary matrix groups (GL(n,R), SL(n,R), O(n), U(n), etc.). In fact, all of the matrix ...Missing: radius | Show results with:radius
[32]
[PDF] The Magnus expansion and some of its applications - arXiv
Oct 30, 2008 · Lemma 2 The derivative of a matrix exponential can be written alternatively as. (a) d dt exp(Ω(t)) = d expΩ(t)(Ω′(t)) exp(Ω(t)),. (33). (b) d.
[33]
[PDF] inhomogeneous linear systems of differential equations
The general solution of the inhomogeneous system of equations (1) is x(t) = xh(t) + xp(t), where xh(t) is the general solution to the homogeneous equation, and ...
[34]
[PDF] Notes on Variation of Parameters for Nonhomogeneous Linear ...
Variation of Parameters Consider differential equation x0 = P(t)x + f(t), where xc = c1x1(t) + ··· + cnxn(t). The fundamental matrix of the solutions is. Φ(t) ...<|control11|><|separator|>
[35]
Diagonalization
Diagonalizable matrices are similar if and only if they have the same characteristic polynomial, or equivalently, the same eigenvalues with the same algebraic ...
[36]
Functions of Matrices: Theory and Computation
Detailed treatment of the matrix sign function, matrix roots, the polar decomposition, and transcendental matrix functions (exponential, logarithm, cosine, sine) ...
[37]
[PDF] Unit 16: Diagonalization
16.9. How do we decide whether a matrix is diagonalizable? Theorem: A is diagonalizable ⇔ malg(λ) = mgeom(λ) for all eigenvalues. If A is diagonalizable, then ...
[38]
[PDF] n × n matrix. A is diagonalizable if and only if it has eigenvectors
[B'] If A is an n×n matrix with n distinct eigenvalues, then A is diagonalizable. Fact. If one chooses linearly independent sets of eigenvectors ...
[39]
EIG-0050: Diagonalizable Matrices and Multiplicity - Ximera
Let be an matrix . Then is diagonalizable if and only if for each eigenvalue of , the algebraic multiplicity of is equal to the geometric multiplicity of . ...
[40]
[PDF] I Jordan canonical form I generalized modes I Cayley ... - EE263
Exponential of Jordan block. I consider a single jordan block J = I + N of size k k. I N is the nilpotent matrix with ones on the 1st upper diagonal. I the ...
[41]
[PDF] Lie Groups and Algebraic Groups - UCSD Math
Thus the exponential function is a bijective polynomial map from the nilpotent ele- ments in Mn(C) onto the unipotent elements in GL(n, C), with polynomial ...
[42]
Nineteen Dubious Ways to Compute the Exponential of a Matrix ...
Matrix-algebra-based calculations of the time evolution of the binary spin ... Gilbert Strang. SIAM Journal on Numerical Analysis, Vol. 51, No. 6 | 19 ...<|control11|><|separator|>
[43]
Matrix Computations - 4th Edition | SIAM Publications Library
Van Loan's classic is an essential reference for computational scientists and engineers in addition to researchers in the numerical linear algebra community.
[44]
The Equivalence of Definitions of a Matric Function - jstor
R. F. RINEHART, Case Institute of Technology. 1. Introduction. The general field of matric analysis, which is finding ever- increasing applicability in ...
[45]
The Scaling and Squaring Method for the Matrix Exponential Revisited
The scaling and squaring method is the most widely used method for computing the matrix exponential, not least because it is the method implemented in MATLAB's ...Missing: properties | Show results with:properties
[46]
[PDF] Numerical methods for Lyapunov equations
This is a vectorized Lyapunov equation. It can consequently be solved efficiently with, e.g., Bartels-Stewart method. A Lyapunov equation approach can be useful ...