Fact-checked by Grok 2 weeks ago

Transition matrix

A transition matrix is a square matrix in mathematics that encodes transformations or probabilities between discrete states or coordinate systems. In the context of probability and stochastic processes, particularly Markov chains, it is a nonnegative stochastic matrix where the entry in row i and column j represents the probability of transitioning from state i to state j, with each row summing to 1 to preserve total probability.^[1]^[2] In linear algebra, a transition matrix, also known as a change-of-basis matrix, is the invertible matrix whose columns are the coordinate vectors of the new basis vectors expressed in the old basis, facilitating the conversion of vector coordinates between two bases of a vector space.^[3] In Markov chains, the transition matrix enables the computation of multi-step transition probabilities through matrix powers, where the n-th power gives the probabilities after n steps, and it often converges to a stationary distribution under certain conditions like regularity or irreducibility.^[4] These matrices model diverse real-world phenomena, such as population dynamics, queueing systems, and random walks, by representing state evolutions over discrete time.^[5] For change of basis, the transition matrix P satisfies [v]_C = P [v]_B, where B and C are bases, and it is crucial for simplifying matrix representations of linear transformations across different coordinates.^[6] Transition matrices also appear in control theory as state-transition matrices for linear dynamical systems, solving differential equations like ẋ(t) = Ax(t) via the matrix exponential e^At, which propagates initial states forward in time.^[7] Across these domains, their properties—such as stochasticity, invertibility, and spectral characteristics—underpin analytical tools like eigenvalue decomposition for long-term behavior prediction.^[8]

In Probability and Markov Chains

Definition and Basic Properties

A Markov chain is a discrete-time stochastic process with the Markov property, which states that the probability distribution for the next state depends only on the current state and is independent of the sequence of states prior to it. This memoryless characteristic allows Markov chains to model a wide variety of systems where future behavior is determined solely by the present configuration.^[9] In the context of a Markov chain with a finite state space of size n, the transition matrix P is defined as an n × n stochastic matrix whose entry P_{ij} denotes the probability of moving from state i to state j in a single time step. This matrix encapsulates the one-step transition probabilities of the chain, formally expressed as \Pr(X_{t+1} = j \mid X_t = i) = P_{ij}, where X_t represents the state at time t./10%3A_Markov_Chains/10.01%3A_Introduction_to_Markov_Chains)^[10] The basic properties of a transition matrix P include non-negativity, ensuring P_{ij} \geq 0 for all i and j, which reflects that probabilities cannot be negative. Additionally, P is row-stochastic, meaning the entries in each row sum to 1: \sum_{j=1}^n P_{ij} = 1 for every i, guaranteeing that the process transitions to some state with certainty. Column-stochastic variants, where columns sum to 1, arise in the analysis of time-reversed Markov chains.^[11]^[12]^[13] The concept of the transition matrix originated in the early 1900s through Andrey Markov's pioneering studies on random walks, particularly his 1906 analysis of sequences in Russian poetry to model dependencies in stochastic processes.^[2]^[14]

Construction and Examples

To construct a transition matrix for a discrete-time Markov chain, the state space must first be defined as a finite or countable set representing all possible system configurations. Transition probabilities p_{ij} are then determined for each pair of states i and j, either through empirical estimation from observed data sequences or by theoretical assignment based on the model's behavioral assumptions. These probabilities are arranged into a right-stochastic matrix P, where the entry in row i and column j is p_{ij}, and each row sums to 1 to ensure the total probability of departing any state is unity.^[15]^[2] A basic illustration is a two-state Markov chain modeling daily weather as "sunny" (state 1) or "rainy" (state 2). If the chain transitions from sunny to sunny with probability 0.9 and to rainy with 0.1, while from rainy it goes to sunny with 0.5 and stays rainy with 0.5, the transition matrix takes the form

P = \begin{pmatrix} 0.9 & 0.1 \\ 0.5 & 0.5 \end{pmatrix},

where rows correspond to the current state and columns to the next state.^[16] Another example arises in the gambler's ruin problem, where a gambler begins with 1 unit of capital and plays until reaching 0 (ruin) or 2 units (success), assuming equal win/loss probabilities of 0.5 per fair game. The states are 0, 1, and 2, with 0 and 2 as absorbing states. From state 1, the chain moves to 0 with probability 0.5 or to 2 with 0.5, while states 0 and 2 remain there with probability 1. The resulting transition matrix is

P = \begin{pmatrix} 1 & 0 & 0 \\ 0.5 & 0 & 0.5 \\ 0 & 0 & 1 \end{pmatrix}.

This structure captures the absorbing nature by placing 1s on the diagonal for boundary states and distributing probabilities only among transient states.^[17] For multi-step transitions, the matrix power P^k yields the probabilities of moving between states in exactly k steps. In the weather model above, P^2 is computed as

P^2 = \begin{pmatrix} 0.86 & 0.14 \\ 0.7 & 0.3 \end{pmatrix},

indicating, for instance, a 0.86 probability of sunny weather two days after a sunny day.^[18] In continuous-time Markov chains, the analog involves constructing an infinitesimal generator matrix Q rather than a one-step transition matrix, where off-diagonal entries q_{ij} (for i \neq j) represent instantaneous transition rates from state i to j, and the diagonal q_{ii} = -\sum_{j \neq i} q_{ij} ensures row sums of zero. The time-t transition probabilities then derive from this Q via the matrix exponential, with the forward (Kolmogorov forward) equation governing the evolution of state probabilities over time based on Q.^[19]

Stationary Distributions

In Markov chains, a stationary distribution \pi is a probability vector that remains invariant under the transition matrix P, satisfying the equation \pi P = \pi where \sum_i \pi_i = 1.^[20] This distribution represents the long-run proportion of time the chain spends in each state, assuming the chain converges to it.^[21] For an ergodic Markov chain, defined as irreducible and aperiodic, there exists a unique stationary distribution, and the chain converges to it regardless of the initial state distribution.^[22] Ergodicity ensures that this unique \pi captures the limiting behavior as the number of steps approaches infinity.^[21] To compute the stationary distribution, solve the linear system (I - P^T) \pi = 0 subject to the normalization \sum_i \pi_i = 1, treating \pi as a column vector.^[23] An iterative approach, such as the power iteration method, starts with an initial probability vector and repeatedly multiplies it by P until convergence to \pi.^[24] For reversible Markov chains, the stationary distribution satisfies the detailed balance equations \pi_i P_{ij} = \pi_j P_{ji} for all states i and j, ensuring the chain's dynamics are symmetric in equilibrium.^[25] Consider a two-state weather model with states sunny (S) and rainy (R), where the transition matrix is

P = \begin{pmatrix} 0.9 & 0.1 \\ 0.5 & 0.5 \end{pmatrix},

meaning from sunny, it stays sunny with probability 0.9; from rainy, it transitions to sunny with probability 0.5.^[26] Solving \pi P = \pi with \pi_S + \pi_R = 1 yields \pi_S = 5/6 and \pi_R = 1/6, indicating the long-run proportion of sunny days is five times that of rainy days.^[27]

In Linear Algebra and Dynamical Systems

State-Transition Matrix

In linear algebra and dynamical systems, the state-transition matrix provides a fundamental tool for describing the evolution of the state vector in linear time-invariant (LTI) systems, distinct from the probabilistic transition matrices used in Markov chains. For a continuous-time LTI system defined by the state equation \dot{x}(t) = A x(t), where x(t) \in \mathbb{R}^n is the state vector and A \in \mathbb{R}^{n \times n} is the constant system matrix, the solution expressing the state at time t in terms of the initial state is x(t) = \Phi(t) x(0). Here, \Phi(t) denotes the state-transition matrix, which maps the initial state x(0) to the current state x(t).^[28] This matrix is explicitly given by the matrix exponential \Phi(t) = e^{A t}, where the exponential is defined via its power series expansion:

e^{A t} = I + A t + \frac{(A t)^2}{2!} + \frac{(A t)^3}{3!} + \cdots,

with I being the identity matrix. This form arises as the unique solution to the matrix differential equation \dot{\Phi}(t) = A \Phi(t) subject to the initial condition \Phi(0) = I.^[28]^[29] In the discrete-time counterpart, the LTI system is governed by the difference equation x_{k+1} = A x_k, where k indexes discrete time steps and x_k \in \mathbb{R}^n. The state at step k is then x_k = \Phi(k) x_0, with the state-transition matrix \Phi(k) = A^k. This power form directly follows from iterative application of the system equation, starting from \Phi(0) = I.^[29]^[30] Both continuous and discrete formulations share the property that \Phi(0) = I, ensuring the state remains unchanged at the initial time. Additionally, for the continuous case, the state-transition matrix satisfies the semigroup property \Phi(t + s) = \Phi(t) \Phi(s) for all t, s \geq 0, reflecting the additive structure of time in linear systems. This property underscores the matrix's role in composing evolutions over successive intervals.^[28] The conceptual framework of the state-transition matrix emerged within modern control theory during the mid-20th century, particularly through Rudolf E. Kalman's development of state-space representations for linear systems in the late 1950s and early 1960s. Kalman's seminal work formalized these matrices as central to analyzing and predicting system behavior, enabling recursive solutions for estimation and control problems in noisy environments. This approach shifted focus from transfer functions to internal state dynamics, laying the groundwork for applications in aerospace and beyond.^[31]

Computation and Properties

In linear time-invariant dynamical systems, the state-transition matrix for continuous-time systems, denoted \Phi(t) = e^{At}, where A is the system matrix, is fundamentally computed using the matrix exponential. The matrix exponential is defined by its power series expansion:

e^{At} = \sum_{k=0}^{\infty} \frac{(At)^k}{k!} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \cdots,

which converges for all finite t and square matrices A.^[7] This series provides an exact representation, analogous to the scalar exponential, and directly yields the state evolution \mathbf{x}(t) = e^{At} \mathbf{x}(0).^[32] An alternative analytical method leverages the Cayley-Hamilton theorem, which states that A satisfies its own characteristic equation p(\lambda) = \det(\lambda I - A) = 0, implying p(A) = 0. This reduces the infinite series to a finite linear combination of powers of A up to degree n-1 for an n \times n matrix, enabling closed-form computation when the characteristic polynomial is known.^[33] For numerical evaluation, especially in high dimensions, the Padé approximation offers efficient alternatives to the series, representing e^{z} as a ratio of polynomials that matches the Taylor series up to a specified order; post-1980s implementations combine scaling, squaring, and [m/m] Padé approximants for improved accuracy and stability.^[34]^[35] For discrete-time systems, the state-transition matrix over k steps is \Phi(k) = A^k, computed via direct matrix powering for small k or, if A is diagonalizable as A = PDP^{-1} with D diagonal, by \Phi(k) = PD^k P^{-1}, where D^k involves only scalar exponentiations of eigenvalues.^[36] This decomposition simplifies iteration, though it requires solving the eigenvalue problem; eigenvector-based methods for the continuous case follow a similar principle but are detailed elsewhere.^[37] Key properties of the state-transition matrix include invertibility, satisfying \Phi(-t) = \Phi(t)^{-1} for all t, which ensures reversibility of state evolution in linear systems.^[7] For time-invariant systems, \Phi(t, \tau) = \Phi(t - \tau), reflecting independence from absolute time and dependence only on elapsed duration.^[38]

Applications in Control Theory

In control theory, the state-transition matrix \Phi(t) plays a pivotal role in stability analysis of linear time-invariant (LTI) systems described by \dot{x}(t) = Ax(t). The eigenvalues of \Phi(t) = e^{At} are determined by those of the system matrix A, as the solution to the homogeneous equation is x(t) = e^{At} x(0), where the exponential form decomposes into terms e^{\lambda_i t} associated with each eigenvalue \lambda_i of A. A system is asymptotically stable if all eigenvalues of A have negative real parts (i.e., \operatorname{Re}(\lambda_i) < 0 for all i), ensuring that the state trajectories converge to the origin as t \to \infty, regardless of initial conditions.^[7] The state-transition matrix also facilitates assessments of controllability and observability in LTI systems of the form \dot{x}(t) = Ax(t) + Bu(t), y(t) = Cx(t). Controllability, which verifies if any initial state can be driven to the origin in finite time using input u(t), requires the controllability Gramian W_c(t_0, t_f) = \int_{t_0}^{t_f} \Phi(t_0, \tau) B(\tau) B(\tau)^T \Phi(t_0, \tau)^T d\tau to have full rank (i.e., positive definite for time-invariant cases). Similarly, observability, which checks if the initial state can be reconstructed from outputs y(t), demands the observability Gramian W_o(t_0, t_f) = \int_{t_0}^{t_f} \Phi(t, t_0)^T C(t)^T C(t) \Phi(t, t_0) dt to be positive definite. These rank conditions, derived from the fundamental solution involving \Phi, underpin structural analyses in system design.^[39] In optimal control, particularly the linear quadratic regulator (LQR) problem minimizing \int_0^T (x^T Q x + u^T R u) dt subject to \dot{x} = Ax + Bu, the state-transition matrix aids in solving the associated differential Riccati equation (DRE) \dot{P} = -PA - A^T P + P B R^{-1} B^T P - Q. For finite-horizon LQR, the solution P(t) can be expressed using the state-transition matrix of the Hamiltonian system \dot{z} = H z, where H = \begin{bmatrix} A & -B R^{-1} B^T \\ -Q & -A^T \end{bmatrix} and z = \begin{bmatrix} x \\ p \end{bmatrix}, with P(t) = \Phi_{21}(t, T) \Phi_{11}(t, T)^{-1}; this linearizes the quadratic DRE into a larger linear matrix equation, enabling efficient numerical computation.^[40] A representative example is the mass-spring-damper system, modeled in state space as \dot{x} = \begin{bmatrix} 0 & 1 \\ -2 & -3 \end{bmatrix} x + \begin{bmatrix} 0 \\ 1 \end{bmatrix} u, where x = \begin{bmatrix} y \\ \dot{y} \end{bmatrix}, y is displacement, and parameters correspond to unit mass, spring constant 2, and damping coefficient 3 (overdamped case). The state-transition matrix is \Phi(t) = e^{At}, computed via the modal decomposition using eigenvalues -1 and -2. Stability follows from \operatorname{Re}(\lambda) < 0, confirming asymptotic convergence of exponential decay to equilibrium without oscillations.^[41] For nonlinear systems \dot{x} = f(x, u), modern extensions approximate local behavior via linearization at an equilibrium (\bar{x}, \bar{u}) where f(\bar{x}, \bar{u}) = 0, yielding \dot{\delta x} = A \delta x + B \delta u with A = \frac{\partial f}{\partial x} \big|_{\bar{x}, \bar{u}} (the Jacobian matrix) and B = \frac{\partial f}{\partial u} \big|_{\bar{x}, \bar{u}}. The state-transition matrix then becomes \Phi(t) = e^{A t}, enabling stability analysis, controllability checks, and LQR design in the linearized vicinity, valid for small perturbations \delta x. This Jacobian-based approach underpins applications in robotics and aerospace, where full nonlinear solutions are intractable.^[42]

In Geometry and Transformations

Transition Matrices for Möbius Transformations

Möbius transformations are rational functions of the form f(z) = \frac{az + b}{cz + d}, where a, b, c, d \in \mathbb{C} and the condition ad - bc \neq 0 ensures the transformation is non-constant.^[43] These transformations are represented by $2 \times 2 complex matrices of the form

\begin{pmatrix} a & b \\ c & d \end{pmatrix},

where the representation is defined up to nonzero scalar multiplication, meaning that matrices differing by a scalar multiple induce the same transformation.^[44] This matrix association arises from the projective linear action on the complex projective line \mathbb{CP}^1, where z corresponds to the homogeneous coordinates [z : 1], and the matrix applies linearly before dehomogenization.^[43] The set of all Möbius transformations forms a group under composition, which is isomorphic to the projective special linear group \mathrm{PSL}(2, \mathbb{C}), obtained as the quotient of the special linear group \mathrm{SL}(2, \mathbb{C}) by its center \{ \pm I \}.^[45] The special linear group \mathrm{SL}(2, \mathbb{C}) consists of those $2 \times 2 matrices with determinant 1, providing a normalization condition \det(M) = ad - bc = 1 that eliminates the scalar ambiguity while preserving the group structure.^[43] Composition of transformations corresponds directly to matrix multiplication: if f and g are represented by matrices M_f and M_g, then f \circ g is represented by M_f M_g (up to scalar).^[46] This matrix representation of Möbius transformations aligns with Felix Klein's Erlangen program, introduced in the 1870s, which characterizes geometries by their groups of transformations, unifying diverse geometric structures under group-theoretic lenses.^[47] In this framework, Möbius transformations generate the conformal geometry of the complex plane, highlighting invariants preserved under these projective actions.

Representation on the Riemann Sphere

The Riemann sphere, denoted \hat{\mathbb{C}} = \mathbb{C} \cup \{\infty\}, is the extended complex plane obtained by adjoining the point at infinity to the finite complex plane \mathbb{C}. This topological space can be visualized as the unit sphere S^2 in \mathbb{R}^3 via the stereographic projection, which maps the north pole (0,0,1) to \infty and the equatorial plane to \mathbb{C}.^[48] Transition matrices, specifically non-singular $2 \times 2 complex matrices M = \begin{pmatrix} a & b \\ c & d \end{pmatrix} with \det M = ad - bc \neq 0, induce Möbius transformations that act bijectively on the Riemann sphere.^[48] The action of such a matrix M on a point z \in \hat{\mathbb{C}} is given by the Möbius transformation f_M(z) = \frac{az + b}{cz + d}, interpreted in homogeneous coordinates as M \begin{pmatrix} z \\ 1 \end{pmatrix} = \begin{pmatrix} az + b \\ cz + d \end{pmatrix}, where the image is the ratio of the components provided the denominator is non-zero. For finite z with cz + d \neq 0, this yields the standard fractional linear form; at the pole z = -d/c (if c \neq 0), f_M(z) = \infty; and at infinity, f_M(\infty) = a/c if c \neq 0, or \infty if c = 0. This representation ensures the transformation is well-defined and bijective on the entire Riemann sphere, with composition of transformations corresponding to matrix multiplication up to scalar factors.^[48] Fixed points of the transformation, where f_M(z) = z, satisfy the equation \frac{az + b}{cz + d} = z, or equivalently (a - z c)z + (b - z d) = 0 after clearing the denominator, leading to a quadratic equation in z. These fixed points correspond to the projective eigenvectors of M, as solving f_M(z) = z is equivalent to finding eigenvectors (z, 1) such that M \begin{pmatrix} z \\ 1 \end{pmatrix} = \lambda \begin{pmatrix} z \\ 1 \end{pmatrix} for some \lambda \neq 0, with z being the projective coordinate. Every non-identity Möbius transformation has exactly two fixed points on the Riemann sphere, counted with multiplicity.^[49] A representative example arises in representing rotations of the Riemann sphere, given by matrices of the form M = \begin{pmatrix} a & b \\ -\bar{b} & \bar{a} \end{pmatrix} with |a|^2 + |b|^2 = 1, corresponding to elements of the special unitary group SU(2), which double-covers the rotation group SO(3). Via stereographic projection, these transformations map the complex plane to itself, preserving circles and angles on the sphere.^[48] Möbius transformations induced by transition matrices are conformal, meaning they preserve oriented angles between curves, as they are holomorphic functions with non-vanishing derivative on the Riemann sphere. The derivative is f_M'(z) = \frac{ad - bc}{(cz + d)^2} \neq 0 for finite z \neq -d/c, and similar non-zero expressions hold at the pole and infinity via change of variables, ensuring angle preservation globally. This conformality follows from the fact that such maps are bijective holomorphic automorphisms of \hat{\mathbb{C}}.^[50]

Geometric Interpretations

In geometry, transition matrices often arise as change-of-basis matrices that facilitate coordinate transformations, providing a geometric lens on how linear maps alter spatial configurations. For affine transformations in the 2D plane, these are represented using 3×3 matrices in homogeneous coordinates, which embed translations alongside linear operations like scaling, shearing, and rotations. This formulation allows a single matrix to encode shifts, where a point (x, y) is augmented to (x, y, 1) and transformed via matrix multiplication, yielding the new coordinates after dehomogenization by dividing by the third component. Such matrices preserve collinearity and parallelism but generally distort distances and angles unless specialized.^[51] Similarity transformations, a subclass of affine maps, are captured by matrices of the form sQ, where s > 0 is a uniform scaling factor and Q is an orthogonal matrix representing rotation or reflection. Orthogonal matrices Q satisfy Q^T Q = I, ensuring they preserve lengths and angles up to the scaling s, thus maintaining shapes while allowing size changes. Geometrically, this interprets the transition matrix as a rigid motion scaled uniformly, ideal for modeling conformal mappings in the plane.^[52]^[53] Transition matrices relate to Möbius transformations through embeddings in hyperbolic geometry, where Möbius maps preserve the hyperbolic metric when restricted to models like the Poincaré disk, effectively transitioning between Euclidean and hyperbolic coordinate systems. This connection highlights how projective extensions of affine transitions embed non-Euclidean structures.^[54] A concrete example is the transition matrix for a glide reflection in the Euclidean plane, an isometry combining reflection over a line (say, the x-axis) with translation parallel to it (by vector (t, 0)). In homogeneous coordinates, the reflection matrix is \begin{pmatrix} 1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{pmatrix} and the translation \begin{pmatrix} 1 & 0 & t \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}; their product \begin{pmatrix} 1 & 0 & t \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{pmatrix} yields the glide, preserving distances while reversing orientation. Visually, these matrices alter geometry by shearing (distorting angles without changing area), scaling (stretching distances proportionally), or rotating (preserving both), with orthogonal components maintaining metric properties like perpendicularity, while general affines map parallelograms to parallelograms but warp circles to ellipses.^[56]

General Mathematical Properties

Eigenvalues and Spectral Analysis

For a stochastic transition matrix P, the eigenvalues \lambda satisfy |\lambda| \leq 1, as the spectral radius is bounded by the matrix's row sums equaling 1. Moreover, \lambda = 1 is always an eigenvalue, corresponding to a left eigenvector that represents the stationary distribution of the associated Markov chain.^[57] The eigenvalues are the roots of the characteristic polynomial, given by

\det(P - \lambda I) = 0,

which determines the spectral properties of P.^[58] The Perron-Frobenius theorem applies to positive transition matrices, guaranteeing a unique dominant real eigenvalue \lambda = 1 of algebraic multiplicity one, with a corresponding positive right eigenvector that is simple and strictly positive.^[59] For irreducible non-negative stochastic matrices, the theorem extends to ensure that 1 is the eigenvalue of largest modulus, and all other eigenvalues lie strictly inside the unit disk if the matrix is primitive.^[59] In cases where the transition matrix is not diagonalizable, the Jordan canonical form may reveal blocks for other eigenvalues. Although irreducible transition matrices have a simple eigenvalue 1, periodicity leads to other eigenvalues on the unit circle, resulting in oscillatory convergence behavior. The spectral gap, defined as $1 - |\lambda_2| where \lambda_2 is the second-largest eigenvalue modulus, quantifies the rate of convergence to the stationary distribution in reversible Markov chains.^[60] A larger spectral gap implies a shorter mixing time, with the relaxation time bounded by $1/(1 - |\lambda_2|), providing a key measure of how quickly the chain approaches equilibrium.^[60]

Similarity to Other Matrices

Transition matrices, particularly those arising in Markov chains, are closely related to stochastic matrices, which are square matrices with non-negative entries where each row sums to 1. This property ensures that the matrix represents a valid probability distribution over possible transitions from one state to others.^[61] A special case is the doubly stochastic matrix, where both rows and columns sum to 1, implying that the uniform distribution is stationary under the transition dynamics.^[61] While all doubly stochastic matrices are stochastic, the converse does not hold; transition matrices are typically row-stochastic but not necessarily column-stochastic unless the chain is reversible or symmetric in a specific way.^[61] In graph theory, transition matrices bear a direct resemblance to normalized adjacency matrices, particularly in the context of random walks on undirected graphs. The transition matrix P for a simple random walk is given by P = D^{-1}A, where A is the adjacency matrix (with 1s indicating edges between vertices) and D is the diagonal degree matrix.^[62] This normalization makes P row-stochastic, mirroring the probabilistic interpretation of moving to neighboring vertices with equal probability. The normalized Laplacian matrix L_{\text{norm}} = I - D^{-1/2} A D^{-1/2} is related to P, since L_{\text{norm}} = I - S where S is similar to P, so the eigenvalues of L_{\text{norm}} are 1 minus those of P, highlighting their shared role in analyzing graph connectivity and diffusion processes.^[62] Permutation matrices represent an extreme case of stochastic transition matrices, serving as the vertices of the Birkhoff polytope formed by all doubly stochastic matrices. A permutation matrix is a $0-$1 matrix with exactly one 1 in each row and column, corresponding to a deterministic reassignment of states without probability diffusion.^[63] These matrices are both row- and column-stochastic, as the sums are precisely 1, and they are orthogonal, satisfying P^T P = I, which preserves the Euclidean norm under transformation.^[64] This orthogonality distinguishes them from general stochastic matrices, which do not necessarily preserve vector lengths but do preserve the \ell_1-norm for probability vectors. Transition matrices often function as contraction operators in appropriate norms, facilitating convergence analyses in dynamical systems. For instance, on the space of probability measures equipped with the total variation distance, the action of a transition matrix induces a contraction with Lipschitz constant at most 1, and strictly less than 1 for irreducible aperiodic chains, enabling the Banach fixed-point theorem to guarantee a unique stationary distribution.^[65] In the \ell_1-norm, stochastic matrices have operator norm exactly 1, reflecting their norm-preserving property for positive vectors, but the induced map on differences (deviations from stationarity) contracts under ergodicity assumptions.^[65] To illustrate distinctions, consider a comparison with rotation matrices, which are orthogonal but generally not stochastic. A 2D rotation matrix by \theta, such as \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}, preserves the Euclidean norm (R^T R = I) and represents rigid geometric transformations. However, its entries may be negative or not sum to 1 per row (e.g., for \theta = 90^\circ, row sums are 0 and 0), violating the non-negativity and normalization required for transition matrices, which model probabilistic state changes rather than deterministic rotations.

Extensions and Generalizations

Time-inhomogeneous Markov chains generalize the standard discrete-time case by allowing the transition probabilities to depend explicitly on time, so the transition matrix P_n at step n may differ from those at other steps. The probability distribution after k steps is then obtained by the ordered product \prod_{i=1}^k P_i, rather than powers of a fixed matrix. In continuous-time Markov chains, the transition matrix P(t) describes the probabilities of state changes over a continuous time interval [0, t], generated by an infinitesimal rate matrix Q such that P(t) = e^{Qt}, where the exponential is the matrix exponential. This formulation satisfies the Kolmogorov forward and backward differential equations, \frac{d}{dt} P(t) = P(t) Q and \frac{d}{dt} P(t) = Q P(t), respectively, which govern the time evolution of the transition probabilities.^[19] For infinite-state spaces, transition matrices extend to bounded linear operators on infinite-dimensional spaces, such as Hilbert spaces, where classical probabilities are replaced by density operators and transitions by completely positive trace-preserving super-operators. In quantum Markov chains, for instance, the state space is a separable Hilbert space \mathcal{H}, and the transition kernel consists of super-operators E_{ij}: \mathcal{B}(\mathcal{H}) \to \mathcal{B}(\mathcal{H}) acting on the algebra of bounded operators, enabling the modeling of quantum correlations in infinite-dimensional systems.^[66] Non-square generalizations of transition matrices arise when the input and output state spaces have different dimensions, such as in latent Markov models where the number of latent classes varies over time. In these rectangular latent Markov models, the transition matrix from time t-1 to t has dimensions k_{t-1} \times k_t if the number of classes changes from k_{t-1} to k_t, allowing subjects to be reassigned across groupings while maintaining probabilistic consistency.^[67] Recent developments in quantum transition matrices focus on open quantum systems, where the evolution of the density operator \rho(t) on a Hilbert space follows the Lindblad master equation \frac{d}{dt} \rho(t) = -i [H, \rho(t)] + \sum_k \left( L_k \rho(t) L_k^\dagger - \frac{1}{2} \{ L_k^\dagger L_k, \rho(t) \} \right), generalizing classical Markovian dynamics to account for decoherence and dissipation through jump operators L_k. This framework, prominent since the early 2000s, underpins analyses of quantum channels as quantum Markov semigroups, with applications in quantum information processing.^[68]

References

[1]
[PDF] Lecture 1: Markov Chains-Part I 1.1 Definition and characterization
The matrix P = [Pij]i,j∈X is called the transition matrix of the Markov Chain X. P satisfies the properties that Pij ≥ 0 for all i, j ∈ X and Pj∈X. Pij = 1 for ...
[2]
[PDF] Markov Chains Handout for Stat 110 1 Introduction
The transition matrix of the chain is the M × M matrix Q = (qij). Note that Q is a nonnegative matrix in which each row sums to 1. Definition 2. Let q. (n).
[3]
[PDF] Lec 26: Transition matrix. Let V be an n-dimensional vector space ...
Let V be an n-dimensional vector space and S = {v1,...,vn}, T = {w1,...,wn} its two bases. The transition matrix PS←T from T to S is n × n matrix which columns.Missing: mathematics | Show results with:mathematics
[4]
[PDF] Markov Chains
2 Regular Markov Chains. Definition 2.1 A Markov chain is a regular Markov chain if the transition matrix is primitive. (Recall that a matrix A is primitive ...
[5]
https://www.stat.yale.edu/~jtc5/251/readings/Basics%20of%20Applied%20Stochastic%20Processes_Serfozo.pdf
[6]
[PDF] MATH 311 Topics in Applied Mathematics I Lecture 18
The inverse matrix U-1 is called the transition matrix from e1,..., en to u1,..., un. Page 4. Problem. Find coordinates of the vector x = (1,2,3) ...
[7]
Transition Matrix -- from Wolfram MathWorld
A state-transition matrix is a matrix whose product with the initial state vector gives the state vector at a later time.
[8]
https://www.sciencedirect.com/science/article/pii/B9780123747518000202
[9]
[PDF] Lecture 18: Markov Chains 1 Overview 2 Basic Definitions
Definition 1 (Transition Matrix). We define the transition matrix of a Markov chain as the matrix. P = pij, where: pij = P[Xt+1 = j | Xt = i]. (2). Using the ...
[10]
[PDF] Chapter 8: Markov Chains
Definition: The transition matrix of the Markov chain is P = (pij). 8.4 Example: setting up the transition matrix. We can create a transition matrix for any ...
[11]
Stochastic Matrices
... Markov chain. Definition. A square matrix A is stochastic if all of its entries are nonnegative, and the entries of each column sum to 1. A matrix is ...
[12]
[PDF] Lecture 2: Markov Chains (I)
Any matrix that satisfies (i), (ii) above is called a stochastic matrix. Hence, the transition matrix is a stochas- tic matrix. 1 We call P a matrix even if |S| ...
[13]
[PDF] 5 Markov Chains
A transition matrix where the columns sum to 1 is called column stochastic (or left stochastic). The rows of a row stochastic (or right stochastic) transition ...<|control11|><|separator|>
[14]
Andrei Andreyevich Markov (1856 - 1922) - Biography - MacTutor
Markov is particularly remembered for his study of Markov chains, sequences of random variables in which the future variable is determined by the present ...
[15]
[PDF] MARKOV CHAINS AND THEIR APPLICATIONS
Apr 28, 2021 · The transition matrix for a Markov chain is a stochastic matrix whose (i, j) entry gives the probability that an element moves from the state si ...
[16]
[PDF] MATH 423 Linear Algebra II Lecture 25: Markov chains (continued ...
Very primitive weather model: Two states: “sunny” (1) and “rainy” (2). Transition matrix: P = 0.9 0.1. 0.5 0.5 . Suppose that x0 = (1,0) (sunny weather ...<|control11|><|separator|>
[17]
Gambler's Ruin - Markov chains
May 11, 2020 · A gambler G starts with two chips at a table game in a casino, pledging to quit once 8 more chips are won. G can either win a chip or lose a chip in each game.
[18]
[PDF] MARKOV CHAINS: BASIC THEORY 1.1. Definition and First ...
Once the Chapman-Kolmogorov equation is established, it follows that the n−step transition probabilities pn (x,y ) are the entries of Pn , because equation (5) ...
[19]
[PDF] 1 IEOR 6711: Continuous-Time Markov Chains
A CTMC makes transitions from state to state, independent of the past, ac- cording to a discrete-time Markov chain, but once entering a state remains in that ...
[20]
[PDF] Lecture 8 1 Last Class 2 Stationary Distributions of Markov Chains
Mar 5, 2012 · A stationary distribution Π is a fixed point of the Markov chain, meaning it satisfies ΠP = Π. Restated, it is a distribution over states so ...
[21]
[PDF] Lecture 7: Markov Chains and Random Walks - cs.Princeton
The uniform distribution, which assigns probability 1/n to each node, is a stationary distribution for this chain, since it is unchanged after applying one step ...
[22]
[PDF] 2.1 Markov Chains
Ergodic Markov chains are useful algorithmic tools in that, regardless of their initial state, they eventually reach a unique stationary distribution. We can ...
[23]
[PDF] Finding the Stationary Distribution
The stationary distribution of P may be found by solving a system of linear equations. The stationary distribution π satisfies π = πP and P i πi = 1. We write ...
[24]
[PDF] PCA and the Power Iteration Method - Stanford CS Theory
Apr 15, 2015 · ... stationary distribution for the Markov Chain, which is the limiting distribution, provided such a distribution exists. (recall the two ...
[25]
MarkovChains - Computer Science
The basic version of the ergodic theorem says that if an aperiodic Markov chain has a stationary distribution π (i.e., it's ergodic), then it converges to π if ...
[26]
[PDF] 18.440: Lecture 33 Markov Chains - DSpace@MIT
▶ For example, imagine a simple weather model with two states: rainy and sunny. next day, a . 5 chance it will be sunny.
[27]
[PDF] 1 Markov Chains
Mar 13, 2015 · Example. Consider a two-state weather forecast model, where weather is classified as sunny or rainy. • If today's forecast is rainy, then ...
[28]
[PDF] Lectures on Dynamic Systems and Control - DSpace@MIT
(Note that since the state transition matrix in CT is alway invertible, there is no restriction that t1 lie between t0 and t | unlike in the DT case, where the ...<|separator|>
[29]
[PDF] Solution of LTI State-Space Equations - University of Washington
The state transition matrix eAt. Computing eAt when A is diagonal or in Jordan form. Discrete-time LTI case. The state transition matrix Ak. Computing Ak when A ...
[30]
[PDF] Discrete-time linear state-space models - MIT OpenCourseWare
If A(k;1), A(k;2),. .., A(`) are all invertible, then one could use the state transition matrix to obtain x(k) from x(`) even when k `, but we shall typically ...
[31]
Kalman 1960: The birth of modern system theory - ResearchGate
Aug 9, 2025 · In this year, he published two equally important contributions, one about linear state space system theory and the other about linear quadratic ...
[32]
[PDF] Time-Domain Solution of LTI State Equations - MIT
The state transition matrix for the diagonal system is Φ(t) = e. Λt as given by Eq. (45). Φ(t) =..... e λ1t. 0 ... 0. 0 e λ2t ... 0 ... ... ... ...
[33]
[PDF] Lecture 10 Solution via Laplace transform and matrix exponential
• with this definition, state-transition matrix is. Φ(t) = L−1 (sI − A)−1 = etA. Solution via Laplace transform and matrix exponential. 10–13. Page 14. Matrix ...
[34]
[PDF] Computing the Matrix Exponential The Cayley-Hamilton Method 1
The Cayley-Hamilton theorem states that every matrix satisfies its own ... The matrix exponential is simply one case of an analytic function as described above.
[35]
The Padé method for computing the matrix exponential - ScienceDirect
We analyze the Padé method for computing the exponential of a real matrix. More precisely, we study the roundoff error introduced by the method.
[36]
Scaled and Squared Subdiagonal Padé Approximation for the ...
The method scales the matrix by a power of 2 to reduce the norm to order 1, computes a Padé approximant to the matrix exponential, and then repeatedly squares ...
[37]
[PDF] Fall 2010 State Equation Solution - Purdue Engineering
Sep 14, 2010 · The methods discussed allow us to easily compute the matrix exponential and hence the state transition matrix of LTI systems. A major tool ...
[38]
Diagonalizing a State-Space Model - Stanford CCRMA
A system can be diagonalized whenever the eigenvectors of $ A$ are linearly independent. This always holds when the system poles are distinct.
[39]
[PDF] ECE504: Lecture 4 - spinlab
x[k] = Φ[k, k0]x[k0]. The state transition matrix Φ[k, k0] describes how the state at time k0 evolves to the state at time k ≥ k0 (in the absence of an input).
[40]
[PDF] Lecture 8
The state equation is controllable on (t0,tf ) if and only if the controllability Gramian W(t0,tf ) is invertible (W(t0,tf ) > 0). Observability Gramian: M(t0, ...
[41]
6.1.2 Riccati differential equation - Daniel Liberzon
A unique candidate for an optimal control is given by the linear feedback law (6.12), where the matrix $ P(t)$ satisfies the RDE (6.14) and the boundary ...
[42]
https://www.cds.caltech.edu/~murray/courses/cds101/fa02/caltech/pph02-ch19-23.pdf
[43]
None
### Summary of Linearization for Nonlinear Systems Using Jacobian and State Transition Matrix
[44]
[PDF] The Geometry of Möbius Transformations - John Olsen's homepage
The purpose of these notes is to explore some basic properties of Möbius transforma- tions (linear fractional transformations) which are one-to-one, ...
[45]
[PDF] 1 Conformal Property 2 Matrix Representations
May 5, 2017 · 2 Matrix Representations. We can associate any invertible 2 x 2 matrix with a Möbius transformation under the mapping π : GL(2,C) → MG. (. a b.
[46]
[PDF] On the Universal Group PSL(2,C) 1 Introduction
Oct 10, 2018 · An element of PSL(2, C) is called a Möbius transformation. The set of all Möbius transformations forms a group under compo- sition. This ...
[47]
[PDF] Review of Möbius Transformations
The map ϕA is called a Möbius transformation. Lemma. ϕAB = ϕA ◦ ϕB. This is easily proved by direct computation. More conceptually, we can argue as follows.Missing: representation | Show results with:representation
[48]
The Felix Klein Protocols - American Mathematical Society
While at. Erlangen he developed his revolutionary Erlangen. Program, unifying the various geometries of the time by classifying them according to their corre-.
[49]
[PDF] Möbius Transformations of the Riemann Sphere
Proposition 7.5 ensures that every 2 × 2 unitary matrix with determinant equal to one determines a corresponding rotation of three-dimensional space R3. We ...
[50]
[PDF] Geometry & Groups 1 Möbius transformations and inversions
Jan 12, 2012 · We will see that an eigenvector of an element of. GL2(C) corresponds to a fixed point of the associated Möbius transformation. 2. Page 3. 1.3 ...
[51]
[PDF] chapter 2. conformal mappings 58 - HKUST Math Department
The above Möbius map is conformal on the Riemann sphere. Proof. Let c 6= 0. Then f. 0. (z) = ad − bc. (cz + d)2. 6= 0, whenever z 6= −d c . Hence f(z) is ...
[52]
[PDF] Lecture 9: 2D Transformation & Alignment - UNC Computer Science
Q: How can we represent translation as a 3x3 matrix? • A: Using the rightmost column: │. │. │. ⌋. ⌉. │.Missing: transition | Show results with:transition
[53]
CoordinateTransformations - Intelligent Motion Lab
Homogeneous coordinates: A way to represent rigid transforms as linear transforms (matrices). A 3x3 (in 2D) or 4x4 (in 3D) matrix that contains the rotation ...
[54]
[PDF] 2D Geometric Transformations
Geometry of 2D linear trans. • 2x2 matrices have simple geometric interpretations. – uniform scale. – non-uniform scale. – rotation.
[55]
[PDF] Hyperbolic Neural Networks - arXiv
Jun 28, 2018 · We do it by connecting the theory of gyrovector spaces and generalized Möbius transformations introduced by [2, 26] with the Riemannian geometry ...
[56]
[PDF] ISOMETRIES OF THE HYPERBOLIC PLANE - UChicago Math
In this paper, I will explore basic properties of the group PSL(2, R). These include the relationship between isometries of H. 2, Möbius transforma- tions, and ...Missing: transition | Show results with:transition
[57]
[PDF] 2D and 3D Transformations - Stanford Computer Graphics Laboratory
Homogeneous Coordinates. Q: How can we represent translation as a 3x3 matrix? y x tyy txx. +=. +=. ' ' Page 32. Homogeneous Coordinates. Homogeneous coordinates.
[58]
[PDF] Lecture 33: Markov matrices
Apr 27, 2011 · A Markov matrix A always has an eigenvalue 1. All other eigenvalues are in absolute value smaller or equal to 1. Proof. For the transpose matrix ...
[59]
[PDF] Lecture 34: Perron Frobenius theorem
Apr 27, 2011 · Perron Frobenius theorem: If all entries of a n × n matrix A are positive, then it has a unique maximal eigenvalue. Its eigenvector has ...
[60]
[PDF] Math 4571 (Advanced Linear Algebra)
Transition matrices and Markov chains, used for modeling iterated changes in ... Since its Jordan form is not diagonal, it is not diagonalizable. Page ...
[61]
[PDF] Mixing times of Markov chains - University of Cambridge
The spectral gap is defined to be γ = 1 − λ2. Exercise 3.2. Check that if the chain is lazy then γ∗ = γ. Definition 3.3. The relaxation time for a reversible ...<|control11|><|separator|>
[62]
[PDF] arXiv:1706.07616v1 [math.PR] 23 Jun 2017
Jun 23, 2017 · Similarly one can define a left stochastic matrix being a non-negative real square matrix, with each column summing to 1 and a doubly stochastic ...
[63]
[PDF] Lecture 7 1 Normalized Adjacency and Laplacian Matrices
Sep 13, 2016 · In this lecture, we introduce normalized adjacency and Laplacian matrices. We state and begin to prove Cheeger's inequality, which relates ...
[64]
permutation matrix - PlanetMath
Mar 22, 2013 · 2 Properties. Permutation matrices have the following properties: •. They are orthogonal ... of the convex set of doubly stochastic matrices.
[65]
[PDF] Linear Algebra 2 Lecture #17 Permutation matrices. Orthogonal ...
Mar 20, 2024 · . Obviously, matrices In and −In are orthogonal. By Theorem 2.3.14, permutation matrices are orthogonal (as long as we consider the 0's and ...
[66]
[PDF] MARKOV CHAINS: BASIC THEORY 1.1. Definition and Conventions ...
Contraction Mapping Fixed Point Theorem. What do we gain by knowing that the action of the transition probability matrix on the simplex is a contraction?
[67]
[PDF] Model checking quantum Markov chains
A super-operator weighted Markov chain, or quantum Markov chain (QMC) is a triple (S, H,Q), where. S is a countable set of states (classical state space);. H is ...
[68]
[PDF] Rectangular latent Markov models for time-specific clustering, with ...
Whenever kt-1 6= kt, a rectangular transition matrix is obtained, where subjects are re-arranged into a new grouping configuration. Each subject in group h ...
[69]
[PDF] Dynamics of Open Quantum Systems—Markovian Semigroups and ...
Aug 22, 2022 · From a modern perspective, the original representation of the transition rate matrix is not the most general. For systems with multiple steady ...