Fact-checked by Grok 2 weeks ago

Doubly stochastic matrix

A doubly stochastic matrix is a square matrix consisting of non-negative real entries such that the sum of the entries in each row equals 1 and the sum of the entries in each column equals 1.^[1] The set of all n \times n doubly stochastic matrices forms a convex polytope known as the Birkhoff polytope, which is the convex hull of the n! permutation matrices and has dimension (n-1)^2.^[2] The Birkhoff–von Neumann theorem states that every doubly stochastic matrix can be decomposed as a finite convex combination of permutation matrices, providing a canonical representation that connects linear algebra with combinatorics.^[3] This theorem implies that the Birkhoff polytope is the convex hull of its vertices, the permutation matrices, and highlights the integral structure underlying these matrices despite their continuous nature.^[2] Doubly stochastic matrices exhibit several notable algebraic properties: the product of two such matrices is again doubly stochastic, the transpose of a doubly stochastic matrix is doubly stochastic, and the identity matrix is doubly stochastic.^[1] Their eigenvalues lie within the unit disk, with 1 an eigenvalue corresponding to the all-ones eigenvector, and the spectral radius equals 1.^[4] In probability theory, doubly stochastic matrices serve as transition probability matrices for time-homogeneous Markov chains on finite state spaces, where the uniform distribution is a stationary distribution.^[5] Applications extend to combinatorial optimization, such as fair allocation problems and network flows, as well as the computation of permanents, where the van der Waerden theorem bounds the minimum permanent of an n \times n doubly stochastic matrix by n!/n^n.^[6]

Definition and Properties

Definition

A non-negative matrix is a square matrix whose entries are all greater than or equal to zero.^[7] A doubly stochastic matrix is a non-negative square matrix A = (a_{ij}) of order n such that

\sum_{j=1}^n a_{ij} = 1 \quad \text{for each } i = 1, \dots, n

and

\sum_{i=1}^n a_{ij} = 1 \quad \text{for each } j = 1, \dots, n.

^[7] This means both the rows and columns sum to unity, distinguishing it from other stochastic matrices. A row-stochastic matrix (also called right-stochastic) is a non-negative square matrix where only the row sums equal 1, but the column sums may vary.^[8] Conversely, a column-stochastic matrix (also called left-stochastic) requires only the column sums to equal 1, with row sums potentially differing.^[8] The set of all n \times n doubly stochastic matrices is commonly denoted by D_n.^[7] Permutation matrices represent a fundamental special case of doubly stochastic matrices.^[9]

Fundamental Properties

A doubly stochastic matrix A = (a_{ij}) of order n has the property that each row and each column sums to 1 with nonnegative entries, implying that every row vector and every column vector is a probability distribution in the \ell_1 norm.^[1] The eigenvalues of a doubly stochastic matrix satisfy |\lambda| \leq 1 for all eigenvalues \lambda, with 1 being an eigenvalue of algebraic multiplicity at least 1, as the all-ones vector is both a left and right eigenvector corresponding to the eigenvalue 1.^[4] This bound follows from the Gershgorin circle theorem applied to the row sums of 1, confining all eigenvalues to the unit disk in the complex plane.^[4] The permanent of an n \times n doubly stochastic matrix A satisfies \frac{n!}{n^n} \leq \operatorname{per}(A) \leq 1. The lower bound, known as the van der Waerden conjecture, was proved independently by Egorychev and Falikman, with equality holding for the uniform matrix where all entries are \frac{1}{n}.^[10] The upper bound of 1 is achieved precisely when A is a permutation matrix.^[11] Applying the Hadamard inequality to the columns of a doubly stochastic matrix yields |\det(A)| \leq 1, since the Euclidean norm of each column is at most 1 (as the \ell_2 norm is bounded by the \ell_1 norm of 1). Equality holds if and only if A is a permutation matrix, where the columns are orthonormal standard basis vectors.^[12] The set of all n \times n doubly stochastic matrices forms a convex set, meaning it is closed under convex combinations: if A_k are doubly stochastic for k=1,\dots,m and \lambda_k \geq 0 with \sum_k \lambda_k = 1, then \sum_k \lambda_k A_k is also doubly stochastic.^[13] This follows directly from the linearity of row and column sums.^[13] Doubly stochastic matrices are a special case of stochastic matrices, being both row-stochastic and column-stochastic.^[1]

Examples and Constructions

Permutation Matrices as Extremes

A permutation matrix is an n \times n matrix obtained by permuting the rows of the identity matrix according to a permutation \sigma of the set \{1, \dots, n\}, resulting in entries P_{i,j} = 1 if j = \sigma(i) and $0 otherwise.^[14] This ensures exactly one 1 in each row and each column, with all other entries being 0.^[14] Every permutation matrix is doubly stochastic because its entries are non-negative, and both the row sums and column sums equal 1, as each row and column contains precisely one 1.^[1]^[9] For n=2, the two permutation matrices correspond to the identity permutation and the transposition, given by:

\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \quad \text{and} \quad \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.

^[14] For n=3, there are $3! = 6 permutation matrices, representing all permutations of \{1,2,3\}. These are:

\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}, \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}, \begin{pmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix}, \begin{pmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{pmatrix}.

^[14] Permutation matrices serve as the extreme points (vertices) of the set of all n \times n doubly stochastic matrices, meaning no doubly stochastic matrix other than a permutation matrix itself can be expressed as a non-trivial convex combination of others in this set.^[15] The full characterization of this convex set as the Birkhoff polytope follows from these vertices.^[15] Historically, permutation matrices connect to the study of magic squares, where a semi-magic square (with equal row and column sums) of integer weight d can be expressed as the sum of d permutation matrices.^[16] This decomposition has aided in enumerating and analyzing magic squares since early classifications in the 20th century.

Convex Combinations and Averaging

A fundamental method for constructing doubly stochastic matrices beyond the extremal permutation matrices involves taking convex combinations of the latter. Consider k distinct permutation matrices P_1, P_2, \dots, P_k. Their uniform average A = \frac{1}{k} \sum_{i=1}^k P_i is doubly stochastic because each P_i has non-negative entries with row and column sums equal to 1, so A inherits these properties: its entries are non-negative, and the row and column sums are \frac{1}{k} \sum_{i=1}^k 1 = 1. The entry a_{ij} of A equals \frac{m_{ij}}{k}, where m_{ij} counts how many of the P_i have a 1 in position (i,j), with $0 \leq m_{ij} \leq k. This construction highlights the convex nature of the set of doubly stochastic matrices.^[17] A concrete example arises for n=2. The two permutation matrices are the identity I = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} and the swap matrix S = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}. Their average is \frac{1}{2}(I + S) = \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}, which has all row and column sums equal to 1 and uniform entries, demonstrating a non-extremal point in the set. In Birkhoff's original 1946 work, a notable construction averages over all perfect matchings in the complete bipartite graph K_{n,n}, equivalent to uniform averaging over all n! permutation matrices. This yields the matrix J/n, where J is the all-ones matrix, with every entry exactly $1/n; for n=3, it is \frac{1}{3} \begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix}. Each position (i,j) receives a 1 in exactly (n-1)! permutations, so the average entry is (n-1)! / n! = 1/n.^[17]^[18] Doubly stochastic matrices arise naturally as bistochastic maps that preserve the uniform probability distribution. If u is the uniform vector with all entries $1/n, then for any doubly stochastic matrix A, Au = u because the row sums are 1, ensuring the output remains uniform. This preservation property underscores their role in probabilistic interpretations of averaging over permutations.^[17] To generate doubly stochastic matrices with rational entries, a straightforward algorithm selects a finite subset of permutations and computes their uniform average. For a subset of size k, the resulting matrix has entries that are rational multiples of $1/k, providing an explicit, ad-hoc way to populate the interior of the convex set without invoking advanced decomposition techniques. This method is particularly useful for illustrating points with controlled denominators in educational or computational contexts.^[17]

Geometric Perspective

The Birkhoff Polytope

The Birkhoff polytope B_n, named after Garrett Birkhoff, is formally defined as the convex hull of all n \times n permutation matrices, that is,

B_n = \conv \{ P_\sigma \mid \sigma \in S_n \},

where S_n denotes the symmetric group consisting of all permutations of \{1, \dots, n\}, and P_\sigma is the permutation matrix associated with \sigma, having entries (P_\sigma)_{i,j} = 1 if \sigma(i) = j and $0otherwise. This polytope lies in\mathbb{R}^{n^2}but has dimension(n-1)^2, as the permutation matrices satisfy nrow sum constraints andncolumn sum constraints that are linearly dependent by one relation (the total sum). The vertices ofB_nare precisely then!permutation matrices, making it a0/1$-polytope with a combinatorial structure tied to the symmetric group.^[19] A key feature of B_n is its equivalence to the set D_n of all n \times n doubly stochastic matrices, which are nonnegative matrices with all row and column sums equal to $1. This identification follows from the Birkhoff–[von Neumann](/page/Von_Neumann) theorem, establishing that every doubly [stochastic](/page/Stochastic) matrix is a [convex combination](/page/Convex_combination) of [permutation](/page/Permutation) matrices, and thus B_n = D_n. The elements of B_ncan therefore be expressed as [convex combination](/page/Convex_combination)s\sum_{\sigma \in S_n} \lambda_\sigma P_\sigmawhere\lambda_\sigma \geq 0and\sum_{\sigma \in S_n} \lambda_\sigma = 1$. The volume of B_n, measured as the (n-1)^2-dimensional Lebesgue measure in its affine span (with the standard normalization where the fundamental parallelepiped has volume $1$), admits an exact combinatorial formula:

\vol(B_n) = \frac{1}{((n-1)^2)!} \sum_{\sigma \in S_n} \sum_{T \in \Arb(n,n)} \frac{\langle c, \sigma \rangle^{(n-1)^2}}{\prod_{e \notin E(T)} \langle c, W_{T,e} \sigma \rangle},

where \Arb(n,n) is the set of arborescences on n labeled vertices (with |\Arb(n,n)| = n^{n-2}), c \in \mathbb{R}^{n^2} is a generic vector ensuring nonzero denominators, \langle \cdot, \cdot \rangle denotes the standard dot product, and W_{T,e} is the permutation matrix corresponding to the unique cycle formed by adding edge e to arborescence T. This expression, derived using generating functions and integration over transportation polytopes, allows computation for small n; for example, \vol(B_3) = 9/8.^[20] The Birkhoff polytope exhibits rich symmetry under the action of the group S_n \times S_n, where the first factor permutes the rows and the second permutes the columns of any matrix in B_n; this action preserves the set of doubly stochastic matrices and thus maps B_n to itself. The full combinatorial automorphism group of B_n is the wreath product S_n \wr C_2 \cong (S_n \times S_n) \rtimes C_2 for n \geq 3, incorporating an additional transposition of rows and columns. Regarding vertices, their explicit enumeration corresponds to generating all elements of S_n, which is computationally feasible only for small n due to the factorial growth: full lists are known up to n=10 (with $3,628,800$ vertices), but becomes intractable beyond that without specialized algorithms.^[21]

Structure of the Polytope

The Birkhoff polytope B_n, defined as the convex hull of all n \times n permutation matrices, possesses a well-defined combinatorial structure arising from its connection to the symmetric group S_n. Its vertices correspond precisely to the n! permutation matrices, each representing a perfect matching in the complete bipartite graph K_{n,n}. These vertices lie in the affine subspace of \mathbb{R}^{n^2} where all row and column sums equal 1, ensuring the polytope's embedding within the set of doubly stochastic matrices.^[22] The edges of B_n connect pairs of vertices whose corresponding permutation matrices differ by exactly one 2-cycle, meaning they agree everywhere except in a $2 \times 2 submatrix, where one matrix has 1s on the diagonal and the other on the anti-diagonal. This structure reflects the geometry of convex combinations supported on a single derangement of length 2, yielding a graph that is the Cayley graph of S_n generated by all transpositions. For instance, in B_3, this results in a graph with 6 vertices and 9 edges. More generally, the 1-skeleton captures adjacent permutations under single swaps, with the number of edges scaling combinatorially with n.^[23] The facets of B_n are defined by the n^2 inequalities x_{ij} \geq 0 for $1 \leq i,j \leq n, each corresponding to a supporting hyperplane where one entry is fixed at zero while satisfying the row and column sum constraints. These Birkhoff-von Neumann inequalities, combined with the affine equalities, provide a complete H-description of the polytope, confirming that no additional inequalities are needed to bound it tightly. Each such facet is itself a Birkhoff polytope of lower dimension, isomorphic to B_{n-1} in structure after contraction.^[22] The dimension of B_n is (n-1)^2. To see this, observe that the n^2 variables are subject to $2n linear equalities from the row and column sums equaling 1, but these are affinely dependent (their sum is n on both sides), yielding $2n - 1 independent constraints and thus an ambient affine space of dimension n^2 - 2n + 1 = (n-1)^2. The polytope achieves full dimensionality in this space, as demonstrated by the affine independence of the relative interiors of its faces or by verifying that the differences between permutation matrices span the full (n-1)^2-dimensional tangent space—for example, by restricting to the top-left (n-1) \times (n-1) submatrix, which maps onto the full unit hypercube after normalization.^[23] For small n, the structure exhibits simplicial properties in its low-dimensional faces. In B_2, a 1-dimensional segment with 2 vertices and 2 facets (each a point), all faces are simplices. For B_3, a 4-dimensional polytope with 6 vertices, 9 edges, and 9 facets, the faces include simplices and products of simplices, aligning with classifications of combinatorial types that appear in Hanner polytopes—iterated products of simplices known for their extremal volume properties among unconditional polytopes. These simplicial aspects facilitate computational enumeration and highlight B_n's role in broader polytope theory.^[22]

Core Theorem

Birkhoff–von Neumann Theorem

The Birkhoff–von Neumann theorem states that every n \times n doubly stochastic matrix A can be decomposed as a convex combination of permutation matrices:

A = \sum_{k=1}^m \lambda_k P_k,

where m is finite, \lambda_k > 0 for each k, \sum_{k=1}^m \lambda_k = 1, and each P_k is an n \times n permutation matrix.^[24] This decomposition establishes that the Birkhoff polytope, the convex hull of all n \times n permutation matrices, coincides exactly with the set of all n \times n doubly stochastic matrices. The theorem was originally proved by Garrett Birkhoff in 1946 as part of his work on linear algebra, where it arose in the context of doubly stochastic transformations and their relation to permanents. Independently, John von Neumann provided a proof in 1953 while studying zero-sum games and the optimal assignment problem, offering a more direct algorithmic perspective that generalized Birkhoff's insights to optimization settings. Regarding the number of terms m in the decomposition, Carathéodory's theorem applied to the dimension n^2 - 2n + 1 of the Birkhoff polytope implies that m \leq n^2 - 2n + 2 suffices for any doubly stochastic matrix. For sparse fully indecomposable matrices with \tau nonzero entries, Brualdi strengthened this to m \leq \tau - 2n + 2.^[25]^[26]

Implications and Interpretations

The Birkhoff–von Neumann theorem asserts that every doubly stochastic matrix can be expressed as a convex combination of permutation matrices. Combinatorially, the theorem interprets doubly stochastic matrices as fractional perfect matchings in the complete bipartite graph K_{n,n}, where each permutation matrix corresponds to an integral perfect matching, and the convex combination weights represent the fractional contributions.^[16] This perspective directly connects to Hall's marriage theorem, which ensures the existence of such perfect matchings in the support graph of the matrix whenever the Hall condition holds for subsets of rows and columns, thereby linking linear algebraic decompositions to graph-theoretic matching problems.^[3] Algorithmically, the theorem facilitates the decomposition of doubly stochastic matrices into permutation matrices, which supports optimization over the Birkhoff polytope \mathcal{D}_n by reducing problems to evaluations at vertices, and aids in approximating solutions to combinatorial tasks like scheduling where sparse representations minimize the number of terms.^[27] While the decomposition itself does not directly compute the permanent, it enables bounding techniques for the permanent of doubly stochastic matrices, as the polytope structure implies permanents lie between known minima and maxima achieved at extreme points. The Birkhoff–von Neumann decomposition is generally not unique, as multiple distinct sets of permutation matrices and weights can yield the same doubly stochastic matrix, though the extreme points of \mathcal{D}_n—precisely the permutation matrices—form a unique vertex set.^[28] In the context of total positivity, the theorem underpins decompositions of oscillation matrices, which are sign-regular matrices with specific oscillation properties; Samuel Karlin utilized such decompositions to analyze totally nonnegative kernels in stochastic processes, showing how doubly stochastic matrices inherit positivity through convex combinations of permutation matrices. Regarding majorization, the theorem reveals that doubly stochastic matrices preserve the majorization order: if vector x majorizes y (denoted y \prec x), then for any doubly stochastic P, Py \prec Px, as the convex hull characterization ensures transformations remain within the majorization lattice. This preservation, rooted in the Hardy–Littlewood–Pólya theorem, underscores the role of doubly stochastic matrices in inequality theory and resource allocation.

Proof Techniques

Key Lemmas and Preliminaries

The proof of the Birkhoff–von Neumann theorem relies on foundational results from bipartite matching theory and specific properties of the support graph associated with a doubly stochastic matrix A, defined as the bipartite graph G_A with row vertices U = \{1, \dots, n\}, column vertices V = \{1, \dots, n\}, and edges \{i,j\} where a_{ij} > 0. These lemmas ensure the existence of perfect matchings in G_A and enable an iterative reduction of A via subtraction of scaled permutation matrices. Hall's marriage theorem provides the condition for perfect matchings in bipartite graphs. It states that G_A admits a perfect matching if and only if, for every subset S \subseteq U, the neighborhood N(S) \subseteq V satisfies |N(S)| \geq |S|, with a symmetric condition for subsets of V. For a doubly stochastic matrix A, the row and column sums of 1 imply that no Hall violator exists: if |N(S)| < |S| for some S \subseteq U, the total mass from rows in S would exceed the capacity of columns in N(S), contradicting the column sums. Thus, G_A always has a perfect matching, corresponding to a permutation \sigma such that a_{i,\sigma(i)} > 0 for all i. This lemma is essential for selecting permutations in the decomposition process.^[3] König's theorem establishes the equality between the maximum matching size \nu(G_A) and the minimum vertex cover size \tau(G_A) in bipartite graphs. Combined with Hall's theorem showing \nu(G_A) = n, König's theorem confirms \tau(G_A) = n and aids in analyzing subgraphs during induction, ensuring no bottlenecks in matching sizes below n.^[29] A key reduction lemma addresses positive entries in non-permutation doubly stochastic matrices. If A has at least one positive entry and is not a permutation matrix, then there exists a scalar \alpha > 0 and a permutation matrix P_\sigma such that A' = A - \alpha P_\sigma is nonnegative, doubly stochastic, and has strictly fewer positive entries than A. To obtain it, apply Hall's theorem to find a perfect matching in G_A, yielding permutation \sigma with a_{i,\sigma(i)} > 0 for all i; set \alpha = \min_i a_{i,\sigma(i)} > 0. Then A' preserves row and column sums of 1 (as P_\sigma does), remains nonnegative (since \alpha is the minimum along the matching), and has fewer positives because at least the minimizing entry becomes zero. This enables inductive decomposition by reducing the support size.^[3] The cycle lemma handles the special case of cyclic supports. If the support graph G_A consists of a single even cycle of length $2k(as bipartite graphs have no odd cycles), thenAdecomposes as a convex combination\beta Q_1 + (1 - \beta) Q_2of two permutation matricesQ_1, Q_2, where Q_1andQ_2correspond to the two alternating perfect matchings along the [cycle](/page/Cycle), and\beta$ is chosen to match the entry values while preserving sums (e.g., by solving the system along the cycle edges). This simple two-term decomposition serves as a base for handling connected cyclic components in more general supports during the iterative process. The greedy selection lemma facilitates efficient choice of matchings by ensuring flexibility in including specific edges. In G_A, every edge \{i,j\} with a_{ij} > 0 lies in some perfect matching. To see this, pre-match i to j and apply Hall's theorem to the reduced graph on U \setminus \{i\}, V \setminus \{j\}; the original sums of 1 ensure no violator arises, as any potential bottleneck would contradict the full matching existence. This allows greedy selection of a perfect matching containing the smallest positive entry in A, minimizing \alpha in the reduction lemma and bounding the decomposition length by the number of positives.^[28]

Detailed Proof Construction

The proof of the Birkhoff–von Neumann theorem can be constructed inductively by dimension n, starting with the base case n=1. For an n \times n doubly stochastic matrix A = (a_{ij}) with a_{ij} \geq 0, \sum_j a_{ij} = 1, and \sum_i a_{ij} = 1 for all i,j, the $1 \times 1 case is trivial since

A = {{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}}

, which is the unique permutation matrix. Assume the theorem holds for all (n-1) \times (n-1) doubly stochastic matrices. For the n \times n case, if A is a permutation matrix, the decomposition is immediate. Otherwise, consider the bipartite graph G with bipartition (R, C) where R = \{r_1, \dots, r_n\} represents rows and C = \{c_1, \dots, c_n\} represents columns, with an edge between r_i and c_j if and only if a_{ij} > 0. This support graph G has no isolated vertices since each row and column sum is 1. To ensure a perfect matching exists in G, apply Hall's marriage theorem: for any subset S \subseteq R, the neighborhood N(S) \subseteq C satisfies |N(S)| \geq |S|, because the total entry sum from rows in S is |S| and cannot exceed the column sum capacity of N(S), which is at most |N(S)|; a symmetric argument holds for subsets of C. Thus, there exists a perfect matching corresponding to a permutation \sigma such that a_{i,\sigma(i)} > 0 for all i.^[3] Let \lambda = \min_{1 \leq i \leq n} a_{i,\sigma(i)} > 0, and let P^\sigma be the permutation matrix with 1s at positions (i, \sigma(i)). Define the residual matrix B = A - \lambda P^\sigma. Then B has nonnegative entries (since \lambda is the minimum along the support of P^\sigma), and both row and column sums of B are $1 - \lambda. Normalizing gives the doubly stochastic matrix A' = B / (1 - \lambda). This yields the decomposition step A = \lambda P^\sigma + (1 - \lambda) A'. The matrix A' has strictly fewer positive entries than A because at least the positions where a_{i,\sigma(i)} = \lambda become zero in B (and remain zero in A'). If A' has a zero row or column (which occurs if \lambda zeros out an entire row/column support in some cases), the problem reduces to an (n-1) \times (n-1) submatrix by removing that row and column, applying the induction hypothesis, and embedding back into n \times n with identity blocks. Otherwise, repeat the process on A'.^[30] The process terminates because each iteration reduces the number of positive entries, and a doubly stochastic matrix has at most n^2 positive entries initially. More tightly, by the Marcus–Ree theorem applied to the dimension of the Birkhoff polytope (which has affine dimension n^2 - 2n + 1) and Carathéodory's theorem on convex hulls, any decomposition requires at most n^2 - 2n + 2 permutation matrices, ensuring at most that many iterations before reaching the zero matrix. The full decomposition is then the convex combination accumulated over these steps: A = \sum_{k=1}^m \lambda_k P_k where \sum \lambda_k = 1 and each P_k is a permutation matrix.^[27] In numerical implementations of this algorithmic proof, floating-point arithmetic introduces challenges to stability, as small rounding errors can create tiny negative entries (violating nonnegativity) or spurious small positives (altering the support graph and potentially leading to non-terminating or incorrect matchings). To mitigate this, a tolerance \epsilon > 0 (e.g., $10^{-10}) is typically used: treat entries with |a_{ij}| < \epsilon as zero for support determination, clip negatives to zero, and renormalize residuals after each subtraction. This ensures the computed decomposition approximates the exact one within \mathcal{O}(\epsilon n^2) Frobenius norm error, though excessive iterations may amplify accumulation errors, necessitating refinement via methods like the Hungarian algorithm on the final residual.^[31]

Applications

Combinatorial Optimization

Doubly stochastic matrices play a central role in the linear programming relaxation of the assignment problem, where the objective is to minimize the cost \langle C, X \rangle over all X in the Birkhoff polytope \mathcal{B}_n, the convex hull of n \times n permutation matrices. This relaxation is equivalent to finding a minimum-cost perfect matching in a complete bipartite graph with cost matrix C, as the optimum is attained at a vertex of \mathcal{B}_n, which corresponds to a permutation matrix.^[32] The Hungarian algorithm solves this problem efficiently by iteratively adjusting dual potentials to satisfy complementary slackness, effectively navigating the structure of the Birkhoff polytope. The Birkhoff-von Neumann decomposition further aids this process in algorithmic variants, where successive applications of the Hungarian algorithm extract permutation matrices from the residual doubly stochastic matrix, enabling sparse approximations or exact solutions in related optimization settings.^[33] A natural generalization of the Birkhoff polytope is the transportation polytope, consisting of all m \times n nonnegative matrices with prescribed row sums r \in \mathbb{R}^m_+ and column sums s \in \mathbb{R}^n_+ such that \sum r_i = \sum s_j. When r = s = \mathbf{1}_n (the all-ones vector), this reduces precisely to the set of n \times n doubly stochastic matrices, forming \mathcal{B}_n. Extreme points of general transportation polytopes are characterized by network flow properties, generalizing the permutation matrix vertices of \mathcal{B}_n, and enable formulations for unbalanced assignment or supply-demand optimization problems.^[34] Recent applications leverage doubly stochastic relaxations in integer programming for scheduling tasks. In high-speed circuit switch scheduling, traffic demands are modeled as doubly stochastic matrices, and the Birkhoff-von Neumann decomposition yields sparse sequences of permutation matrices approximating the demand, minimizing reconfiguration overhead while achieving near-optimal throughput (e.g., 7% improvement over prior methods for low-error regimes). Similarly, doubly stochastic models have been explored as convex relaxations for quadratic assignment problems, where integer solutions are recovered via decomposition, although computational costs pose challenges for large-scale instances.^[35]^[36] For example, minimizing \langle C, X \rangle over X \in \mathcal{B}_n directly solves the bipartite minimum-cost matching problem, as the optimal X is a permutation matrix encoding the assignment.^[32]

Probability and Markov Chains

In probability theory, doubly stochastic matrices serve as transition probability matrices for Markov chains where both rows and columns sum to one, ensuring that the uniform distribution over the state space remains invariant under the chain's dynamics. Specifically, if P is a doubly stochastic transition matrix for a finite-state Markov chain, then the uniform probability vector \pi = (1/n, \dots, 1/n) satisfies \pi P = \pi, making it a stationary distribution.^[37] This property holds because the column sums of P being one implies that the expected value under \pi preserves the total probability mass equally across states. For irreducible and aperiodic chains with such matrices, the uniform distribution is the unique stationary measure, facilitating analysis in symmetric systems like random walks on regular graphs.^[37] Doubly stochastic matrices also play a central role in coupling probability measures within optimal transport theory, particularly when coupling distributions with identical marginals. In the discrete setting with uniform marginals on finite sets of equal size, a coupling \gamma between two probability measures \mu and \nu (both uniform on $$) corresponds to a doubly stochastic matrix, where \gamma_{ij} represents the joint probability mass transported from the i-th point of the first set to the j-th point of the second. The set of all such couplings forms the Birkhoff polytope, and optimal transport problems over this set minimize the expected cost \sum_{i,j} \gamma_{ij} c_{ij} subject to the doubly stochastic constraints. This framework is essential for comparing distributions while preserving marginal uniformity, with applications in statistics and machine learning for tasks like generative modeling.^[38] Regarding entropy and mixing properties, doubly stochastic matrices achieve maximum entropy among row-stochastic matrices with a fixed support pattern, as the iterative proportional fitting procedure (also known as Sinkhorn's algorithm) yields the unique doubly stochastic matrix that minimizes the Kullback-Leibler divergence to the uniform matrix over the support, thereby maximizing the entropy H(P) = -\sum_{i,j} p_{ij} \log p_{ij}. This maximum entropy configuration promotes rapid mixing in Markov chains, as the uniform transitions on the support minimize information loss per step while maintaining the doubly stochastic structure. In random walks, this leads to efficient convergence to the uniform stationary distribution, quantified by the second-largest eigenvalue modulus being bounded away from 1 in expander graphs.^[39] Applications extend to random walks on finite groups, where the Cayley graph's adjacency matrix, when normalized by the degree, yields a doubly stochastic transition matrix if the generating set is symmetric, enabling derandomization techniques via expander constructions. For instance, in the 2020s, deterministic approximations of random walk probabilities on expanders have advanced derandomization of algorithms, allowing logarithmic-space computation of mixing behaviors without full randomness, as shown in high-precision estimation methods for undirected graphs. These results leverage the spectral gap of doubly stochastic expanders to simulate pseudorandom walks efficiently in space-bounded computation.^[40]^[41] In machine learning, doubly stochastic attention mechanisms in transformer models, such as those based on expected sliced transport plans, enforce balanced probabilities to enhance training stability and avoid token over-concentration (as of 2025).^[42] A representative example is the uniform mixing matrix for the complete graph K_n, where the transition matrix P for the simple random walk has p_{ij} = 1/(n-1) for i \neq j and p_{ii} = 0; this is doubly stochastic since both row and column sums equal 1, and it exhibits rapid mixing to the uniform distribution from any starting state, illustrating ideal entropy maximization and rapid convergence.^[37]

Generalizations and Extensions

Higher-Order or Multi-Marginal Cases

Multi-marginal bistochastic tensors, also known as multistochastic tensors, generalize doubly stochastic matrices to higher-order settings with more than two indices. These are non-negative tensors of order k \geq 3 and size n \times \cdots \times n (with k dimensions) such that the sum of entries over any (k-1)-dimensional slice is the all-ones vector of length n, ensuring uniform marginals on each index. The set of such tensors forms a polytope, whose structure extends the Birkhoff polytope of doubly stochastic matrices. For k=3 (triply stochastic or plane-stochastic tensors), the polytope P_n has dimension n^3 - 3n + 2 and n^3 facets, reflecting increased complexity compared to the bipartite case where the dimension is (n-1)^2. A key combinatorial characterization involves the convex hull of permutation tensors, which are the higher-order analogs of permutation matrices with exactly one 1 in each "line" across all indices and 0s elsewhere. Unlike the k=2 case, where the Birkhoff–von Neumann theorem states that every doubly stochastic matrix is a convex combination of permutation matrices, the extreme points of the multistochastic polytope for k > 2 include additional non-permutation tensors. A necessary and sufficient condition for a multistochastic tensor to decompose into a finite convex combination of permutation tensors is provided by a generalized Birkhoff–von Neumann theorem, which identifies extremal elements beyond simple permutations. This extension highlights that the polytope's vertices grow super-exponentially with n and k, with facet complexity scaling as O(n^k) in general, complicating algorithmic decomposition. The Ryser extension generalizes the Birkhoff theorem to settings with multi-set marginals using inclusion-exclusion principles, originally developed for term ranks and permanents of non-negative matrices with prescribed row and column sums. In higher-order contexts, this approach, building on Jurkat and Ryser's work, enables decomposition of polystochastic arrays (with multiple uniform sums per index) into combinations of generalized permutation hypercubes via counting transversals and zero-permanent conditions derived from inclusion-exclusion.^[43] Recent developments (2020–2025) have applied multi-marginal bistochastic structures to optimal transport problems in machine learning, particularly through entropic regularization to mitigate computational challenges. Entropic multi-marginal optimal transport scales transport plans with Boltzmann-Shannon entropy penalties, transforming the problem into tensor scaling solvable via multi-marginal Sinkhorn iterations, which converge at rates depending on the regularization parameter. This has enabled applications in multimodal representation learning, where multi-marginal matching losses align multiple data views more effectively than pairwise methods, improving self-supervised tasks on datasets like ImageNet. A concrete example arises for the 3-marginal case with n=2, corresponding to the polytope P_2 of dimension $8 - 6 + 2 = 4 and 6 vertices, which relates to quantum information theory for three qubits. Here, doubly normalized tensors represent joint measurements with uniform marginals on each qubit's outcomes, and their decomposition into permutation tensors corresponds to jointly measurable observables, with non-decomposable cases illustrating incompatibility in quantum tomography.^[44]

Infinite-Dimensional Settings

In infinite-dimensional settings, doubly stochastic operators generalize the finite-dimensional concept to function spaces such as L^1([0,1]) or \ell^p spaces, where they preserve relevant measures like the Lebesgue measure or counting measures. A prototypical example is an integral operator Tf(x) = \int_0^1 K(x,y) f(y)\, dy defined by a doubly stochastic kernel K, which is a non-negative measurable function satisfying \int_0^1 K(x,y)\, dy = 1 for almost every x \in [0,1] and \int_0^1 K(x,y)\, dx = 1 for almost every y \in [0,1]. Such operators map the constant function 1 to itself and preserve the integral of functions, ensuring they act as Markov operators on the space of probability densities. Von Neumann proposed a continuous analog of the Birkhoff–von Neumann theorem for these settings, suggesting that every doubly stochastic measure—a Borel probability measure \pi on [0,1] \times [0,1] whose marginals are the Lebesgue measure—can be decomposed as a convex integral of "permutation measures," which are measures supported on the graphs of measure-preserving transformations of [0,1]. However, this conjecture does not hold in general; counterexamples exist where extremal doubly stochastic measures cannot be expressed solely as such integrals, often requiring more complex supports beyond single graphs. These counterexamples highlight that the set of extreme points includes nontrivial measures constructed via measure-preserving transformations that are not invertible or bipartite.^[45]^[46] In ergodic theory, doubly stochastic operators play a key role when induced by nonsingular measure-preserving transformations, linking to the Birkhoff ergodic theorem, which guarantees pointwise convergence of ergodic averages for powers of such operators under suitable mixing conditions. For instance, if the operator arises from a transformation T preserving the Lebesgue measure up to equivalence, the theorem implies that time averages converge almost everywhere to spatial averages, facilitating the study of mixing properties in infinite-dimensional dynamical systems. This connection underscores the operators' utility in analyzing long-term behavior without finite support assumptions.^[47]^[48] Modern extensions in functional analysis consider non-commutative doubly stochastic maps on von Neumann algebras, defined as completely positive, trace-preserving maps that also preserve the identity. These generalize classical majorization and have been characterized in terms of joint majorization for finite von Neumann algebras, with applications to quantum information theory, such as entanglement measures and stochastic processes in non-commutative L_p-spaces. Recent work establishes that such maps on semifinite von Neumann algebras admit decompositions analogous to the classical case but adapted to operator-valued settings, preserving quantum entropies under certain conditions. Challenges in these infinite-dimensional frameworks include the absence of a finite bound on the "rank" or number of terms in any decomposition, as supports can be uncountably infinite, and the heavy reliance on measure disintegration theorems to express kernels in terms of conditional probabilities. Unlike the finite case, extremal operators may not correspond directly to permutations, complicating algorithmic constructions and requiring advanced tools from descriptive set theory.^[49]^[50]

References

[1]
Doubly Stochastic Matrix -- from Wolfram MathWorld
A doubly stochastic matrix is a matrix A=(a_(ij)) such that a_(ij)>=0 and sum_(i)a_(ij)=sum_(j)a_(ij)=1 is some field for all i and j.Missing: definition | Show results with:definition
[2]
https://www.combinatorics.org/ojs/index.php/eljc/article/download/v22i1p67/pdf
[3]
[PDF] Birkhoff's Theorem
Theorem(Birkhoff) Every doubly stochastic matrix is a convex combination of permutation matrices. The proof of Birkhoff's theorem uses Hall's marriage theorem.
[4]
On Spectral Properties of Doubly Stochastic Matrices
The main objective of this article is to present some useful theorems, concerning the spectral properties of doubly stochastic matrices.
[5]
[PDF] 2.9 An Application to Markov Chains - Auburn University
11 A stochastic matrix is doubly stochas- tic if all the row sums also equal 1. Find the steady-state vector for a doubly stochastic matrix. Exercise 2.9.12 ...
[6]
[PDF] PERMANENTS OF DOUBLY STOCHASTIC MATRICES - MSpace
The permanent of a matrix is the sum of all products of entries on each of n! diagonals. A doubly stochastic matrix has non-negative entries and row/column ...
[7]
Doubly stochastic matrices (Chapter 2) - Cambridge University Press
A square matrix is called doubly stochastic if all entries of the matrix are nonnegative and the sum of the elements in each row and each column is unity.
[8]
Doubly Stochastic Matrix - an overview | ScienceDirect Topics
A doubly stochastic matrix is a square matrix of nonnegative real numbers with each row and column adding up to 1. Illustration. □. A right stochastic matrix.
[9]
[PDF] IEOR 8100: Scheduling Algorithms Lecture 4
Sep 16, 2016 · A permutation matrix is a square 0 − 1 matrix with exactly one 1 in each row and column. Obviously, a permutation matrix is a doubly stochastic ...
[10]
[PDF] Lecture 15. Van der Waerden's conjecture and comput
Feb 27, 2018 · It is quite easy to see that the permanent of a doubly stochastic matrix cannot be more than 1. In. 1926, Van der Waerden conjectured that Per(1.
[11]
[PDF] BOUNDS ON PERMANENTS, AND THE NUMBER OF 1-FACTORS ...
Bregman's upper bound. Now we turn to upper bounds. It is easy to see that the maximum permanent over the doubly stochastic matrices is 1, which is attained ...
[12]
Interpretation for the determinant of a stochastic matrix?
Sep 15, 2014 · The magnitude of the determinant of a stochastic matrix must be between 0 and 1 inclusive. It's equal to 1 if and only if the matrix is a permutation matrix.Bound of the permanent of a stochastic matrix - Math Stack ExchangeProof that the largest eigenvalue of a stochastic matrix is 1More results from math.stackexchange.com
[13]
Convex sets of doubly stochastic matrices - ScienceDirect.com
Let Ω denote the set of all n by n doubly stochastic matrices and let m be a positive integer. Then the set Ω m = {A ϵ Ω : 0 ⩽ a ij ⩽ 1 m , 1 ⩽ i, ...
[14]
Permutation Matrix -- from Wolfram MathWorld
A permutation matrix is a matrix obtained by permuting the rows of an n×n identity matrix according to some permutation of the numbers 1 to n.
[15]
AMS :: Bulletin of the American Mathematical Society
16, 280-302. Garrett Birkhoff, Three observations on linear algebra, Univ. Nac. Tucumán. Revista A. 5 (1946), 147–151 (Spanish) ...
[16]
[PDF] Hall's Matching Theorem 1. Perfect Matching in Bipartite Graphs
Theorem 2. Every doubly stochastic matrix is a convex combination (weighted average) of permutation matrices. Every magic square of weight d is the sum of d ( ...
[17]
[PDF] A Course in Combinatorial Optimization
Nov 9, 2004 · A matrix is called doubly stochastic if it is nonnegative and each row sum and each ... combinatorial optimization, Combinatorica 1 (1981) 169–197 ...
[18]
Tres observaciones sobre el algebra lineal - ScienceOpen
Tres observaciones sobre el algebra lineal. Author(s): G. BIRKHOFF, Birkhoff, G. Birkhoff, D Birkhoff, GD Birkhoff. Publication date: 1946.Missing: pdf | Show results with:pdf
[19]
[PDF] Faces of Birkhoff polytopes
Apr 14, 2013 · The Birkhoff polytope Bn is the convex hull of all (n×n) permutation matrices, i.e., matrices that have precisely one 1 in each row and ...
[20]
Formulas for the volumes of the polytope of doubly - ResearchGate
Mar 23, 2015 · We provide an explicit combinatorial formula for the volume of the polytope of n×n doubly-stochastic matrices, also known as the Birkhoff ...
[21]
[PDF] A property of the Birkhoff polytope - arXiv
The Birkhoff polytope Bn is the convex hull of all n × n permutation matrices in Rn×n. We compute the combinatorial symmetry group of the Birkhoff polytope.
[22]
Faces of Birkhoff Polytopes | The Electronic Journal of Combinatorics
Mar 13, 2015 · This is a widely studied polytope with various applications throughout mathematics. In this paper we study combinatorial types L ...
[23]
[PDF] arXiv:1304.3948v1 [math.CO] 14 Apr 2013
Apr 14, 2013 · The Birkhoff polytope Bn has dimension (n − 1)2 with n! vertices ... This is a simple d-polytope with 2d vertices and 2d facets. More ...
[24]
Notes on Birkhoff–von Neumann decomposition of doubly stochastic ...
May 15, 2016 · We investigate the problem of computing a decomposition with the minimum number of permutation matrices and show that the associated decision problem is ...
[25]
[PDF] arXiv:2110.12337v2 [math.CO] 8 Nov 2021
Nov 8, 2021 · The Krein-Milman Theorem asserts that every polytope is the convex hull of its extreme points. Then what are those for Ln? The above example Q ...
[26]
[PDF] Notes on Birkhoff-von Neumann decomposition of doubly ... - Hal-Inria
Abstract: Birkhoff-von Neumann (BvN) decomposition of doubly stochastic matrices expresses a double stochastic matrix as a convex combination of a number of ...
[27]
[PDF] Minimum Birkhoff-von Neumann Decomposition
Minimum Birkhoff-von Neumann decomposition seeks to represent a doubly stochastic matrix using the smallest number of sub-permutation matrices, where the ...
[28]
[PDF] Greedy algorithms for computing the Birkhoff decomposition
A square matrix is called doubly stochastic if it has nonnegative entries, and the sum of entries in each row and in each column is one. Birkhoff Theorem ...
[29]
[PDF] Chapter 5 Flows in Networks
Generally a ”greedy” form of selection does not produce maximum weight. In the simplest of situations one can see the severe drawbacks of such an attitude:.
[30]
[PDF] 1 Bipartite matching - Stanford CS Theory
Theorem 7 (Birkhoff - von Neumann) Every doubly stochastic matrix is a convex combina- tion of permutation matrices. 3 Using LP duality with bipartite matching.Missing: implications | Show results with:implications
[31]
proof of Birkhoff-von Neumann theorem - PlanetMath.org
Mar 22, 2013 · First, we prove the following lemma: Lemma: A convex combination of doubly stochastic matrices is doubly stochastic. Proof:
[32]
[PDF] Approximate Birkhoff-von-Neumann decomposition - OpenReview
The Birkhoff-von-Neumann (BvN) decomposition is a standard tool used to draw permutation matrices from a doubly stochastic (DS) matrix. The BvN decomposition ...
[33]
[PDF] Linear Assignment Problems and Extensions ∗ - TU Graz
Due to a famous result of Birkhoff [36], the assignment polytope PA is the convex hull of all assignments: Theorem 1.1 (Birkhoff [36], 1946). The vertices of ...<|separator|>
[34]
[PDF] Birkhoff's Decomposition Revisited: Sparse Scheduling for High ...
Nov 5, 2020 · The main techni- cal contribution is Theorem 1, which establishes that the num- ber of permutation matrices in Birkhoff's approach increases.
[35]
[PDF] EXTREME POINTS OF CERTAIN TRANSPORTATION POLYTOPES ...
Jan 9, 2020 · Abstract. Transportation matrices are m×n nonnegative matrices with given row sum vector R and column sum vector S. All such matrices form ...
[36]
[PDF] Lecture 5: Lowerbound on the Permanent and Application to TSP
In this lecture we are interested in lower-bounding the permanent in different special case of doubly stochastic matrices. A (real-valued) matrix A is called ...
[37]
[PDF] Sparse Scheduling for High-Speed Circuit Switches - NSF-PAR
Aug 13, 2021 · Section V shows how to use Frank-Wolfe algorithms to decompose a doubly stochastic matrix, and how Frank-Wolfe chooses a permutation matrix that provides “ ...
[38]
[PDF] doubly stochastic matrix models - arXiv
Apr 5, 2023 · A Doubly Stochastic Matrix (DSM) is a matrix D = [dij]n×n of non ... By definition, any row and any column of a DSM is a multinomial1 ...
[39]
[PDF] Markov Chains and Mixing Times, second edition David A. Levin ...
Page 1. Markov Chains and Mixing Times, second edition. David A. Levin. Yuval Peres. With contributions by Elizabeth L. Wilmer. University of Oregon. E-mail ...
[40]
[PDF] Statistical Optimal Transport - Sinho Chewi
Let µ, ν be two probability measures over Rd and let γ be a coupling between ... doubly stochastic matrices, also known as the Birkhoff polytope: Birk ...
[41]
Markov chains with doubly stochastic transition matrices and ...
Ergodic Markov chains with doubly stochastic matrices, where the system follows asymptotically all paths with the same probability, are discussed in Section 3.4 ...
[42]
[PDF] Expander Graphs - Harvard SEAS
Thus it is natu- ral to apply Matrix Decomposition, writing the random-walk matrix for an arbitrary expander H as a convex combination of the random-walk ...
[43]
[PDF] High-precision Estimation of Random Walks in Small Space - arXiv
Mar 11, 2022 · Abstract. In this paper, we provide a deterministic ˜O(log N)-space algorithm for estimating random walk probabilities on undirected graphs, ...
[44]
https://arxiv.org/abs/1809.07366
[45]
Joint measurability meets Birkhoff-von Neumann's theorem - arXiv
Sep 19, 2018 · We extrapolate this analogy to define a generalisation of doubly stochastic matrices that we call doubly normalised tensors (DNTs), and ...Missing: tomography | Show results with:tomography
[46]
Counterexamples to some conjectures about doubly stochastic measures.
- **Paper**: "Counterexamples to some conjectures about doubly stochastic measures" by V. Losert, Pacific J. Math. 99(2): 387-397 (1982).
[47]
[PDF] Supports of Extremal Doubly and Triply Stochastic Measures
Sep 9, 2007 · Useful, if dated, overview of the study of doubly stochastic matrices; provides references to Birkhoff & von Neumann's original papers. 11.
[48]
Measure-theoretic and topological entropy of operators on function ...
Mar 8, 2005 · We study the entropy of actions on function spaces with the focus on doubly stochastic operators on probability spaces and Markov operators ...
[49]
The Shannon–McMillan theorem for doubly stochastic operators
Nov 2, 2012 · In this paper we introduce and study an information function for doubly stochastic operators. This generalization of the notion, which is well ...
[50]
Generalizations of the Birkhoff-von Neumann Theorem - MathOverflow
Nov 26, 2009 · The famous Birkhoff-von Neumann theorem asserts that every doubly stochastic matrix can be written as a convex combination of permutation matrices.Birkhoff's theorem about doubly stochastic matricesBirkhoff – von Neumann for "$k$-stochastic matrices"More results from mathoverflow.net
[51]
Doubly (sub)stochastic operators on ℓ p spaces - ScienceDirect.com
Jun 1, 2021 · An important tool in the study of majorization for infinite dimensional spaces ℓ p ( I ) are doubly stochastic operators. These operators ...