Fact-checked by Grok 2 weeks ago

Matrix chain multiplication

Matrix chain multiplication is a classic in that involves determining the most efficient way to compute the product of a sequence of two or more matrices by finding the optimal order of multiplication to minimize the total number of scalar multiplications required.
Matrix is associative, allowing different parenthesizations of the product—such as ((A₁A₂)A₃) versus (A₁(A₂A₃))—but the computational cost varies significantly based on the dimensions of the matrices involved, as multiplying two matrices of sizes p×q and q×r requires pqr scalar multiplications.
For example, multiplying four matrices with dimensions 30×1, 1×40, 40×10, and 10×25 can cost as few as 1,400 or as many as 41,200 multiplications depending on the order chosen.
The problem is efficiently solved using dynamic programming, which leverages the property: the optimal cost for multiplying a subchain of matrices can be used to compute the cost for larger chains.
Let M[i,j] denote the minimum number of scalar multiplications needed to compute the product of matrices Aᵢ through Aⱼ; the is M[i,j] = min_{i ≤ k < j} (M[i,k] + M[k+1,j] + d_{i-1} · d_k · d_j), where d is the array of matrix dimensions.
A bottom-up dynamic programming algorithm fills an n×n table in O() time by iterating over chain lengths and possible split points, also tracking the optimal parenthesization for reconstruction.
As a foundational example of dynamic programming, matrix chain multiplication illustrates key concepts like overlapping subproblems and memoization, and it has practical applications in areas such as compiler optimization for expression evaluation, database query processing, and scientific computing tasks involving chained linear algebra operations like random walks and matrix factorization.

Fundamentals

Matrix Multiplication Basics

Matrix multiplication is an operation that combines two matrices to produce a third, provided the number of columns in the first matrix matches the number of rows in the second. For matrices A of dimensions p \times q and B of dimensions q \times r, the product C = AB is a p \times r matrix. Each element c_{ij} in C is computed as the sum of the products of corresponding elements from the i-th row of A and the j-th column of B: c_{ij} = \sum_{k=1}^{q} a_{ik} b_{kj} This process involves taking the dot product of the row vector from A and the column vector from B for every pair of indices i and j. The computational cost of this naive algorithm is O(p q r) scalar multiplications and a comparable number of additions, as each of the p r elements requires q multiplications and q-1 additions. For square matrices where p = q = r = n, this simplifies to O(n^3). A key property of matrix multiplication is its associativity: for compatible matrices A, B, and C, (AB)C = A(BC). This holds because the operation aligns intermediate dimensions correctly, ensuring the result is independent of parenthesization order. To illustrate, consider the 2×2 matrices A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}, \quad B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix}. The product C = AB has elements: c_{11} = 1 \cdot 5 + 2 \cdot 7 = 19, \quad c_{12} = 1 \cdot 6 + 2 \cdot 8 = 22, c_{21} = 3 \cdot 5 + 4 \cdot 7 = 43, \quad c_{22} = 3 \cdot 6 + 4 \cdot 8 = 50. Thus, C = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix}. This example requires 8 multiplications and 4 additions, consistent with the O(2^3) = O(8) complexity.

Chain Multiplication Problem

The chain multiplication problem arises when computing the product of a sequence of two or more matrices, where the order of pairwise multiplications significantly impacts the total computational cost due to the non-commutative nature of matrix operations and varying dimensions. Given a chain of n matrices A_1 A_2 \cdots A_n, where A_i has dimensions d_{i-1} \times d_i for i=1 to n, the objective is to find a parenthesization of the product that minimizes the number of scalar multiplications required. This optimization is crucial because matrix multiplication is associative but not commutative, allowing multiple ways to group the operations, each yielding potentially different intermediate matrix sizes and thus varying costs. A naive strategy is to perform the multiplications strictly from left to right, as in ((A_1 A_2) A_3) \cdots A_n. Under this approach, the cost accumulates as follows: after the first multiplication yields a d_0 \times d_1 result, each subsequent step multiplies the growing d_0 \times d_k intermediate by A_{k+1} (d_k \times d_{k+1}), incurring d_0 d_k d_{k+1} scalar multiplications for k=1 to n-1, for a total cost of \sum_{k=1}^{n-1} d_0 d_k d_{k+1}. However, this may be suboptimal, as alternative groupings can reduce the cost by keeping intermediate dimensions smaller. The total number of possible parenthesizations is exponential, specifically the (n-1)th C_{n-1} = \frac{1}{n} \binom{2(n-1)}{n-1}, reflecting the combinatorial explosion due to associativity. Consider three matrices A ($10 \times 30), B ($30 \times 5), and C ($5 \times 60). The left-associative (AB)C first computes AB ($10 \times 5) at a cost of $10 \cdot 30 \cdot 5 = 1500 multiplications, then multiplies the result by C at a cost of $10 \cdot 5 \cdot 60 = 3000, for a total of $4500. In contrast, A(BC)first computesBC (30 \times 60) at 30 \cdot 5 \cdot 60 = 9000, then multiplies by Aat10 \cdot 30 \cdot 60 = 18000, totaling &#36;27000—demonstrating how poor grouping can increase costs by a factor of six. This problem is motivated by practical scenarios in linear algebra and numerical computations, such as successive matrix transformations in computer graphics, robotics, and scientific simulations, where matrix dimensions often vary irregularly across the chain, making efficient ordering essential for performance. The standard approach to solving it efficiently is dynamic programming.

Dynamic Programming Solution

Problem Formulation

The matrix chain multiplication problem is formally defined for a sequence of n matrices A_1, A_2, \dots, A_n, where each matrix A_i has dimensions d_{i-1} \times d_i for i = 1, 2, \dots, n, with the dimension sequence given by d_0, d_1, \dots, d_n. The goal is to determine the optimal parenthesization of the product A_1 A_2 \dots A_n that minimizes the total number of scalar multiplications required. This optimization arises because matrix multiplication is associative, permitting multiple ways to group the operations, but the computational cost varies depending on the order due to the dimensional constraints. This dynamic programming approach was first described by Godbole in 1973. Let m(i,j) denote the minimum number of scalar multiplications needed to compute the matrix product of the subchain from A_i to A_j (for $1 \leq i \leq j \leq n). The base case is m(i,i) = 0 for all i, since no multiplication is required for a single matrix. The recurrence relation for the optimal cost is: m(i,j) = \min_{k=i}^{j-1} \left[ m(i,k) + m(k+1,j) + d_{i-1} \cdot d_k \cdot d_j \right] for j > i, where the term d_{i-1} \cdot d_k \cdot d_j represents the cost of multiplying the resulting matrices from the subchains A_i \dots A_k and A_{k+1} \dots A_j. This formulation captures the optimal substructure of the problem, as the minimum cost for a subchain depends on the minimum costs of its partitions. To reconstruct the optimal parenthesization corresponding to this minimum cost, an auxiliary array s(i,j) is used to record the optimal split point, defined as s(i,j) = k^* where k^* is the value of k that achieves the minimum in the recurrence for m(i,j). The full parenthesization can then be derived recursively from the s values, starting from s(1,n). This tracking ensures that the solution not only computes the minimum cost but also specifies the exact grouping of matrix multiplications.

Recursive Solution

The recursive solution employs a top-down approach to compute the minimum number of scalar multiplications required to multiply a chain of matrices from index i to j, denoted as m[i, j]. This is achieved by defining a recursive function that tries all possible split points k between i and j-1, recursively solving the subchains A_i \cdots A_k and A_{k+1} \cdots A_j, and adding the cost of multiplying the resulting matrices, which is p_{i-1} \times p_k \times p_j, where p is the array of matrix dimensions. The function returns the minimum over all such k. m[i, j] = \begin{cases} 0 & \text{if } i = j \\ \min_{i \leq k < j} \left( m[i, k] + m[k+1, j] + p_{i-1} p_k p_j \right) & \text{if } i < j \end{cases} \end{align*}$$ This naive recursive implementation leads to exponential time complexity, specifically $T(n) = O(n \cdot 2^n)$, because it recomputes the same subproblems multiple times due to overlapping substructures in the recursion tree. To optimize, memoization is introduced using a 2D table $m[1..n, 1..n]$ initialized to undefined values, where each computed $m[i, j]$ is stored and retrieved on subsequent calls, avoiding redundant recursions. This reduces the time complexity to $O(n^3)$ while using $O(n^2)$ space, as there are $O(n^2)$ unique subproblems, each requiring $O(n)$ work to solve. The following pseudocode illustrates the memoized recursive function: ``` MATRIXCHAIN(i, j) if m[i, j] ≠ undefined return m[i, j] if i == j m[i, j] ← 0 else m[i, j] ← ∞ for k ← i to j-1 q ← MATRIXCHAIN(i, k) + MATRIXCHAIN(k+1, j) + p_{i-1} * p_k * p_j if q < m[i, j] m[i, j] ← q return m[i, j] ``` To invoke, call `MATRIXCHAIN(1, n)` for $n$ matrices. Consider an example with four matrices $A(10 \times 20)$, $B(20 \times 30)$, $C(30 \times 40)$, $D(40 \times 30)$, so dimensions $p = [10, 20, 30, 40, 30]$. The initial call `MATRIXCHAIN(1,4)` tries splits at $k=1,2,3$: - For $k=1$: `MATRIXCHAIN(1,1)=0` + `MATRIXCHAIN(2,4)` + $10 \times 20 \times 30 = 6000$. Now compute `MATRIXCHAIN(2,4)`: tries $k=2$: 0 + `MATRIXCHAIN(3,4)` + $20 \times 30 \times 30 = 18000$; `MATRIXCHAIN(3,4)` tries $k=3$: 0+0 + $30 \times 40 \times 30 = 36000$, so min for (3,4)=36000, thus for $k=2$: 18000+36000=54000; for $k=3$: `MATRIXCHAIN(2,3)` + 0 + $20 \times 40 \times 30 = 24000$; `MATRIXCHAIN(2,3)`: min over $k=2$: 0+0 + $20 \times 30 \times 40 = 24000$. So (2,3)=24000, thus for $k=3$: 24000+24000=48000. Min for (2,4)=48000. Total for top $k=1$: 0+48000+6000=54000. - For $k=2$: `MATRIXCHAIN(1,2)` + `MATRIXCHAIN(3,4)` + $10 \times 30 \times 30 = 9000$; (1,2): min $k=1$: 0+0 + $10 \times 20 \times 30 = 6000$; (3,4)=36000 as above. Total: 6000+36000+9000=51000. - For $k=3$: `MATRIXCHAIN(1,3)` + 0 + $10 \times 40 \times 30 = 12000$; (1,3): tries $k=1$: 0 + (2,3)=24000 + $10 \times 20 \times 40 = 8000$=32000; $k=2$: (1,2)=6000 + 0 + $10 \times 30 \times 40 = 12000$=18000. Min (1,3)=18000. Total for $k=3$: 18000+0+12000=30000. With [memoization](/page/Memoization), subproblems like (2,3) and (3,4) are computed once and reused (e.g., (3,4) used in multiple paths), yielding the overall minimum of 30000 for the full chain. The [recursion](/page/Recursion) tree prunes redundant branches via table lookups, filling the table progressively. ### Bottom-Up Algorithm The bottom-up dynamic programming algorithm for the matrix chain multiplication problem computes the minimum multiplication cost and the corresponding optimal parenthesization by iteratively filling two two-dimensional tables: one for the costs and one for the split points. Let the input be a sequence of $n$ matrices $A_1, A_2, \dots, A_n$ with dimensions given by the [array](/page/ARRAY) $p = [p_0, p_1, \dots, p_n]$, where $A_i$ is $p_{i-1} \times p_i$. The algorithm initializes an $n \times n$ cost table $m$ and a split table $s$ such that $m = 0$ for $i = 1$ to $n$ (no cost for a single matrix) and $s$ is [undefined](/page/Undefined).[](https://ieeexplore.ieee.org/document/5009182) The tables are filled in order of increasing chain length $l$, starting from $l = 2$ up to $l = n$. For each $l$, the algorithm iterates over all possible starting indices $i = 1$ to $n - l + 1$, setting $j = i + l - 1$. For each subchain from $A_i$ to $A_j$, it computes the minimum cost by trying all possible split points $k$ from $i$ to $j-1$: m = \min_{i \leq k < j} \left( m + m[k+1] + p_{i-1} \cdot p_k \cdot p_j \right) and records the optimal $k$ in $s = \arg\min_k$ of the above expression. This ensures that shorter subchains are solved before longer ones, building up the solution without [recursion](/page/Recursion).[](https://ieeexplore.ieee.org/document/5009182) Once the tables are filled, the optimal parenthesization for the full chain (from $i=1$ to $j=n$) is reconstructed using the $s$ table via a recursive procedure. The function PRINT-OPTIMAL-PARENTHESIZATION($i, j$) is defined as: if $i = j$, return "$A_i$"; else, let $k = s$, recursively print the parenthesization for $(i, k)$, add a [multiplication sign](/page/Multiplication_sign), and recursively print for $(k+1, j)$, enclosing the results in parentheses. This traces back the optimal splits to produce the fully parenthesized expression.[](https://ieeexplore.ieee.org/document/5009182) To illustrate, consider the example with dimensions $p = [30, 35, 15, 5, 10, 20, 25]$ for six matrices $A_1$ to $A_6$. The cost table $m$ is partially filled as follows (excerpt for key entries; rows and columns indexed from 1 to 6): | | 1 | 2 | 3 | 4 | 5 | 6 | |-----|-------|--------|-------|-------|--------|--------| | **1** | 0 | 15750 | 7875 | 9375 | 11875 | 15125 | | **2** | | 0 | 2625 | 4375 | 7125 | 10500 | | **3** | | | 0 | 750 | 2500 | 5375 | | **4** | | | | 0 | 1000 | 3500 | | **5** | | | | | 0 | 5000 | | **6** | | | | | | 0 | The split table $s$ records values such as $s[1,2] = 1$, $s[1,3] = 1$, $s[1,4] = 3$, $s[1,5] = 3$, $s[1,6] = 3$, $s[4,6] = 5$. The minimum cost for the full chain is $m[1,6] = 15125$, and the optimal parenthesization is $((A_1 (A_2 A_3)) ((A_4 A_5) A_6))$.[](http://cs.bme.hu/thalg/dynamic2.pdf) This bottom-up method corresponds to the memoized recursive approach but eliminates function call overhead by directly computing all subproblems in a predetermined order.[](https://ieeexplore.ieee.org/document/5009182) ### Complexity Analysis The dynamic programming solution for the matrix chain multiplication problem exhibits a time complexity of $O(n^3)$, where $n$ is the number of matrices, stemming from the three nested loops in the bottom-up implementation: one over chain lengths $l$ from 2 to $n$, one over starting indices $i$ from 1 to $n-l+1$, and one over split points $k$ from $i$ to $i+l-2$, with each loop bounded by $O(n)$ iterations.[](https://ieeexplore.ieee.org/document/5009182/) This cubic dependence ensures efficient computation for moderate chain lengths. The [space complexity](/page/Space_complexity) is $O(n^2)$, primarily due to the two-dimensional arrays $m[1\dots n, 1\dots n]$ storing minimum multiplication costs and $s[1\dots n, 1\dots n]$ recording optimal parenthesization splits.[](https://ieeexplore.ieee.org/document/5009182/) Unlike the naive recursive formulation, which incurs exponential [time complexity](/page/Time_complexity) from repeatedly solving identical subproblems without [memoization](/page/Memoization), the dynamic programming approach leverages [optimal substructure](/page/Optimal_substructure)—where optimal solutions to subchains compose into global optima—and [overlapping subproblems](/page/Overlapping_subproblems) to eliminate redundancy, yielding the [polynomial](/page/Polynomial) bound.[](https://arxiv.org/abs/2104.01777) In practice, for $n = 100$, the $O(n^3)$ operations total approximately $10^6$, rendering the algorithm computationally feasible on standard hardware. The method is [polynomial](/page/Polynomial) in $n$ but exhibits pseudopolynomial behavior relative to matrix dimension sizes, as cost entries involve products of potentially large dimensions, though arithmetic operations are typically assumed constant-time.[](https://www.cs.umd.edu/class/spring2025/cmsc451-0101/Lects/lect10-dp-mat-mult.pdf) Space optimizations exist, such as in-place updates or a one-dimensional array for costs alone (reducing to $O(n)$ space when parenthesization is unnecessary), though the standard implementation retains $O(n^2)$ for full reconstruction.[](https://www.cs.utexas.edu/~vlr/papers/soda06.pdf) ## Advanced Algorithms ### Hu and Shing Method The Hu and Shing method provides an O(n log n) [algorithm](/page/Algorithm) for optimally parenthesizing a [chain](/page/Chain) of matrices. The cost function satisfies the convex quadrangle inequality, which guarantees the existence of a unique optimal solution structure amenable to efficient computation.[](https://epubs.siam.org/doi/10.1137/0211028) A central [insight](/page/Insight) is that optimal multiplication splits are non-crossing, mirroring the triangulation of a [convex polygon](/page/Convex_polygon) where vertices correspond to the dimensions. The [algorithm](/page/Algorithm) models the chain as a [convex polygon](/page/Convex_polygon) and uses efficient data structures, such as priority queues, to select and process candidate splits, merging subchains by repeatedly choosing the lowest-cost compatible merge and updating costs accordingly. This continues until the full [triangulation](/page/Triangulation) is obtained, yielding the optimal parenthesization.[](https://epubs.siam.org/doi/10.1137/0211028) The [time complexity](/page/Time_complexity) is $O(n \log n)$, dominated by priority queue operations (insertions and extractions) using a [binary heap](/page/Binary_heap) to manage up to O(n) candidate merges.[](https://epubs.siam.org/doi/10.1137/0211028) **Example**: Consider the dimensions $[1, 2, 3, 5, 8]$, for matrices $A_1(1 \times 2)$, $A_2(2 \times 3)$, $A_3(3 \times 5)$, $A_4(5 \times 8)$. The algorithm identifies the initial lowest-cost merge as $A_1$ and $A_2$ (cost $1 \cdot 2 \cdot 3 = 6$), producing a $1 \times 3$ submatrix. Next, it merges this with $A_3$ (cost $1 \cdot 3 \cdot 5 = 15$), yielding a $1 \times 5$ submatrix. Finally, it merges with $A_4$ (cost $1 \cdot 5 \cdot 8 = 40$), for a total cost of 61 and parenthesization $((A_1 A_2) A_3) A_4$.[](https://epubs.siam.org/doi/10.1137/0211028) ### Other O(n log n) Algorithms In addition to the Hu and Shing method, which applies specifically to cases with [convex](/page/Convex) cost structures, alternative exact algorithms achieve O(n log n) [time complexity](/page/Time_complexity) for the general matrix chain multiplication problem using divide-and-conquer paradigms that exploit the inherent properties of the [cost function](/page/Cost_function).[](https://epubs.siam.org/doi/10.1137/18M1195401) A prominent example is the simplified algorithm developed by [Wang](/page/Wang), Zhu, and [Tian](/page/Tian), which recursively partitions the chain and identifies the optimal split point k for each subchain via binary search guided by the quadrangle [inequality](/page/Inequality).[](https://ieeexplore.ieee.org/document/6553999) The quadrangle [inequality](/page/Inequality) ensures that the [cost function](/page/Cost_function) satisfies C(i,j) + C(i',j') ≤ C(i,j') + C(i',j) for i ≤ i' ≤ j ≤ j', implying monotonicity in the optimal split positions, which allows binary search to narrow down the range of potential k values efficiently.[](https://epubs.siam.org/doi/10.1137/18M1195401) This approach evaluates only O(log n) candidate splits per recursion level, with the [recursion](/page/Recursion) depth being O(log n), yielding the overall O(n log n) complexity without requiring assumptions beyond the standard [matrix multiplication](/page/Matrix_multiplication) costs.[](https://ieeexplore.ieee.org/document/6553999) To further optimize, these methods precompute tight bounds on the optimal k for each subchain by leveraging the monotonicity of costs—specifically, the fact that the optimal split k(i,j) is non-decreasing in j for fixed i—reducing the search space and avoiding exhaustive enumeration.[](https://epubs.siam.org/doi/10.1137/18M1195401) This technique applies to arbitrary dimension sequences, including non-monotonic ones where dimensions do not increase steadily. For illustration, consider a short chain with dimensions p = [10, 1, 100, 50], corresponding to matrices A (10×1), B (1×100), and C (100×50). The possible splits are (A)(B C) with cost 1×100×50 = 5000 or (A B)C with cost 10×1×100 + 10×100×50 = 100 + 50000 = 50100. In a recursive divide-and-conquer setting for longer chains including this subchain, binary search on k (from 1 to 2) quickly identifies the minimum at k=1 by evaluating midpoints and using monotonicity to prune, demonstrating how the method avoids linear scans even in cases with erratic dimension drops like 10 to 1.[](https://epubs.siam.org/doi/10.1137/18M1195401) Compared to earlier methods like Hu and Shing, these divide-and-conquer variants offer cleaner implementations with equivalent asymptotic performance but incorporate explicit logarithmic overhead from binary search, making them more accessible while preserving exact optimality for the general case.[](https://ieeexplore.ieee.org/document/6553999) ### Approximation Techniques Approximation algorithms for the matrix chain multiplication problem provide efficient ways to find near-optimal parenthesizations when the exact dynamic programming solution, which runs in O(n³) time, is impractical for very large n. These methods trade optimality for speed, offering multiplicative guarantees on the computation cost relative to the optimal solution. Seminal contributions include early heuristics that achieve constant-factor approximations in linear time, making them suitable for scenarios such as [streaming data](/page/Streaming_data) processing or preliminary computations where speed is prioritized over precision.[](https://arxiv.org/pdf/2303.17352) One of the earliest [approximation](/page/Approximation) algorithms is due to [Chandra](/page/Chandra), who in 1975 proposed an [O(n](/page/O(n)) method that guarantees a cost at most twice the optimal. The algorithm identifies the position m of the smallest [dimension](/page/Dimension) k_m among the chain dimensions p_0, p_1, ..., p_n. It then computes the product by parenthesizing the prefix chain from the first [matrix](/page/Matrix) to the m-th as a right-associative product (multiplying right-to-left) and the suffix from the (m+1)-th to the last as a left-associative product (left-to-right), before combining the two partial results. This greedy-like strategy prioritizes merging around the smallest dimension to bound the intermediate costs. The [approximation](/page/Approximation) ratio of 2 arises because each partial product cost is at most the optimal cost for that subchain plus an additional factor from the final multiplication, as analyzed in subsequent works.[](https://arxiv.org/pdf/2303.17352)[](https://www.cs.huji.ac.il/~odedsc/papers/SICOMP19-Revisiting.pdf) Chin improved upon this in 1978 with an O(n) algorithm that achieves a better approximation ratio of at most 1.25. The method involves four phases: initializing data structures and finding the smallest dimension; scanning the chain forward to associate matrices based on conditions that avoid high-cost merges (using lemmas on dimension ratios); nibbling from both ends to further group low-cost pairs; and finally applying Chandra's algorithm to the reduced chain. This preprocessing eliminates many suboptimal cases, leading to tighter bounds. For example, consider a chain with dimensions p = [40, 10, 20, 5, 30, 15], corresponding to matrices M₁ (40×10), M₂ (10×20), M₃ (20×5), M₄ (5×30), M₅ (30×15). Chin's algorithm might output a parenthesization like (M₁ (M₂ (M₃ M₄))) M₅, with a total cost of approximately 1.15 times the optimal, demonstrating its effectiveness in balancing merges.[](https://dl.acm.org/doi/10.1145/359545.359556)[](https://arxiv.org/pdf/2303.17352) Hu and Shing reformulated and analyzed Chin's approach in [1981](/page/1981) by reducing the matrix chain problem to optimal triangulation of a [convex polygon](/page/Convex_polygon), where each diagonal cost corresponds to a multiplication cost. Their O(n) heuristic proves a tighter approximation ratio of 1.155 for the algorithm, showing it is near-optimal in the worst case. Recent analyses as of 2025 have further improved variants to ratios under 1.143.[](https://www.sciencedirect.com/science/article/pii/0196677481900146)[](https://arxiv.org/pdf/2303.17352) This analysis confirms the method's reliability without altering the core steps, and it has influenced subsequent improvements. Together, these techniques—often collectively referred to in the literature—provide practical alternatives to exact methods, especially in applications like [compiler](/page/Compiler) optimization or large-scale scientific [computing](/page/Computing) where n exceeds thousands and exact solutions exceed time budgets.[](https://arxiv.org/pdf/2303.17352) Other approximation strategies include [greedy](/page/Greedy) heuristics based on dimension ratios, such as repeatedly merging adjacent pairs where the ratio of outer to inner dimensions is minimized to reduce intermediate sizes. These can achieve ratios better than 2 in practice but lack formal guarantees without additional preprocessing. [Linear programming](/page/Linear_programming) relaxations have also been explored for related chain optimization problems, yielding ratios like 4/3 in restricted cases, though they are less common for the standard matrix chain due to the problem's structure. These methods are particularly useful in streaming settings, where matrices arrive sequentially and full dynamic programming tables cannot be maintained.[](https://arxiv.org/pdf/2303.17352) ## Extensions ### Generalizations The matrix chain multiplication problem can be generalized to other associative binary operations beyond standard [scalar multiplication](/page/Scalar_multiplication), such as in algebraic structures where the operation defines a cost for combining elements in a [sequence](/page/Sequence).[](https://arxiv.org/pdf/1804.04021) Constrained variants of the problem incorporate additional [resource](/page/Resource) limitations, such as limited [parallelism](/page/Parallel) or [storage](/page/Storage), to optimize not just scalar operations but also practical [implementation](/page/Implementation) factors. In [parallel](/page/Parallel) settings, the goal shifts to minimizing the depth of the multiplication tree under constraints on the number of processors, where dynamic programming is augmented with scheduling heuristics to balance load and communication overhead, achieving near-optimal performance on distributed systems.[](https://link.springer.com/chapter/10.1007/978-3-031-74430-3_7) For storage-constrained environments, algorithms focus on reducing temporary matrix allocations during computation, using techniques like in-place operations or checkpointing to bound peak memory usage while maintaining minimal multiplication counts, particularly useful in [embedded](/page/Embedded) or memory-limited [hardware](/page/Hardware).[](https://umu.diva-portal.org/smash/get/diva2:1766984/FULLTEXT01.pdf) Higher-dimensional generalizations extend the problem to tensor [chain](/page/chain) multiplication, where sequences of multi-way arrays (tensors) are contracted along compatible modes to minimize computational [cost](/page/Cost), analogous to matrix cases but with costs scaling exponentially in dimensions. For sparse tensors, [parallel](/page/parallel) algorithms leverage [hardware](/page/Hardware) accelerators like Tensor [Core](/page/Core) Units (TCUs) to optimize mode contractions, reducing [time complexity](/page/time_complexity) from cubic to near-linear in sparsity patterns for applications in [machine learning](/page/machine_learning).[](https://ieeexplore.ieee.org/document/10159508) The matrix chain problem is isomorphic to the optimal polygon triangulation, where triangulating a convex polygon with $n+1$ vertices corresponds to parenthesizing $n$ matrices via dimensional mappings (e.g., vertex labels to matrix dimensions), and the cost of adding a diagonal equals the matrix multiplication cost, allowing the same $O(n^3)$ dynamic programming solution.[](https://arxiv.org/abs/2104.01777) It also relates to constructing optimal binary search trees (BSTs), where the DP formulation mirrors matrix chain ordering by minimizing weighted path lengths instead of multiplication costs, with subproblem overlaps enabling efficient computation of search frequencies.[](https://www.eecs.umich.edu/courses/eecs380/ALG/mat_chain.html) Theoretical extensions generalize the framework to algebraic [semiring](/page/Semiring)s, replacing standard addition and multiplication with semiring operations (e.g., min-plus for shortest paths or max-product for probabilistic inferences), preserving associativity to apply the [chain](/page/Chain) ordering [DP](/page/DP) for evaluating sequences in tropical or other semirings.[](https://www.ifi.uzh.ch/dam/jcr:06d7fd07-b484-4ce1-b38a-1e129ba513ef/EA21-2-Semirings.pdf) In [quantum computing](/page/Quantum_computing), the problem adapts to subroutine design for quantum [matrix multiplication](/page/Matrix_multiplication) circuits, using amplitude encoding and Chebyshev approximations to achieve faster evaluation of [chain](/page/Chain) products on quantum [hardware](/page/Hardware), with complexity scaling polylogarithmically in [matrix](/page/Matrix) dimensions for fault-tolerant implementations.[](https://www.nature.com/articles/s41598-025-13092-2) ### Applications Matrix chain multiplication finds practical applications across various computational domains, where optimizing the order of matrix products is crucial for reducing computational costs in large-scale calculations. The problem was first systematically formulated in the context of algorithm design in the [1970s](/page/1970s), drawing from [operations research](/page/Operations_research) principles for sequencing operations efficiently, such as in scheduling and [resource allocation](/page/Resource_allocation) tasks prevalent in the [1960s](/page/1960s). This foundational approach has since influenced optimization strategies in diverse fields, emphasizing minimal scalar multiplications for chains of compatible matrices. In compiler optimization, [matrix](/page/Matrix) chain multiplication is employed to determine the optimal parenthesization of [matrix](/page/Matrix) expression trees during [code generation](/page/Code_generation) for numerical libraries. For instance, libraries like [Armadillo](/page/Armadillo) use expression templates to automatically reorder chained [matrix](/page/Matrix) multiplications, minimizing temporary allocations and operations by evaluating the most efficient sequence at [compile time](/page/Compile_time).[](https://arma.sourceforge.net/docs.html) This technique is particularly valuable in scientific computing, where complex linear algebra expressions in user code are rewritten to leverage hardware accelerators like GPUs, achieving significant speedups in high-performance applications. In [computer graphics](/page/Computer_graphics) and [robotics](/page/Robotics), the method optimizes chains of transformation matrices representing rotations, scalings, and translations applied to vertices or joint configurations. In graphics pipelines, such as rendering scenes with hierarchical models, efficient ordering reduces the number of floating-point operations when composing multiple affine transformations, improving [real-time](/page/Real-time) [performance](/page/Performance). Similarly, in robotic [kinematics](/page/Kinematics), forward chain calculations for manipulator arms benefit from dynamic programming to find the minimal-cost multiplication order, enabling faster trajectory planning and simulation in control systems.[](https://www.researchgate.net/publication/368227120_Matrix_Chain_Multiplication_and_Equivalent_Reduced-Order_Parallel_Calculation_Method_for_a_Robotic_Arm) Machine learning applications leverage matrix chain multiplication to streamline computations in neural network forward passes and tensor-based models. For example, optimizing the order of layer-wise matrix products in deep networks minimizes intermediate tensor sizes, reducing memory usage during training on large datasets. This extends to tensor contractions in models like transformers, where generalizations of the algorithm, such as those implemented in opt_einsum, find near-optimal contraction paths to accelerate operations in scientific ML workflows. Implementations of matrix chain multiplication appear in numerical libraries and custom solvers for large-scale simulations. While core libraries like [NumPy](/page/NumPy) rely on BLAS for pairwise multiplications and perform chained operations sequentially, extensions and specialized tools incorporate dynamic programming for order optimization in domains like [climate](/page/Climate) modeling or quantum simulations, where chains of hundreds of matrices demand precise cost minimization to handle exascale computations efficiently.[](https://dl.acm.org/doi/10.1145/3654959)