Matrix norm

In linear algebra, a matrix norm is a function that extends the notion of a vector norm to the space of matrices, assigning a non-negative real number to each matrix to quantify its magnitude. Formally, for matrices A, B \in \mathbb{C}^{m \times n} and scalars \alpha \in \mathbb{C}, a matrix norm \|\cdot\|: \mathbb{C}^{m \times n} \to [0, \infty) satisfies three axioms: non-negativity, where \|A\| \geq 0 and \|A\| = 0 if and only if A = 0; absolute homogeneity, where \|\alpha A\| = |\alpha| \|A\|; and the triangle inequality, where \|A + B\| \leq \|A\| + \|B\|.^[1]^[2] Many matrix norms additionally satisfy submultiplicativity, ensuring \|AB\| \leq \|A\| \|B\| whenever the product AB is defined, which makes them compatible with matrix multiplication.^[2]^[1] Matrix norms are essential tools in numerical linear algebra, enabling the analysis of perturbation effects, stability of algorithms, and convergence rates in iterative methods.^[3]^[2] They facilitate the computation of condition numbers, such as \kappa(A) = \|A\| \cdot \|A^{-1}\| for invertible matrices, which quantify sensitivity to input errors.^[2] A key property is that all matrix norms on finite-dimensional spaces are equivalent, meaning any two norms differ by a bounded constant factor, though specific norms are chosen for their computational advantages or theoretical insights.^[2] Furthermore, for any consistent matrix norm, the spectral radius \rho(A) (the largest modulus of an eigenvalue) satisfies \rho(A) \leq \|A\|, providing bounds on eigenvalue magnitudes.^[1]^[2] Common types of matrix norms include induced norms, which derive from vector norms and measure the maximum gain in transforming vectors, such as the operator p-norm \|A\|_p = \sup_{x \neq 0} \frac{\|Ax\|_p}{\|x\|_p}.^[1]^[2] Notable induced norms are the 1-norm (maximum absolute column sum), \infty-norm (maximum absolute row sum), and 2-norm (largest singular value, \sigma_{\max}(A)), the latter being computationally intensive but central to spectral theory.^[1]^[2] The Frobenius norm, defined as \|A\|_F = \sqrt{\sum_{i,j} |a_{ij}|^2}, resembles the Euclidean vector norm when viewing the matrix as a vectorized form and is easy to compute, requiring O(mn) operations; it is also unitarily invariant, preserving values under orthogonal transformations.^[3]^[2] These norms find applications in optimization, signal processing, and machine learning, where they regularize models or assess data transformations.^[3]

Definition and Basic Properties

Definition

In linear algebra, a matrix norm is a function \|\cdot\| that assigns to each m \times n matrix A with real or complex entries a non-negative real number \|A\|, satisfying the axioms of a norm on the vector space of such matrices: positivity, where \|A\| \geq 0 and \|A\| = 0 if and only if A is the zero matrix; absolute homogeneity, where \|cA\| = |c| \|A\| for any scalar c \in \mathbb{R} or \mathbb{C}; and the triangle inequality, where \|A + B\| \leq \|A\| + \|B\| for any conformable matrices A and B.^[4]^[5] Unlike vector norms, which quantify the magnitude of elements in a single vector space, matrix norms generalize this idea to matrices regarded as linear transformations between finite-dimensional vector spaces, providing a measure of their "size" or amplification effect on vectors.^[2]

Axiomatic Properties

A matrix norm on the space of m \times n matrices over the real or complex numbers is a function \|\cdot\|: \mathbb{C}^{m \times n} \to [0, \infty) that satisfies three fundamental axioms. These axioms ensure that the norm behaves consistently as a measure of matrix size, analogous to vector norms but adapted to the algebraic structure of matrices.^[6]^[2] The first axiom is positivity: \|A\| \geq 0 for every matrix A, with equality if and only if A = 0 (the zero matrix). This ensures that the norm distinguishes the zero matrix from all nonzero matrices and provides a non-negative scalar value, reflecting the intuitive notion of "size." A basic consequence follows directly: for any nonzero matrix A \neq 0, \|A\| > 0. To see this, suppose A \neq 0; then by positivity, \|A\| \neq 0, and since \|A\| \geq 0, it must hold that \|A\| > 0. This property is crucial for applications like error analysis in numerical linear algebra, where nonzero perturbations must yield positive norm increases.^[6]^[7] The second axiom is absolute homogeneity: for any scalar \alpha \in \mathbb{C} and matrix A, \|\alpha A\| = |\alpha| \|A\|. This scalability property implies that multiplying a matrix by a scalar amplifies its norm proportionally to the scalar's modulus, preserving the norm's role as a size measure under linear transformations. A simple proof of a related consequence, such as the norm of a scalar multiple, follows from repeated application: for integer k \geq 1, induct on \|\alpha^k A\| = |\alpha|^k \|A\|, using homogeneity twice for the base and inductive step. This extends to rational scalars and, by continuity arguments in normed spaces, to all complexes.^[6]^[2] The third axiom is the triangle inequality: for any matrices A and B of compatible dimensions, \|A + B\| \leq \|A\| + \|B\|. This subadditivity ensures that the norm of a sum is no larger than the sum of the norms, facilitating bounds on matrix operations like additions in algorithms. A key consequence is the subadditivity for finite sums: by induction, \left\| \sum_{i=1}^k A_i \right\| \leq \sum_{i=1}^k \|A_i\| for compatible matrices A_i. The base case k=1 is trivial, and assuming it holds for k-1, the step follows as \left\| \sum_{i=1}^k A_i \right\| = \left\| \left( \sum_{i=1}^{k-1} A_i \right) + A_k \right\| \leq \left\| \sum_{i=1}^{k-1} A_i \right\| + \|A_k\| \leq \sum_{i=1}^k \|A_i\|. This is essential for controlling error propagation in matrix summations.^[6]^[7]

Induced Matrix Norms

Norms Induced by Vector p-Norms

A matrix norm induced by a vector p-norm, for $1 \leq p \leq \infty, quantifies the maximum factor by which a matrix A \in \mathbb{C}^{m \times n} stretches vectors measured in the p-norm. Formally, it is defined as

\|A\|_p = \sup_{x \neq 0} \frac{\|Ax\|_p}{\|x\|_p} = \max_{\|x\|_p = 1} \|Ax\|_p,

where \|\cdot\|_p denotes the vector p-norm on \mathbb{C}^n and \mathbb{C}^m.^[8] This definition captures the operator interpretation, as it bounds the amplification of any nonzero input vector x.^[8] For p=1, the vector 1-norm is \|x\|_1 = \sum_{i=1}^n |x_i|, and the induced matrix norm simplifies to the maximum absolute column sum:

\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^m |a_{ij}|.

This value is computed by summing the absolute entries in each column of A and selecting the largest such sum.^[8] The norm arises because the supremum is achieved when x is a standard basis vector corresponding to the column with the maximum sum.^[8] For p=\infty, the vector \infty-norm is \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|, yielding the induced matrix norm as the maximum absolute row sum:

\|A\|_\infty = \max_{1 \leq i \leq m} \sum_{j=1}^n |a_{ij}|.

Computation involves summing the absolute entries in each row of A and taking the maximum.^[8] Here, the supremum is attained for x with components chosen to align with the signs in the row of largest sum.^[8] The case p=2 corresponds to the spectral norm, equal to the largest singular value of A.^[8] All induced p-norms are submultiplicative, satisfying \|AB\|_p \leq \|A\|_p \|B\|_p for compatible matrices A \in \mathbb{C}^{m \times k} and B \in \mathbb{C}^{k \times n}.^[8] To prove this, recall the definition \|A\|_p = \sup_{\|x\|_p=1} \|Ax\|_p. For any x with \|x\|_p = 1,

\|ABx\|_p = \|A(Bx)\|_p \leq \|A\|_p \|Bx\|_p,

since \|A\|_p bounds the action on any vector. Further,

\|Bx\|_p \leq \|B\|_p \|x\|_p = \|B\|_p,

as \|x\|_p = 1. Combining these gives

\|ABx\|_p \leq \|A\|_p \|B\|_p.

Taking the supremum over all such x yields

\|AB\|_p = \sup_{\|x\|_p=1} \|ABx\|_p \leq \|A\|_p \|B\|_p.

This establishes submultiplicativity for all induced norms.^[8]

Spectral Norm

The spectral norm of a matrix A, also known as the operator norm induced by the Euclidean vector norm, is defined as \|A\|_2 = \sigma_{\max}(A), where \sigma_{\max}(A) denotes the largest singular value of A.^[9] This norm quantifies the maximum scaling factor by which A can stretch a vector in the Euclidean space, specifically \|A\|_2 = \sup_{\|x\|_2 = 1} \|Ax\|_2.^[9] Equivalently, for a complex matrix A \in \mathbb{C}^{m \times n}, it is given by \|A\|_2 = \sqrt{\lambda_{\max}(A^* A)}, where A^* is the conjugate transpose of A and \lambda_{\max} is the largest eigenvalue.^[9] For real matrices, the conjugate transpose reduces to the ordinary transpose. To compute the spectral norm, one typically performs the singular value decomposition (SVD) of A = U \Sigma V^*, where U and V are unitary matrices and \Sigma is a diagonal matrix containing the singular values \sigma_1 \geq \sigma_2 \geq \cdots \geq 0 along its diagonal; thus, \|A\|_2 is the maximum entry on this diagonal. For square symmetric matrices A = A^T, the spectral norm simplifies to the maximum absolute eigenvalue, \|A\|_2 = \max_i |\lambda_i(A)|, since the singular values coincide with the absolute values of the eigenvalues in this case.^[9] Algorithms for SVD, such as the Golub-Reinsch method, enable efficient computation, with complexity O(\min(mn^2, m^2 n)) for an m \times n matrix. In applications, the spectral norm provides the tightest bound \|Ax\|_2 \leq \|A\|_2 \|x\|_2 for all vectors x, making it essential for analyzing the growth of errors in linear systems and iterative methods.^[9] It plays a key role in stability analysis of numerical algorithms, such as bounding the condition number \kappa_2(A) = \|A\|_2 \|A^{-1}\|_2 for invertible matrices, which measures sensitivity to perturbations. For instance, in control theory and signal processing, the spectral norm assesses system stability by evaluating the maximum gain of linear operators.^[9]

Norms Induced by Other Vector Norms

In addition to the standard p-norms, matrix norms can be induced by other vector norms, such as weighted Euclidean norms or mixed norms employing distinct vector norms on the input and output spaces. These are particularly advantageous for rectangular matrices, where the domain and range have different dimensions, allowing for tailored measures of operator size that account for varying scaling in each space. For an m × n matrix A, the induced norm is defined as \|A\|_{\alpha,\beta} = \sup_{x \neq 0} \frac{\|Ax\|_\beta}{\|x\|_\alpha}, where \|\cdot\|_\alpha is a vector norm on \mathbb{R}^n and \|\cdot\|_\beta is a vector norm on \mathbb{R}^m. This framework generalizes the consistent induced norms from p-norms by permitting asymmetry between input and output, which is essential for analyzing rectangular systems in numerical linear algebra.^[1] A prominent example is the β-norm, defined for a vector x \in \mathbb{R}^n with positive weights \beta = (\beta_1, \dots, \beta_n)^T > 0 as \|x\|_\beta = \sqrt{\sum_{i=1}^n \beta_i |x_i|^2}, a weighted Euclidean norm that emphasizes components based on the weights. The induced matrix β-norm for a square matrix is submultiplicative, satisfying \|AB\|_\beta \leq \|A\|_\beta \|B\|_\beta, and extends naturally to rectangular matrices when using compatible weights on the respective spaces. Explicit computation for certain mixed cases simplifies analysis; for instance, \|A\|_{1,\beta} = \max_{1 \leq j \leq n} \|A(:,j)\|_\beta, where A(:,j) denotes the j-th column, providing a bound tied to the largest weighted column length. Similarly, \|A\|_{\alpha,\infty} = \max_{1 \leq i \leq m} \|A(i,:)\|_\alpha^D, with \|\cdot\|_\alpha^D the dual norm, offers efficient evaluation for infinity-based outputs. These formulas facilitate bounds involving \min(m,n), such as scaling factors in error estimates for least squares problems.^[10]^[1] The α-norm follows analogously, using weights \alpha > 0 for \|x\|_\alpha = \sqrt{\sum_{i=1}^n \alpha_i |x_i|^2}, yielding an induced norm \|A\|_\alpha that is submultiplicative for square matrices and adaptable to rectangular ones via mixed \alpha-\beta pairings. Unlike p-norms, which apply uniform treatment across components, these weighted induced norms provide greater flexibility for ill-conditioned problems by incorporating domain-specific scaling, such as in stability analysis of iterative solvers or perturbation bounds for underdetermined systems. For example, choosing weights inversely proportional to singular values can mitigate amplification in rectangular pseudoinverses, enhancing numerical robustness without altering the underlying operator. Seminal developments in these norms trace to foundational work on subordinate norms, with explicit rectangular extensions formalized in modern numerical texts.^[10]

Entrywise Matrix Norms

Frobenius Norm

The Frobenius norm of an m \times n matrix A = (a_{ij}) is defined as

\|A\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2},

which represents the square root of the sum of the squares of the absolute values of all entries in A.^[11] This norm can equivalently be expressed using the trace of the matrix product A^* A, where A^* denotes the conjugate transpose of A:

\|A\|_F = \sqrt{\operatorname{trace}(A^* A)}.

^[9] The Frobenius norm arises naturally as the Euclidean norm on the vector space of matrices when they are vectorized, treating the matrix entries as components of a vector in \mathbb{R}^{mn} or \mathbb{C}^{mn}.^[9] The Frobenius norm possesses several important properties that make it useful in matrix analysis. It is unitarily invariant, meaning that for any unitary matrices U and V (of appropriate dimensions), \|U A V^*\|_F = \|A\|_F.^[9] Additionally, it is submultiplicative with respect to matrix multiplication: for compatible matrices A and B,

\|A B\|_F \leq \|A\|_F \|B\|_F.

^[9] The Frobenius norm also satisfies inequalities relating it to the spectral norm \|A\|_2, the largest singular value of A: specifically,

\|A\|_2 \leq \|A\|_F \leq \sqrt{\operatorname{rank}(A)} \, \|A\|_2.

These bounds highlight the Frobenius norm's position between the spectral norm and a scaled version thereof, depending on the matrix rank.^[9] Computationally, the Frobenius norm connects directly to the singular value decomposition (SVD) of A = U \Sigma V^*, where \Sigma = \operatorname{diag}(\sigma_1, \dots, \sigma_r) contains the singular values \sigma_i \geq 0 (with r = \min(m,n)). In this case,

\|A\|_F = \sqrt{\sum_{i=1}^r \sigma_i^2}.

This expression underscores the norm's role in quantifying the overall "energy" of the matrix through its singular values, independent of the unitary factors in the SVD.^[9] The Frobenius norm is efficiently computable without full SVD, simply by summing the squared entries, making it practical for numerical applications.^[11]

Max Norm

The max norm of an m \times n matrix A = (a_{ij}), also referred to as the entrywise infinity norm, is defined as

\|A\|_{\max} = \max_{1 \leq i \leq m, 1 \leq j \leq n} |a_{ij}|,

representing the largest absolute value among all entries of A.^[12] This norm is computationally straightforward, requiring only a single pass through the matrix elements to identify the maximum absolute entry, making it valuable for applications where bounding the magnitude of individual elements is essential, such as in perturbation analysis or error estimation in matrix approximations.^[12] As a matrix norm, \|A\|_{\max} satisfies the axioms of non-negativity (\|A\|_{\max} \geq 0), definiteness (\|A\|_{\max} = 0 if and only if A = 0), absolute homogeneity (\|\alpha A\|_{\max} = |\alpha| \|A\|_{\max} for scalar \alpha), and the triangle inequality (\|A + B\|_{\max} \leq \|A\|_{\max} + \|B\|_{\max}).^[12] However, it does not generally satisfy submultiplicativity (\|AB\|_{\max} \leq \|A\|_{\max} \|B\|_{\max}). A counterexample is the $2 \times 2 matrix A with all entries equal to 1, where \|A\|_{\max} = 1, but A^2 has all entries equal to 2, so \|A^2\|_{\max} = 2 > 1^2.^[12] The max norm relates to the Frobenius norm, which aggregates entry magnitudes via summation, through the inequalities

\|A\|_{\max} \leq \|A\|_F \leq \sqrt{mn} \, \|A\|_{\max}

for any m \times n matrix A, with the lower bound following from the Frobenius norm exceeding the largest entry and the upper bound arising from at most mn entries each at most \|A\|_{\max} in magnitude.^[2] These bounds highlight how the max norm provides a tight control on peak entry size compared to the Frobenius norm's overall energy measure.^[13]

Lp,q Norms

The \ell_{p,q} norms form a family of entrywise matrix norms for an m \times n matrix A = (a_{ij}), defined for $1 \leq p, q \leq \infty as

\|A\|_{p,q} = \left( \sum_{i=1}^m \left( \sum_{j=1}^n |a_{ij}|^p \right)^{q/p} \right)^{1/q},

with the understanding that the expression adapts to the cases p = \infty or q = \infty via the maximum norm in the inner or outer summation, respectively. This construction applies the \ell_p norm to each row of A to obtain a vector of length m, then takes the \ell_q norm of that vector. A prominent special case is the \ell_{2,1} norm, given explicitly by

\|A\|_{2,1} = \sum_{i=1}^m \sqrt{ \sum_{j=1}^n a_{ij}^2 },

which sums the Euclidean norms of the rows. This norm promotes row sparsity in matrices and is widely used in machine learning for multi-task feature selection via the group lasso regularization, where rows correspond to feature groups across tasks, encouraging entire rows to be zeroed out simultaneously.^[14] These norms satisfy the standard axioms of a norm and are thus convex functions on the space of matrices. However, they are not always submultiplicative, meaning \|AB\|_{p,q} \leq \|A\|_{p,q} \|B\|_{p,q} does not hold in general for matrix products AB. The Frobenius norm corresponds to the special case \|A\|_{2,2}. As all norms on the finite-dimensional space of matrices are equivalent, there exist dimension-dependent constants c_1, c_2 > 0 such that c_1 \|A\|_F \leq \|A\|_{p,q} \leq c_2 \|A\|_F; for the \ell_{2,1} norm specifically, \|A\|_F \leq \|A\|_{2,1} \leq \sqrt{m} \|A\|_F.

Operator and Schatten Norms

Schatten p-Norms

The Schatten p-norm of an m \times n matrix A, denoted \|A\|_p, is defined as the \ell_p-norm of its singular values \sigma_1(A) \geq \sigma_2(A) \geq \cdots \geq \sigma_{\min(m,n)}(A) \geq 0, given by

\|A\|_p = \left( \sum_{i=1}^{\min(m,n)} \sigma_i(A)^p \right)^{1/p}

for $1 \leq p < \infty, while for p = \infty, it is the spectral norm \|A\|_2 = \sigma_1(A). This family of norms generalizes several important matrix norms and arises in the study of ideals of compact operators on Hilbert spaces.^[15] Special cases include the nuclear norm for p=1, also known as the trace norm, defined as \|A\|_1 = \sum_{i=1}^{\min(m,n)} \sigma_i(A), which measures the total "mass" of the singular values. For p=2, the Schatten $2-norm coincides with the [Frobenius norm](/page/Frobenius_norm) |A|2 = \left( \sum{i=1}^{\min(m,n)} \sigma_i(A)^2 \right)^{1/2} = \left( \operatorname{trace}(A^* A) \right)^{1/2}. The case p=\inftyrecovers the [spectral norm](/page/spectral_norm), the largest singular value ofA$. Schatten p-norms are unitarily invariant, satisfying \|U A V\|_p = \|A\|_p for any unitary matrices U and V of appropriate dimensions, a property that follows from the invariance of singular values under unitary transformations. They are also submultiplicative, meaning \|A B\|_p \leq \|A\|_p \|B\|_p for compatible matrices A and B, with this holding particularly for the nuclear norm (p=1) in applications involving trace-class operators. In low-rank matrix approximation, the Schatten p-norms achieve their minimum distance to rank-k matrices via truncation of the singular value decomposition, as established by the generalization of the Eckart--Young--Mirsky theorem to unitarily invariant norms.^[16]

Monotone Norms

In the context of Hermitian matrices, a matrix norm \|\cdot\| is said to be monotone if, whenever A \geq B \geq 0 in the Loewner partial order (meaning A - B is positive semidefinite), it follows that \|A\| \geq \|B\|. This property ensures that the norm respects the natural ordering on the cone of positive semidefinite matrices, reflecting increases in "size" under the Loewner order. Monotone norms are particularly relevant for Hermitian matrices, where the Loewner order aligns with the ordering of eigenvalues, and they extend naturally to unitarily invariant norms on general matrices via singular values. Prominent examples of monotone norms include the spectral norm and the Ky Fan k-norms. The spectral norm \|A\|_2 = \lambda_{\max}(A) for positive semidefinite A is monotone, as the min-max theorem implies that \lambda_{\max}(A) \geq \lambda_{\max}(B) whenever A \geq B \geq 0. The Ky Fan k-norm, defined for $1 \leq k \leq n and an n \times n positive semidefinite matrix A as

\|A\|_{(k)} = \sum_{j=1}^k \lambda_j(A),

where \lambda_1(A) \geq \cdots \geq \lambda_n(A) \geq 0 are the eigenvalues of A, is also monotone. This follows from the variational characterization of eigenvalues: if A \geq B \geq 0, then \lambda_j(A) \geq \lambda_j(B) for each j = 1, \dots, k, ensuring the sum is non-decreasing. Notably, the spectral norm coincides with the Ky Fan 1-norm, linking these examples directly. Schatten p-norms restricted to Hermitian positive semidefinite matrices exhibit similar monotonicity, as the p-norm of the eigenvalue vector is increasing in each component. Monotone norms possess useful properties when restricted to positive semidefinite matrices, including submultiplicativity: for positive semidefinite A, B, \|AB\| \leq \|A\| \|B\|. This holds for the spectral norm due to its general operator norm submultiplicativity, and for Ky Fan k-norms via inequalities on products of singular values (or eigenvalues for Hermitian cases). Such norms are instrumental in optimization over convex cones, particularly in semidefinite programming, where they enable tractable convex relaxations. For instance, minimizing the nuclear norm (the Ky Fan n-norm, or Schatten 1-norm) promotes low-rank solutions in problems over the positive semidefinite cone, as in rank minimization heuristics.

Cut Norms

Cut norms, also known as cut semi-norms, arise in combinatorial optimization and graph theory as a measure emphasizing the maximum discrepancy in submatrix sums over subsets of rows and columns. For an n \times n real matrix A = (a_{ij}) with |a_{ij}| \leq 1 for all i, j \in , the cut norm is defined as

\|A\|_c = \sup_{S, T \subseteq } \frac{1}{n^2} \left| \sum_{i \in S, j \in T} a_{ij} \right|,

where the supremum is taken over all subsets S and T of = \{1, 2, \dots, n\}. This normalization ensures that the norm scales appropriately for dense matrices, bounding it between 0 and 1, and highlights structural imbalances rather than overall magnitude. The concept was introduced by Frieze and Kannan in their work on quick approximations for matrices, providing a foundation for analyzing dense graph problems through low-rank decompositions. Unlike many standard matrix norms, the cut norm is not submultiplicative, meaning \|AB\|_c \not\leq \|A\|_c \|B\|_c in general, which limits its use in certain algebraic settings but enhances its utility in discrete optimization where subset selections model cuts. It relates to the spectral norm via inequalities such as \|A\|_c \leq \|A\|_2 \leq 4 \|A\|_c for matrices with bounded entries and zero row/column sums, allowing spectral methods to bound cut norm approximations. A key property stems from the Grothendieck inequality, which bounds the integrality gap for semidefinite relaxations of cut norm maximization, enabling constant-factor approximations via randomized rounding. This connection, exploited by Alon et al., yields an O(\log n)-approximation algorithm for computing the cut norm, improving on earlier polynomial-time methods.^[17]^[18] In applications, cut norms are pivotal for approximating the max-cut problem in graphs, where for an adjacency matrix A derived from graph G, \mathrm{MAXCUT}(G) = \frac{1}{4} n^2 \|A\|_c, facilitating efficient algorithms that find induced subgraphs approximating the maximum cut value within a constant factor. More broadly, they support approximation schemes for quadratic programs of the form \max \sum_{i,j} a_{ij} x_i y_j over \{ -1, 1 \}-vectors x, y, by reducing to semidefinite programs whose solutions can be rounded using Grothendieck-based techniques. These tools, developed in the late 1990s and early 2000s, have influenced regularity lemmas and structural graph theory, enabling quasi-polynomial time algorithms for dense instances.^[17]^[18]

Submultiplicative and Consistent Norms

A submultiplicative matrix norm satisfies \|AB\| \leq \|A\| \|B\| for all matrices A, B where the product AB is defined. This property is crucial for bounding powers of matrices and analyzing convergence in series expansions. For square matrices, submultiplicativity often follows from consistency with a vector norm, as shown below.^[19]

Consistent Norms

A matrix norm \|\cdot\|_m on the space of m \times n matrices is said to be consistent with a vector norm \|\cdot\|_v on \mathbb{C}^n (or \mathbb{R}^n) if, for every matrix A \in \mathbb{C}^{m \times n} and every vector x \in \mathbb{C}^n,

\|Ax\|_v \leq \|A\|_m \|x\|_v.

This property ensures that the matrix norm bounds the amplification of the vector norm under linear transformations induced by A. The concept originates in the study of normed linear spaces and is fundamental for analyzing stability and conditioning in numerical linear algebra.^[19] The operator norm (or induced norm) \|\cdot\| generated by \|\cdot\|_v, defined as

\|A\| = \sup_{\|x\|_v = 1} \|Ax\|_v = \sup_{x \neq 0} \frac{\|Ax\|_v}{\|x\|_v},

is the minimal matrix norm consistent with \|\cdot\|_v. Specifically, it satisfies the consistency inequality by construction, and for any other matrix norm \|\cdot\|_m consistent with \|\cdot\|_v, the inequality \|A\|_m \geq \|A\| holds for all A. This minimality follows from the definition of the induced norm as the exact supremum of the operator's action on unit vectors. Thus, consistent norms provide upper bounds on the induced norm, which quantifies the maximum stretch factor of the linear map.^[19] Examples of consistent norms include the induced p-norms on matrices, which are consistent with the corresponding vector p-norms for $1 \leq p \leq \infty. For instance, the matrix 2-norm \|A\|_2 = \sigma_{\max}(A), the largest singular value of A, is consistent with the Euclidean vector norm. Another example is the Frobenius norm \|A\|_F = \sqrt{\sum_{i,j} |a_{ij}|^2} = \sqrt{\operatorname{trace}(A^* A)}, which is consistent with the vector 2-norm via the inequality \|Ax\|_2 \leq \|A\|_F \|x\|_2, provable using the Cauchy-Schwarz inequality applied to the entries or singular value decomposition. The Frobenius norm, while not induced, offers computational advantages due to its equivalence to the Euclidean norm on vectorized matrices.^[19]^[20]

Compatible Norms

\|Ax\|_n \leq \|A\| \cdot \|x\|_m

for every matrix A and every vector x in the domain.^[21] This property ensures that the matrix norm respects the underlying vector norms in measuring the amplification of vectors under linear transformations. Induced (or operator) norms, defined as

\|A\| = \sup_{x \neq 0} \frac{\|Ax\|_n}{\|x\|_m},

are compatible by construction and achieve equality in the inequality for at least one nonzero x.^[22] When the domain and codomain are the same finite-dimensional space equipped with the same vector norm \|\cdot\|, compatibility with this vector norm implies that the matrix norm is consistent and submultiplicative, satisfying \|AB\| \leq \|A\| \cdot \|B\| for all compatible square matrices A and B.^[21] To see this, apply compatibility twice: \|A(Bx)\| \leq \|A\| \cdot \|Bx\| \leq \|A\| \cdot \|B\| \cdot \|x\|, and taking the supremum over x \neq 0 yields the submultiplicativity. All induced norms on square matrices are thus both compatible and consistent.^[23] For square matrices, compatible norms are closely related to induced norms, which quantify the maximum gain of the operator relative to the vector norm, as in \|A\| = \sup_{x \neq 0} \|Ax\| / \|x\|. Induced norms for square matrices often arise in this compatible framework when analyzing stability in systems like quadratic forms or iterative methods.^[21]

Energy Norms for Square Matrices

For Hermitian positive semidefinite square matrices, the spectral norm \|A\|_2 = \lambda_{\max}(A) = \sup_{\|x\|_2 = 1} x^T A x (the largest eigenvalue, given by the maximum of the Rayleigh quotient) plays a key role in energy-related applications. This coincides with the operator norm induced by the Euclidean vector norm and satisfies \|Ax\|_2 \leq \|A\|_2 \|x\|_2 for all vectors x, ensuring consistency with the 2-norm. It is also submultiplicative: \|AB\|_2 \leq \|A\|_2 \|B\|_2 for square matrices A, B.^[9] In applications, particularly within finite element methods, this spectral norm bounds energy functionals associated with positive semidefinite stiffness matrices, enabling estimates of interpolation errors in the discrete energy norm (a vector norm \|e\|_A = \sqrt{e^T A e}) via the maximum eigenvalue or trace of local stiffness matrices.^[24]

Equivalence of Matrix Norms

General Equivalence Theorem

In the space of all m \times n real or complex matrices, which is a finite-dimensional vector space of dimension mn, any two matrix norms \|\cdot\|_1 and \|\cdot\|_2 are equivalent. Specifically, there exist positive constants c and C, depending only on m and n, such that c \|\mathbf{A}\|_1 \leq \|\mathbf{A}\|_2 \leq C \|\mathbf{A}\|_1 for every matrix \mathbf{A}.^[2]^[25] To prove this, it suffices to show equivalence relative to a fixed norm, such as the entrywise \ell_1-norm \|\mathbf{A}\|_1 = \sum_{i=1}^m \sum_{j=1}^n |a_{ij}|, as equivalence is transitive. Consider the standard basis matrices \mathbf{E}_{kl} for k=1,\dots,m and l=1,\dots,n, where \mathbf{E}_{kl} has a 1 in position (k,l) and zeros elsewhere. Any matrix \mathbf{A} = (a_{ij}) can be expressed as \mathbf{A} = \sum_{k=1}^m \sum_{l=1}^n a_{kl} \mathbf{E}_{kl}. By the triangle inequality, \|\mathbf{A}\|_2 \leq \sum_{k=1}^m \sum_{l=1}^n |a_{kl}| \|\mathbf{E}_{kl}\|_2 \leq \left( \max_{k,l} \|\mathbf{E}_{kl}\|_2 \right) \|\mathbf{A}\|_1, establishing an upper bound. For the reverse inequality, note that the unit sphere \{\mathbf{A} : \|\mathbf{A}\|_1 = 1\} is compact in the finite-dimensional topology induced by \|\cdot\|_1. The norm \|\cdot\|_2 is continuous with respect to \|\cdot\|_1 (by the above bound), so by the extreme value theorem, \|\cdot\|_2 attains a positive minimum c > 0 on this compact set, yielding c \|\mathbf{A}\|_1 \leq \|\mathbf{A}\|_2.^[25]^[2] This equivalence implies that all matrix norms induce the same topology on the finite-dimensional space of m \times n matrices, so a sequence of matrices converges with respect to one norm if and only if it converges with respect to any other. However, the equivalence constants c and C generally grow with the dimensions m and n, affecting the quantitative analysis of stability or conditioning in applications.^[2]

Specific Examples of Equivalence

The general equivalence theorem for matrix norms guarantees that any two norms on the space of m \times n matrices are equivalent, with explicit constants depending on the dimensions and norm definitions. These constants can be computed for common induced and entrywise norms, providing tight bounds that illustrate the theorem's implications for practical computations and error analysis.^[9] For the induced 1-norm and \infty-norm on an m \times n matrix A, the equivalence is given by

\|A\|_1 \leq m \|A\|_\infty \leq n \|A\|_1.

These bounds arise from the column-sum and row-sum definitions of the respective norms, with the factors m and n reflecting the number of terms in the column sums (m rows) and row sums (n columns), respectively. The constants are sharp, achieved for example by matrices with all entries equal to 1 in one column and zeros elsewhere (attaining \|A\|_1 / \|A\|_\infty = m) or all 1s in one row (attaining \|A\|_\infty / \|A\|_1 = n).^[9] The spectral norm (induced 2-norm) and Frobenius norm satisfy

\|A\|_2 \leq \|A\|_F \leq \sqrt{\min(m,n)} \|A\|_2

for an m \times n matrix A. The lower bound follows from the spectral norm being the largest singular value, while the upper bound uses the fact that the Frobenius norm is the Euclidean norm of the singular values vector, bounded by its \ell_2 norm times the square root of the rank. This relation is particularly useful in singular value decomposition contexts, where the Frobenius norm provides a computationally accessible proxy for the spectral norm.^[9] The entrywise maximum norm and Frobenius norm are related by

\|A\|_{\max} \leq \|A\|_F \leq \sqrt{mn} \|A\|_{\max}

for an m \times n matrix A, where \|A\|_{\max} = \max_{i,j} |a_{ij}|. The lower bound holds by the definition of the Frobenius norm as the \ell_2 norm of the entries, and the upper bound by Cauchy-Schwarz applied to the sum of squares, with \sqrt{mn} as the dimension-dependent factor. These inequalities highlight the maximum norm's role in entrywise perturbation analysis.^[9] Finally, the nuclear norm (Schatten 1-norm) and spectral norm satisfy

\|A\|_* \leq \min(m,n) \|A\|_2

for an m \times n matrix A. This follows directly from the nuclear norm being the sum of singular values and the spectral norm the maximum singular value, with the bound tight for rank-1 matrices scaled appropriately. The inequality underscores the nuclear norm's utility as a convex surrogate for low-rank promotion in optimization problems.^[9]