Fact-checked by Grok 2 weeks ago

Channel capacity

In , channel capacity represents the maximum rate at which information can be reliably communicated over a noisy channel using an optimal coding scheme, measured in bits per second. This concept, introduced by Claude E. Shannon in , establishes a fundamental limit on the achievable data transmission rate without errors exceeding an arbitrarily small probability, even in the presence of or . It serves as the theoretical foundation for designing efficient communication systems, ensuring that rates below this capacity allow for error-free transmission through proper encoding, while rates above it inevitably lead to uncorrectable errors. For discrete memoryless channels, channel capacity C is formally defined as the maximum mutual information between the input X and output Y, expressed as C = \max_{p(x)} I(X; Y), where the maximization is over all possible input probability distributions p(x). Shannon's noisy-channel coding theorem asserts that reliable communication is possible at rates up to C, with the equivocation (or uncertainty in the input given the output) bounded by H(X) - C for source entropy H(X) > C. In continuous-time channels, such as those with bandwidth W, signal power P, and noise power N, the capacity simplifies to C = W \log_2 \left(1 + \frac{P}{N}\right), known as the Shannon-Hartley theorem, which highlights the trade-off between bandwidth, signal-to-noise ratio, and achievable rate. Channel capacity has profound implications across telecommunications, data storage, and networking, guiding the development of error-correcting codes like turbo and LDPC codes that approach these limits in practice. Extensions to modern scenarios, including multiple-access and broadcast channels, further refine capacity bounds under constraints like power or interference, enabling advancements in wireless systems and 5G/6G technologies. Despite these theoretical ideals, real-world factors such as fading, multi-user interference, and computational complexity often require approximations, yet Shannon's framework remains the benchmark for reliability and efficiency.

Core Concepts

Formal Definition

In , the C of a represents the supremum of the rates at which can be transmitted reliably over that , in the limit of using arbitrarily long codewords. For a memoryless () characterized by a p(y|x) from input \mathcal{X} to output \mathcal{Y}, the is formally defined as the maximum between the input X and output Y: C = \max_{p(x)} I(X; Y) = \max_{p(x)} \left[ H(Y) - H(Y|X) \right], where the maximization is over all possible input distributions p(x) on \mathcal{X}, I(X; Y) is the mutual information, H(Y) is the entropy of the output, and H(Y|X) is the conditional entropy of the output given the input. This quantity is typically expressed in bits per channel use. For continuous-time channels, such as those with bandwidth limitations and additive noise, the capacity is defined analogously as the maximum of the mutual information rate. In the seminal case of a bandlimited channel with bandwidth W hertz, average signal power P, and additive white Gaussian noise with power spectral density N, C = W \log_2 \left(1 + \frac{P}{N W}\right) bits per second, achieved by a Gaussian input distribution. This formula highlights how capacity scales with signal-to-noise ratio and bandwidth, establishing a fundamental limit independent of specific signaling schemes.

Noisy-Channel Coding Theorem

The noisy-channel coding theorem, a cornerstone of information theory, demonstrates the limits of reliable communication over noisy channels. Formulated by Claude E. Shannon in 1948, the theorem asserts that for a discrete memoryless channel with capacity C bits per channel use, there exists an encoding and decoding scheme capable of transmitting information at any rate R < C with an arbitrarily small probability of error as the code block length n approaches infinity. Conversely, for any rate R > C, no such scheme exists, and the error probability remains bounded below by a positive constant. This result fundamentally separates the challenges of source compression and error correction, showing that noise does not preclude reliable communication provided the rate stays below capacity. The channel capacity C is rigorously defined as the supremum of the I(X; Y) over all possible input distributions p(x), expressed as C = \max_{p(x)} I(X; Y) = \max_{p(x)} \left[ H(Y) - H(Y \mid X) \right], where X is the input symbol, Y the output symbol, H(Y) the of the output, and H(Y \mid X) the representing noise-induced uncertainty. Equivalently, C = \max_{p(x)} \left[ H(X) - H(X \mid Y) \right], with H(X \mid Y) denoting the or residual uncertainty about the input given the output. The theorem's achievability is established via a random coding argument: by generating $2^{nR} random codewords and selecting a maximum-likelihood , the average error probability over the codebook ensemble decays exponentially with n for R < C, using the on typical set decoding. The converse relies on , which bounds the error probability from below for rates exceeding C, ensuring the capacity bound is tight. In the continuous-time setting, particularly for bandlimited channels with , derived the formula in 1948. For a of W hertz, with average transmitted power constrained to P watts and spectral density N watts per hertz, the becomes C = W \log_2 \left(1 + \frac{P}{N W}\right) bits per second. This expression highlights the power- trade-off: increasing signal power or expands , but fundamentally limits the rate, with Gaussian signaling achieving the maximum. The implies that practical codes, such as convolutional or , can approach this limit asymptotically, though finite-block-length effects introduce a small gap. These results underpin modern digital communication systems, from wireless networks to , by quantifying the reliability achievable against .

Theoretical Properties

Additivity of Channel Capacity

In classical information theory, the additivity of channel capacity refers to the property that the capacity of a product of independent channels equals the sum of their individual capacities. For two discrete memoryless channels \mathcal{N}_1: \mathcal{X}_1 \to \mathcal{Y}_1 and \mathcal{N}_2: \mathcal{X}_2 \to \mathcal{Y}_2, the capacity C(\mathcal{N}_1 \otimes \mathcal{N}_2) of the joint channel satisfies C(\mathcal{N}_1 \otimes \mathcal{N}_2) = C(\mathcal{N}_1) + C(\mathcal{N}_2), where C(\mathcal{N}) = \max_{p(x)} I(X; Y) is the maximum mutual information over input distributions. This holds because the mutual information for independent uses factorizes: if X_1 and X_2 are independent and the channels are memoryless, then I(X_1, X_2; Y_1, Y_2) = I(X_1; Y_1) + I(X_2; Y_2). The proof relies on the chain rule for and the memoryless property of the channels. Specifically, for independent inputs p(x_1, x_2) = p(x_1) p(x_2) and p(y_1, y_2 | x_1, x_2) = p(y_1 | x_1) p(y_2 | x_2), the joint simplifies as follows: I(X_1, X_2; Y_1, Y_2) = H(Y_1, Y_2) - H(Y_1, Y_2 | X_1, X_2). The term factors: H(Y_1, Y_2 | X_1, X_2) = H(Y_1 | X_1) + H(Y_2 | X_2), and the joint entropy H(Y_1, Y_2) = H(Y_1) + H(Y_2) due to of the outputs. Thus, maximizing over product distributions yields the sum of individual maxima, eliminating the need for regularization in the formula. This additivity extends to n independent uses, where the is nC, enabling the to apply directly without additional complexity. For continuous channels like the additive white Gaussian noise (AWGN) channel, additivity holds under power constraints with optimal allocation, such as water-filling across parallel Gaussian subchannels. The capacity becomes \sum_{i=1}^k \frac{1}{2} \log_2 \left(1 + \frac{P_i}{N_i}\right), where P_i is the power allocated to the i-th subchannel with noise variance N_i, confirming the additive structure. However, additivity fails in settings with interference, such as multiple-access channels, where the capacity region involves joint mutual informations like R_1 + R_2 \leq I(X_1, X_2; Y), exceeding simple sums due to correlated signals. This property, established in Shannon's foundational work and elaborated in subsequent analyses, simplifies capacity computations for memoryless systems and underpins reliable communication protocols over independent links.

Shannon Capacity of a Graph

The Shannon capacity of a graph arises in the study of zero-error communication over noisy channels, where the goal is to transmit information with no possibility of decoding error. In 1956, Claude Shannon introduced the zero-error capacity C_0 of a discrete memoryless channel (DMC) as the supremum of rates (in bits per channel use) at which reliable communication is possible without any error probability. For a DMC with input alphabet X, the zero-error capacity is modeled using the confusability graph G, where the vertices V(G) = X and there is an edge between distinct inputs x, y \in X if x and y are confusable, meaning there exists an output that can arise from both (i.e., their output supports overlap). A valid codeword set for one channel use corresponds to an independent set in G, as no two codewords can be confusable. The size of the maximum independent set, denoted \alpha(G), thus gives the maximum number of distinguishable messages in a single use. For n independent uses of the channel, the effective confusability becomes the strong product G^{\boxtimes n} (also called the OR-product or disjunctive product), where two n-tuples of inputs are adjacent if they are confusable in at least one position. An independent set in G^{\boxtimes n} represents a valid for n uses, and \alpha(G^{\boxtimes n}) is the maximum code size. The zero-error capacity is then C_0 = \lim_{n \to \infty} \frac{1}{n} \log_2 \alpha(G^{\boxtimes n}), and by Fekete's lemma (due to submultiplicativity of \alpha), this equals \sup_n \frac{1}{n} \log_2 \alpha(G^{\boxtimes n}) = \log_2 \Theta(G), where the Shannon capacity of the is defined as \Theta(G) = \sup_{n \geq 1} \alpha(G^{\boxtimes n})^{1/n}. Thus, \Theta(G) quantifies the asymptotic growth rate of the largest independent sets in the strong powers of G, normalized appropriately; it always satisfies \alpha(G) \leq \Theta(G) \leq |V(G)|. Computing \Theta(G) is notoriously difficult, as it requires evaluating independence numbers of arbitrarily high powers, and the limit may not be achieved at finite n. A trivial lower bound is \Theta(G) \geq \alpha(G), from single-use codes, but equality holds only for certain graphs, such as complete graphs (\Theta(K_m) = 1) or empty graphs (\Theta(\overline{K_m}) = m). In 1979, introduced the Lovász theta function \vartheta(G) as a relaxation that provides a computable upper bound: \Theta(G) \leq \vartheta(G). Defined via orthonormal representations, \vartheta(G) is the minimum over unit vectors c and orthonormal vectors \{u_i\} (one per vertex, with c^\top u_i = 0 if i \sim j) of \max_i (c^\top u_i)^2, or dually via the maximum trace of a matrix under constraints. This bound is tight for perfect graphs (where \alpha(G) = \chi(\overline{G}), the number of the complement), and \vartheta is multiplicative: \vartheta(G \boxtimes H) = \vartheta(G) \vartheta(H). A seminal example where \alpha(G) < \Theta(G) is the 5-cycle graph C_5, with \alpha(C_5) = 2 but \Theta(C_5) = \sqrt{5} \approx 2.236, achieved via the limit over powers (specifically, \alpha(C_5^{\boxtimes 2}) = 5, so \Theta(C_5) \geq 5^{1/2}, and upper bounded by \vartheta(C_5) = \sqrt{5}). For the P, \alpha(P) = 4 and \Theta(P) = 4 = \vartheta(P). Self-complementary vertex-transitive graphs often have \Theta(G) = \sqrt{|V(G)|}. Despite these insights, computing or approximating \Theta(G) within subpolynomial factors is NP-hard, and the integrality gap \vartheta(G)/\alpha(G) can be large; recent work has derived exact values for families like q-Kneser and graphs, but many cases remain open.

Specific Channel Models

Additive White Gaussian Noise Channel

The (AWGN) channel models a continuous-time communication system where the received signal is the sum of the transmitted signal and , with the being stationary, Gaussian-distributed, and having a flat power across the . This model assumes the process Z(t) is white Gaussian with two-sided power N_0/2, making it uncorrelated over time and maximally for a given variance. The output is thus Y(t) = X(t) + Z(t) for t \in [0, T], subject to an average power constraint \frac{1}{T} \int_0^T X^2(t) \, dt \leq P on the input signal X(t), and a limitation to W Hz. The capacity of the AWGN channel, representing the supremum of reliable information rates, is given by C = W \log_2 \left(1 + \frac{P}{N_0 W}\right) bits per second, where P/N_0 W is the (SNR). This formula was first derived by in his seminal 1948 paper, establishing that reliable communication is possible at rates up to C with error probability approaching zero as block length increases, but impossible above C. The derivation involves discretizing the continuous channel into $2W real dimensions per second (via the Nyquist sampling theorem) and maximizing the I(X; Y) over input distributions subject to the power constraint; the Gaussian distribution for X achieves this maximum, yielding C = \frac{1}{2} \log_2 (1 + \text{SNR}) bits per two-dimensional symbol, or W \log_2 (1 + \text{SNR}) bits per second overall. Capacity-achieving codes for the AWGN can be constructed using random Gaussian codebooks with typical-set decoding, where the error probability decays exponentially with block length N for rates below C, leveraging the and Chernoff bounds on atypical sets. The AWGN model serves as a baseline for evaluating practical systems, such as wireline and early communications, where approximates Gaussian due to sources; deviations, like non-white spectra, reduce below this limit. For example, in a with W = 4 kHz and SNR = 30 dB (P/N_0 W = 1000), the is approximately 40 kbps, highlighting the theorem's role in bounding achievable data rates.

Fading Channels

In fading channels, the signal experiences random fluctuations in and due to , shadowing, and other environmental effects, which significantly impact the achievable capacity compared to static channels. These variations are modeled by a random for the channel gain h(t), often assumed to be complex Gaussian for or scenarios. The capacity analysis distinguishes between fast , where the channel changes rapidly within a codeword (allowing time averaging), and slow , where the channel remains constant over the codeword duration (leading to potential outages). Seminal work established that () availability at the transmitter and/or receiver fundamentally alters the capacity expressions, enabling adaptive strategies to mitigate effects. For fast fading channels with perfect CSI at the receiver but none at the transmitter, the channel capacity is the ergodic capacity, representing the long-term average rate achievable by coding over many fading states. This is given by C = \mathbb{E}_h \left[ \log_2 \left(1 + \frac{P |h|^2}{N_0} \right) \right], where the expectation is taken over the distribution of the channel gain h, P is the transmit power, and N_0 is the noise power spectral density. Under constant transmit power and independent identically distributed (i.i.d.) fading, no gain is obtained from transmitter CSI, as power adaptation does not improve the ergodic rate. For Rayleigh fading, where |h| follows a Rayleigh distribution, the integral form requires numerical evaluation, but it scales logarithmically with SNR at high values, approaching the AWGN capacity offset by a constant loss due to fading variability. When perfect CSI is available at both transmitter and receiver, the capacity increases through optimal power and rate adaptation, akin to waterfilling over the time-varying SNR levels \gamma(t) = |h(t)|^2 P / N_0. The resulting is C(S) = \int_{\gamma_0}^\infty \log_2(\gamma) p(\gamma) \, d\gamma, where S is the average SNR, \gamma_0 is the cutoff SNR solving \int_{\gamma_0}^\infty \frac{1}{\gamma} p(\gamma) \, d\gamma = \frac{1}{S}, and p(\gamma) is the SNR distribution; power is allocated inversely proportional to \gamma above \gamma_0 and zero otherwise. This temporal waterfilling maximizes throughput by boosting during good states while idling during deep fades, yielding higher rates than fixed-power schemes, especially in correlated . In slow fading channels, where the fading is quasi-static over the codeword, the ergodic is not achievable due to lack of averaging; instead, the outage applies, defining reliable communication at R with outage probability \epsilon. The instantaneous mutual information is \log_2(1 + \gamma), and outage occurs if this falls below R, yielding P_o(R) = \Pr \left( \log_2 \left(1 + \frac{P |h|^2}{N_0} \right) < R \right) = F_{|h|^2} \left( \frac{2^R - 1}{P/N_0} \right), where F_{|h|^2} is the of |h|^2. The \epsilon-outage is the supremum R such that P_o(R) \leq \epsilon, typically requiring techniques (e.g., multiple antennas) to reduce outage for fixed R. For , P_o(R) decays exponentially with order, highlighting the -outage tradeoff central to reliable slow-fading communication. Suboptimal strategies like channel inversion, which equalizes the received SNR at the expense of amplifying in deep fades, achieve lower rates: C = \log_2 \left(1 + \frac{S}{\mathbb{E}[1/\gamma]} \right), but truncated inversion improves this by avoiding transmission below a \gamma_0.

Multi-User Channels

In multi-user channels, multiple transmitters and receivers interact over a shared medium, leading to and challenges that generalize the single-user to multidimensional capacity regions—the sets of achievable rate vectors for reliable communication. These models capture essential aspects of networks, such as cellular systems and arrays, where resources like are allocated among users. Seminal developments focus on characterizing these regions for specific topologies, often using random coding arguments extended from Shannon's . A primary model is the multiple-access channel (MAC), where k independent senders transmit to a common receiver through a shared . The capacity region for the discrete memoryless MAC consists of all rate vectors (R_1, \dots, R_k) such that \sum_{i \in \mathcal{S}} R_i \leq I(X_{\mathcal{S}}; Y \mid X_{\mathcal{S}^c}) for every subset \mathcal{S} \subseteq \{1, \dots, k\}, where the is maximized over independent input distributions p(x_1) \cdots p(x_k). This pentagonal region for the two-user case was independently derived by Ahlswede and Liao using and successive decoding techniques. For the Gaussian MAC with power constraints P_i and variance N, the capacity region simplifies to individual rates R_i \leq \frac{1}{2} \log \left(1 + \frac{P_i}{N}\right) and sum rate \sum R_i \leq \frac{1}{2} \log \left(1 + \frac{\sum P_i}{N}\right), achieved via superposition coding. The broadcast channel (BC) reverses the MAC setup, with a single transmitter sending independent messages to multiple receivers over degraded or non-degraded links. For the two-user degraded memoryless BC, where one receiver sees a noisier version of the other's , the is R_1 \leq I(X; Y_1 \mid U), \quad R_2 \leq I(U; Y_2), maximized over p(u) p(x \mid u), using superposition coding that layers messages for the weaker receiver first. This result, establishing the for degraded BCs, was obtained by and later refined by Bergmans and Gallager for broader classes. In the Gaussian BC with noise variances N_1 < N_2, power allocation \alpha P to the strong user and (1-\alpha)P to the common layer yields rates R_1 = \frac{1}{2} \log \left(1 + \frac{\alpha P}{N_1 + (1-\alpha)P}\right) and R_2 = \frac{1}{2} \log \left(1 + \frac{(1-\alpha)P}{N_2}\right), optimized over \alpha. The general BC remains open, with Marton's inner bound using rounding for correlation at receivers. The channel () models two sender-receiver pairs with cross-link , capturing uncoordinated spectrum sharing. The region is unknown in general, but for weak (a < 1/2 in the symmetric Gaussian case with crossover gain a), it matches the non-interfering sum of point-to-point . The Han-Kobayashi scheme provides the best known inner bound, splitting each message into common and private parts: the common message is decoded by both receivers to mitigate , while private messages use Gaussian signaling treating as noise. This achieves rates within a constant gap of the for many cases. For the symmetric Gaussian , the sum is approximated within 1 bit per user using alignment and structured codes, as shown for the two-user setting. Strong cases, where crossover links are stronger, have exact regions treating as fully decodable. These models form the foundation for more , including channels where a helper aids , with partial results via decode-forward or compress-forward strategies. Ongoing research addresses extensions to , , and large-scale systems, emphasizing practical approximations for and beyond.

Computation and

Channel Estimation

Channel capacity estimation refers to numerical methods for approximating the Shannon capacity, defined as the supremum of C = \max_{p(x)} I(X; Y), when analytical solutions are unavailable or infeasible. This arises for many practical channels, including asymmetric memoryless channels (DMCs) and continuous channels with complex distributions, where optimizing over input distributions requires iterative computation. For DMCs with finite , the seminal Blahut-Arimoto algorithm provides an efficient, procedure to compute the exactly up to machine precision. Independently developed in , it iteratively optimizes the input p(x) by alternating between two steps: (1) updating p(x) to maximize I(X; Y) given fixed output-conditional , and (2) computing auxiliary q(y|x) derived from the transition probabilities W(y|x) and current p(x), ensuring a monotonic increase in a lower bound on the until to the optimal . The algorithm's is in the alphabet sizes, making it practical for up to moderate dimensions, such as those in error-correcting . In continuous channels, where integrals over infinite support prevent direct application of discrete algorithms, estimation relies on stochastic approximation techniques to evaluate and maximize I(X; Y) = h(Y) - h(Y|X). A key approach extends Blahut-Arimoto principles using sequential Monte Carlo (SMC) integration to approximate entropy terms via particle sampling, combined with gradient-based optimization over parameterized input densities (e.g., Gaussian mixtures). This method samples channel outputs from the current input distribution, resamples particles to estimate densities non-parametrically, and updates parameters to ascend the mutual information surface, achieving accuracies within 0.01 bits/channel use for channels like additive non-Gaussian noise after $10^4–$10^5 iterations. For high-dimensional or ergodic channels, such as Rayleigh fading, Monte Carlo simulation estimates capacity by averaging mutual information over multiple channel realizations, often with importance sampling to reduce variance in rare-event regimes near capacity. These techniques prioritize low-error decoding simulations, providing tight lower bounds on capacity while upper bounds from convex optimization over dual problems ensure reliability; for instance, in MIMO systems, they confirm capacities scaling as \min(n_t, n_r) \log \mathrm{SNR} bits/s/Hz for n_t transmit and n_r receive antennas at high SNR.

Numerical Solution Techniques

Numerical solution techniques are essential for computing channel capacity in cases where closed-form expressions are unavailable or impractical, such as for memoryless channels with large input alphabets or continuous channels with complex conditional distributions. The capacity C = \max_{p(x)} I(X; Y), where I(X; Y) is the , requires optimizing over the input distribution p(x), often involving non-convex or infinite-dimensional problems that demand iterative algorithms for accurate approximation. These methods ensure convergence to the optimal value while handling computational constraints efficiently. For discrete memoryless channels, the Blahut–Arimoto algorithm stands as the foundational numerical technique, independently developed by Arimoto and Blahut in 1972. This iterative procedure computes the capacity by alternately optimizing the input and evaluating the resulting . Starting with an input distribution p^{(0)}(x), the algorithm updates as follows: p^{(k+1)}(x) = \frac{p^{(k)}(x) \exp\left( D(W(\cdot | x) \| p^{(k)}(y)) \right)}{\sum_{x'} p^{(k)}(x') \exp\left( D(W(\cdot | x') \| p^{(k)}(y)) \right)}, where D(\cdot \| \cdot) denotes the , W(y | x) is the channel transition probability, and p^{(k)}(y) = \sum_x p^{(k)}(x) W(y | x). The I^{(k)}(X; Y) is then computed using p^{(k)}(x) and p^{(k)}(y), increasing monotonically until to C. The algorithm guarantees to the unique capacity-achieving distribution in a finite number of steps for weakly symmetric channels and asymptotically otherwise, with typical implementations achieving high precision (e.g., within $10^{-6} bits) after 50–100 iterations for modest alphabet sizes. Extensions of the Blahut–Arimoto address continuous memoryless , where direct computation is hindered by the need to evaluate integrals over infinite support. A prominent approach the input space into a finite and applies the discrete , refining the grid for better accuracy at the cost of increased computation; for example, this yields capacities accurate to 0.01 bits/ use for certain additive noise channels with 100–1000 grid points. More sophisticated methods avoid full discretization by employing particle-based approximations. Dauwels (2005) introduced an extension using sequential to estimate expectations in the functional, combined with gradient ascent on the input density parameters. This particle Blahut–Arimoto variant samples input particles from a family (e.g., Gaussian mixtures), propagates them through the , and updates via gradients, enabling capacity computation for non-Gaussian channels like optical intensity-modulated links, where it converges within 20–50 iterations using 10^3–10^4 particles. In power-constrained continuous channels, such as vector Gaussian models, numerical techniques often leverage frameworks. The capacity optimization reduces to a problem solvable via interior-point methods or projected on the , incorporating waterfilling-like allocations across dimensions. For instance, in channels with n antennas, algorithms iteratively solve semidefinite programs to find the optimal , achieving in O(n^3 \log(1/\epsilon)) time for tolerance \epsilon, as demonstrated in applications to systems with constraints. These methods are particularly effective for multi-user scenarios, where decomposes the problem into parallel subproblems. Recent advancements incorporate to reduce variance in estimates, improving efficiency for high-dimensional or fading channels by factors of 10–100 in .

Advanced Topics

Feedback Capacity

Feedback capacity is the maximum reliable communication rate over a channel model where the transmitter receives perfect, instantaneous of the receiver's output after each channel use. This allows the encoding strategy to depend causally on previous outputs, potentially enabling adaptive transmission that exploits behavior. The feedback capacity C_{\text{fb}} is formally defined as the supremum over all rates R such that there exists a feedback coding scheme achieving error probability approaching zero as block length n \to \infty. In general, C_{\text{fb}} is characterized using directed information, introduced by Massey, as C_{\text{fb}} = \lim_{n \to \infty} \frac{1}{n} \sup I(X^n \to Y^n), where I(X^n \to Y^n) = \sum_{i=1}^n I(X_i; Y^i | Y^{i-1}) is the directed mutual information, with the supremum over causal distributions p(x^n \| y^{n-1}) satisfying power or other constraints. This framework generalizes the standard capacity and proves that feedback cannot decrease capacity, i.e., C_{\text{fb}} \geq C. For discrete memoryless channels (DMCs), Shannon established that feedback provides no capacity gain, so C_{\text{fb}} = C = \max_{p(x)} I(X; Y). The proof relies on showing that the directed information per symbol equals the standard mutual information under optimal input distributions, using Fano's inequality for the converse and posterior matching for achievability. This result holds even for zero-error capacity in some cases, though feedback can increase the zero-error capacity beyond its no-feedback value. The finding underscores that, while feedback aids in error exponent improvement or simpler decoding, it does not expand the fundamental rate limit for memoryless discrete channels. In continuous-alphabet channels like the (AWGN) channel with power constraint P and variance N, the capacity equals the no- capacity C_{\text{fb}} = \frac{1}{2} \log_2 \left(1 + \frac{P}{N}\right). and Pombra characterized this for time-varying Gaussian channels via a multi-letter directed expression involving the of inputs and outputs. Although remains unchanged, dramatically enhances reliability: the Schalkwijk-Kailath scheme achieves error probability decaying doubly exponentially with block length, far superior to the polynomial decay without . For channels with memory, such as finite-state or multiplicative channels, can strictly increase , as the adaptive encoding exploits temporal correlations. Computing C_{\text{fb}} remains challenging in general, often requiring numerical optimization or bounds from information stability properties.

Extensions to Quantum Channels

In quantum information theory, the concept of channel capacity extends from classical communication channels to , which are completely positive trace-preserving maps acting on density operators of quantum systems. A \mathcal{N}: \mathcal{D}(\mathcal{A}) \to \mathcal{D}(\mathcal{B}) models the evolution of quantum states under or physical processes, such as decoherence or loss. Unlike classical channels, quantum channels can transmit both classical and , leading to multiple distinct capacity notions that account for quantum effects like superposition, entanglement, and no-cloning. These capacities quantify the maximum reliable information rate per channel use, often requiring regularization over multiple uses due to superadditivity phenomena absent in classical theory. The classical capacity C(\mathcal{N}) of a is the supremum of rates at which classical bits can be reliably transmitted, where the sender encodes messages into quantum states and the receiver performs measurements. It is given by the Holevo quantity \chi(\mathcal{N}), regularized as C(\mathcal{N}) = \lim_{n \to \infty} \frac{1}{n} \max_{\{p_i, \rho_i\}} \chi(\mathcal{N}^{\otimes n} (\sum_i p_i |\psi_i\rangle\langle\psi_i|)), where \chi(\sigma) = S(\sigma) - \sum_i p_i S(\sigma_i) and S denotes . This , established by the Holevo-Schumacher-Westmoreland , shows that optimal encoding uses entangled over multiple channel uses, unlike the additive capacity. For example, the depolarizing \mathcal{N}(\rho) = (1-p)\rho + p \frac{I}{d} in dimension d has C(\mathcal{N}) = \log_2 d + \left(1 - p + \frac{p}{d}\right) \log_2 \left(1 - p + \frac{p}{d}\right) + (d-1) \frac{p}{d} \log_2 \left(\frac{p}{d}\right), highlighting how noise reduces capacity below the noiseless \log_2 d bits. The quantum capacity Q(\mathcal{N}) measures the rate of reliable qubit transmission, preserving quantum coherence and entanglement. It equals the entanglement generation capacity and is defined as Q(\mathcal{N}) = \lim_{n \to \infty} \frac{1}{n} \max_{\rho} I_c(\rho, \mathcal{N}^{\otimes n}), where the coherent information I_c(\rho, \mathcal{N}) = S(\mathcal{N}(\rho)) - S(\rho, \mathcal{N}) with S(\rho, \mathcal{N}) = S((\mathrm{id} \otimes \mathcal{N})(\phi_\rho)) and \phi_\rho the purification of \rho. This lower bound arises from Schumacher-Westmoreland coding using coherent information, while Lloyd provided an upper bound matching in the regularized form, as proven by Devetak. For degradable channels, like the amplitude damping channel, Q(\mathcal{N}) simplifies to the single-letter coherent information without regularization. However, computing Q(\mathcal{N}) remains challenging for general channels due to the need for optimization over entangled inputs. Entanglement-assisted capacities further extend these notions by allowing pre-shared entanglement between sender and receiver. The entanglement-assisted classical capacity C_E(\mathcal{N}) is given by C_E(\mathcal{N}) = \max_{\phi_{RA}} I(R ; B)_{\omega}, where \omega = (\mathrm{id}_R \otimes \mathcal{N}_A)(\phi_{RA}) is the output state and I(R ; B) is the quantum mutual information; it equals twice the entanglement-assisted quantum capacity Q_E(\mathcal{N}). Bennett et al. derived this as the mutual information over half of a maximally entangled state, showing additivity and exact computability. For the erasure channel \mathcal{N}(\rho) = (1-\epsilon)\rho + \epsilon |e\rangle\langle e|, C_E(\mathcal{N}) = (1-\epsilon) \log d + h_2(\epsilon), exceeding the unassisted capacity by leveraging entanglement to correct errors more efficiently. Private capacities, for secure classical communication against eavesdroppers, parallel quantum capacity via P(\mathcal{N}) = \lim_{n \to \infty} \frac{1}{n} \max I_c(\rho, \mathcal{N}^{\otimes n} ; E^c), where E^c is the coherent information to the environment's complement. These quantum extensions reveal fundamental differences from classical : superadditivity implies capacities may increase with uses, and entanglement can double effective rates, but no-cloning limits quantum transmission. Seminal results apply to specific models like bosonic Gaussian channels, where capacities involve eigenvalues, but general computation is NP-hard. Ongoing research focuses on additivity conjectures and numerical bounds for practical quantum networks.