Fact-checked by Grok 2 weeks ago

Noisy-channel coding theorem

The noisy-channel coding theorem, also known as Shannon's channel coding theorem, is a fundamental result in that delineates the fundamental limits of reliable over a subject to . It asserts that for any discrete memoryless channel with a finite C (measured in bits per channel use), there exist encoding and decoding schemes that enable the of at any R < C with an arbitrarily small probability of error as the block length increases, but no such reliable communication is possible for rates R > C. Formulated by Claude E. Shannon in his seminal 1948 paper "," the theorem revolutionized the field by demonstrating that does not preclude error-free communication up to a nonzero threshold, overturning prior beliefs that required infinite redundancy to combat interference. The theorem's proof relies on probabilistic methods, particularly random coding arguments, which show the existence of good codes without explicitly constructing them, though practical codes like convolutional, turbo, and low-density parity-check (LDPC) codes have since approached these limits in applications such as wireless networks and data storage. Key concepts include the channel capacity C, defined as the supremum of achievable rates, often computed using mutual information maximization for specific channel models like the binary symmetric channel where C = 1 - H(p) and H(p) is the binary entropy function for crossover probability p. The result applies broadly to discrete and continuous channels, influencing modern digital communications by establishing that error-correcting codes can reliably operate near capacity with sufficient computational resources.

Introduction and Background

Overview

The noisy-channel coding theorem, introduced by in his 1948 paper "," demonstrates that reliable transmission of information over a corrupted by is achievable as long as the data rate remains below the channel's capacity. Intuitively, the theorem shows that , which introduces random errors into the signal, can be counteracted through clever encoding schemes that add to the message, allowing the receiver to reconstruct the original information with arbitrarily high accuracy despite the interference. This result revolutionized communication engineering by proving that perfect reliability is theoretically attainable, even in the presence of unavoidable distortions, without needing error-free channels. A noisy channel encompasses any transmission medium prone to errors, such as wireless radio links susceptible to and or optical fibers affected by and . The theorem's core insight is that the —defined as the supreme rate between input and output—serves as the fundamental limit for error-free communication. This theorem forms the theoretical bedrock for contemporary error-correcting codes, notably , which approach capacity limits through iterative decoding, and low-density parity-check (LDPC) codes, which leverage sparse parity-check matrices for efficient near-optimal performance. In real-world applications, it enables robust broadcasting, where concatenated coding schemes protect against multipath in terrestrial signals, and reliable over links, mitigating bit errors in packet delivery.

Historical Context

The noisy-channel coding theorem emerged from foundational ideas in communication engineering during the early 20th century. In 1928, Ralph Hartley introduced a measure of information quantity based on the logarithm of the number of possible message choices, providing an early framework for assessing communication efficiency that directly influenced subsequent theories. That same year, published work on telegraph transmission limits, establishing key concepts related to channel bandwidth and signaling rates that shaped understandings of reliable data transfer over constrained media. Claude Shannon formalized the noisy-channel coding theorem in his seminal 1948 paper, "A Mathematical Theory of Communication," published in the Bell System Technical Journal. In this work, Shannon demonstrated that reliable communication is possible over noisy channels at rates up to the , provided sufficiently long codewords are used, thereby establishing as a rigorous discipline. This theorem addressed the fundamental limits of error-free transmission in the presence of noise, building explicitly on Hartley's and Nyquist's contributions. Following Shannon's breakthrough, practical coding techniques rapidly advanced to realize the theorem's implications. In 1950, developed error-detecting and error-correcting codes at Bell Laboratories, introducing the as an efficient method to correct single-bit errors in binary data streams. By 1955, Peter Elias proposed convolutional codes, which encode data streams continuously using shift registers, enabling better performance near for certain noisy environments. The evolution continued into the 1960s with innovations bridging theory and application. In 1960, Irving S. Reed and Gustave Solomon introduced , non-binary cyclic codes particularly effective for burst error correction in high-reliability systems like space communications. In 1967, Andrew J. Viterbi devised an efficient decoding algorithm for , using dynamic programming to find the most likely transmitted sequence, which became essential for practical implementations. Shannon's contributions received widespread recognition, including the and the IEEE Medal of Honor in 1966, and the in Basic Sciences in 1985 for founding . His work profoundly influenced later luminaries, such as , whose advancements in algebraic earned the 1993 Claude E. Shannon Award.

Core Concepts and Definitions

Channel Models

In the context of the noisy-channel coding theorem, the primary channel model considered is the discrete memoryless channel (DMC), which abstracts the communication process over a noisy medium with finite symbol sets. A DMC is defined by a finite input \mathcal{X}, a finite output \mathcal{Y}, and a transition matrix specified by conditional probabilities p(y|x) for all x \in \mathcal{X} and y \in \mathcal{Y}, where p(y|x) denotes the probability that symbol y is received when symbol x is transmitted. The memoryless property ensures that the noise affecting each use is of prior uses, so the output Y_n at the nth use depends solely on the input X_n at that time, with the joint distribution over multiple uses factoring as \prod_n p(y_n | x_n). The operates as a probabilistic from input to output, governed by the conditional P_{Y|X}, which captures the introduced by without any deterministic relationship between transmitted and received symbols. Stationarity is assumed, meaning the probabilities p(y|x) remain identical and time-invariant across all channel uses, allowing the model to represent repeated transmissions under fixed conditions. A representative example of a DMC is the binary symmetric channel (BSC), where both input and output alphabets are \{0, 1\}, and noise flips the bit with fixed crossover probability p < 1/2, such that p(y = x | x) = 1 - p and p(y \neq x | x) = p for x, y \in \{0, 1\}. Another illustrative case is the binary erasure channel (BEC), introduced as a model for channels where symbols may be lost rather than corrupted; here, the input alphabet is \{0, 1\}, the output alphabet is \{0, 1, e\} with e indicating erasure, and the transition probabilities are p(y = x | x) = 1 - \alpha, p(y = e | x) = \alpha for x \in \{0, 1\}, and p(y = 1 - x | x) = 0, where \alpha is the erasure probability. DMCs can be generalized to additive noise channels, where the output is formed by superimposing independent noise on the input, typically expressed as Y = X + N (modulo the alphabet structure if discrete), with N denoting a noise random variable independent of X. This additive structure simplifies analysis for certain symmetric noises while preserving the memoryless and stationary assumptions. The capacity of a DMC, central to the theorem, is achieved by maximizing the mutual information I(X; Y) over all possible input distributions on \mathcal{X}.

Information-Theoretic Primitives

The H(X) of a discrete random variable X with probability mass function p(x) measures the average uncertainty or content associated with its outcomes, defined as H(X) = -\sum_{x} p(x) \log_2 p(x), where the sum is over all possible values of x with p(x) > 0, and the logarithm is base 2 for units in bits. This quantity is non-negative, H(X) \geq 0, with equality if and only if X is deterministic (i.e., one outcome has probability 1), and it achieves its maximum value of \log_2 | \mathcal{X} | when X is uniformly distributed over its \mathcal{X}. The conditional entropy H(Y|X) quantifies the average remaining uncertainty in a random variable Y after observing X, given by H(Y|X) = -\sum_{x,y} p(x,y) \log_2 p(y|x), where p(y|x) is the mass function. It satisfies $0 \leq H(Y|X) \leq H(Y), indicating that conditioning on X cannot increase the of Y, and equals H(Y) if X and Y are . A key relation is the chain rule for joint : H(X,Y) = H(X) + H(Y|X). Mutual information I(X;Y) measures the amount of information that one contains about another, defined as the difference between the of Y and its given X: I(X;Y) = H(Y) - H(Y|X) = \sum_{x,y} p(x,y) \log_2 \frac{p(x,y)}{p(x)p(y)}. It is symmetric, I(X;Y) = I(Y;X), non-negative, I(X;Y) \geq 0, with equality if and only if X and Y are , and it satisfies the chain rule I(X;Y,Z) = I(X;Y) + I(X;Z|Y). These properties make a fundamental measure of dependence between variables. The C represents the supremum of the over all possible input distributions, defined as C = \max_{p(x)} I(X;Y), where the maximum is taken over probability distributions p(x) on the input alphabet, and Y is the output. This quantity sets the fundamental limit on the rate at which can be reliably transmitted over the .

Mathematical Statement

Formal Theorem

The noisy-channel coding theorem applies to discrete memoryless channels (DMCs), which are channels where the output symbol Y depends only on the current input symbol X according to a fixed W(y|x), with input and output alphabets \mathcal{X} and \mathcal{Y}, respectively. The C of such a is the maximum C = \max_{p(x)} I(X;Y), where I(X;Y) measures the reduction in uncertainty about the input given the output, achieved by optimizing over input distributions p(x). The theorem states that reliable communication over a is possible the communication rate is below the . Formally: For any rate R < C and any \epsilon > 0, there exists a sufficiently large block length n and an (n, M, \epsilon)- with M \geq 2^{nR} codewords such that the average probability of decoding error P_e^{(n)} < \epsilon. This achievability part guarantees the existence of codes that transmit information at rates up to but not exceeding the capacity with arbitrarily low error probability as n grows. Conversely, for any rate R > C, every code with rate R has average error probability P_e^{(n)} bounded below by a positive constant that does not depend on n, implying that reliable communication is impossible above the capacity. The strong converse extends this to show that P_e^{(n)} > 1 - \epsilon for any \epsilon > 0 and sufficiently large n. The achievability proof relies on the asymptotic equipartition property (AEP), which partitions the space of output sequences into a typical set where sequences occur with probability roughly $2^{-nH(Y)} and the atypical set with vanishing probability as n \to \infty; jointly typical sets for input-output pairs similarly concentrate around $2^{-nI(X;Y)}, enabling random coding arguments to bound the number of distinguishable codewords and low-error decoding via typical set decoding.

Key Notations and Assumptions

In the context of the noisy-channel coding theorem, block coding refers to the process of encoding messages into fixed-length sequences of symbols for transmission over a noisy channel, distinct from source coding, which focuses on efficient compression of information sources without considering transmission errors. An (n, M) block code consists of an encoding function that maps each of M possible messages from the set \{1, 2, \dots, M\} to a unique codeword in the input alphabet raised to the power n, denoted as x^n(m) \in \mathcal{X}^n for message m, where \mathcal{X} is the finite input alphabet and n is the block length or number of channel uses. The codebook is the collection of these M codewords \{x^n(1), x^n(2), \dots, x^n(M)\}, and the corresponding decoding function maps the received output sequence y^n \in \mathcal{Y}^n—where \mathcal{Y} is the finite output alphabet—back to an estimated message \hat{m} = g(y^n). The error probability for an (n, M) code, denoted P_e^{(n)}, is the average probability that the decoded message differs from the transmitted one, computed as P_e^{(n)} = \frac{1}{M} \sum_{m=1}^M \Pr(g(Y^n) \neq m \mid X^n = x^n(m)), where the probability is taken over the 's randomness assuming uniform selection of messages. The R of the code measures the information transmitted per channel use and is defined as R = \frac{\log_2 M}{n} in bits per channel use. The theorem applies to discrete memoryless channels (DMCs), which assume finite input and output alphabets \mathcal{X} and \mathcal{Y}, respectively. The channel is memoryless, meaning that the output Y_i at each time i depends only on the current input X_i according to the p(y_i \mid x_i), with the joint distribution for n uses given by p(y^n \mid x^n) = \prod_{i=1}^n p(y_i \mid x_i). Additionally, the channel is stationary, implying that the transition probabilities p(y \mid x) remain constant across channel uses, ensuring identical statistical behavior over time.

Proofs for Discrete Memoryless Channels

Achievability

The achievability of the noisy-channel coding theorem for discrete memoryless channels (DMCs) is established through a constructive existence proof using random coding, which shows that any rate R < C, where C is the channel capacity defined as the maximum mutual information between input and output, can be reliably achieved with vanishing error probability as the block length n grows large. In this approach, a random codebook \mathcal{C}_n is generated by drawing M = 2^{nR} codewords \mathbf{x}(m) \in \mathcal{X}^n, for m = 1, \dots, M, independently and identically distributed (i.i.d.) according to the capacity-achieving input distribution p(x) that maximizes I(X;Y). This random selection ensures that, with high probability over the codebook ensemble, the codewords are sufficiently spread out to allow low-error decoding despite channel noise. For decoding, the receiver employs joint typicality decoding: upon observing the channel output sequence \mathbf{y}^n, it searches for a codeword \mathbf{x}(m) such that the pair (\mathbf{x}(m), \mathbf{y}^n) is jointly typical with respect to the distribution p(x,y) = p(x) W(y|x), where W is the ; if such a unique codeword exists, it decodes to m, otherwise declares an error. This decoder leverages the asymptotic equipartition property (AEP), which implies that for large n, the jointly typical set \mathcal{A}_\epsilon^{(n)}(X^n, Y^n) captures nearly all the probability mass of the output distribution, concentrating the typical output sequences around those generated from typical input codewords. The use of joint typicality ensures that correct decoding occurs when the noise does not distort the received sequence far from the transmitted codeword's typical neighborhood. The error probability analysis proceeds by bounding the average error probability \bar{P}_e^{(n)} over the random code ensemble. For the transmitted codeword \mathbf{x}(m_1), the probability of decoding error is dominated by two events: the output \mathbf{y}^n not being jointly typical with \mathbf{x}(m_1) (which has probability at most \epsilon by the AEP for large n), or another codeword \mathbf{x}(m_2) being more likely or jointly typical with \mathbf{y}^n. The probability of the latter is controlled using the union bound over the M-1 incorrect codewords, yielding \bar{P}_e^{(n)} \leq \epsilon + (M-1) \cdot 2^{-n(I(X;Y) - \delta_n)}, where \delta_n \to 0 as n \to \infty. For R < C = \max_{p(x)} I(X;Y), choosing the optimal p(x) ensures I(X;Y) > R, so the second term vanishes exponentially, and thus \bar{P}_e^{(n)} \to 0. More precisely, the error exponent is given by E(R) = \min_{0 \leq \rho \leq 1} \left[ -\frac{1}{n} \log \mathbb{E} \left[ ( \sum_y W(y|\mathbf{x})^\rho p(y|\mathbf{x}')^{1-\rho} )^n \right] \right] > 0 for R < C, leading to \bar{P}_e^{(n)} \leq 2^{-n E(R)}. Since the average error over random codes vanishes, there exists at least one good code achieving P_e^{(n)} \to 0. Intuitively, this achievability aligns with a sphere-packing perspective: each codeword is surrounded by a "noise ball" of typical output sequences within Hamming or divergence distance corresponding to the channel's noise level; for rates below capacity, the random codewords' noise balls cover the typical output space without significant overlap, ensuring that the decoder can unambiguously identify the transmitted codeword from the received sequence in its ball. This non-overlapping coverage becomes probable due to the exponential number of possible codewords and the concentration of measure in the typical set, providing a geometric justification for the random coding bound.

Converse Proofs

The converse proofs for the noisy-channel coding theorem establish that the channel capacity C represents a fundamental upper limit on the rate R at which information can be reliably transmitted over a (DMC). These proofs demonstrate the impossibility of achieving arbitrarily low error probabilities for rates exceeding C, complementing the achievability results by showing that C is indeed the exact capacity. Two main types of converse proofs exist: the weak converse and the strong converse, each providing different levels of tightness in bounding the error probability P_e. The weak converse, first established by Shannon in his foundational work on communication theory, asserts that reliable communication at a rate R > C is impossible in the asymptotic sense. Specifically, for any sequence of codes with rate R and error probability P_e^{(n)} \to 0 as the block length n \to \infty, it must hold that R \leq C. To derive this, the proof relies on , which provides an upper bound on the of the message given the channel output: H(M \mid Y^n) \leq h(\epsilon) + \epsilon \log_2 (|M| - 1), where \epsilon = P_e^{(n)} is the average error probability, h(\cdot) denotes the , and M is the message with H(M) = nR. Since \log_2 (|M| - 1) \leq n R, this implies H(M \mid Y^n) \leq h(\epsilon) + \epsilon n R. Combining this with the chain rule for , H(M \mid Y^n) = H(M) - I(M; Y^n), yields I(M; Y^n) \geq n R (1 - \epsilon) - h(\epsilon). Since the implies I(M; Y^n) \leq I(X^n; Y^n) \leq nC for memoryless channels, substituting gives n R (1 - \epsilon) - h(\epsilon) \leq n C. Dividing by n, R (1 - \epsilon) - h(\epsilon)/n \leq C. As n \to \infty and \epsilon \to 0, h(\epsilon)/n \to 0, so R \leq C. This bound shows that rates above C cannot achieve vanishing error, though for finite n, small errors may occur even slightly above C. The strong converse provides a sharper impossibility result, stating that if R > C, then the error probability P_e^{(n)} \to 1 exponentially fast as n \to \infty. Initially proven by Wolfowitz for channels without memory using asymptotic equipartition property arguments, the result was extended by Arimoto to general DMCs via a change-of-measure technique involving the . In Arimoto's approach, the probability of correct decoding is bounded using the divergence between the joint output distribution under the code and a product distribution achieving capacity, leading to an term e^{-n \delta} for some \delta > 0 when R > C. Alternative proofs employ the blowing-up lemma, which enlarges typical sets to show that atypical outputs dominate for supercritical rates. This exponential convergence distinguishes the from weaker bounds. A key tool in both converse proofs is the , which states that cannot increase under further processing: for a X^n \to Y^n \to Z^n, I(X^n; Z^n) \leq I(X^n; Y^n). In the channel coding context, this implies I(X^n; Y^n) \leq \sum_{i=1}^n I(X_i; Y_i) = nC for i.i.d. inputs over a , directly bounding the information transferable through the and limiting the number of reliably distinguishable messages to approximately $2^{nC}. Unlike the weak converse, which permits non-vanishing but small errors above C for finite block lengths, the strong converse enforces that any excess rate forces the error to approach certainty, providing a more rigorous impossibility for supercritical regimes.

Extensions and Variants

Non-Stationary Memoryless Channels

In non-stationary memoryless channels, the transition probabilities P(Y_i \mid X_i) depend on the time index i, but the channel maintains the memoryless property, meaning the outputs Y_1, \dots, Y_n are conditionally given the inputs X_1, \dots, X_n. The joint conditional distribution thus factors as P(Y^n \mid X^n) = \prod_{i=1}^n P_i(Y_i \mid X_i), where each P_i may differ across time steps. The of such a is the generalized , defined as C = \lim_{n \to \infty} \frac{1}{n} \max_{P_{X^n}} I(X^n; Y^n), where the maximum is over all distributions on input sequences X^n. This exists under mild conditions, such as when the channel is information-stable, and represents the supremum of rates at which reliable communication is possible as block length n grows. If the process is and ergodic, the simplifies to the ergodic form C = \sup_{P_X} \mathbb{E}[I(X; Y \mid S)], where S denotes the time-varying state determining P_i. Achievability for rates R < C follows an extension of random coding, where codebooks are generated using time-varying input distributions P_{X^n} = \prod_{i=1}^n P_i(X_i) selected to maximize I(X^n; Y^n)/n. For large n, these distributions ensure that the empirical mutual information concentrates around its expectation via the , allowing typical-set decoding to achieve vanishing error probability, analogous to the stationary case but adapted for varying statistics. The converse establishes that no code can achieve rates exceeding C with vanishing error, proved by bounding the achievable rate using Fano's inequality: for any code with rate R and error probability \epsilon_n \to 0, R \leq \frac{1}{n} I(X^n; Y^n) + \frac{h(\epsilon_n)}{n(1 - \epsilon_n)}, where the second term vanishes as n \to \infty. The chain rule I(X^n; Y^n) = \sum_{i=1}^n I(X_i; Y_i \mid X^{i-1}, Y^{i-1}) simplifies under memorylessness to time-dependent terms, confirming the upper bound aligns with the definition of C. These channels model practical scenarios like fading channels in wireless systems, where transition probabilities vary over time due to fluctuating noise or interference levels, such as a time-varying binary symmetric channel with crossover probabilities p_i evolving according to environmental factors. The capacity then becomes C = \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n (1 - H_2(p_i)), enabling reliable communication up to this average binary entropy rate.

Continuous and Gaussian Channels

The noisy-channel coding theorem extends to continuous channels, where both input and output alphabets are uncountably infinite subsets of the real numbers \mathbb{R}, rather than finite discrete sets. In this setting, the channel capacity is defined as the maximum mutual information I(X;Y) over all possible input distributions p_X, expressed using differential entropies as C = \max_{p_X} [h(Y) - h(Y|X)], where h(Y) is the differential entropy of the output and h(Y|X) is the conditional differential entropy. This formulation arises because continuous random variables are characterized by probability density functions rather than probability mass functions, and differential entropy h(X) = -\int p_X(x) \log p_X(x) \, dx measures uncertainty in a manner analogous to discrete entropy but depends on the coordinate system. Unlike discrete channels, which rely on finite symbol probabilities, continuous channels require handling densities and often involve constraints like average power limits instead of per-symbol restrictions. A canonical example is the additive white Gaussian noise (AWGN) channel, modeled as Y = X + Z, where X is the input signal, Z is independent Gaussian noise with zero mean and variance N > 0 (i.e., Z \sim \mathcal{N}(0, N)), and the input is subject to an average power constraint \mathbb{E}[X^2] \leq P. Here, the conditional h(Y|X) = h(Z) is fixed at \frac{1}{2} \log (2\pi e N) nats (or \frac{1}{2} \log_2 (2\pi e N) bits), since the noise is independent of the input. The capacity simplifies to maximizing h(Y) subject to the power constraint on X, and by the maximum-entropy principle, this maximum occurs when Y is Gaussian with variance P + N. Thus, the is C = \frac{1}{2} \log_2 \left(1 + \frac{P}{N}\right) bits per real dimension (or channel use), achieved by using a Gaussian input X \sim \mathcal{N}(0, P). This result establishes that reliable communication is possible at rates below C with error probability approaching zero as block length increases, while rates above C are impossible. The achievability proof for the AWGN channel follows by approximating the continuous channel with discrete constellations and taking the limit as the number of points grows, leveraging the discrete noisy-channel coding theorem; Gaussian inputs ensure the approaches the bound. For practical implementation, random Gaussian codes achieve in the information-theoretic sense, but structured codes like provide low-complexity alternatives that approach or attain under lattice decoding. Specifically, nested codes, combining a fine for dithered quantization and a coarse for shaping, can achieve the full C for any rate below it, with decoding complexity scaling favorably for high dimensions. The converse proof relies on the maximum-entropy property: for any input distribution satisfying the power constraint, h(Y) \leq \frac{1}{2} \log_2 (2\pi e (P + N)), with equality only for Gaussian Y, leading to I(X;Y) \leq C. This bound holds via and data processing arguments adapted to continuous variables, ensuring no coding scheme can exceed C without error. In contrast to discrete memoryless channels, the continuous AWGN case eliminates finite-alphabet constraints, allowing optimization over entire density functions, but introduces subtleties like the need for regularization in calculations to handle infinite possibilities.

References

  1. [1]
    [PDF] Shannon's Noisy Coding Theorem 1 Channel Coding
    May 14, 2015 · In a groundbreaking paper in 1948, Claude. Shannon showed that this was not true. What Shannon showed was that every channel has a capacity C.
  2. [2]
    [PDF] Shannon's Noisy Coding Theorem 16.1 Defining a Channel
    Channel coding theorem promises the existence of block codes that allow us to transmit information at rates below capacity with an arbitrary small probability ...
  3. [3]
    [PDF] A Mathematical Theory of Communication
    In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible ...
  4. [4]
    Channel Capacity - Bits and Binary Digits
    The fact that information can be transmitted essentially error-free at capacity is called the Noisy Channel Coding Theorem of Shannon. Before 1948, it was ...
  5. [5]
    [PDF] Near Optimum Error Correcting Coding And Decoding: Turbo-Codes
    Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in ICC'93, Geneva,. Switzerland, May 93, pp.
  6. [6]
    [PDF] Low-Density Parity-Check Codes Robert G. Gallager 1963
    Chapter 1 sets the background of the study, summarizes the results, and briefly compares low-density coding with other coding schemes. Chapter 2 analyzes the ...
  7. [7]
    [PDF] EN 300 744 - V1.6.1 - Digital Video Broadcasting (DVB ... - ETSI
    The present document describes a baseline transmission system for digital terrestrial TeleVision (TV) broadcasting. It specifies the channel coding/modulation ...
  8. [8]
    Hartley's Law - History of Information
    Hartley's law eventually became one of the elements of Claude Shannon's Mathematical Theory of Communication.Missing: measure influence
  9. [9]
    [PDF] Error Bounds for Convolutional Codes and an Asymptotically ...
    69-72, February. 1967. Error Bounds for Convolutional Codes and an Asymptotically Optimum. Decoding Algorithm. ANDREW J. VITERBI,. SENIOR MEMBER, IEEE. Ahstraci ...
  10. [10]
    Claude Elwood Shannon | Kyoto Prize - 京都賞
    Prof. Claude Elwood Shannon has given a mathematical scientific basis to the development of communication technology.Missing: 1966 | Show results with:1966
  11. [11]
    [PDF] Appendix B Information theory from first principles - Stanford University
    The binary erasure channel has binary input and ternary output = 0 1 = 0 1 e . The transition probabilities are p 0 0 = p 1 1 = 1 − p e 0 = p e 1 = . Here, ...
  12. [12]
    [PDF] coding for two noisy channels - MIT
    Elias has shown that it is possible to signal at rates arbitrarily close to the capacity of the binary symmetric channel with arbitrarily small probability of.
  13. [13]
    [PDF] A bit of information theory - UCSD Math
    Mutual information is nonnegative, i.e. I(X;Y ) ≥ 0. Equivalently,. H(X|Y ) ≤ H(X). Hence conditioning one random variable on another can only decrease.
  14. [14]
    [PDF] Notes 3: Stochastic channels and noisy coding theorem bound
    We now turn to the basic elements of Shannon's theory of communication over an intervening noisy channel. 1 Model of information communication and noisy channel.
  15. [15]
    [PDF] shannon.pdf - ESSRL
    In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible ...
  16. [16]
    [PDF] EE 376A: Information Theory Lecture Notes
    Feb 25, 2016 · 3.1 Asymptotic Equipartition Property (AEP) . ... channel coding theorem is that random coding lets us attain this upper bound.
  17. [17]
  18. [18]
    [PDF] Chapter 16: Linear Codes. Channel Capacity. - MIT OpenCourseWare
    We saw that C = Ci for stationary memoryless channels, but what other channels does this hold for? And what about non-stationary channels? To answer this ...
  19. [19]
    [PDF] A General Formula for Channel Capacity - MIT
    This was achieved by a converse whose proof involves Fano's and Chebyshev's inequalities plus a generalized Shannon-McMillan Theo- rem for periodic measures.
  20. [20]
    [PDF] Capacity Of Fading Channels With Channel Side Information
    The capacity of a fading channel with transmitter/receiver side information is achieved with "water-pouring" in time. Receiver-only information has lower ...
  21. [21]
    [PDF] Achieving log(1 + SNR) on the AWGN Channel With Lattice ... - MIT
    We then show that capacity may also be achieved using nested lattice codes, the coarse lattice serving for shaping via the modulo-lattice transformation, the.Missing: primary | Show results with:primary