Fact-checked by Grok 2 weeks ago

Conditional entropy

Conditional entropy, a fundamental concept in information theory, quantifies the average uncertainty or remaining information about one random variable given the knowledge of another. Formally, for discrete random variables X and Y with joint probability mass function p(x,y), it is defined as H(X|Y) = -\sum_{x,y} p(x,y) \log_2 p(x|y), where p(x|y) is the conditional probability mass function of X given Y = y.^[1]^[2] This measure extends Shannon entropy to dependent variables and can equivalently be expressed via the chain rule as H(X|Y) = H(X,Y) - H(Y), where H(X,Y) is the joint entropy and H(Y) is the marginal entropy of Y.^[1]^[2] Introduced by Claude Shannon in his seminal 1948 paper on communication theory, conditional entropy captures the reduction in uncertainty about X upon observing Y, playing a central role in analyzing noisy channels and data dependencies.^[1] Key properties include non-negativity, H(X|Y) \geq 0, with equality when X is a deterministic function of Y; and an upper bound, H(X|Y) \leq H(X), with equality if X and Y are independent, indicating no shared information.^[2] It also satisfies the chain rule for multiple variables: H(X_1, \dots, X_n) = \sum_{i=1}^n H(X_i | X_1, \dots, X_{i-1}), enabling decomposition of joint entropies in sequential processes.^[2] Conditional entropy is intimately linked to mutual information, defined as I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X), which measures the shared information between X and Y.^[2] This connection underpins applications in source coding, where it helps determine the minimal bits needed to represent X given side information Y, and in channel capacity calculations, such as C = \max_p [H(Y) - H(Y|X)], quantifying reliable transmission rates over noisy channels.^[1]^[2] For continuous random variables, the definition generalizes to H(X|Y) = -\iint p(x,y) \log_2 \frac{p(x,y)}{p(y)} \, dx \, dy, maintaining similar properties and extending to differential entropy contexts like signal processing.^[1] In modern extensions, it informs entropy rates for stochastic processes and machine learning tasks involving conditional modeling.^[2]

Fundamentals

Definition

Conditional entropy quantifies the average uncertainty remaining in a discrete random variable X given knowledge of another discrete random variable Y.^[1] To define it, first recall the entropy of a single discrete random variable X with probability mass function p_X(x) over a finite or countable space:

H(X) = -\sum_x p_X(x) \log_2 p_X(x).

This quantity, introduced by Shannon, measures uncertainty in bits and is always non-negative.^[1] The conditional entropy H(X|Y) is then given by the expectation of the conditional entropy H(X|Y=y) with respect to the probability mass function p_Y(y) of Y:

H(X|Y) = \sum_y p_Y(y) \, H(X|Y=y),

where

H(X|Y=y) = -\sum_x p_{X|Y}(x|y) \log_2 p_{X|Y}(x|y).

Equivalently, it can be expressed in joint form using the joint probability mass function p_{X,Y}(x,y):

H(X|Y) = -\sum_{x,y} p_{X,Y}(x,y) \log_2 p_{X|Y}(x|y).

Here, the logarithms are base-2 to measure uncertainty in bits, and the sums are over the supports of the random variables.^[1]

Motivation

Conditional entropy provides an intuitive measure of the average uncertainty remaining in a random variable Y even after observing another random variable X, in contrast to the unconditional entropy H(Y), which quantifies the total uncertainty without any side information.^[1]^[3] This concept captures how much additional information is needed to describe Y when X is known, reflecting the persistent randomness or unpredictability in Y despite the conditioning.^[1] To illustrate, consider the outcome of a die roll (Y) given the day of the week (X). If the die is fair and independent of the day, the conditional entropy H(Y|X) equals H(Y) = \log_2 6 \approx 2.585 bits, indicating no reduction in uncertainty from knowing X. However, if the die is fair on weekdays but biased toward even numbers on weekends, observing X reduces the average uncertainty, yielding H(Y|X) < H(Y), as the side information from X makes Y's distribution more predictable on average.^[3] The reduction in uncertainty from observing X, given by H(Y) - H(Y|X), corresponds to the mutual information I(X;Y), often termed information gain, which quantifies the shared information between X and Y.^[3]^[4] This relation highlights conditional entropy's role in assessing how much one variable reveals about another. Introduced by Claude Shannon in his seminal 1948 paper "A Mathematical Theory of Communication," conditional entropy (also called equivocation) emerged to model communication channels where side information, such as a noisy received signal, affects the uncertainty of the original message.^[1] Shannon motivated it as the "average ambiguity in the received signal," essential for determining effective transmission rates in the presence of noise.^[1] Conditional entropy is crucial in data compression, where it bounds the bits needed to encode sources with side information, as in Slepian-Wolf coding.^[3] In cryptography, it measures the remaining uncertainty in plaintext or keys given ciphertext or eavesdropper knowledge, underpinning security analyses like conditional min-entropy for randomness extraction.^[5]^[6] In machine learning, it supports feature selection and decision trees via information gain, enhancing predictability in models with interdependent variables.^[4]^[7]

Properties of Discrete Conditional Entropy

Non-Negativity and Zero Conditional Entropy

The conditional entropy H(Y|X) satisfies H(Y|X) \geq 0 for any joint probability distribution over discrete random variables X and Y. This non-negativity arises because the conditional entropy can be expressed as the expectation H(Y|X) = \sum_x p(x) H(Y \mid X = x), where each term H(Y \mid X = x) \geq 0 by the non-negativity of entropy for a fixed conditional distribution, and p(x) \geq 0 with \sum_x p(x) = 1. A more formal proof leverages Jensen's inequality applied to the concave entropy function, confirming that the average entropy over the distribution of X cannot be negative. Equality holds, i.e., H(Y|X) = 0, if and only if Y is a deterministic function of X, meaning that for every x with p(x) > 0, the conditional distribution p_{Y|X}(\cdot \mid x) is degenerate (concentrated on a single outcome). In this case, knowing X completely resolves the uncertainty in Y, as there is no remaining randomness in the conditional distributions. For example, if Y = f(X) for some deterministic function f, then H(Y|X) = 0, since Y is fully determined by X with probability 1. This property underscores the role of conditional entropy in quantifying residual uncertainty after conditioning, with zero indicating perfect predictability.

Behavior Under Independence

When random variables X and Y are statistically independent, the conditional entropy H(Y|X) simplifies to the unconditional entropy H(Y). This result indicates that knowledge of X provides no reduction in the uncertainty about Y, as the side information from X is irrelevant to predicting outcomes of Y. The proof follows directly from the definition of conditional entropy. Independence implies that the conditional probability p_{Y|X}(y|x) = p_Y(y) for all x and y. Substituting into the conditional entropy formula yields:

H(Y|X) = -\sum_x p_X(x) \sum_y p_{Y|X}(y|x) \log p_{Y|X}(y|x) = -\sum_x p_X(x) \sum_y p_Y(y) \log p_Y(y) = \sum_y p_Y(y) \log p_Y(y) = H(Y),

where the summation over x factors out due to the independence. This property has broader implications in information theory, as it establishes that the mutual information I(X;Y) = 0 if and only if H(Y|X) = H(Y), confirming that independence corresponds to zero information sharing between the variables. For example, consider Y as the outcome of a fair coin flip (heads or tails, each with probability $1/2) and X as the local weather condition (e.g., sunny or rainy), where the two are independent. Here, H(Y) = 1 bit, and observing the weather X does not alter the uncertainty about the coin flip, so H(Y|X) = 1 bit as well.

Chain Rule

The chain rule for entropy expresses the joint entropy of two random variables X and Y as the sum of the entropy of X and the conditional entropy of Y given X:

H(X,Y) = H(X) + H(Y|X).

This relation also holds symmetrically:

H(X,Y) = H(Y) + H(X|Y).

^[1] To derive this, start from the definition of joint entropy:

H(X,Y) = -\sum_{x,y} p(x,y) \log p(x,y).

Substitute the chain rule for probability, p(x,y) = p(x) p(y|x), into the logarithm:

\log p(x,y) = \log p(x) + \log p(y|x).

Thus,

H(X,Y) = -\sum_{x,y} p(x,y) [\log p(x) + \log p(y|x)] = -\sum_{x,y} p(x,y) \log p(x) - \sum_{x,y} p(x,y) \log p(y|x).

The first term simplifies to H(X), and the second to H(Y|X), yielding the chain rule.^[1] This rule extends to multiple random variables X_1, \dots, X_n:

H(X_1, \dots, X_n) = H(X_1) + \sum_{i=2}^n H(X_i \mid X_1, \dots, X_{i-1}).

The extension follows by iterative application of the two-variable case. The chain rule is particularly useful in applications such as sequential prediction, where it decomposes the uncertainty in predicting future outcomes based on past observations, and in modeling dependencies within Markov chains, where conditional entropies capture transition uncertainties. In general, the rule holds for any finite number of discrete random variables, facilitating recursive computation of joint entropies from conditional components.

Relation to Bayes' Rule and Mutual Information

Conditional entropy plays a central role in defining mutual information, a measure of the shared information between two random variables X and Y. Specifically, the mutual information I(X; Y) is given by the difference between the marginal entropy of Y and its conditional entropy given X:

I(X; Y) = H(Y) - H(Y \mid X).

This expression quantifies the reduction in uncertainty about Y upon learning X.^[1]^[8] Due to the symmetry in the underlying joint distribution, mutual information can equivalently be expressed using the conditional entropy of X given Y:

I(X; Y) = H(X) - H(X \mid Y).

This symmetry highlights that mutual information captures the bidirectional dependence between the variables. Furthermore, I(X; Y) is always non-negative, I(X; Y) \geq 0, with equality holding precisely when X and Y are independent, in which case the conditional entropy equals the marginal entropy.^[1]^[8] An alternative formulation arises from the chain rule for joint entropy, H(X, Y) = H(X) + H(Y \mid X) = H(Y) + H(X \mid Y), leading to

I(X; Y) = H(X) + H(Y) - H(X, Y).

This form emphasizes mutual information as the amount by which the sum of the marginal entropies exceeds the joint entropy, reflecting the dependence structure. Bayes' rule, which relates conditional probabilities via P(X \mid Y) = \frac{P(Y \mid X) P(X)}{P(Y)}, underpins the probabilistic conditioning in these entropy measures, enabling the computation of posteriors that inform the conditional distributions used in the definitions.^[8] The interpretation of mutual information as the entropy reduction due to conditioning is fundamental to methods like the information bottleneck, which seeks to compress input data while preserving relevant information about an output variable by minimizing I(X; T) subject to a constraint on I(T; Y), where T is a compressed representation. This approach balances compression and predictive power, with applications in feature extraction and neural network compression.^[9] As an illustrative example, consider a binary symmetric channel (BSC) with input X \in \{0, 1\} drawn uniformly and crossover probability p < 0.5, where the output Y equals X with probability $1 - p and flips with probability p. The mutual information I(X; Y) measures the transmitted information and equals $1 - h_2(p), where h_2(p) = -p \log_2 p - (1-p) \log_2 (1-p) is the binary entropy function. For p = 0, I(X; Y) = 1 bit (perfect transmission), while for p = 0.5, I(X; Y) = 0 (no information transmitted). This capacity expression demonstrates how conditional entropy H(Y \mid X) = h_2(p) limits the reliable information flow.^[8]

Additional Properties

One key property of discrete conditional entropy is its monotonicity under additional conditioning. Specifically, for random variables X, Y, and Z, the inequality H(Y \mid X, Z) \leq H(Y \mid X) holds, indicating that conditioning on more information (via Z) cannot increase the uncertainty in Y given X. This follows as an implication of the data processing inequality in information theory, where Z represents additional relevant information about Y. Equality is achieved when Y and Z are conditionally independent given X.^[10] Another important inequality is subadditivity of conditional entropy. For random variables Y_1, Y_2, and X, it satisfies H(Y_1, Y_2 \mid X) \leq H(Y_1 \mid X) + H(Y_2 \mid X), meaning the conditional entropy of the joint distribution is bounded above by the sum of the individual conditional entropies. This property arises from the chain rule for entropy and holds with equality if and only if Y_1 and Y_2 are conditionally independent given X. It plays a role in multi-user coding scenarios, such as Slepian-Wolf coding.^[10] Conditioning also reduces entropy on average, as expressed by H(Y \mid X) \leq H(Y), with equality if and only if X and Y are independent. This fundamental inequality reflects that knowledge of X decreases the uncertainty in Y by an amount equal to their mutual information, I(X; Y) \geq 0. It underpins many results in source coding and rate-distortion theory.^[10] Regarding uniqueness, the conditional entropy H(Y \mid X) is well-defined because the underlying conditional probability distribution P_{Y \mid X} is unique up to sets of measure zero. This ensures that the entropy, computed as an expectation over these distributions, remains invariant under such null set modifications.^[11]

Conditional Differential Entropy

Definition

The conditional differential entropy extends the concept of conditional entropy from discrete random variables to continuous ones, measuring the average uncertainty in a continuous random variable Y given knowledge of another continuous random variable X.^[1] To define it, first recall the differential entropy of a single continuous random variable Y with probability density function p_Y(y) over a continuous space such as \mathbb{R}^n:

h(Y) = -\int p_Y(y) \log_2 p_Y(y) \, dy.

This quantity, introduced by Shannon, differs from the discrete entropy in that it is defined using integrals rather than sums and can take negative values, reflecting the relative nature of densities in continuous spaces.^[1]^[12] The conditional differential entropy h(Y|X) is then given by the expectation of the conditional differential entropy h(Y|X=x) with respect to the density p_X(x) of X:

h(Y|X) = \int p_X(x) \, h(Y|X=x) \, dx,

where

h(Y|X=x) = -\int p_{Y|X}(y|x) \log_2 p_{Y|X}(y|x) \, dy.

Equivalently, it can be expressed in joint form using the joint density p_{X,Y}(x,y):

h(Y|X) = -\iint p_{X,Y}(x,y) \log_2 p_{Y|X}(y|x) \, dx \, dy.

Here, the logarithms are base-2 to measure uncertainty in bits, and the integrals are over the supports of the densities in continuous spaces like \mathbb{R}^n.^[1]

Key Properties

The conditional differential entropy h(Y \mid X) shares several properties with its discrete counterpart H(Y \mid X), but exhibits distinct behaviors due to the continuous nature of the underlying distributions, assuming the joint distribution of X and Y admits a density with respect to a product measure (absolute continuity).^[10] A fundamental property is the chain rule, which states that the joint differential entropy equals the marginal plus the conditional:

h(X, Y) = h(X) + h(Y \mid X) = h(Y) + h(X \mid Y).

This holds under the absolute continuity condition and mirrors the discrete chain rule H(X, Y) = H(X) + H(Y \mid X).^[10]^[13] The conditional differential entropy relates directly to the joint and marginal entropies via

h(Y \mid X) = h(X, Y) - h(X),

analogous to the discrete relation H(Y \mid X) = H(X, Y) - H(X).^[10] If X and Y are independent, then h(Y \mid X) = h(Y), reflecting that knowledge of X provides no additional information about Y.^[10]^[14] Unlike the discrete case, where H(Y \mid X) \geq 0, the conditional differential entropy h(Y \mid X) can be negative. This occurs when the conditional density f_{Y \mid X} is highly concentrated, such as for a uniform distribution on an interval of length less than 1; for example, if Y \mid X = x is uniform on [0, a] with a < 1, then h(Y \mid X = x) = \log a < 0.^[10]^[13] Conditioning generally reduces uncertainty, so h(Y \mid X, Z) \leq h(Y \mid X), following from the non-negativity of conditional mutual information I(Y; Z \mid X) \geq 0; however, for continuous variables, equality holds under independence given X, and the inequality may not imply the same strict bounds as in discrete settings due to possible negative values.^[14]^[10] The conditional differential entropy is translation invariant: h(Y + c \mid X) = h(Y \mid X) for any constant c, as shifting Y does not alter the density shape in the entropy integral. However, it depends on units of measurement, scaling as h(aY \mid X) = h(Y \mid X) + \log |a| for scalar a \neq 0, which highlights its sensitivity to the choice of reference measure unlike invariant discrete entropy.^[15]^[10]

Relation to Estimation Error

In the context of estimating a continuous random variable Y from an observation X, the minimum mean squared error (MMSE) is defined as \mathrm{MMSE} = \mathbb{E}[(Y - \mathbb{E}[Y|X])^2] = \mathbb{E}[\mathrm{Var}(Y|X)]. A fundamental lower bound from information theory states that \mathrm{MMSE} \geq \frac{2^{2 h(Y|X)}}{2 \pi e}. This inequality arises because, for any random variable Z, the variance satisfies \mathrm{Var}(Z) \geq \frac{2^{2 h(Z)}}{2 \pi e}, with equality if and only if Z is Gaussian; applying this conditionally to the error Y - \mathbb{E}[Y|X] and using Jensen's inequality on the convex function $2^{2h} yields the bound. An alternative derivation leverages de Bruijn's identity, which connects the evolution of differential entropy under additive Gaussian noise to the Fisher information J: \frac{d}{dt} h(X + \sqrt{t} N) = \frac{1}{2} J(X + \sqrt{t} N), where N \sim \mathcal{N}(0, I).^[17] Combined with the Cramér-Rao bound, which lower-bounds the estimation variance by the reciprocal of the Fisher information, this establishes that higher conditional entropy corresponds to greater inherent uncertainty, limiting the accuracy of any estimator.^[17] Thus, the conditional differential entropy quantifies a fundamental limit on estimation precision, independent of the specific estimation method. Consider the additive Gaussian noise channel Y = X + N, where N \sim \mathcal{N}(0, \sigma^2) is independent of X. Here, h(Y|X) = h(N) = \frac{1}{2} \log_2(2\pi e \sigma^2), and the MMSE equals \sigma^2, achieving equality in the bound since the conditional error is Gaussian. This example illustrates how noise variance directly ties to conditional entropy and MSE in linear estimation settings. Beyond direct estimation, the bound informs rate-distortion theory for source coding with side information at the decoder, where the minimal rate to achieve distortion D (e.g., MSE) is R(D) = \min I(X; \hat{X} | Z), with the min over distributions satisfying \mathbb{E}[d(X, \hat{X})] \leq D and Z as side information; the entropy bound constrains achievable D relative to h(X|Z).^[18] Such connections were pioneered in the 1960s–1970s by Pinsker and contemporaries, applying information measures to signal processing and statistical estimation problems.^[19]

Quantum Conditional Entropy

Definition in Quantum Information Theory

In quantum information theory, the conditional entropy generalizes the classical notion to quantum systems described by density operators. For a bipartite quantum state represented by the density operator \rho_{AB} acting on the tensor product Hilbert space \mathcal{H}_A \otimes \mathcal{H}_B, the quantum conditional entropy of subsystem A given subsystem B is defined as

H(A|B)_{\rho} = H(\rho_{AB}) - H(\rho_B),

where H(\cdot) denotes the von Neumann entropy, given by H(\rho) = -\operatorname{Tr}(\rho \log \rho) for a density operator \rho, and \rho_B = \operatorname{Tr}_A(\rho_{AB}) is the reduced density operator on \mathcal{H}_B obtained via the partial trace over \mathcal{H}_A.^[20] This definition parallels the classical conditional entropy H(Y|X), with subsystems A and B playing roles analogous to the random variables Y and X, respectively. The von Neumann entropy itself serves as the quantum analog of the Shannon entropy, quantifying the uncertainty or mixedness in a quantum state.^[20] In the classical limit, where \rho_{AB} is diagonal in a product basis (corresponding to a classical joint probability distribution), the quantum conditional entropy reduces precisely to the classical Shannon conditional entropy H(Y|X).^[21] This recovery ensures consistency between the quantum and classical frameworks when quantum superpositions and coherences are absent. A key distinction from the classical case arises because the quantum conditional entropy H(A|B)_{\rho} can take negative values, which occurs for entangled states and signifies stronger-than-classical correlations between subsystems A and B.^[20]^[22] Such negativity has no direct classical analog and highlights the role of entanglement in quantum information processing.

Distinct Properties and Interpretations

One distinctive feature of quantum conditional entropy is its capacity to take negative values, unlike its classical counterpart, which is always non-negative. For a bipartite quantum state \rho_{AB}, the conditional entropy H(A|B) = H(AB) - H(B) is negative if \rho_{AB} has distillable entanglement (and holds if and only if entangled for pure states), since separable states yield non-negative values. This negativity arises because the joint von Neumann entropy H(AB) can be smaller than the marginal entropy H(B), implying that the correlations in \rho_{AB} reduce the overall uncertainty beyond what the subsystem B alone suggests; such a phenomenon is impossible in classical systems and serves as an entanglement witness. The negative conditional entropy quantifies "quantum partial information," indicating that the subsystem A provides more information about B than required classically, facilitating tasks like state merging where entangled states allow free transfer of quantum information. The negativity of H(A|B) is intimately linked to the coherent information, defined for a state \rho_{AB} as I_c(A \rangle B) = H(B) - H(AB) = -H(A|B). This equivalence positions negative conditional entropy as a measure of the potential for quantum communication, where I_c upper-bounds the reliable transmission rate of quantum information through noisy channels. In entangled systems, the negative value signals that correlations enable distillation of pure entanglement, enhancing communication efficiency beyond classical limits. Quantum conditional entropy satisfies strong subadditivity, expressed as H(A|B) + H(B|C) \geq H(A|C) for any tripartite state \rho_{ABC}. This inequality, proven using operator inequalities for density matrices, ensures that conditioning on additional subsystems cannot decrease the conditional entropy monotonically, reflecting the non-increasing nature of quantum correlations under partial tracing. It plays a foundational role in quantum information inequalities, implying the positivity of conditional mutual information I(A:B|C) \geq 0. The chain rule for quantum conditional entropy holds exactly as H(A,B|C) = H(A|C) + H(B|A,C), mirroring the classical form but applicable to non-commuting quantum observables. This additivity allows decomposition of multipartite entropies, essential for analyzing complex quantum networks without additional quantum-specific corrections. In quantum communication, negative conditional entropies bound channel capacities; the quantum capacity of a channel \mathcal{N} is given by Q(\mathcal{N}) = \max_{\rho} I_c(A \rangle B) where B is the channel output, directly leveraging -H(A|B) to quantify entanglement-assisted transmission rates. For quantum error correction, conditional entropy interprets code performance: a code corrects errors if it preserves low conditional entropy between logical and physical qubits, ensuring information recovery with rates tied to entropy deficits. In cryptography, squashed entanglement, defined as E_{sq}(A:B) = \frac{1}{2} \inf I(A:B|E) over extensions \rho_{ABE}, uses conditional mutual information derived from conditional entropies to measure secure entanglement, providing monogamy bounds for quantum key distribution.^[23] Recent applications in quantum thermodynamics highlight conditional entropy's role in open systems, where it quantifies irreversibility via conditional entropy production, capturing dissipative information flow between system S and reference R interacting indirectly through the environment.^[24] In non-equilibrium Gaussian processes, negative conditional entropies enable fluctuation theorems that bound work extraction, revealing thermodynamic costs of maintaining quantum correlations in open dynamics.^[24] These insights, emerging post-2010, extend to collisional models and Maxwell's demon protocols, where conditional entropy production remains positive even at thermal equilibrium, signaling hidden informational nonequilibria.

References

[1]
[PDF] A Mathematical Theory of Communication
We define the conditional entropy of y, Hx(y) as the average of the entropy ... information, namely the entropy of the underlying stochastic process.
[2]
[PDF] Entropy and Information Theory - Stanford Electrical Engineering
Examples are entropy, mutual information, conditional entropy, conditional information, and relative entropy (discrimination, Kullback-Leibler information) ...
[3]
Differential entropy can be negative | The Book of Statistical Proofs
Mar 2, 2020 · Theorem: Unlike its discrete analogue, the differential entropy can become negative. Proof: Let X X be a random variable following a ...
[4]
[PDF] Entropy, Relative Entropy and Mutual Information
We also define the conditional entropy of a random variable given another as the expected value of the entropies of the conditional distributions, averaged over ...
[5]
[PDF] Decision Trees: Information Gain - Washington
Conditional entropy H(X|Y) of X given Y : Mututal information (aka Information Gain) of X and Y : ... Information Gain is the mutual information between.
[6]
Conditional Entropy - an overview | ScienceDirect Topics
Conditional entropy is defined as the uncertainty about a random variable Y given that another random variable X is known. It quantifies the expected number ...
[7]
[PDF] Some Notions of Entropy for Cryptography - Computer Science
Jun 2, 2011 · The paper discusses min-entropy, conditional min-entropy, average min-entropy, and HILL entropy, which are used in cryptography.
[8]
Entropy in machine learning — applications, examples, alternatives
Sep 17, 2024 · H ( x ∣ y ) H(x∣y) is also called conditional entropy. If knowing y completely determines x, the conditional entropy is zero because there's no ...
[9]
Elements of Information Theory | Wiley Online Books
Author Bios A recipient of the 1991 IEEE Claude E. Shannon Award, Dr. Cover is a past president of the IEEE Information Theory Society, a Fellow of the IEEE ...
[10]
[physics/0004057] The information bottleneck method - arXiv
Apr 24, 2000 · We squeeze the information that X provides about Y through a bottleneck formed by a limited set of codewords tX.
[11]
[PDF] Elements of Information Theory
Page 1. Page 2. ELEMENTS OF. INFORMATION THEORY. Second Edition. THOMAS M. COVER ... Elements of Information Theory, Second Edition, By Thomas M. Cover and Joy A ...
[12]
[PDF] conditional probability
All equations involving conditional probabilities must be qualified in this way by the phrase with probability 1, because the conditional probability is unique ...
[13]
[PDF] ECE 587 / STA 563: Lecture 7 – Differential Entropy - Galen Reeves
Aug 24, 2023 · • The conditional differential entropy of X given Y is defined by h(X | Y ) = −. Z f(x, y) log f(x | y)dxdy. It can also be expressed as h(X ...
[14]
[PDF] Lecture 2: Entropy and mutual information
... non-negative quantity, we will have shown that the mutual information is also non-negative. Proof of non-negativity of relative entropy: Let p(x) and q(x) ...
[15]
Properties of Differential Entropy and Related Measures
Differential entropy is translation invariant: h(X + c) = h(X) · Differential entropy is not scale invariant: h(cX) = h(X) + log |c| · The differential entropy of ...
[16]
[PDF] deBruijn identities: from Shannon, Kullback–Leibler and Fisher to ...
The deBruijn identity is important for various reasons: (i) it quantifies the loss or gain of entropy at the output of the Gaussian channel versus the noise ...
[17]
[PDF] The Rate-Distortion Function for Source Coding with Side ... - MIT
This completes the proof of Lemma 5. Page 9. WYNER AND ZIV: RATE-DISTORTION. FUNCTION. FOR SOURCE CODING.
[18]
Information and information stability of random variables and ...
Information and information stability of random variables and processes · D. Brillinger, M. S. Pinsker, A. Feinstein · Published 1964 · Mathematics.
[19]
[PDF] quantum-computation-and-quantum-information-nielsen-chuang.pdf
This comprehensive textbook describes such remarkable effects as fast quantum algorithms, quantum teleportation, quantum cryptography, and quantum error- ...
[20]
[PDF] PHYSICA - The Adami Lab
In classical statistical physics, the concept of conditional and mutual probabilities has given rise to the definition of conditional and mutual entropies.Missing: explanation | Show results with:explanation
[21]
[PDF] Lecture 18 — October 26, 2015 1 Overview 2 Quantum Entropy
It is the quantum generalization of the Shannon entropy, but it captures both classical and quantum uncertainty in a quantum state. The von Neumann entropy ...
[22]
[quant-ph/0505062] Quantum information can be negative - arXiv
May 9, 2005 · In quantum physics, partial information can be negative, unlike classical cases where it must be positive. Negative partial information gives ...
[23]
An Additive Entanglement Measure - quant-ph - arXiv
Aug 16, 2003 · In this paper, we present a new entanglement monotone for bipartite quantum states. Its definition is inspired by the so-called intrinsic information of ...
[24]
Conditional entropy production and quantum fluctuation theorem of dissipative information: Theory and experiments
### Summary of Quantum Conditional Entropy in Open Quantum Systems and Thermodynamics