Fact-checked by Grok 2 weeks ago

Pointwise mutual information

Pointwise mutual information (PMI) is a measure of association between two discrete events or random variables, quantifying how much more (or less) information one event provides about the other compared to independence.^[1] Formally defined as \text{PMI}(x; y) = \log_2 \frac{P(x,y)}{P(x) P(y)}, where P(x,y) is the joint probability of x and y, and P(x) and P(y) are their marginal probabilities, it was introduced by Kenneth Ward Church and Patrick Hanks in 1989 to statistically derive word association norms from large text corpora.^[1] Estimated using frequency counts from a corpus—such as P(x) = f(x)/N where f(x) is the frequency of x and N is the total number of observations—PMI enables objective analysis of co-occurrences, replacing subjective psychological association tests.^[1] In natural language processing (NLP), PMI serves as a foundational tool for identifying significant word co-occurrences, such as in collocation extraction (e.g., "strong tea" over "powerful tea") and constructing term co-occurrence matrices for distributional semantics, where higher PMI values indicate stronger semantic or syntactic associations.^[2] It has applications in speech recognition, parsing, information retrieval, and lexicography by highlighting lexico-syntactic patterns like verb-preposition pairs within defined windows (e.g., five words).^[1] Despite its utility, PMI can produce negative values for unlikely co-occurrences (indicating repulsion) and is biased toward rare events; to mitigate this, variants like positive PMI (PPMI), defined as \max(\text{PMI}(w,c), 0), discard negatives, while normalized PMI (NPMI), given by \text{NPMI}(x;y) = \frac{\text{PMI}(x;y)}{-\log_2 P(x,y)} and ranging from -1 to 1, provides a bounded scale with 0 for independence and better handling of low frequencies.^[2]^[3]

Introduction and Definition

Historical Background

Pointwise mutual information (PMI) traces its origins to the foundational work in information theory established by Claude Shannon in 1948, where mutual information was introduced as a measure of the average amount of information one random variable provides about another, quantified as the expected value over joint probabilities.^[4] This average formulation laid the groundwork for understanding dependencies in communication systems, but it was not until 1961 that Robert Fano explicitly formalized the pointwise counterpart in his book Transmission of Information: A Statistical Theory of Communications, where he described the instantaneous or event-specific measure of association between outcomes, initially referring to it under the term "mutual information" (now reserved for the average).^[5] Fano's contribution extended Shannon's framework by focusing on the log-ratio of joint to marginal probabilities for specific events, enabling applications in statistical communication theory and early pattern recognition tasks during the 1960s and 1970s.^[5] Warren Weaver played a key role in interpreting and disseminating these concepts through his collaboration with Shannon, co-authoring The Mathematical Theory of Communication in 1949, which popularized mutual information as a tool for analyzing redundancy and dependency in signals, thereby bridging technical theory with broader scientific interpretations. In the decades following, the PMI measure appeared in statistical contexts alongside other association metrics, such as log-likelihood ratios, for testing dependencies in contingency tables. The concept gained prominence in linguistics with the seminal paper by Kenneth Ward Church and Patrick Hanks in 1989, who coined the term "pointwise mutual information" and applied it to measure word associations in large corpora, demonstrating its utility for identifying collocations and informing lexicography by quantifying how much more often words co-occur than expected under independence.^[1] This work marked a pivotal adoption in computational linguistics, shifting from theoretical statistics to practical, corpus-based methods in the 1990s, where PMI became integral to natural language processing tasks such as phrase extraction and semantic analysis. Mutual information, as the expected value of PMI, underscores this historical progression from average dependencies to pointwise specificity.^[4]

Formal Definition

Pointwise mutual information (PMI) between two discrete events x and y is formally defined as

\text{PMI}(x; y) = \log_2 \left( \frac{P(x,y)}{P(x) P(y)} \right),

where P(x,y) denotes the joint probability that both events x and y occur simultaneously, and P(x) and P(y) are the respective marginal probabilities that x or y occurs alone.^[5] This formulation originates from information theory, where it serves as the pointwise contribution to the overall mutual information between random variables. PMI measures the deviation of the observed co-occurrence probability P(x,y) from the probability expected under statistical independence, P(x) P(y), on a per-event basis rather than as an average across all possible events. A positive value indicates that x and y co-occur more frequently than independence would predict, suggesting a positive association; a value of zero signifies exact independence; and a negative value implies co-occurrence less frequent than expected, indicating repulsion or negative association. The PMI can be expressed in terms of self-information as \text{PMI}(x; y) = I(x) + I(y) - I(x,y), where I(z) = -\log_2 P(z) is the self-information of event z. The range of PMI extends from -\infty, which occurs when P(x,y) = 0 but P(x) > 0 and P(y) > 0 (complete repulsion, as the events never co-occur despite individual occurrences), to a finite upper bound of \log_2 \left[ \min\left(1/P(x), 1/P(y)\right) \right], achieved when P(x,y) reaches its theoretical maximum of \min(P(x), P(y)) (maximum possible association). Due to the symmetry in the defining formula, \text{PMI}(x; y) = \text{PMI}(y; x). To illustrate, consider a small corpus of 100 word tokens where the target word "ice" appears 5 times, the context word "cream" appears 4 times, and the pair "ice cream" appears together 2 times. The probabilities are P(\text{ice}) = 0.05, P(\text{cream}) = 0.04, and P(\text{ice}, \text{cream}) = 0.02. Thus,

\text{PMI}(\text{ice}; \text{cream}) = \log_2 \left( \frac{0.02}{0.05 \times 0.04} \right) = \log_2 10 \approx 3.32,

indicating a strong positive association. For comparison, if the co-occurrence were independent (P(\text{ice}, \text{cream}) = 0.002), then \text{PMI} = 0. The following contingency table summarizes the example counts (with "not cream" comprising the remaining 96 tokens):

	cream	not cream	Total
ice	2	3	5
not ice	2	94	96
Total	4	97	100

From this table, \text{PMI}(\text{ice}; \text{not cream}) = \log_2 \left( \frac{0.03}{0.05 \times 0.97} \right) \approx -0.68, reflecting a mild negative association.

Theoretical Foundations

Relation to Mutual Information

Pointwise mutual information (PMI) serves as the building block for mutual information (MI), where MI represents the expected value of PMI over the joint distribution of two random variables X and Y. Specifically, the mutual information is given by

I(X; Y) = \mathbb{E}[\text{PMI}(x; y)] = \sum_{x, y} P(x, y) \cdot \text{PMI}(x; y),

where the summation weights each pointwise value by the joint probability, effectively averaging the local associations to yield a global measure of dependence.^[2]^[6] This relationship stems from the definitions of both quantities in information theory. The PMI for specific outcomes x and y is defined as \text{PMI}(x; y) = \log_2 \frac{P(x, y)}{P(x) P(y)}, which can be expressed in terms of self-information—the negative logarithm of the probability of an event—as \text{PMI}(x; y) = I(x) + I(y) - I(x, y), where I(x) = -\log_2 P(x) is the self-information of x, I(y) = -\log_2 P(y) for y, and I(x, y) = -\log_2 P(x, y) is the joint self-information.^[2]^[6] This formulation highlights how PMI quantifies the reduction in uncertainty about one event given the other at the instance level, mirroring the interpretive role of MI but without averaging.^[7] Both PMI and MI measure the degree of statistical dependence between variables, with positive values indicating associations stronger than independence and values near zero suggesting independence; however, PMI can be negative for events less likely to co-occur than expected under independence, while MI is always non-negative as an expectation.^[2] In practice, MI assesses the overall dependence across the entire distribution, making it suitable for characterizing probabilistic relationships globally, whereas PMI focuses on specific pairs, often amplifying the signal for rare events due to the logarithmic scaling that penalizes low-probability occurrences more severely.^[2]^[8] To derive the connection, consider the definition of MI:

I(X; Y) = \sum_{x, y} P(x, y) \log_2 \frac{P(x, y)}{P(x) P(y)}.

Substituting the PMI expression directly yields the equivalence, as the sum is precisely the expectation of PMI under the joint distribution P(x, y); this holds probabilistically because the weighting by P(x, y) ensures the average reflects the distribution's structure, providing a foundational link between local and global information measures.^[6]^[7]

Chain Rule

The chain rule for pointwise mutual information (PMI) decomposes the association between a variable and a joint event into successive conditional associations, mirroring the structure of the chain rule in information theory. For random variables X, Y, and Z, this is expressed as

\text{PMI}(X; YZ) = \text{PMI}(X; Y) + \text{PMI}(X; Z \mid Y),

where the conditional PMI is defined as

\text{PMI}(X; Z \mid Y) = \log_2 \frac{P(X,Z \mid Y)}{P(X \mid Y) P(Z \mid Y)}.

This relation follows directly from the chain rule of probability applied to the joint distribution P(X,Y,Z), and holds pointwise just as the analogous chain rule does for mutual information.$$](https://www.wiley.com/en-us/Elements+of+Information+Theory%2C+2nd+Edition-p-9780471241959) The interpretation of this decomposition is that \text{PMI}(X; Y) captures the base association between X and Y, while \text{PMI}(X; Z \mid Y) quantifies the additional information that Z provides about X once Y is known, revealing sequential dependencies in the events.[(https://arxiv.org/abs/2507.15372) This mirrors the [entropy](/page/Entropy) [chain rule](/page/Chain_rule), where [joint](/page/Joint) [uncertainty](/page/Uncertainty) is broken into marginal and conditional components, but applied here to specific realizations rather than expectations.](https://www.wiley.com/en-us/Elements+of+Information+Theory%2C+2nd+Edition-p-9780471241959) The chain rule extends iteratively to n variables, allowing a full decomposition of the association as [ \text{PMI}(X_1; X_2 \dots X_n) = \sum_{i=2}^n \text{PMI}(X_1; X_i \mid X_2 \dots X_{i-1}),

by repeated application of the bivariate form.$$\](https://arxiv.org/abs/2507.15372) To illustrate, consider a simple three-event scenario analogous to word trigrams in a corpus, with events $x$, $y$, $z$ having $P(x) = 0.05$, $P(y) = 0.1$, $P(z) = 0.1$, $P(yz) = 0.05$, $P(xy) = 0.04$, and $P(xyz) = 0.025$. Then, \[ \text{PMI}(x; y) = \log_2 \frac{0.04}{0.05 \times 0.1} = \log_2 8 = 3

bits. The conditional probabilities are P(x \mid y) = 0.4, P(z \mid y) = 0.5, and P(xz \mid y) = 0.25, so

\text{PMI}(x; z \mid y) = \log_2 \frac{0.25}{0.4 \times 0.5} = \log_2 1.25 \approx 0.32

bits. Finally,

\text{PMI}(x; yz) = \log_2 \frac{0.025}{0.05 \times 0.05} = \log_2 10 \approx 3.32

bits, confirming $3 + 0.32 = 3.32. This iterative decomposition proves useful for hierarchical models or sequential data analysis, where complex associations can be unraveled into layered conditional dependencies.$$](https://core.ac.uk/download/pdf/40026500.pdf)

Variants

Positive PMI

Positive pointwise mutual information (PPMI) addresses a limitation of the standard pointwise mutual information (PMI) by transforming negative values to zero, thereby focusing exclusively on positive associations between events. Formally, it is defined as

\text{PPMI}(x;y) = \max\left( \text{PMI}(x;y), 0 \right),

where \text{PMI}(x;y) = \log_2 \frac{P(x,y)}{P(x) P(y)} measures the deviation from independence in the joint probability P(x,y) relative to the marginals P(x) and P(y). The motivation for PPMI arises from the fact that negative PMI values occur when P(x,y) < P(x) P(y), indicating co-occurrences less frequent than expected under independence; these often reflect mere independence or mutual exclusion rather than useful signals for association, potentially adding noise in tasks like word similarity computation or collocation detection. By thresholding at zero, PPMI discards such cases, yielding a measure that highlights only attractive dependencies while avoiding the interpretive challenges of negative scores. Key properties of PPMI include its non-negativity, ensuring all values are \geq 0, and symmetry, \text{PPMI}(x;y) = \text{PPMI}(y;x), inherited from PMI. Unlike PMI, which ranges from -\infty to a finite upper bound (approximately -\log_2 \max(P(x), P(y))), PPMI is bounded below by 0 but retains the same upper bound; this truncation shifts the value distribution toward higher associations, enhancing sparsity in low-association pairs but preserving the relative ordering of positive ones. Computation of PPMI involves first estimating the PMI matrix from empirical probabilities derived from a corpus—typically via maximum likelihood, P(x,y) = \#(x,y)/N where N is the total count—then applying the max operation element-wise. PPMI was introduced in 2007 within natural language processing to improve collocation extraction by filtering unreliable negative associations from co-occurrence data.^[9] Further refinements, such as normalized PMI, build on this.

Normalized PMI

Normalized pointwise mutual information (NPMI) addresses limitations in the unbounded range of standard PMI by scaling the scores to a fixed interval, facilitating direct comparisons of association strengths across different event pairs. Defined as [ \text{NPMI}(x;y) = \frac{\text{PMI}(x;y)}{-\log_2 P(x,y)},

NPMI produces values between -1 and 1, where 1 denotes perfect co-occurrence (events $x$ and $y$ always appear together), 0 indicates statistical independence, and -1 signifies mutual exclusion (events never co-occur). This normalization divides the PMI by the pointwise joint entropy term $-\log_2 P(x,y)$, which grows with the rarity of the joint event. The primary motivation for NPMI arises from PMI's sensitivity to event probabilities: rare events yield disproportionately high PMI scores, complicating cross-pair comparisons and rankings, particularly in sparse data like collocation extraction. By normalizing against the joint probability's logarithm, NPMI yields a coefficient akin to correlation measures, offering consistent interpretability regardless of frequency—low-frequency pairs with strong associations receive appropriately scaled positive values without inflation. This makes NPMI particularly useful in applications requiring robust handling of sparsity, such as natural language processing tasks. NPMI retains PMI's symmetry, so $\text{NPMI}(x;y) = \text{NPMI}(y;x)$, and better accommodates sparse distributions than positive PMI variants by preserving negative values for dissociations while bounding extremes. Positive NPMI values signal attraction or co-occurrence beyond chance, zero denotes independence, and negative values highlight repulsion or avoidance. Unlike PPMI, which discards negatives and can distort rankings in low-count scenarios, NPMI maintains informational balance. To illustrate, consider a corpus of 1000 documents where events $x$ and $y$ represent word occurrences, with marginal probabilities $P(x) = 0.1$ and $P(y) = 0.1$. The following contingency table shows joint counts for different association levels: | Scenario | $x \land y$ | $x \land \neg y$ | $\neg x \land y$ | Total | |----------------|---------------|--------------------|--------------------|-------| | Independence | 10 | 90 | 90 | 1000 | | Association | 50 | 50 | 50 | 1000 | | Perfect co-occurrence | 100 | 0 | 0 | 1000 | For independence, $P(x,y) = 0.01$, so $\text{PMI}(x;y) = 0$ and $\text{NPMI}(x;y) = 0$. For association, $P(x,y) = 0.05$, yielding $\text{PMI}(x;y) \approx 2.32$ and $\text{NPMI}(x;y) \approx 0.54$. For perfect co-occurrence, $P(x,y) = 0.1$, resulting in $\text{NPMI}(x;y) = 1$. This bounded output highlights NPMI's utility in scaling associations uniformly.[](https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf) In the context of binary variables, NPMI provides a bounded measure of dependence similar to the Pearson correlation coefficient, with asymptotic behavior approaching equivalence in large samples where co-occurrence patterns stabilize. A common hybrid approach thresholds NPMI at zero, akin to positive PMI, to focus solely on positive [associations](/page/Association) while retaining the normalization benefits. ### PMI^k Family The PMI^k family refers to a parameterized extension of pointwise mutual information (PMI) designed to mitigate the bias of standard PMI toward rare co-occurrences by adjusting the influence of joint probability through a parameter $k > 0$. Introduced as a set of [heuristic](/page/Heuristic) [association](/page/Association) measures for [terminology](/page/Terminology) and [collocation](/page/Collocation) extraction, the family modifies the PMI formula to incorporate powers of the joint probability $P(x,y)$. The definition is given by

\text{PMI}^k(x;y) = \log_2 \left[ \frac{P(x,y)^k}{P(x) P(y)} \right] = \text{PMI}(x;y) + (k-1) \log_2 P(x,y),

where the second equality expresses it in terms of standard PMI. This formulation allows tuning the measure's sensitivity to [frequency](/page/Frequency): standard PMI ($k=1$) strongly favors rare but highly associated events due to the logarithmic amplification of deviations from [independence](/page/Independence), whereas values of $k > 1$ reduce this bias by downweighting low joint probabilities more severely relative to frequent ones, effectively boosting scores for common co-occurrences. Conversely, $k < 1$ (e.g., $k=0.5$) amplifies the preference for rarity by raising low $P(x,y)$ values closer to 1 before division. Key properties include reduction to standard PMI at $k=1$, and as $k \to 0^+$, the measure approaches $\log_2 P(x,y)$, emphasizing raw joint frequency over independence. The family is symmetric in $x$ and $y$ for all $k$. It provides a tunable parameter for domain-specific adjustments, such as balancing rarity in sparse corpora versus frequency in dense ones, and has been analyzed as a monotonic transformation related to geometric means of frequencies for $k=2$. Since the [1990s](/page/1990s), the [PMI](/page/PMI)^k family has been applied in adjusted association measures for imbalanced data in [natural language processing](/page/Natural_language_processing) tasks like [collocation](/page/Collocation) extraction and concept mining, often with $k=2$ or $k=3$ to handle [frequency](/page/Frequency) biases in large-scale corpora. [Normalization](/page/Normalization) techniques may be applied afterward to bound the values.[](https://theses.fr/1994PA077353) ### Specific Correlation Specific correlation generalizes pointwise mutual information to multiple variables, providing a measure of association for n events in a single realization. Defined as

SI(x_1, \dots, x_n) = \log_2 \frac{P(x_1, \dots, x_n)}{\prod_{i=1}^n P(x_i)},

this quantity captures the logarithmic ratio of the joint probability to the product of the marginal probabilities for a particular outcome $(x_1, \dots, x_n)$. A value of SI greater than zero indicates that the variables exhibit synergistic dependence in that instance, exceeding what would be expected under independence; a value of zero signifies exact independence for those specific values; and negative values suggest repulsive or anti-correlated behavior. The measure is symmetric across all variables, as the formula remains unchanged under any permutation of the $x_i$. Under the chain rule of probability, [SI](/page/Si) can decompose additively into sequential conditional terms, facilitating analysis of higher-order interactions. For the bivariate case ($n=2$), SI reduces directly to the standard pointwise mutual information, $PMI(x,y) = \log_2 \frac{P(x,y)}{P(x)P(y)}$. To illustrate, consider a trivariate example from medical diagnostics involving three symptoms—fever ($S_1$), [cough](/page/Cough) ($S_2$), and [fatigue](/page/Fatigue) ($S_3$)—observed in a specific [patient](/page/Patient) instance. Suppose the marginal probabilities are $P(S_1) = 0.2$, $P(S_2) = 0.3$, and $P(S_3) = 0.25$, while the joint probability $P(S_1, S_2, S_3) = 0.03$. Then,

SI(S_1, S_2, S_3) = \log_2 \frac{0.03}{0.2 \times 0.3 \times 0.25} = \log_2 \frac{0.03}{0.015} = \log_2 2 = 1 \text{ bit},

\text{PMI}(A, B) = \log_2 \left( \frac{P(A \cap B)}{P(A) P(B)} \right) = \log_2 \left( \frac{0.15}{0.4 \times 0.3} \right) = \log_2 (1.25) \approx 0.322

This positive value indicates co-habitation beyond chance, suggesting potential ecological interactions like symbiosis. (adapted for illustrative computation using standard PMI formula) PMI variants have been applied in quantum information theory to provide pointwise security bounds analogous to mutual information measures, aiding privacy amplification in quantum key distribution protocols.[29]

References

[1]
[PDF] Word Association Norms, Mutual Information, and Lexicography
Word Association Norms, Mutual Information, and Lexicography. Kenneth Ward Church. Bell Laboratories. Murray Hill, N.J.. Patrick Hanks. CoLlins Publishers.
[2]
[PDF] Pointwise Mutual Information (PMI)
The pointwise mutual information between a target word w and a context word c (Church and Hanks 1989, Church and Hanks 1990) is then defined as: PMI(w,c) = log2.
[3]
[PDF] Normalized (Pointwise) Mutual Information in Collocation Extraction
Mutual information (MI) is a measure of the information overlap between two random variables. In this section I will review definitions and properties of MI. A ...
[4]
[PDF] A Mathematical Theory of Communication
379–423, 623–656, July, October, 1948. A Mathematical Theory of Communication. By C. E. SHANNON. INTRODUCTION. THE recent development of various methods of ...Missing: mutual | Show results with:mutual
[5]
[PDF] On Log-Likelihood-Ratios and the Significance of Rare Events
We address the issue of judging the significance of rare events as it typically arises in statistical natural- language processing. We first define a general ap ...
[6]
Word Association Norms, Mutual Information, and Lexicography
Word Association Norms, Mutual Information, and Lexicography. Kenneth Ward Church, Patrick Hanks. church-hanks-1990-word PDF
[7]
22.11. Information Theory - Dive into Deep Learning
We can calculate self information as shown below. Before that ... mutual information, which is often referred to as the pointwise mutual information:.
[8]
On the Properties and Estimation of Pointwise Mutual Information ...
Oct 16, 2023 · In this paper, we analytically describe the profiles of multivariate normal distributions and introduce a novel family of distributions, Bend and Mix Models.
[9]
https://link.springer.com/article/10.3758/BF03193020
[10]
Approche mixte pour l'extraction de terminologie : statistique lexicale ...
Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques ; Auteur / Autrice : Béatrice Daille ; Direction : Laurence ...
[11]
Microsoft Concept Graph: Mining Semantic Concepts for Short Text ...
Jun 1, 2019 · denotes the pointwise mutual information, which ... Therefore, we further propose PMIk and Graph Traversal measures to tackle this problem.<|control11|><|separator|>
[12]
[PDF] A re-examination of lexical association measures - NUS Computing
PMI reduces the performance of PMI. As such, assigning more weight to ( ) f xy does not im- prove the AP performance of PMI. AM. VPCs LVCs Mixed. Best [M6] PMIk.
[13]
Elements of Information Theory - ResearchGate
Elements of Information Theory. October 2001. DOI:10.1002/0471200611 ... MI can also be interpreted through decomposition into pointwise mutual information ...
[14]
Mutual Information Fundamentals | Eyal Kazin - Towards AI
Apr 28, 2025 · This discrepancy motivates the definition of an additional quantity: the PMI, which was first introduced by Robert Fano in 1961, 13 years after ...
[15]
Mutual information - Wikipedia
MI is the expected value of the pointwise mutual information (PMI). The quantity was defined and analyzed by Claude Shannon in his landmark paper "A ...Pointwise mutual information · Conditional mutual information · Conditional entropy
[16]
Introducing a differentiable measure of pointwise shared information
Mar 25, 2021 · Thus the additivity of (mutual) information terms is incompatible with an additive separation of informative and misinformative exclusions ...
[17]
[PDF] Beyond Normal: On the Evaluation of Mutual Information Estimators
Jun 19, 2023 · critic is not able to fully learn the pointwise mutual information (see Appendix E). ... Elements of Information Theory. Wiley-Interscience ...
[18]
[PDF] On the Properties and Estimation of Pointwise Mutual Information ...
The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables.
[19]
None
### Summary of Pointwise Mutual Information (PMI) and Its Relation to Mutual Information (MI)
[20]
[PDF] Automatic Evaluation of Topic Coherence - ACL Anthology
Evaluating topic coherence is a component of the larger question of what are good topics, what char- acteristics of a document collection make it more amenable ...
[21]
PMINR: Pointwise Mutual Information-Based Network Regression
Oct 14, 2020 · We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome.
[22]
PMINR: Pointwise Mutual Information-Based Network Regression
Oct 15, 2020 · Materials and Methods. The PMI of two node variables X and Y can be defined as follows (Church and Hanks, 1990):. P ⁢ M ⁢ I ⁢ ( x , y ) = log ...
[23]
Profiling and analysis of chemical compounds using pointwise ...
Jan 10, 2021 · Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize ...
[24]
[PDF] SNA Applications I - SNAP: Stanford
Networks, Information & Social Capital (formerly titled 'Network. Structure ... □ Weighted by pointwise mutual information. PMI = log (P(a,b) / P(a)P(b)).
[25]
Using Social Media Listening to Characterize the Flare Lexicon in ...
Flare-associated clinical concepts (co-occurrence > 100 and PMI2 > 3) included SYMPTOMS (pain, fatigue, dryness of eye, xerostomia, arthralgia, stress) and BODY ...
[26]
CEHR-GPT: Generating Electronic Health Records with ... - arXiv
Feb 6, 2024 · The degree of co-occurrence is calculated with prevalence, which evaluates the frequency of concept co-occurrence, and Pointwise Mutual ...
[27]
Weighted Average Pointwise Mutual Information for Feature ...
We propose a variant of mutual information, called Weighted Average Pointwise Mutual Information (WAPMI) that avoids both problems.
[28]
https://arxiv.org/pdf/2510.18998
[29]
[PDF] Privacy Amplification in Quantum Key Distribution: Pointwise Bound ...
bound on the actual, or pointwise mutual information as the pointwise privacy amplifcation bound, or PPA. In carrying out privacy amplification we must ...