Fact-checked by Grok 2 weeks ago

Co-occurrence

Co-occurrence refers to the joint presence or simultaneous appearance of two or more entities, such as events, items, words, or , within a specific context, dataset, or environment, often analyzed to uncover patterns or associations beyond random chance. This concept is multidisciplinary, spanning fields like , statistics, (NLP), , and , where it serves as a foundational tool for understanding relationships and structures in . In , co-occurrence primarily describes the tendency of words or lexical units to appear together in texts, forming collocations—predictable word combinations like "strong tea" that carry idiomatic or restricted meanings—and multi-word expressions with syntactic and semantic properties. These patterns help reveal grammatical constraints and semantic relations, such as verb-preposition pairings, and are essential for building dictionaries and understanding language use. In statistics and data mining, co-occurrence is quantified through probabilistic measures like (), which compares the joint probability of entities occurring together against their independent probabilities, or , which assesses deviation from expected ; these metrics identify significant associations in datasets, such as frequent itemsets in transaction records. Algorithms like Apriori leverage co-occurrence to mine association rules and sequential patterns, enabling applications in recommendation systems and predictive modeling. In , co-occurrence analysis involves tracking words within syntactic windows, n-grams (contiguous sequences like bigrams), or skip-grams (with intervening words), forming the basis for vector-space models and word embeddings that capture from textual data. This approach underpins tasks like topic modeling by representing lexical relations statistically. In , co-occurrence denotes the shared presence of in habitats or communities, analyzed probabilistically to detect non-random patterns influenced by environmental factors, phylogeny, or interactions, aiding in assessment and planning. Tools like co-occurrence networks model these relationships as graphs, revealing community structures and predicting species distributions.

Definition and Fundamentals

Core Definition

Co-occurrence refers to the simultaneous or proximal appearance of two or more entities—such as words, events, or —within a or , where their joint presence occurs at a frequency higher than would be expected under conditions of statistical . This concept underpins analyses across disciplines by identifying patterns of association that suggest underlying relationships, rather than mere chance. A key distinction in co-occurrence lies between strict adjacency, where entities must be immediate neighbors (e.g., consecutive words in a ), and broader proximity, which allows for separation within a defined or (e.g., entities sharing the same document or ). For instance, in , the terms "strong" and "" exhibit co-occurrence as they frequently appear in proximity, forming a meaningful beyond random pairing. Similarly, in , multiple sharing the same sample or demonstrate co-occurrence when their overlap surpasses independent distribution expectations. At its probabilistic foundation, co-occurrence is assessed by contrasting observed joint frequencies—how often entities actually appear together—with expected frequencies derived from their individual occurrence rates under an . This comparison reveals non-random dependencies, providing a basis for inferring potential interactions or structures in diverse datasets.

Co-occurrence in

Word Co-occurrence Patterns

Word co-occurrence patterns refer to the observable tendencies of words to appear together in linguistic contexts more frequently than expected by chance, providing insights into the structural and relational aspects of language. This concept, foundational to , was articulated by J.R. Firth, who posited that the meaning and usage of a word can be understood through its habitual associations with other words in discourse. In , these patterns emerge from large-scale analyses of text collections, revealing how lexical items cluster to form the building blocks of sentences and utterances. In textual corpora, co-occurrence patterns manifest in several forms, including adjacent pairings known as bigrams, where words appear directly next to each other, such as in phrases like "." Window-based patterns extend this to non-adjacent positions within a fixed span, for instance, up to five words to the left or right of a target word (5L-5R), capturing broader contextual links like modifiers or related nouns. Syntactic dependency patterns, derived from parse trees, focus on rather than mere proximity, such as a governing its subject or object, enabling the extraction of structured multi-word units. These approaches highlight how co-occurrence operates at different levels of linguistic organization, from linear sequences to hierarchical structures. Illustrative examples include fixed phrases like "," where the words consistently pair due to their conventionalized linkage, contrasting with more variable patterns such as "coffee" co-occurring with qualifiers like "hot" or "cold" depending on context. Co-occurrence restrictions further delineate these patterns, prohibiting incompatible grammatical elements, such as mismatched agreements in languages with inflectional (e.g., a masculine with a feminine in ). Such restrictions underscore the non-random nature of word pairings, enforcing syntactic harmony. In language analysis, these patterns play a crucial role in identifying idioms, where words form non-compositional units like "," detectable through their high-frequency, restricted co-occurrences in corpora. Frequency distributions of co-occurrences also reveal underlying rules, such as verb-argument preferences or structures, aiding in the empirical modeling of syntactic constraints without relying solely on . Association measures can quantify the strength of these patterns, though their computational details are addressed in statistical frameworks.

Collocations and Semantic Associations

In , collocations are defined as habitual pairings of words that co-occur more frequently than would be expected by chance, often carrying idiomatic or restricted meanings that deviate from literal combinations. For instance, native speakers prefer "blond hair" over the semantically equivalent but unnatural "yellow hair," illustrating how collocations reflect conventional use rather than arbitrary synonymy. This phenomenon underscores the role of co-occurrence in shaping idiomatic expressions, where the whole acquires a cohesive meaning beyond its parts. Collocations are broadly categorized into lexical and grammatical types. Lexical collocations involve , such as adjective-noun pairs like "strong tea" or verb-noun combinations like "commit a ," which enhance and idiomaticity in . Grammatical collocations, by contrast, pair a with a , such as a with a preposition in "account for" or a with an in "decide to," where the grammatical element restricts the semantic interpretation. These distinctions highlight how co-occurrence fosters predictable yet non-compositional structures in . A foundational perspective on collocations comes from John Sinclair's idiom principle, which posits that language production relies heavily on semi-preconstructed phrases stored as holistic units, rather than purely creative word selection. Introduced in his 1991 work, this principle explains why collocations dominate textual output, influencing everything from everyday speech to formal writing by prioritizing recurrent patterns over open-choice combinations. Sinclair's insights, drawn from analysis, emphasize that such habitual pairings encode deeper semantic and pragmatic associations. Building on this, co-occurrence enables inferences about semantic proximity through the distributional hypothesis, which states that words appearing in similar contexts tend to share similar meanings. This idea, originating from and popularized by J.R. Firth's dictum "you shall know a word by the company it keeps," underpins modern models in (). In models like , co-occurrence patterns within text corpora are transformed into dense vector representations, where proximity in the captures semantic relationships, such as linking "king" and "queen" through contextual similarities. These embeddings allow computational systems to infer nuanced meanings from word associations. In applications, collocations and their semantic associations enhance performance in tasks like , where recognizing idiomatic pairings prevents literal translations that distort meaning, as seen in systems that prioritize collocational dictionaries for phrase-level alignment. Similarly, in , collocations reveal subtle emotional nuances; for example, "bitter disappointment" signals stronger negativity than isolated terms, improving accuracy in opinion mining from reviews or . These uses demonstrate how co-occurrence-derived insights enable more context-aware technologies.

Statistical Frameworks

Co-occurrence Matrices

A is a fundamental in statistical that captures the joint frequencies of pairs of items, such as words or , within a . It is typically represented as an n \times n , where n is the number of unique items, the rows and columns index these items, and each cell (i, j) contains the count of occurrences where item i and item j appear together according to a predefined , such as proximity in a sequence or shared membership in a like a . This construction enables the quantification of pairwise relationships as a prerequisite for further . The matrix can be symmetric or asymmetric depending on the nature of the co-occurrence definition. In symmetric matrices, the entry (i, j) equals (j, i), treating co-occurrence as undirected and suitable for cases where order does not matter, such as words appearing in the same regardless of . Asymmetric matrices, conversely, distinguish directionality, with (i, j) counting instances where j appears in the of i (e.g., within a fixed window to the right), which is common in sequential data like text. These matrices are often built from linguistic corpora as input, where contexts are defined by boundaries or sliding windows. Variants of co-occurrence matrices address different analytical needs and computational constraints. Binary matrices use 0 or 1 to indicate the presence or absence of co-occurrence for any pair, simplifying for detection without regard to . Weighted matrices, in contrast, record the exact of joint occurrences, providing richer information for frequency-based modeling; for instance, in large-scale constructions, weights may incorporate distance-based decay functions to emphasize nearby co-occurrences. For efficiency with expansive datasets, where vocabularies can exceed tens of thousands of items leading to mostly zero entries, sparse representations store only non-zero values, reducing memory usage from O(n^2) to the actual number of observed pairs. To illustrate, consider a minimal example with two words, "apple" and "fruit", across a small set of three sentences: (1) "I eat an apple.", (2) "The fruit is apple.", (3) "Bananas are fruit.". Assuming document-level co-occurrence (words co-occur if in the same sentence), the raw 2x2 is: \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} Here, the diagonal shows individual frequencies ("apple" in 2 sentences, "fruit" in 2), and off-diagonals show joint occurrences (1 shared sentence). might proceed by dividing off-diagonal entries by the row marginals to yield conditional probabilities: for "apple", P(\text{fruit} \mid \text{apple}) = 1/2 = 0.5. This step facilitates comparative analysis across items.

Measures of Association

Measures of association provide quantitative tools to evaluate whether observed co-occurrences between events, such as words or , deviate significantly from expectations under , thereby identifying non-random patterns. These measures are computed from probability estimates or tables derived from co-occurrence , enabling the distinction between chance joint occurrences and meaningful dependencies. Common approaches include information-theoretic metrics, statistical tests of , and overlap-based similarity coefficients, each suited to different aspects of association strength and reliability. Pointwise Mutual Information (PMI) is a foundational measure originating from , adapted for co-occurrence analysis to capture the informativeness of specific pairwise associations. Defined as \text{PMI}(x, y) = \log_2 \left( \frac{P(x, y)}{P(x) P(y)} \right), where P(x, y) is the joint probability of events x and y, and P(x) and P(y) are their marginal probabilities, PMI quantifies how much more likely x and y co-occur than if they were . A positive value indicates attraction (co-occurrence exceeds chance), zero suggests , and negative values denote repulsion (less co-occurrence than expected). The derivation stems from the concept, where the average PMI over a distribution yields the expected mutual information between random variables; for application, it directly assesses individual pairs without averaging. To compute PMI, probabilities are estimated from frequencies in a : P(x, y) = f(x, y) / N, P(x) = f(x) / N, and P(y) = f(y) / N, with f denoting counts and N the total observations. Challenges arise when P(x, y) = 0 or marginals are sparse, leading to undefined or negative infinity values due to the logarithm. To address this, smoothed variants are employed, such as adding a small constant \alpha (e.g., 1 for Laplace smoothing) to all frequency counts before probability estimation, yielding \text{PMI}_\alpha(x, y) = \log_2 \left( \frac{f(x, y) + \alpha}{(f(x) + \alpha) (f(y) + \alpha) / N} \right). Alternatively, Positive PMI (PPMI) thresholds negative values to zero: \text{PPMI}(x, y) = \max(0, \text{PMI}(x, y)), preserving only attractive associations while mitigating sparsity effects. These adjustments prevent extreme penalties for unseen pairs and stabilize estimates in finite datasets. PMI excels at emphasizing rare but highly specific co-occurrences, as the logarithmic scale amplifies deviations from independence for low-frequency events; however, it biases toward infrequent items, overvaluing sparse data prone to sampling noise, and performs poorly with imbalanced frequencies where common events dominate. For illustration, consider a of 1,000,000 words where "ice" appears 100 times, "cream" 50 times, and the "ice cream" 40 times. Then P(\text{ice}) = 100 / 1,000,000 = 10^{-4}, P(\text{cream}) = 5 \times 10^{-5}, and P(\text{ice}, \text{cream}) = 4 \times 10^{-5}, yielding \text{PMI}(\text{ice}, \text{cream}) = \log_2 (4 \times 10^{-5} / (10^{-4} \times 5 \times 10^{-5})) = \log_2(8000) \approx 13, indicating strong positive association. If smoothed with \alpha = 1, the value adjusts minimally to approximately 12.96, demonstrating robustness to minor count perturbations. The for assesses whether co-occurrences in a significantly differ from random expectations, providing a for testing. For a 2x2 table with observed frequencies O_{ij} (e.g., co-occurrence and non-co-occurrence for two events), the test statistic is \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, where expected values E_{ij} = (row_i \times col_j) / N, and summation is over all cells. Under the of , \chi^2 follows a with degrees of freedom (r-1)(c-1), allowing significance determination (e.g., p < 0.05 rejects ). Introduced by , this test evaluates overall table deviation rather than pairwise strength, making it ideal for validating associations across multiple categories. Its strengths include asymptotic normality for large samples, enabling reliable inference, but limitations involve sensitivity to small expected values (requiring Yates' for 2x2 tables) and assumption of fixed margins, which may not hold in sequential data like text. The Dice coefficient offers a straightforward, normalized measure of overlap for binary co-occurrence representations, defined as \text{Dice}(X, Y) = \frac{2 |X \cap Y|}{|X| + |Y|}, where |X \cap Y| is the size of the intersection (shared occurrences), and |X|, |Y| are the individual set sizes. Ranging from 0 (no overlap) to 1 (identical sets), it emphasizes shared elements twice as heavily as total coverage, derived from set theory to gauge similarity without penalizing for absent co-occurrences. Originally proposed for ecological associations, it is computed directly from binary vectors or document-term incidences in co-occurrence contexts. The coefficient's simplicity and bounded range facilitate comparison, with strengths in handling sparse data symmetrically; however, it ignores non-overlaps and frequencies, underperforming on count-based data where abundance matters, and assumes binary presence rather than strength of association. These measures are often applied to co-occurrence matrices as inputs, transforming raw frequencies into interpretable association scores for further analysis.

Applications Across Disciplines

Ecology and Biology

In , species co-occurrence refers to the simultaneous presence of multiple within the same or site, often revealing underlying interactions such as , predation, or that shape community structure. To assess whether these patterns arise randomly or from biotic processes, researchers employ null model analyses, which generate randomized community matrices while preserving observed marginal totals (e.g., site richness or species frequencies) and compare them to empirical data for . Seminal work by (1975) examined co-occurrence among 141 land species across 76 islands in the , identifying non-random "forbidden" combinations and "checkerboard" distributions that implied competitive exclusion as a key assembly mechanism. A common method for quantifying species overlap in co-occurrence studies is the Jaccard similarity index, defined as the ratio of shared between two sites to the total unique species across them, providing a presence-absence measure of community resemblance. This index has been instrumental in analyses, including extensions of Diamond's assemblage research, where it helped evaluate how environmental gradients and dispersal limitations influence similarity among communities. For instance, in studies of avian distributions, Jaccard values highlighted greater overlap in similar habitats, supporting inferences of interaction-driven patterns beyond neutral processes. In , co-occurrence extends to genetic scales through gene co-expression networks, where simultaneous expression of across conditions suggests functional associations, such as shared regulatory pathways or protein interactions. The advent of technology in the early facilitated large-scale profiling, enabling the construction of these networks from expression data in model organisms like . Eisen et al. (1998) pioneered approaches on datasets, grouping co-expressed (e.g., those involved in regulation) and linking them to biological processes, laying foundational methods for inferring gene interactions that proliferated in throughout the decade. Association measures from statistics, such as correlation coefficients, are routinely adapted for ecological and genomic data to evaluate co-occurrence strength while accounting for sampling variability.

Network and Social Analysis

In network and social analysis, co-occurrence refers to the joint presence or interaction of entities—such as nodes, edges, or —within relational structures like , revealing underlying patterns of connectivity and behavior. This approach is particularly valuable in multiplex , where entities participate in multiple types of relationships simultaneously, allowing analysts to study how edges co-occur across layers to infer dependencies or predict links. For instance, in , edge co-occurrence in multiplex networks captures scenarios where connections in one layer (e.g., ties) align with those in another (e.g., collaborations), enabling the discovery of rules that enhance accuracy. Similarly, node co-occurrence in temporal graphs examines how vertices appear together over time, modeling dynamic evolutions such as changing alliances in systems through embeddings that preserve temporal proximity. Social applications of co-occurrence have long informed and by mapping relational ties derived from joint activities. In , co-authorship networks treat shared publication credits as co-occurrences, highlighting clusters of collaboration that drive knowledge diffusion, as pioneered in Diana Crane's 1972 of "invisible colleges"—informal groups of whose joint work accelerates in specialized fields. Crane's studies from the late and early demonstrated how these networks form around productive cores, with co-authorship frequency indicating influence and paradigm shifts in disciplines like biochemistry. In , co-attendance serves as a for co-occurrence in affiliation networks, where actors connected through shared participation in gatherings (e.g., meetings or protests) form ties of varying strength based on overlap frequency, revealing social structures like subgroups or coalitions. This two-mode projection from events to actors quantifies interpersonal bonds, as seen in analyses of organizational memberships where repeated co-attendance strengthens relational density. Practical examples illustrate co-occurrence's role in modern social data analysis, such as detecting influence via Twitter user co-mentions, where frequent tagging of accounts signals propagation of ideas and authority within communities. Studies show that mention-based co-occurrence outperforms follower counts for identifying true influencers, as it captures active engagement rather than passive connections, with empirical data from millions of tweets confirming higher retweet cascades from co-mentioned users. For community detection, metrics like modularity optimize partitions in co-occurrence networks by measuring intra-group edge density relative to random expectations, a seminal method introduced by Newman in 2006 that has been applied to reveal modular structures in relational data such as collaboration graphs. Co-occurrence matrices, akin to adjacency representations, briefly underpin these analyses by encoding pairwise joint appearances. Overall, these techniques prioritize relational patterns over isolated attributes, providing scalable insights into network dynamics.

References

  1. [1]
    co-occurrence - APA Dictionary of Psychology
    Apr 19, 2018 · n. a relation between two or more phenomena (objects or events) such that they tend to occur together. For example, thunder co-occurs with ...
  2. [2]
    [PDF] How to Define Co-occurrence in a Multidisciplinary Context? - Agritrop
    The co-occurrence concept has different definitions de- pending on the research domain (i.e. linguistics, natu- ral language processing (NLP), computer science, ...Missing: ecology | Show results with:ecology
  3. [3]
    (PDF) The Role of Co-Occurrence Statistics in Developing Semantic ...
    Sep 16, 2020 · The results indicated that co-occurrence both independently contributes to relations between concepts, and contributes to relations between ...
  4. [4]
    Co-occurrences and species distribution models show the ...
    In this study, we describe an approach to rank the relative importance of plant species within a community based on their abundance and co-occurrence patterns.Co-Occurrences And Species... · 1. Introduction · 3. Data<|control11|><|separator|>
  5. [5]
  6. [6]
  7. [7]
    [PDF] A corpus-driven approach to formulaic language in English
    In its most basic form, corpus-driven analysis assumes only the exis- tence of words; co-occurrence patterns among words, discovered from the corpus analysis, ...<|separator|>
  8. [8]
    Survey of Word Co-occurrence Measures for Collocation Detection
    In this article, we interpret N as the total number of bigrams in a corpus. The bigrams are obtained following the paths in trees of syntactic dependencies ...<|control11|><|separator|>
  9. [9]
    Collocations in Corpus‐Based Language Learning Research ...
    Feb 20, 2017 · The collocation window approach looks for co-occurrences within a specified window span such as 5L 5R (i.e., five words to the left of the word ...
  10. [10]
    [PDF] Extraction of Multi-Word Collocations Using Syntactic Bigram ...
    This paper presents a method for extract- ing multi-word collocations (MWCs) from text corpora, which is based on the previous ex- traction of syntactically ...
  11. [11]
  12. [12]
    Three (or four) levels of word cooccurence restriction - ScienceDirect
    Restrictions on the cooccurence of words have in the past been treated at two levels, syntactic and semantic (making due allowance for border-line cases).
  13. [13]
    [PDF] Constructions are Patterns and so are Fixed Expressions
    In this lexicalist framework, both syntactic patterns and lexical patterns are seen to arise from the combinatoric properties of words, including idiom words. 2 ...
  14. [14]
    [PDF] 43. Corpora and grammar - Stefan Th. Gries
    In corpus linguistics, the term is typically taken to refer to the co-occurrence of words. HSK J Corpus Linguistics. MILES, Release 18.02x on Tuesday January 22 ...
  15. [15]
    The distributional hypothesis - ResearchGate
    Aug 5, 2025 · This approach, which finds its foundation in the Distributional Hypothesis (Firth, 1957;Harris, 1954; Sahlgren, 2008) , operates on the ...
  16. [16]
    [PDF] JOURNAL OF LANGUAGE AND LINGUISTIC STUDIES - ERIC
    Grammatical collocations include an adjective, a verb or noun, plus an infinitive, a preposition or clause. The patterns of a phrasal grammatical collocations ...<|separator|>
  17. [17]
    Efficient Estimation of Word Representations in Vector Space - arXiv
    Jan 16, 2013 · We propose two novel model architectures for computing continuous vector representations of words from very large data sets.
  18. [18]
    [PDF] Collocation extraction for machine translation - ACL Anthology
    There is a clear consensus in the literature that collocation data is very useful for parsing, word sense disambiguation, machine translation, text generation, ...Missing: sentiment | Show results with:sentiment
  19. [19]
    A Study of Collocations in Sentiment Analysis - IEEE Xplore
    The present paper investigates collocations in positive and negative sentiments and their usefulness in sentiment analysis. We considered Amazon Products Review ...
  20. [20]
    [PDF] A Case Study in Computing Word Co-occurrence Matrices with ...
    This section presents two MapReduce algorithms for building word co-occurrence matrices for large corpora. The goal is to illustrate how the prob- lem can be ...Missing: seminal | Show results with:seminal
  21. [21]
    [PDF] GloVe: Global Vectors for Word Representation - Stanford NLP Group
    Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co- occurrence matrix, rather than on the en-.Missing: seminal | Show results with:seminal
  22. [22]
    Occurrence Matrix - an overview | ScienceDirect Topics
    A co-occurrence matrix is defined as an n × n matrix, where the (i, j)th element represents the proportion of times observational units i and j were clustered ...Construction, Types, and... · Applications of Occurrence...
  23. [23]
    Co-occurrence Matrices and Their Uses in NLP - Baeldung
    Feb 28, 2025 · Co-occurrence matrices are a fundamental concept in NLP, and we can use them to represent the relationship between elements in a text corpus.
  24. [24]
    [PDF] Word Association Norms, Mutual Information, and Lexicography
    Word Association Norms, Mutual Information, and Lexicography. Kenneth Ward Church. Bell Laboratories. Murray Hill, N.J.. Patrick Hanks. CoLlins Publishers.
  25. [25]
    [PDF] Pointwise Mutual Information (PMI)
    The pointwise mutual information between a target word w and a context word c (Church and Hanks 1989, Church and Hanks 1990) is then defined as: PMI(w,c) = log2.Missing: paper | Show results with:paper
  26. [26]
    Cluster analysis and display of genome-wide expression patterns | PNAS
    ### Summary of Cluster Analysis for Gene Expression (PNAS, 1998)
  27. [27]
    Introduction to social network methods: Chapter 17: Two-mode ...
    The resulting one-mode matrices of actors-by-actors and events-by-events are now valued matrices indicating the strength of the tie based on co-occurrence. Any ...
  28. [28]
    Multiplex Graph Association Rules for Link Prediction - arXiv
    Aug 19, 2020 · Abstract:Multiplex networks allow us to study a variety of complex systems where nodes connect to each other in multiple ways, for example ...
  29. [29]
    Fast Multiplex Graph Association Rules for Link Prediction - arXiv
    Nov 22, 2022 · Multiplex networks allow us to study a variety of complex systems where nodes connect to each other in multiple ways, for example friend, family ...
  30. [30]
    [PDF] Node Embedding over Temporal Graphs - IJCAI
    In this work, we present a method for node em- bedding in temporal graphs. We propose an algo- rithm that learns the evolution of a temporal graph's.
  31. [31]
    Invisible colleges : diffusion of knowledge in scientific communities
    Jun 28, 2019 · Invisible colleges : diffusion of knowledge in scientific communities. by: Crane, Diana, 1933-. Publication date: 1972. Topics: Science -- ...Missing: co- authorship
  32. [32]
    Coauthorship networks and patterns of scientific collaboration - PNAS
    Apr 6, 2004 · In this paper, we look in detail at three particular networks of scientific collaborations and describe some of the patterns they reveal.
  33. [33]
    The backbone of bipartite projections: Inferring relationships from co ...
    The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors · Figures and Tables ...
  34. [34]
    [PDF] Measuring User Influence in Twitter: The Million Follower Fallacy
    We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions.
  35. [35]
    (PDF) Everyone's an Influencer: Quantifying Influence on Twitter
    PDF | In this paper we investigate the attributes and relative influence of 1.6M Twitter users by tracking 74 million diffusion events that took place.
  36. [36]
    Modularity and community structure in networks - PMC
    One highly effective approach is the optimization of the quality function known as “modularity” over the possible divisions of a network.