Fact-checked by Grok 2 weeks ago

Coreference

Coreference is a linguistic phenomenon in which two or more expressions within a text or discourse, such as noun phrases, pronouns, or proper names, refer to the same real-world entity, person, object, or event, thereby establishing referential identity across mentions.^[1]^[2] This relation is essential for discourse cohesion, allowing speakers or writers to avoid redundancy by reusing references to previously introduced entities, and it operates on a continuum from full identity to partial overlaps influenced by context and pragmatics.^[1] In linguistic theory, coreference encompasses various subtypes, including anaphora, where a subsequent expression (the anaphor) links back to an earlier antecedent, and cataphora, a forward-pointing reference resolved by later context.^[2] Additional forms involve discourse deixis, which references prior segments of the text, and predicative relations, where expressions attribute properties to the same entity.^[1] Theoretical models, such as those drawing on mental space theory, explain shifts in coreference through cognitive operations like specification (adding details), refocusing (changing perspective), and neutralization (reducing specificity), highlighting its dynamic and context-dependent nature.^[1] Within natural language processing (NLP), coreference resolution denotes the computational task of automatically detecting and clustering these coreferential mentions to enhance text understanding.^[3] Originating in the 1960s with early heuristic systems, the field advanced through annotated corpora from initiatives like the Message Understanding Conferences (MUC) and Automatic Content Extraction (ACE) in the 1990s, enabling supervised machine learning approaches such as mention-pair models and ranking algorithms.^[3] Key challenges persist, including ambiguity in non-referential phrases, handling singletons (entities with only one mention, comprising 60-86% of cases), and domain-specific issues like negation or temporal variations in specialized texts.^[1]^[2] Applications span machine translation, question answering, summarization, and clinical information extraction, where accurate resolution improves system performance despite ongoing needs for robust evaluation metrics like B³ for precision and recall.^[3]^[2]

Fundamentals

Definition

Coreference is a linguistic relation in which two or more noun phrases (NPs) or other expressions within a discourse refer to the same real-world entity, abstract concept, or discourse referent.^[4] This relation enables efficient communication by linking expressions that share referential identity, such as a proper name and a subsequent pronoun.^[5] For instance, in the sentence "John entered the room. He sat down," the pronoun "He" corefers with the proper name "John," both denoting the same individual.^[4] Central to coreference are referring expressions, which include proper names (e.g., "John"), definite NPs (e.g., "the man"), and pronouns (e.g., "he"), that point to entities in the discourse model.^[5] These expressions are distinguished by their roles: an antecedent is the initial expression that introduces or establishes the entity, while subsequent coreferents refer back to it.^[6] Key terminology includes "mention," which denotes any surface-level expression (typically an NP) that refers to an entity; "entity," the underlying object, person, or concept being referenced; and "coreferent chain," a sequence of two or more mentions linked as referring to the identical entity.^[5] Coreference plays a vital role in maintaining discourse coherence, allowing speakers and writers to avoid redundancy by substituting full NPs with shorter forms like pronouns, thereby facilitating smoother text flow and reader comprehension.^[6] This mechanism supports the construction of a shared mental model of the discourse, where entities persist across sentences.^[7] Coreference encompasses subtypes such as anaphora (backward reference) and cataphora (forward reference), though these are explored in greater detail elsewhere.^[4]

Historical Development

In the early 20th century, structural linguistics, pioneered by Ferdinand de Saussure, shifted focus to synchronic analysis of language as a system of signs, laying foundational ideas on how signifiers relate to signified entities and influencing later explorations of referential relations in discourse. The mid-20th century marked a pivotal advancement through generative grammar, as Noam Chomsky integrated coreference into syntactic theory during the 1950s and 1960s. In works like Syntactic Structures (1957) and Aspects of the Theory of Syntax (1965), Chomsky linked coreference to phrase structure rules and transformational operations, introducing referential indices to track co-referential elements across sentences and distinguishing them from bound variables.^[8] This approach treated coreference as a derivational phenomenon, assuming syntactic identity often aligned with semantic equivalence, though challenges from quantifiers soon highlighted limitations.^[8] From the 1970s to the 1980s, attention turned to discourse analysis and pragmatics, with scholars like Edward L. Keenan examining definite descriptions and their presuppositional role in establishing coreference. Keenan's 1971 work examined two kinds of presupposition in natural language, including those carried by definite descriptions regarding existence and uniqueness. This period solidified coreference's place in semantic theory, as seen in Peter Sells' 1985 Lectures on Contemporary Syntactic Theories, which synthesized anaphora across frameworks like government-binding theory and generalized phrase structure grammar. By the 1990s, coreference transitioned into computational linguistics with the rise of natural language processing (NLP), shifting from theoretical linguistics to applied systems for resolving references in text. Early supervised methods, such as those using machine learning on annotated corpora, emerged around 1995, enabling automated clustering of mentions and marking a key evolution toward practical discourse understanding.^[9]

Types and Examples

Anaphoric Coreference

Anaphoric coreference, commonly referred to as anaphora, occurs when a linguistic expression, such as a pronoun or definite noun phrase, follows an antecedent in the discourse and derives its interpretation from that prior element.^[10] In this forward-referring relation, the anaphor points back to the antecedent to establish identity of reference, enabling efficient communication by avoiding repetition of full descriptions.^[11] For instance, in the sentence "The dog chased the cat. It was fast," the pronoun "it" serves as the anaphor referring to "the dog" as its antecedent, though ambiguity may arise if "it" could plausibly refer to "the cat" in context, highlighting the need for pragmatic resolution to disambiguate coreferent from non-coreferent interpretations.^[10] Linguistic constraints on anaphora are primarily governed by binding theory, a framework in generative grammar that regulates the structural relations between antecedents and anaphors.^[12] Principle A of binding theory stipulates that anaphors, such as reflexives (e.g., "himself") or reciprocals (e.g., "each other"), must be bound to a c-commanding antecedent within a local domain, typically the same clause, ensuring the antecedent structurally dominates the anaphor. For example, in "John saw himself," "himself" is bound to "John" because "John" c-commands it from a higher structural position, whereas "*Himself saw John" violates Principle A due to the lack of such binding.^[13] These principles prevent illicit coreference, such as in cases where a pronoun cannot bind to a non-c-commanding element, distinguishing grammatical anaphora from ungrammatical attempts at coreference. In discourse, anaphora plays a crucial role in maintaining cohesion by linking sentences and reducing redundancy, allowing speakers to track entities across a narrative without restating them explicitly. This mechanism fosters textual unity, as seen in extended narratives where repeated pronouns or definite descriptions signal continuity of reference, enhancing readability and flow.^[14] Anaphora thus contributes to the overall coherence of discourse by presupposing shared knowledge of antecedents, which listeners or readers recover to interpret subsequent expressions.^[15] Anaphora varies between surface and deep forms, distinguished by whether the anaphoric process relies on syntactic structure or deeper semantic interpretation.^[16] Surface anaphora, such as simple pronominal reference, requires a linguistically overt antecedent and is controlled by syntactic rules, whereas deep anaphora involves pragmatic or interpretive reconstruction without strict syntactic parallelism.^[17] A classic example of deep anaphora is verb phrase (VP) ellipsis, as in "John likes apples, and Mary does too," where the elided VP under "does" is interpreted as coreferent with "likes apples" through semantic recovery rather than surface form.^[18] This distinction underscores how anaphora can operate at different levels of linguistic processing to achieve coreference.^[19] In contrast to cataphora, which involves backward reference to a forthcoming antecedent, anaphora is inherently forward-directed in its textual progression.^[10]

Cataphoric Coreference

Cataphoric coreference, or cataphora, occurs when a linguistic expression, such as a pronoun, precedes and refers forward to a subsequent antecedent that provides its full interpretation.^[20] This contrasts with the more prevalent anaphoric coreference, where reference points backward to a prior expression.^[21] Cataphora is less common than anaphora due to the cognitive processing demands it imposes on language users, as resolving the reference requires holding the initial expression in memory until the antecedent appears.^[22] Syntactically, it is constrained and typically occurs in specific structural contexts, such as preposed subordinate clauses or lists, where the forward reference can be anticipated within a bounded scope.^[23] For instance, in relative clauses introduced early, cataphora facilitates integration without violating locality principles.^[24] A classic example is the sentence "When he arrived, John was tired," where the pronoun "he" cataphorically refers to the later noun "John."^[25] In more complex sentences, such as those involving embedded clauses, cataphora can lead to ambiguous readings if multiple potential antecedents follow, as in "If she wins the award, Mary will celebrate with her team," where "she" anticipates "Mary" but could momentarily suggest another entity.^[26] In discourse, cataphora serves anticipatory functions by building suspense or structuring information flow, particularly in stylistic writing like literature or technical descriptions, where it signals upcoming details to engage the reader.^[27] For example, novelists use it to foreshadow character introductions, enhancing narrative cohesion.^[28] Identifying cataphora presents challenges due to its higher ambiguity relative to anaphora, as the lack of an immediate antecedent increases the risk of misresolution during comprehension.^[22] Corpus analyses of literary texts, such as those in the Anaphoric Treebank, reveal that cataphoric instances are rarer and often context-dependent, complicating annotation and analysis in extended narratives like novels.^[29]

Theoretical Aspects

Relation to Bound Variables

In formal semantics, coreference refers to the relation where two expressions denote the same entity in the discourse, establishing referential identity independent of syntactic structure.^[30] In contrast, bound variables involve scope-dependent assignments where a pronoun or anaphor is interpreted as a variable governed by a quantifier or lambda operator, as seen in frameworks like lambda calculus and predicate logic.^[31] This distinction is crucial because bound variables do not imply coreference; instead, they allow the pronoun's interpretation to covary with the quantifier's scope without referring to a fixed individual.^[8] A key example illustrates this difference: in the sentence "John loves his wife," the pronoun "his" is coreferential with "John," denoting the same specific individual.^[30] However, in "Every man loves his wife," "his" is not coreferential with "every man" but functions as a bound variable, interpreted such that each man in the domain loves his own wife, with the pronoun's reference varying across instances.^[31] This bound reading arises from the quantifier's scope, avoiding a unified referential link.^[8] This theoretical framework originates in Montague grammar, developed in the 1970s, which integrates natural language syntax with intensional logic to handle quantification and pronouns.^[32] Montague's approach treats pronouns systematically as bound variables when under quantifier scope, using translations that embed them within lambda abstractions or predicate logic operators, without invoking coreference.^[31] For instance, the logical form of "Every man loves his wife" can be represented as:

\forall x \, [\text{man}(x) \to \exists y \, [\text{wife}(y,x) \land \text{love}(x,y)]]

Here, the pronoun "his" corresponds to the bound variable x, scoped by the universal quantifier, demonstrating how binding operates without referential identity to a single entity.^[32] This representation highlights the scope dependency, where the existential quantifier for the wife is subordinated to the universal over men. The implications of this separation are significant for semantic interpretation, as conflating coreference with binding can lead to errors in resolving scope ambiguities or quantifier interactions.^[8] By distinguishing referential links from operator scope, Montague grammar ensures compositional meanings that accurately capture variable interpretations in quantified contexts, influencing broader formal semantic theories.^[31]

Coreference in Formal Semantics

In formal semantics, coreference is modeled through dynamic frameworks that represent how utterances update a shared discourse context, tracking entities and their relations across sentences. Discourse Representation Theory (DRT), introduced by Hans Kamp in 1981, provides a foundational approach by constructing Discourse Representation Structures (DRS) to handle anaphoric relations.^[33] In DRT, coreferring expressions link to the same discourse referent, a variable representing an entity in the mental model of the discourse. This allows for systematic resolution of pronouns and definite descriptions by embedding conditions that equate or subordinate referents.^[33] DRT employs a box notation to visually represent these structures, where discourse referents are listed above a horizontal line and conditions below it. For instance, the discourse "John entered the room. He sat down" begins with a DRS for the first sentence introducing referent x with the condition John(x) \land entered(x, room):

\begin{array}{c} x \\ \hline John(x) \land entered(x, room) \end{array}

The second sentence updates this DRS by adding referent y with conditions he(y) and y = x, along with sat(y), yielding:

\begin{array}{c} x, y \\ \hline John(x) \land entered(x, room) \land he(y) \land y = x \land sat(y) \end{array}

This equivalence y = x captures coreference, ensuring the pronoun "he" refers back to John.^[33] A related framework, File Change Semantics developed by Irene Heim in 1982, treats the discourse context as a "file" of indexed entries for entities, where coreferents update the same file card rather than introducing new ones. In this system, anaphors succeed only if their descriptive content matches an existing entry, promoting incremental context updates. Advanced mechanisms in these theories address unresolved coreference via accommodation, as elaborated in Heim's 1983 work on presupposition projection. Accommodation permits the addition of presupposed referents to the context when they are not explicitly introduced, enabling felicitous coreference in underspecified discourses—for example, inferring a shared acquaintance in "His wife is waiting" without prior mention. This process ensures semantic coherence by globally or locally adjusting the discourse file to satisfy referential presuppositions. Cross-linguistically, formal semantic models like DRT adapt to variations in pronoun systems, such as in null-subject languages like Spanish, where pro-drop allows omitted subjects to corefer via implicit discourse referents without overt forms. In Spanish, null subjects preferentially resume topic-continuous antecedents, modeled by restricting referent accessibility in the DRS to high-salience positions, unlike in non-pro-drop languages requiring explicit pronouns for the same links.^[34] This variation highlights how morphological options influence coreference resolution within unified dynamic frameworks.^[34]

Computational Approaches

Coreference Resolution Task

Coreference resolution is a fundamental task in natural language processing (NLP) that involves identifying all mentions—linguistic expressions such as noun phrases—in a document that refer to the same real-world entity and partitioning them into coreference chains.^[35] For example, in the sentence "John entered the room. He sat down," the chain would group "John" and "he" as referring to the same individual. The goal is to produce an output that accurately clusters these mentions while handling variations in form, such as pronouns, definite descriptions, or proper names.^[36] The task typically breaks into subtasks, beginning with mention detection, which identifies candidate spans in the text that could serve as referring expressions. This step is crucial as a prerequisite, since coreference linking operates on detected mentions, and errors here propagate to subsequent clustering.^[37] Following detection, the core linking subtask groups compatible mentions into equivalence classes, effectively resolving which referents are identical without linking to external knowledge bases in the standard setup.^[38] Key datasets have standardized evaluation and training for the task. OntoNotes, introduced in the mid-2000s, provides a large-scale, multilingual corpus with coreference annotations integrated alongside other layers like parsing and named entity recognition, covering genres such as news and conversational text.^[36] The CoNLL-2011 shared task established benchmarks using OntoNotes data, focusing on unrestricted coreference across English, Arabic, and Chinese, and promoting consistent annotation schemes that include both nominal and pronominal mentions.^[38] These resources shifted the field toward end-to-end systems that jointly handle mention detection and resolution on diverse, real-world texts. Evaluation metrics assess the quality of predicted coreference chains against gold-standard annotations, emphasizing precision and recall in linking mentions. The Message Understanding Conference (MUC) metric, introduced in 1995, measures link-based performance by counting the number of correctly merged or split coreference sets, rewarding systems that avoid spurious merges while penalizing missed connections.^[39] B-Cubed, proposed in 1998, is a mention-centric approach that computes precision and recall for each individual mention based on the proportion of correctly clustered co-referents in its entity, then averages across all mentions for an overall F1 score. The Constrained Entity-Alignment F-measure (CEAF), from 2005, aligns predicted and gold chains via a bipartite matching to optimize entity overlap, providing a more robust measure that accounts for both mention boundaries and partition structure.^[35] Shared tasks like CoNLL-2011 often report the average of these three F1 scores as a composite metric to balance their perspectives.^[38] Challenges in the task setup arise from inherent linguistic ambiguities, particularly in long-distance coreference where mentions are separated by intervening text, making contextual dependencies harder to capture.^[40] Nested entities, such as a pronoun embedded within a larger noun phrase (e.g., "the man's mother" where "man" and "mother" may corefer differently), complicate mention detection and partitioning by introducing overlapping spans that defy simple linear clustering. These issues underscore the need for models robust to structural complexity in annotation guidelines like those in OntoNotes.^[36]

Algorithms and Models

Early methods for coreference resolution relied on rule-based systems that leveraged syntactic parsing to identify antecedents, particularly for pronouns. A seminal example is Hobbs' algorithm, introduced in 1978, which operates on surface parse trees by traversing the syntactic structure from the pronoun upward and then downward to find potential noun phrase antecedents based on recency and grammatical constraints.^[41] This approach, developed in the 1970s and refined through the 1980s, achieved notable success in pronoun resolution without requiring deep semantic analysis, serving as a baseline for decades due to its simplicity and efficiency. In the 2000s, statistical machine learning approaches shifted the paradigm toward data-driven models, using classifiers to predict coreference links between noun phrases. A foundational work is Soon et al. (2001), who employed maximum entropy classifiers trained on annotated corpora like MUC-7, incorporating features such as agreement in number, gender, and proper names, along with distance metrics; this system achieved approximately 60.4% F1 on the MUC-7 benchmark.^[42] These methods improved over rule-based systems by learning from examples, often outperforming them on diverse texts, though they were limited by pairwise decisions that required post-processing for full chains.^[42] The advent of neural networks in the late 2010s introduced end-to-end models that jointly predict mentions and coreference clusters without explicit pairwise classification. Lee et al. (2017) proposed the first such system, using a recurrent neural network encoder to score potential antecedent spans for each mention, trained to maximize the marginal likelihood of gold clusters; on the OntoNotes benchmark, it attained 67.2% average F1, surpassing prior statistical methods by eliminating reliance on hand-crafted features or parsers.^[43] Building on this, BERT-based models from 2018 onward integrated transformer architectures for contextual embeddings, enabling more nuanced span prediction. For instance, Joshi et al. (2019) fine-tuned BERT as a coreference head atop span representations, yielding 76.9% F1 on OntoNotes and highlighting BERT's ability to capture long-range dependencies.^[44] Advanced techniques have further refined these neural foundations, incorporating graph-based partitioning to cluster mentions into chains and specialized embeddings for better span handling. Denis and Baldridge (2010) introduced hypergraph partitioning for end-to-end resolution, where mentions form vertices and edges encode compatibility scores, allowing global optimization of clusters in a single step.^[45] Integration of contextual embeddings like ELMo (in extensions of Lee et al., 2018, reaching 70.4% F1 on OntoNotes) and SpanBERT (Joshi et al., 2020, achieving 79.6% F1) has enhanced representation of variable-length spans, with SpanBERT's pretraining on masked spans proving particularly effective for coreference by emphasizing contiguous text units.^[46] Performance trends reflect this evolution, with systems advancing from around 60% F1 in the 2000s on benchmarks like MUC to over 80% in the 2020s on OntoNotes, driven by neural architectures and pretrained language models that better model discourse context; recent models like Maverick (Xia et al., 2024) achieve 83.6% F1 using efficient DeBERTa-based pipelines, with ongoing advances incorporating large language models as of 2025.^[47]^[48]

Applications and Challenges

Practical Applications

Coreference resolution plays a pivotal role in information extraction by linking multiple mentions of the same entity, thereby enhancing the accuracy of entity recognition and relation extraction across diverse texts. In search engines, it improves query understanding by resolving ambiguous references, such as pronouns in user inputs, allowing systems to better interpret intent and retrieve relevant results. In machine translation, coreference resolution ensures coherence by correctly linking pronouns and entities across languages, particularly in neural MT systems where grammatical gender or structure differences can lead to errors. By identifying and preserving coreferential relationships in source texts, it enables translators to maintain entity consistency in outputs, such as resolving "it" to the appropriate antecedent in translations from English to gendered languages like French or German. This integration has been shown to boost translation quality in document-level models.^[49] Dialogue systems benefit from coreference resolution through improved tracking of referents in multi-turn conversations, enabling chatbots to respond contextually to user mentions. Coreference helps resolve ellipses and anaphora in spoken dialogues, such as linking "that one" to a previously mentioned entity, which enhances natural interaction and reduces misunderstanding in voice assistants.^[50] Coreference chains support text summarization by consolidating scattered mentions of entities into unified representations, allowing for more concise and coherent abstracts while preserving key identities. This is particularly useful in abstractive summarization, where unresolved references could lead to fragmented or repetitive outputs.^[51] In domain-specific applications, coreference resolution aids legal document analysis by linking entities and events in contracts, facilitating entity linking and compliance checks through datasets like LegalCore. Similarly, in biomedical named entity recognition, it connects mentions of proteins, genes, and diseases across scientific texts, improving extraction from literature such as abstracts in the GENIA corpus.^[52]^[53]

Open Challenges

Despite significant progress in coreference resolution, theoretical gaps persist, particularly in handling implicit coreference that requires inferring connections from world knowledge beyond explicit textual cues.^[54] These implicit relationships challenge formal models, as they demand integration of external commonsense reasoning, which current semantic frameworks struggle to incorporate systematically.^[55] Similarly, multimodality introduces unresolved issues in aligning coreferential elements across text and images, where fine-grained cross-modal associations and inherent ambiguities in visual descriptions hinder accurate resolution.^[56] For instance, resolving a pronoun in text to an entity depicted in an accompanying image often fails due to limited annotated multimodal datasets and the complexity of encoding inter-modal dependencies. Computationally, coreference resolution faces challenges from imbalanced training data, leading to lower performance on less common entity types or discourse patterns. Long-distance references pose another limitation, as models degrade in accuracy when antecedents and anaphors span large textual distances, complicating dependency modeling over extended contexts.^[57] In low-resource languages, performance drops further owing to insufficient labeled data and linguistic disparities that generic models cannot adapt to effectively.^[58] Cross-lingual coreference faces substantial hurdles from variations in linguistic structures, notably in pro-drop languages like Spanish or Hindi, where omitted pronouns necessitate implicit inference not captured by annotation schemes designed for non-pro-drop languages.^[59] This is exacerbated by the scarcity of diverse, multilingual datasets that adequately represent such patterns, limiting the development of robust transfer learning approaches. Ethical concerns arise from biases in coreference models, which can amplify societal stereotypes, particularly in gender pronoun resolution where training data imbalances lead to preferential linking of occupational roles to male entities.^[60] For example, models may erroneously corefer "engineer" with "he" more often than "she," perpetuating gender disparities in downstream applications like text summarization.^[61] Looking ahead, integrating large language models (LLMs) such as the GPT series offers promising directions for zero-shot coreference resolution, enabling inference without task-specific fine-tuning, though post-2020 advances reveal persistent issues like hallucination and suboptimal handling of complex chains. These developments underscore the need for hybrid approaches that combine LLM capabilities with specialized modules to address lingering gaps in robustness and inclusivity. Historical undercoverage of cataphora, where references precede antecedents, exemplifies a broader theoretical oversight in discourse analysis.^[62]

References

[1]
[PDF] Coreference: Theory, Annotation, Resolution and Evaluation - UB
Coreference relations, as commonly defined, occur between linguistic expressions that refer to the same person, object or event.
[2]
[PDF] Coreference Resolution
Coreference, anaphors, cataphors. • Coreference is when two mentions refer to the same entity in the world. • The relation of anaphora is when a term. (anaphor) ...
[3]
Coreference resolution: A review of general methodologies and ...
Coreference resolution is the task of determining linguistic expressions that refer to the same real-world entity in natural language.
[4]
[PDF] Coreference Resolution and Entity Linking - Stanford University
We introduce the four types of referring expressions (definite and indefinite NPs, pronouns, and names), describe how these are used to evoke and access ...
[5]
[PDF] A Machine Learning Approach to Coreference Resolution of Noun ...
Coreference resolution is the process of determining whether two expressions in nat- ural language refer to the same entity in the world. It is an important ...
[6]
Coreference and Lexical Repetition: Mechanisms of Discourse ...
Two linguistic expressions are said to be coreferential if they refer to the same semantic entity; the first expression (the antecedent) introduces the entity ...
[7]
[PDF] DISCOURSE REFERENTS - ACL Anthology
Discourse referents are established when an indefinite noun phrase justifies a later pronoun or definite noun phrase reference. This is a linguistic problem.
[8]
[PDF] An outline of the history of linguistics - CSULB
Saussure championed the idea that language is a system of arbitrary signs, and his conceptualisation of the sign (see Figure 1.1, p.6) has been highly ...Missing: anaphora coreference
[9]
[PDF] Lecture 7. Anaphora and its History. Reference, Coreference, and ...
Apr 4, 2012 · Some of the strongest linguistic arguments for limiting sentence-grammar to dealing with bound variable anaphora came from Tanya Reinhart (1983a ...
[10]
[PDF] Lecture 11: Introduction to Pragmatics. Formal Semantics and ...
Apr 28, 2011 · 3.1 Presuppositions of definite descriptions. ... presupposition (Keenan 1971, Levinson 1983), but arguments against considering it a ...
[11]
Supervised Noun Phrase Coreference Research: The First Fifteen ...
The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade ...
[12]
Anaphora - Stanford Encyclopedia of Philosophy
Feb 24, 2004 · Anaphora is sometimes characterized as the phenomenon whereby the interpretation of an occurrence of one expression depends on the interpretation of an ...Unproblematic Anaphora · Recent Theories of... · Anaphora in Sign Language
[13]
[PDF] Lecture 10. Introduction to Issues in Anaphora
Apr 21, 2011 · Binding theory is the branch of linguistic theory that explains the behavior of sentence- internal anaphora, which is labelled 'bound anaphora' ...
[14]
[PDF] A Note on the Binding Theory - Scholars at Harvard
Chomsky assumes that PRO is a pronominal anaphor because it is on a par with both pronouns and anaphors. According to. (la) and (lb), if PRO has a governing ...
[15]
[PDF] What is the Right Binding Theory? - University of Delaware
The “traditional” Binding Theory of Chomsky (1981): • Anaphors (reflexives and reciprocals) need a local c-commanding antecedent. • Pronouns may not have a ...
[16]
Handouts on Brown & Yule, Halliday & Hasan - MIT Media Lab
Anaphora is a type of co-reference. COHESION is the internal continuity or network of points of continuity within a text. As Halliday & Hasan say, " The ...Missing: coreference | Show results with:coreference
[17]
[PDF] Cohesion - conceptualizations and systemic features of English and ...
Within endophoric reference, Halliday & Hasan (1976) differentiate further between anaphoric reference (to preceding text) and cataphoric reference (to ...Missing: coreference | Show results with:coreference
[18]
[PDF] Deep and Surface Anaphora
Summary of Arguments In this article we investigate this difference between syntactically and pragmatically controlled anaphora, and show that anaphoric ...
[19]
[PDF] Yet another look at deep and surface anaphora
Merchant 2001; (41)-(42) represent P-stranding languages (as seen in the (b) controls), while (43)-(45) illustrate non-P-stranding languages.
[20]
[PDF] Sameness, Ellipsis and Anaphora - UC Berkeley Linguistics
4 In this example VPE (VP ellipsis) is acceptable, in addition to the other VP anaphora forms. VPE is not acceptable for example (1), presumably because of ...<|control11|><|separator|>
[21]
[PDF] Deep and surface properties of the Dutch dat doen -anaphor - CRISSP
Jan 23, 2023 · the DDA) has properties of both a deep and a surface anaphor, according to the distinction made by Hankamer and Sag (1976).
[22]
What is a Cataphora | Glossary of Linguistic Terms - SIL Global
Definition: Cataphora is the coreference of one expression with another expression which follows it. The following expression provides the information ...
[23]
On the Motivations and Pragmatic Functions of Cataphora in Natural ...
May 27, 2023 · 1. Introduction. Cataphora, also termed “backward anaphora”, is “the process or result of a linguistic unit referring forward to. another unit” ...
[24]
(PDF) Processing Differences for Anaphoric and Cataphoric Pronouns
Aug 6, 2025 · The results showed that anaphoric pronouns were resolved more rapidly than cataphoric pronouns when a co-referent interpretation was possible, ...
[25]
When is cataphoric reference recognised? - ScienceDirect.com
When a pronoun appears in a preposed subordinate clause (as in, Before she began to sing, Susan stood up), incremental interpretation is suspended.
[26]
Structural constraints on cataphora - Language Log
Jan 6, 2011 · If he refers to another, then (d), of course, is a fine sentence. Linguists can supply the requisite labels of cataphora, exophora, etc., at ...
[27]
Definition and Examples of Cataphora in English Grammar
Jun 19, 2019 · In English grammar, cataphora is the use of a pronoun or other linguistic unit to refer ahead to another word in a sentence (ie, the referent).Missing: Crystal | Show results with:Crystal
[28]
(PDF) The Influence of Morphological Information on Cataphoric ...
Oct 9, 2025 · potential coreference relations between the cataphoric pronoun and a following noun phrase,. such as NP2 assignment, are computed simultaneously ...
[29]
[PDF] On the Motivations and Pragmatic Functions of Cataphora in Natural ...
May 27, 2023 · Abstract. This paper examines the motivations and pragmatic functions of cataphora in natural conversations. It is found.
[30]
[PDF] Cataphora, backgrounding and accessibility in discourse
Cataphora is when pronouns (he, she, it, they) precede their referents, and this paper examines discourse factors and accessibility related to it.
[31]
[PDF] The Value of an Annotated Corpus in The Investigation of Anaphoric ...
This thesis investigates English personal pronoun reference in particular focusing on cataphora (backwards anaphora), using the Anaphoric Treebank. (AT), which ...
[32]
[PDF] The Innateness of Binding and Coreference
The intuition behind (20) is that if the structure could allow bound variable anaphora, coreference is preferred only if it is motivated-in other words, only if ...
[33]
Montague Semantics - Stanford Encyclopedia of Philosophy
Nov 7, 2011 · Montague semantics is a theory of natural language semantics and of its relation with syntax. It was originally developed by the logician Richard Montague.<|control11|><|separator|>
[34]
[PDF] The Proper Treatment of Quantification in Ordinary English
Also, by S1 and S2, every man 2 PT; and hence, by S14, every man loves a woman such that she loves him 2 Pt. ... those arguments are understood to be the ...
[35]
Discourse Representation Theory
May 22, 2007 · 1. Introduction. This article concerns Discourse Representation Theory narrowly defined as work in the tradition descending from Kamp (1981a).Introduction · Donkey pronouns · Basic DRT · Applications and extensions
[36]
https://aclanthology.org/N06-2015/
[37]
On Coreference Resolution Performance Metrics - ACL Anthology
Xiaoqiang Luo. 2005. On Coreference Resolution Performance Metrics. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods ...
[38]
OntoNotes: The 90% Solution - ACL Anthology
Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. OntoNotes: The 90% Solution. In Proceedings of the Human Language ...Missing: original | Show results with:original
[39]
[PDF] A Joint Framework for Coreference Resolution and Mention Head ...
Abstract. In coreference resolution, a fair amount of research treats mention detection as a pre- processed step and focuses on developing.
[40]
CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in ...
2011. CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes. In Proceedings of the Fifteenth Conference on Computational Natural Language ...Missing: paper | Show results with:paper
[41]
A Model-Theoretic Coreference Scoring Scheme - ACL Anthology
Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. 1995. A Model-Theoretic Coreference Scoring Scheme. In Sixth Message ...Missing: evaluation metric
[42]
[PDF] Review of coreference resolution in English and Persian - arXiv
Coreference resolution (CR) is a critical task in NLP, defined as "the problem of identifying all noun phrases or in-text mentions that refer to the same real- ...
[43]
A Machine Learning Approach to Coreference Resolution of Noun ...
Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. 2001. A Machine Learning Approach to Coreference Resolution of Noun Phrases. Computational Linguistics, ...Missing: paper | Show results with:paper
[44]
End-to-end Neural Coreference Resolution - ACL Anthology
We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser.
[45]
BERT for Coreference Resolution: Baselines and Analysis
Abstract. We apply BERT to coreference resolution, achieving a new state of the art on the GAP (+11.5 F1) and OntoNotes (+3.9 F1) benchmarks.
[46]
https://arxiv.org/abs/1802.05365
[47]
Coreference resolution - NLP-progress
Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.
[48]
[PDF] State-of-the-art NLP Approaches to Coreference Resolution
Aug 2, 2009 · Applications and future directions. We present an overview of NLP applications which have been shown to profit from coreference in- formation, ...
[49]
Evaluating and Improving the Coreference Capabilities of Machine ...
We develop an evaluation methodology that derives coreference clusters from MT output and evaluates them without requiring annotations in the target language.
[50]
CREAD: Combined Resolution of Ellipses and Anaphora in Dialogues
In this work, we propose a novel joint learning framework of modeling coreference resolution and query rewriting for complex, multi-turn dialogue understanding.
[51]
Online Coreference Resolution for Dialogue Processing: Improving ...
This paper suggests a direction of coreference resolution for online decoding on actively generated input such as dialogue.
[52]
[PDF] Using Coreference Chains for Text Summarization - ACL Anthology
Coreference resolution is carried out by at- tempting to merge each newly added instance with instances already present in the discourse model. The basic ...
[53]
A Dataset for Event Coreference Resolution in Legal Documents
In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference ...
[54]
Coreference Resolution for the Biomedical Domain: A Survey
Abstract. Issues with coreference resolution are one of the most frequently mentioned challenges for information extraction from the biomedical literature.
[55]
[PDF] A brief survey on recent advances in coreference resolution
The task of resolving repeated objects in natural languages is known as coreference resolu- tion, and it is an important part of modern natural language ...Missing: seminal | Show results with:seminal
[56]
[PDF] Using an Implicit Method for Coreference Resolution and Ellipsis ...
From a technical standpoint, it is difficult mainly because it requires natural language understanding, which is a challenging task because it requires world ...
[57]
Semi-supervised multimodal coreference resolution in image ...
This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large ...
[58]
A brief survey on recent advances in coreference resolution
Predicting coreference connections and identifying mentions/triggers are the major challenges in coreference resolution, because these implicit relationships ...
[59]
Coreference Resolution Based on High-Dimensional Multi-Scale ...
This paper first compares the impact of commonly used methods to improve the global information collection ability of the model on the BERT encoding ...Coreference Resolution Based... · 2. Related Work · 2.2. Span-Bert
[60]
(PDF) Coreference Resolution: Toward End-to-End and Cross ...
Jan 28, 2020 · ... challenges of less-resourced languages. Finally, we discussed the main challenges and open issues faced by coreference resolution systems.
[61]
https://arxiv.org/abs/1804.06876
[62]
Toward Gender-Inclusive Coreference Resolution: An Analysis of ...
Nov 3, 2021 · Gender bias in NLP has been considered more broadly than just in coreference resolution, including, for instance, natural language inference ...
[63]
Gender Bias in Coreference Resolution: Evaluation and Debiasing ...
Apr 18, 2018 · We introduce a new benchmark, WinoBias, for coreference resolution focused on gender bias. Our corpus contains Winograd-schema style sentences.Missing: models NLP
[64]
[PDF] Cataphora detection and resolution: Advancements and Challenges ...
Cataphora is when a pronoun or noun phrase points forward to a yet-to-be-mentioned entity, the reverse of anaphora.