Fact-checked by Grok 2 weeks ago

Named-entity recognition

Named-entity recognition (NER), also referred to as entity identification or entity extraction, is a core subtask of information extraction in natural language processing (NLP) that identifies and classifies specific spans of text—known as named entities—into predefined categories such as persons, organizations, locations, times, dates, and monetary values.^[1] The concept of named entities originated in the mid-1990s during the Message Understanding Conference (MUC) evaluations, where it was formalized as a task to detect and categorize rigid designators in text, marking the beginning of systematic research in this area.^[1]^[2] NER serves as a foundational step for numerous NLP applications, enabling the transformation of unstructured text into structured data that supports tasks like question answering, information retrieval, relation extraction, coreference resolution, and topic modeling. For instance, in question answering systems, NER helps pinpoint entities relevant to user queries, while in information retrieval, it improves search accuracy by indexing entity types. Beyond general domains, NER has domain-specific variants, such as biomedical NER for identifying genes, proteins, and diseases, or legal NER for extracting case names and statutes, highlighting its adaptability across fields like healthcare, finance, and journalism.^[3]^[4] Historically, early NER systems from the 1990s depended on rule-based methods using hand-crafted patterns and gazetteers, followed by statistical approaches like hidden Markov models (HMMs) and maximum entropy models in the early 2000s.^[2] The adoption of machine learning, particularly conditional random fields (CRFs), marked a significant advancement around the CoNLL-2003 shared task, achieving higher accuracy through feature engineering.^[2] In recent years, deep learning has revolutionized NER, with recurrent neural networks (RNNs) like long short-term memory (LSTM) units combined with CRFs outperforming prior methods, and transformer-based architectures such as BERT and its variants setting new benchmarks by leveraging contextual embeddings. Large language models (LLMs) like GPT series have further pushed boundaries, enabling few-shot and zero-shot NER in low-resource scenarios, though challenges persist in handling nested entities, ambiguity, and multilingual texts. Ongoing research focuses on improving robustness across languages and domains, with hybrid models integrating graph neural networks and reinforcement learning to address these issues.^[5]

Fundamentals

Definition and Scope

Named-entity recognition (NER), also known as named-entity identification, is a subtask of information extraction within natural language processing that aims to locate and classify named entities in unstructured text into predefined categories such as persons, organizations, locations, and temporal or numerical expressions.^[6]^[7] The term was coined during the Sixth Message Understanding Conference (MUC-6) in 1995, where it was formalized as a core component for extracting structured information from free-form text like news articles.^[1] This process transforms raw textual data into a more analyzable form by tagging entities with their types, enabling further semantic understanding without requiring full sentence parsing.^[7] NER differs from related natural language processing tasks like part-of-speech tagging, which assigns broad grammatical categories (e.g., noun, verb) to individual words regardless of semantic content, whereas NER focuses on semantically specific entity identification and multi-word spans.^[7] Similarly, it is distinct from coreference resolution, which resolves references to the same entity across different mentions in a text (e.g., linking "the president" to a prior named person), rather than merely detecting and categorizing the entities themselves.^[7] These distinctions highlight NER's emphasis on entity-level semantics over syntactic structure or discourse linkage. The basic process of NER typically begins with tokenization, which segments the input text into words or subword units, followed by entity boundary detection to identify the start and end positions of potential entity spans, and concludes with classification to assign each detected entity to a predefined category.^[7] This sequential approach ensures precise localization and typing, often leveraging contextual clues to disambiguate ambiguous cases. The scope of NER is generally limited to predefined entity types, as established in early frameworks like MUC-6, which contrasts with open-domain extraction methods that aim to identify entities and relations without fixed categories or schemas.^[6]^[7]^[8] NER's reliance on such predefined sets facilitates consistent evaluation and integration into structured knowledge bases but may overlook novel or domain-specific entities outside the schema.

Entity Types and Categories

Named entity recognition systems typically identify a core set of standard categories derived from early benchmarks like the Message Understanding Conference (MUC-7), which defined entities under ENAMEX for proper names including persons (PER) (e.g., "John Smith"), organizations (ORG) (e.g., "Microsoft Corporation"), and locations (LOC) (e.g., "New York"); NUMEX for numerical expressions such as money (MNY) (e.g., "$100 million") and percentages (PERC) (e.g., "25%"); and TIMEX for temporal expressions like dates (DAT) (e.g., "July 4, 1776") and times (TIM) (e.g., "3:00 PM"). These categories emphasize referential and quantitative entities central to information extraction in general-domain text.^[7] Subsequent benchmarks introduced hierarchical schemes to capture nested structures, where entities can contain sub-entities of different types. In the Automatic Content Extraction (ACE) program, entities are organized into seven main types—person, organization, location, facility, weapon, vehicle, and geo-political entity (GPE)—with subtypes and nesting, such as a location nested within an organization (e.g., "headquarters in Paris" where "Paris" is a LOC within the ORG). Similarly, the OntoNotes 5.0 corpus employs a multi-level ontology with 18 core entity types, including person, organization, GPE, location, facility, norp (nationalities, religious or political groups), event, work of art, law, language, date, time, money, percent, quantity, ordinal, cardinal, and product, allowing for hierarchical annotations like a date nested within an event description. These schemes enable recognition of complex, overlapping entities beyond flat structures, improving coverage for real-world texts. Domain-specific NER adapts these categories to specialized vocabularies. In biomedical texts, common types include genes/proteins (e.g., "BRCA1"), diseases (e.g., "Alzheimer's disease"), chemicals/drugs (e.g., "aspirin"), cell types/lines (e.g., "HeLa cells"), and DNA/RNA sequences, as seen in datasets like JNLPBA and BC5CDR, which focus on molecular and clinical entities for tasks such as literature mining.^[9] In legal documents, entity types extend to statutes (e.g., "Section 230 of the Communications Decency Act"), courts (e.g., "Supreme Court"), petitioners/respondents (e.g., party names in cases), provisions, precedents, judges, and witnesses, tailored to extract structured information from judgments and contracts.^[10] The categorization in NER has evolved from flat structures in early systems like MUC, which treated entities as non-overlapping spans, to nested and hierarchical representations in ACE and OntoNotes, accommodating real-world complexities such as embedded entities and multi-type overlaps.^[11] This progression reflects a shift toward more expressive models capable of handling ambiguity and granularity, influencing evaluation by requiring metrics that account for nesting depth and type hierarchies.^[12]

Challenges

Inherent Difficulties

Named-entity recognition (NER) faces significant ambiguity in determining entity boundaries, as the same word or phrase can refer to different types of entities depending on context. For instance, the term "Washington" may denote a person (e.g., George Washington), a location (e.g., Washington state or D.C.), or an organization, requiring precise boundary detection to avoid misclassification. This ambiguity arises because natural language lacks explicit markers for entity spans, making it difficult for models to consistently identify the correct start and end positions without additional contextual cues.^[13]^[14] Contextual dependencies further complicate NER, as entity identification often relies on coreference resolution and disambiguation that demand extensive world knowledge. Coreference occurs when multiple mentions refer to the same entity (e.g., "the president" and "Biden" in a sentence), necessitating understanding of prior references to accurately tag subsequent spans. Disambiguation, meanwhile, involves resolving polysemous terms using external knowledge, such as distinguishing "Apple" as a company versus a fruit based on surrounding discourse or real-world associations. These processes highlight NER's dependence on broader linguistic and encyclopedic understanding, beyond mere pattern matching.^[15]^[16] Nested and overlapping entities pose another inherent challenge, where one entity is embedded within another, complicating span extraction. For example, in the phrase "New York City Council," "New York City" is a location containing the nested entity "York," while the full span might represent an organization; traditional flat NER models struggle to capture such hierarchies without losing precision on inner or outer boundaries. This nesting occurs frequently in real-world texts, such as legal documents or news, where entities like persons (PER) within organizations (ORG) overlap, demanding models capable of handling multi-level structures.^[17]^[18] Processing informal text exacerbates these issues, as abbreviations, typos, and code-switching introduce variability not present in standard corpora. Abbreviations like "Dr." for doctor or "NYC" for New York City require expansion or normalization to match entity patterns, while typos (e.g., "Washingtin" for Washington) can evade detection altogether. In multilingual contexts, code-switching—alternating between languages mid-sentence, common in social media—disrupts entity continuity, as seen in Hindi-English mixes where entity spans cross linguistic boundaries. These elements in user-generated content demand robust preprocessing and adaptability, underscoring NER's sensitivity to text quality.^[19]^[20]^[21]

Evaluation Metrics

The performance of named entity recognition (NER) systems is primarily assessed using precision, recall, and the F1-score, which quantify the accuracy of entity detection and classification. These metrics are derived from counts of true positives (TP, correctly identified entities), false positives (FP, incorrectly identified entities), and false negatives (FN, missed entities). Precision measures the proportion of predicted entities that are correct:
P = \frac{TP}{TP + FP}
Recall measures the proportion of actual entities that are detected:
R = \frac{TP}{TP + FN}
The F1-score, as the harmonic mean of precision and recall, balances these measures and is the most commonly reported metric in NER evaluations:
F1 = \frac{2PR}{P + R} ^[22]^[23] Evaluations can occur at the entity level or token level, with entity-level being standard for NER to emphasize complete entity identification rather than isolated word tags. In entity-level assessment, an entity prediction is correct only if its full span (boundaries) and type exactly match the gold annotation, often using the BIO tagging scheme—where "B" denotes the beginning of an entity, "I" the interior, and "O" outside any entity—to delineate boundaries precisely. Token-level evaluation, by contrast, scores each tag independently, which may inflate performance by rewarding partial boundary accuracy but fails to penalize incomplete entities. The CoNLL shared tasks, for instance, adopted entity-level F1 with exact matching to ensure robust boundary detection.^[22]^[23] Prominent benchmarks for NER include the CoNLL-2003 dataset, a foundational English resource from Reuters news articles annotating four entity types (person, location, organization, miscellaneous) across approximately 300,000 tokens (training, development, and test sets combined), serving as the de facto standard for flat, non-nested NER with reported F1 scores around 90-93% for state-of-the-art systems. OntoNotes 5.0 extends this with a larger, multi-genre corpus (over 2 million words) supporting multilingual annotations and nested structures across 18 entity types, enabling evaluation of complex hierarchies in domains like broadcast news and web text. The WNUT series, particularly WNUT-17, targets emerging entities in noisy social media (e.g., Twitter), with 6 entity types including novel terms like hashtags or events, where F1 scores typically range from 50-70% due to informal language challenges.^[22]^[24]^[23] For datasets with nested entities like OntoNotes 5.0, metrics distinguish strict matching—requiring exact span and type overlap for credit—from partial matching, which awards partial credit for boundary approximations or inner/outer entity detection to better capture system capabilities in hierarchical scenarios. Strict matching aligns with flat benchmarks like CoNLL-2003, ensuring conservative scores, while partial variants (e.g., relaxed F1) are used in nested contexts to evaluate boundary tolerance without overpenalizing near-misses.^[23]^[25]

Methodologies

Classical Approaches

Classical approaches to named entity recognition (NER) primarily relied on rule-based systems, which employed hand-crafted patterns and linguistic rules to identify and classify entities in text. These systems operated deterministically, matching predefined templates against input text to detect entity boundaries and types, such as person names or locations, without requiring training data. For instance, patterns could specify syntactic structures like capitalized words following verbs of attribution to flag potential person names.^[26] A key component of these systems was the use of gazetteers, which are curated lists of known entities, such as city names or organization titles, to perform exact or fuzzy matching against text spans. Gazetteers enhanced precision by providing lexical resources for entity lookup, often integrated with part-of-speech tagging to filter candidates. In specialized domains like biomedicine, gazetteers drawn from synonym dictionaries helped recognize protein or gene names by associating text mentions with database entries.^[27]^[28] Boundary detection in rule-based NER frequently utilized regular expressions to capture patterns indicative of entities, such as sequences of capitalized words or specific punctuation, and finite-state transducers to model sequential dependencies in entity spans. Regular expressions, for example, could define patterns like [A-Z][a-z]+ for proper nouns, while finite-state transducers processed text as automata to recognize multi-word entities like "New York City" as a single location. These tools allowed efficient scanning of text for potential entity starts and ends.^[26]^[28] Classification often involved integrating dictionaries—structured collections of entity terms—with heuristics, such as contextual clues like preceding prepositions or domain-specific triggers, to assign entity types. Dictionaries supplemented gazetteers by providing broader lexical coverage, and heuristics resolved ambiguities by prioritizing rules based on confidence scores derived from pattern specificity. This combination enabled systems to handle basic entity categorization in controlled environments, as formalized in early evaluations like those from the Message Understanding Conference.^[29]^[26] Despite their interpretability and high precision on well-defined patterns, rule-based systems suffered from significant limitations, including poor scalability to new domains due to the need for extensive manual rule engineering and their inability to generalize beyond explicit patterns. The high manual effort required for creating and maintaining rules often made these approaches labor-intensive, limiting their applicability to diverse or evolving text corpora. Early systems achieved F1-scores around 90-93% on benchmark tasks but struggled with recall for unseen variations.^[28]^[29]

Machine Learning and Statistical Methods

Machine learning and statistical methods for named entity recognition (NER) represent a shift from purely rule-based systems to data-driven approaches that learn patterns from annotated corpora. These techniques, prevalent in the late 1990s and early 2000s, leverage probabilistic models to assign entity labels to sequences of words, accounting for contextual dependencies within sentences. Supervised methods, which require labeled training data, dominated early applications due to their ability to model local and global features effectively, while unsupervised methods emerged to address the scarcity of annotations by discovering entities through clustering and pattern induction. Hidden Markov Models (HMMs) were among the first statistical models adapted for NER, treating the task as a sequence labeling problem where each word is assigned a state corresponding to an entity type or non-entity. In an HMM, the probability of a label sequence is computed using transition probabilities between states (e.g., from non-entity to person) and emission probabilities for observing a word given a state, enabling the model to capture sequential dependencies in text. Training involves estimating these parameters via the Baum-Welch algorithm, often using maximum likelihood on annotated data. For inference, the Viterbi algorithm dynamically finds the most likely state sequence by maximizing the joint probability path:

\hat{y} = \arg\max_y P(y \mid x) = \arg\max_y \left( \pi_{y_1} b_{y_1}(x_1) \prod_{t=2}^T a_{y_{t-1} y_t} b_{y_t}(x_t) \right)

where \pi is the initial state probability, a are transitions, and b are emissions. This approach achieved high performance on early benchmarks, such as 94.1% F1 on the MUC-7 dataset, by incorporating features like part-of-speech (POS) tags and word capitalization.^[30] Conditional Random Fields (CRFs), introduced as an extension to overcome limitations in HMMs like the independence assumption on observations, model the conditional probability of label sequences given the input directly. CRFs use a chain graph where nodes represent labels and edges capture dependencies, with the probability defined as:

P(y \mid x) = \frac{1}{Z(x)} \exp \left( \sum_{i=1}^T \sum_{k=1}^K \lambda_k f_k(y_{i-1}, y_i, x, i) \right)

Here, f_k are feature functions (e.g., indicating if a word matches a gazetteer entry), \lambda_k are learned weights, and Z(x) is the normalization factor. Training maximizes the log-likelihood of the labeled data using gradient-based methods like L-BFGS, allowing rich contextual modeling without explicit state transitions. In NER, linear-chain CRFs excelled on the CoNLL-2003 English dataset, attaining 88.67% F1 by integrating features such as surrounding words and entity boundaries.^[31] Supervised training in these models relies heavily on feature engineering to represent linguistic cues, including word shapes (e.g., capitalization patterns like "Xxx" for proper nouns), POS tags from taggers like Brill, and contextual n-grams. These hand-crafted features, often numbering in the thousands, are fed into probabilistic classifiers to distinguish entities from non-entities. Maximum entropy (MaxEnt) models, a precursor to CRFs, estimate label probabilities by maximizing entropy subject to feature constraints, using iterative scaling for parameter optimization. Applied to NER, MaxEnt systems like MENE achieved 93% precision on MUC-7 by combining lexical and syntactic features, bridging rule-based gazetteers as sparse inputs with learned distributions.^[32] Unsupervised approaches for NER focus on entity discovery without labels, often employing clustering to group similar phrases based on distributional similarity or co-occurrence patterns. Techniques like agglomerative clustering on word embeddings or topic models identify candidate entities by merging surface forms that appear in similar contexts, followed by type assignment via semantic resources. For instance, generative models cluster named entities by modeling their internal structure (e.g., roles like head and modifier) and external relations, achieving up to 70% accuracy in unsupervised clustering on news corpora without predefined types. These methods are particularly useful for low-resource languages or domains lacking annotations, though they typically underperform supervised baselines by 10-20% F1 due to error propagation in discovery.^[33]

Deep Learning Techniques

Deep learning techniques have revolutionized named-entity recognition (NER) by enabling automatic feature learning from raw text, surpassing traditional methods that rely on hand-crafted features. These approaches leverage neural architectures to capture contextual dependencies and semantic nuances, improving accuracy on diverse datasets. Early deep learning models integrated recurrent neural networks (RNNs) with conditional random fields (CRFs) to model sequential dependencies, while later advancements incorporated pre-trained embeddings and attention mechanisms for enhanced representation learning. Recent developments as of 2025 include advanced transformer variants like RoBERTa and DeBERTa, achieving F1 scores over 95% on benchmarks, and prompt-based methods using large language models (LLMs) such as GPT-4 for zero- and few-shot NER in low-resource settings.^[34]^[35] Recurrent neural networks, particularly long short-term memory (LSTM) units, address the limitations of vanilla RNNs by mitigating vanishing gradients, allowing effective modeling of long-range dependencies in sequences. Bidirectional LSTMs (BiLSTMs) process text in both forward and backward directions, providing richer contextual representations for each token. When combined with a CRF layer for decoding, BiLSTM-CRF models achieve superior performance by jointly considering label dependencies, outperforming standalone LSTMs on standard benchmarks like CoNLL-2003 with F1 scores exceeding 90%. This architecture treats NER as a sequence labeling task, where input tokens are mapped to BIO (Begin, Inside, Outside) tags.^[34] Word embeddings form the foundation of input representations in deep NER models, converting discrete tokens into dense vectors that encode semantic similarities. Static embeddings like Word2Vec, trained via skip-gram or continuous bag-of-words objectives on large corpora, capture distributional semantics such that vector arithmetic reflects linguistic relations (e.g., king - man + woman ≈ queen). Similarly, GloVe embeddings derive from global word co-occurrence statistics, offering efficient training and comparable performance to predictive methods on downstream tasks. These fixed representations, often initialized at 300 dimensions, serve as baselines in NER pipelines, integrated into BiLSTM layers for initial feature extraction.^[36]^[37] Contextual embeddings advance beyond static vectors by generating dynamic representations dependent on surrounding context, addressing polysemy in words like "bank" (financial institution vs. river edge). ELMo, derived from a bidirectional language model with multiple LSTM layers, produces layered, task-agnostic embeddings that are weighted and combined for NER, yielding improvements of 2-4 points in F1 over non-contextual baselines on datasets such as OntoNotes. This approach enables models to disambiguate entities based on syntactic and semantic cues within sentences.^[38] Transformer models, relying on self-attention mechanisms rather than recurrence, excel at capturing long-range dependencies through parallelizable computations. BERT, pre-trained on masked language modeling and next-sentence prediction, generates bidirectional contextual embeddings that, when fine-tuned for NER, set new benchmarks by achieving F1 scores around 93-96% on CoNLL-2003 via a simple linear classifier atop the [CLS] token or token-level predictions. The attention layers allow the model to weigh distant tokens dynamically, proving particularly effective for entity disambiguation in complex sentences.^[39] For nested NER, where entities overlap or embed within others (e.g., "Microsoft Corporation" inside "software company Microsoft"), span-based methods enumerate candidate spans and classify them directly, bypassing flat BIO schemes. These neural architectures, often built on BiLSTM or transformer encoders, score spans using bilinear classifiers to detect boundaries and types, enabling joint prediction of nested structures with F1 gains of 5-10% over sequence labeling on datasets like GENIA. Pointer networks complement this by using attention as a "pointer" to select entity start and end positions from input sequences, facilitating efficient boundary detection in variable-length outputs and supporting discontinuous or nested mentions.^[40]

Historical Development

Early History

The Named Entity Recognition (NER) task originated in the mid-1990s as part of the U.S. Defense Advanced Research Projects Agency (DARPA)-funded Message Understanding Conferences (MUC), which aimed to advance information extraction from text. The task was formally defined at the Sixth Message Understanding Conference (MUC-6) in 1995, marking the first standardized evaluation of NER as a subtask of natural language processing.^[41]^[42] In MUC-6, participants were required to identify and classify entities including person names, organization names, location names (collectively under ENAMEX), temporal expressions (TIMEX), and numerical expressions (NUMEX) in English news articles, establishing the foundational categories of persons (PER), organizations (ORG), and locations (LOC) that remain central to NER today.^[41] This introduction highlighted NER's role in enabling structured information retrieval from unstructured text, with systems evaluated on precision and recall metrics.^[42] Throughout the 1990s, NER development under DARPA's MUC programs (from MUC-3 in 1991 to MUC-7 in 1998) emphasized rule-based systems, which used hand-crafted patterns, lexicons, and grammatical rules to detect entities primarily in journalistic domains.^[43] These approaches, employed by the majority of MUC participants (e.g., five of eight systems in MUC-7 were rule-based), achieved reasonable performance on well-defined entity types but struggled with scalability, ambiguity, and domain adaptation due to their reliance on manual rule engineering.^[43] A notable early system was BBN's Identifinder, detailed in a 1999 publication, which applied hidden Markov models—a statistical learning technique—to recognize and classify names, dates, times, and numerical quantities, attaining an F-score of around 90% on MUC-7 test data and demonstrating improved robustness over pure rule-based methods.^[44] Post-2000, NER saw a pivotal shift toward statistical and machine learning methods, catalyzed by the Conference on Computational Natural Language Learning (CoNLL) shared tasks in 2002 and 2003, which provided standardized, language-independent benchmarks.^[45]^[22] The CoNLL-2002 task focused on Spanish and Dutch corpora, requiring identification of PER, LOC, ORG, and miscellaneous (MISC) entities using supervised learning, with top systems achieving F1-scores of 81.39% (Spanish) and 77.05% (Dutch).^[45] Building on this, CoNLL-2003 extended evaluation to English and German datasets derived from sources like Reuters and Frankfurter Rundschau, encouraging incorporation of unannotated data and external resources, resulting in leading F1-scores of 88.76% (English) and 72.41% (German).^[22] These tasks solidified statistical models, such as maximum entropy and support vector machines, as the dominant paradigm, fostering reusable annotated corpora and metrics that accelerated subsequent research.^[45]^[22]

Recent Advances

In the 2010s, named-entity recognition (NER) experienced a significant shift with the rise of deep learning approaches, which surpassed traditional statistical methods by automatically learning contextual representations from raw text. Early neural models, such as convolutional neural networks (CNNs), demonstrated improved performance on sequence labeling tasks, including NER.^[46] This era saw the emergence of recurrent neural networks (RNNs), particularly bidirectional long short-term memory (BiLSTM) networks combined with conditional random fields (CRFs), which effectively captured sequential dependencies and global tag constraints. The BiLSTM-CRF hybrid, introduced in seminal works, achieved state-of-the-art results on benchmarks like CoNLL-2003, with F1 scores exceeding 90% for English NER, marking a pivotal advancement in accuracy and efficiency.^[47]^[48] From 2018 onward, pre-trained transformer models revolutionized NER by enabling contextualized embeddings that facilitated fine-tuning for specific tasks with minimal data. The introduction of BERT, a bidirectional transformer encoder, dramatically improved NER performance through transfer learning, attaining an F1 score of 92.8% on CoNLL-2003 after fine-tuning.^[49] Subsequent variants like RoBERTa enhanced this further by optimizing pre-training objectives, supporting zero-shot and few-shot NER scenarios where models recognize entities in unseen domains or with limited examples, reducing reliance on large annotated corpora.^[49] These transformer-based methods shifted the paradigm toward scalable, generalizable NER systems. Between 2020 and 2025, trends emphasized multilingual capabilities and domain adaptation to address global and specialized applications. Multilingual BERT (mBERT) extended transformer benefits to over 100 languages, enabling cross-lingual NER transfer without parallel annotations.^[49] XLM-R, a more robust multilingual model from 2020, further advanced this by improving representation learning across low-resource languages, achieving superior F1 scores in multilingual benchmarks compared to mBERT.^[50] Domain adaptation via transfer learning became prominent, with techniques like fine-tuning BERT variants on domain-specific data—such as biomedical or legal texts—boosting performance in targeted scenarios through parameter-efficient methods.^[49] Progress in nested NER, which handles overlapping or hierarchically embedded entities, incorporated graph-based and prompt-based methods, particularly leveraging large language models (LLMs). Graph-based approaches, such as span-level graphs, enhanced entity boundary detection by modeling relationships between candidate spans and training examples, improving nested F1 scores on datasets like ACE 2005. Prompt-based techniques in LLMs, including GPT variants, utilized zero/few-shot prompting with tailored instructions (e.g., decomposed question-answering) to identify nested structures, though they often trailed fine-tuned BERT models but excelled in adaptability for complex, low-data settings. Recent evaluations on datasets like OntoNotes show ongoing improvements in nested NER performance.^[51] From 2023 to 2025, NER research has increasingly focused on integrating large language models for zero-shot and few-shot learning in specialized domains, such as finance and healthcare, alongside advancements in multimodal NER for social media and interprofessional communication. These developments, including adversity-aware few-shot methods, continue to enhance robustness and adaptability as of November 2025.^[52]^[53]^[54]

Applications and Future Directions

Practical Uses

Named-entity recognition (NER) plays a pivotal role in information extraction tasks, enabling systems to identify and categorize entities from unstructured text for enhanced search and retrieval. In search engines, NER facilitates the understanding of user queries by recognizing entities such as people, places, and organizations, which powers features like the Google Knowledge Graph to deliver contextually relevant results and knowledge panels.^[55] Similarly, in question answering systems, NER identifies key entities in queries and documents, improving accuracy by linking them to relevant knowledge bases and reducing ambiguity in responses.^[56] In domain-specific applications, NER addresses unique challenges across industries by extracting tailored entities from specialized corpora. In biomedicine, NER supports entity linking in PubMed abstracts and full-text articles, identifying genes, diseases, and chemicals to facilitate literature mining and drug discovery workflows.^[57] In finance, NER extracts company mentions and financial terms from news and reports, enabling sentiment analysis to gauge market reactions and assess investment risks.^[58] For legal document analysis, NER automates contract review by detecting parties, dates, obligations, and clauses, streamlining compliance checks and due diligence processes.^[59] NER integrates seamlessly into broader NLP pipelines, amplifying the functionality of various applications. In chatbots, it parses user inputs to recognize intents tied to entities like product names or locations, allowing for more precise and personalized responses.^[60] Within recommendation systems, NER processes user reviews and queries to identify preferences for entities such as items or brands, enhancing personalization in e-commerce and content suggestions.^[61] For content moderation, NER flags harmful entities like personal identifiers in user-generated text, aiding in the detection of harassment and policy violations on platforms.^[62] A notable case study involves social media monitoring, where NER tracks entities on platforms like Twitter (now X) to analyze trends and public sentiment in real time. By extracting mentions of brands, events, or influencers from tweets, organizations use NER for crisis detection, reputation management, and targeted advertising campaigns.^[63] Advances in deep learning models have enabled scalable NER deployment in these high-volume environments, processing vast streams of data efficiently.^[56]

Emerging Challenges and Trends

One of the foremost challenges in contemporary named entity recognition (NER) is addressing low-resource languages, where the acute lack of annotated datasets limits the applicability of data-intensive models. High-resource languages like English benefit from vast corpora, but over 7,000 other languages suffer from data scarcity, often comprising less than 1% of available training resources, leading to degraded performance in cross-lingual transfer scenarios. To mitigate this, zero-shot transfer methods utilizing multilingual large language models (LLMs) have gained prominence, allowing models pretrained on diverse languages to infer entities in unseen low-resource ones without additional annotations. For example, approaches like meta-pretraining on multilingual corpora enable zero-shot NER by aligning representations across languages, achieving significant F1-score improvements on low-resource languages compared to traditional baselines. Building on these techniques, few-shot and zero-shot NER paradigms address minimal supervision scenarios through meta-learning frameworks, which train models to rapidly adapt to new entity types or domains with only a handful of examples. Meta-learning, often implemented via prototypical networks or optimization-based methods like Reptile, treats NER tasks as episodes in a meta-training loop, enabling generalization from support sets of 5-10 instances per entity class. This is particularly vital for dynamic domains like social media or emerging events, where full annotation is infeasible; empirical evaluations show few-shot meta-learning outperforming standard fine-tuning on benchmarks like Few-NERD. Zero-shot variants extend this by relying solely on prompt-based inference in LLMs, though they still face challenges in handling rare entity morphologies.^[64] Ethical concerns in NER have intensified, particularly around bias in entity classification and privacy risks in extraction processes. Bias manifests in person entity (PER) tags, where models trained on skewed datasets exhibit gender or racial disparities, perpetuating stereotypes. Privacy issues arise from NER's ability to inadvertently extract sensitive entities like names or locations from unstructured text, potentially violating regulations like GDPR; this is exacerbated in real-time applications such as chatbots or surveillance. Mitigation strategies, including bias-aware fine-tuning and privacy-preserving NER via differential privacy, aim to reduce these risks without significant accuracy loss.^[65]^[66] As of 2025, key trends in NER include the rise of hybrid AI-human systems, where human-in-the-loop feedback refines model outputs for ambiguous entities, improving accuracy in iterative annotation workflows over fully automated systems. Explainable NER is another focus, with techniques like attention visualization and counterfactual explanations addressing the "black-box" nature of deep models, enabling users to trace entity decisions and build trust in high-stakes domains like legal or medical text.^[67] Additionally, integration with multimodal data—combining text with images or audio—represents a forward-looking direction; for example, vision-language models extend NER to caption entities in videos, achieving F1-scores of 70-85% on datasets like Visual Genome, though challenges persist in aligning cross-modal representations.^[68] These trends underscore a shift toward more robust, interpretable, and ethically grounded NER systems.

References

[1]
Message Understanding Conference- 6: A Brief History
Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference- 6: A Brief History. In COLING 1996 Volume 1: The 16th International Conference on ...Missing: Entity Recognition
[2]
(PDF) A Survey of Named Entity Recognition and Classification
Oct 20, 2015 · Named Entity Recognition (NER) aims to extract and to classify rigid designators in text such as proper names, biological species, and temporal ...
[3]
A survey on Named Entity Recognition — datasets, tools, and ...
We present a thorough analysis of several methodologies for NER ranging from unsupervised learning, rule-based, supervised learning, and various Deep Learning ...
[4]
A review of named entity recognition: from learning methods to ...
Jul 16, 2025 · Named Entity Recognition (NER) is commonly used when summarising news articles and legal documents. It can extract the names of politicians ...
[5]
A Survey on Recent Named Entity Recognition and Relationship ...
Named Entity Recognition (NER) is the task of identifying named entities such as specific location, treatment plan, medicines/drug, and critical health ...
[6]
Appendix C: Named Entity Task Definition (v2.1) - ACL Anthology
Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995; Month: Year: 1995; Address: Venue: ...
[7]
[PDF] A survey of named entity recognition and classification - NYU
Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called “Named Entity Recognition and.Missing: seminal | Show results with:seminal
[8]
An analytical study of information extraction from unstructured and ...
Oct 17, 2019 · Named Entity Recognition is one of the important tasks of IE systems ... Identifying relations for open information extraction. In ...
[9]
Cross-type biomedical named entity recognition with deep multi-task ...
... entity types (Fig. 4). Each entity type comes from multiple datasets: genes/proteins from BC2GM and JNLPBA, chemicals from BC4CHEMD and BC5CDR, diseases ...
[10]
[2411.05057] A Brief History of Named Entity Recognition - arXiv
Nov 7, 2024 · The process of NER has evolved in the last three decades since it first appeared in 1996. In this survey, we study the evolution of techniques ...
[11]
Nested Named Entity Recognition: A Survey - ACM Digital Library
Named Entity Recognition (NER), the first step of information extraction, mainly identifies names of persons, locations, and organizations in text. Although ...
[12]
[PDF] Interpretability Analysis for Named Entity Recognition to Understand ...
Contextualized representations can in principle disambiguate the meaning of words based on their context, for example, the canonical example of Washington being ...
[13]
[PDF] arXiv:2004.04564v2 [cs.CL] 3 Jan 2021
Jan 3, 2021 · Because words are ambiguous (Washington can be PER, LOC or ORG), and datasets ... Unsupervised named-entity recognition: Generating gazetteers and ...
[14]
[PDF] Joint Coreference Resolution and Named-Entity Linking with Multi ...
Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other ...Missing: recognition | Show results with:recognition
[15]
[PDF] Entity Disambiguation for Knowledge Base Population
When per- formed without a KB, entity disambiguation is called coreference resolution: entity mentions ei- ther within the same document or across multi- ple ...
[16]
[PDF] Exploring Nested Named Entity Recognition with Large Language ...
Nov 12, 2024 · To assess the ability of structure-based decomposed-QA (SD-QA) to differentiate between flat and nested NEs, we evaluated models special- ized ...
[17]
[PDF] Nested Named Entity Recognition - ACL Anthology
Aug 7, 2009 · Named entity recognition is the task of finding en- tities, such as people and organizations, in text. Frequently, entities are nested within ...Missing: challenges | Show results with:challenges
[18]
A Multi-task Approach for Named Entity Recognition in Social Media ...
Jun 10, 2019 · Named Entity Recognition for social media data is challenging because of its inherent noisiness. In addition to improper grammatical structures, ...
[19]
https://arxiv.org/abs/1906.04135
[20]
[PDF] Named Entity Recognition on Code-Switched Data - ACL Anthology
Jul 19, 2018 · The goal of this shared task is to provide a code-switched NER dataset that can help to benchmark NER state- of-the-art approaches.Missing: abbreviations typos
[21]
[PDF] Introduction to the CoNLL-2003 Shared Task - ACL Anthology
This sentence contains three named entities: Ekeus is a person, U.N. is a organization and Baghdad is a location. Named entity recognition is an impor-.
[22]
[PDF] A Survey on Recent Advances in Named Entity Recognition from ...
Abstract. Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc.Missing: distinction | Show results with:distinction
[23]
Results of the WNUT2017 Shared Task on Novel and Emerging ...
The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.Missing: NER | Show results with:NER
[24]
[PDF] Nested Named Entity Recognition - Stanford NLP Group
Their method could be used for nested named entity recognition, but the experiments they performed were on joint (flat). NER and noun phrase chunking. 3 Nested ...
[25]
[PDF] comprehensive overview of named entity recognition - arXiv
Sep 25, 2023 · The survey commences[2] with classical rule-based approaches[3], where predefined rules guide the identification of named entities. It ...
[26]
ProMiner: rule-based protein and gene entity recognition
May 24, 2005 · The ProMiner system uses a pre-processed synonym dictionary to identify potential name occurrences in the biomedical text and associate protein and gene ...
[27]
[PDF] Techniques for Named Entity Recognition: - IRMA-International.org
Given the importance of NER in semantic processing of text, this paper presents a detailed. (but not necessarily exhaustive) survey of NER techniques. We focus ...
[28]
[PDF] Message Understanding Conference- 6: A Brief History
The tag ENAMEX ("entity name expression") is used for both people and organiza- tion names; the tag NUNEX ("numeric expression") is used for currency and I) ...
[29]
[PDF] Probabilistic Models for Segmenting and Labeling Sequence Data
This paper introduces conditional random fields (CRFs), a sequence modeling framework that has all the advantages of MEMMs but also solves the label bias ...
[30]
[PDF] A Maximum Entropy Approach to Named Entity Recognition
Sep 3, 1999 · The MENE system uses a very flexible, maximum entropy approach to named entity recognition. This chapter will look at systems from NYU, BBN, ...
[31]
[PDF] Structured Generative Models for Unsupervised Named-Entity ...
We describe a generative model for clustering named entities which also models named en- tity internal structure, clustering related words by role.
[32]
Bidirectional LSTM-CRF Models for Sequence Tagging - arXiv
Aug 9, 2015 · Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets.
[33]
Distributed Representations of Words and Phrases and their ... - arXiv
Oct 16, 2013 · We present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
[34]
[PDF] GloVe: Global Vectors for Word Representation - Stanford NLP Group
Abstract. Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and.
[35]
Deep Contextualized Word Representations - ACL Anthology
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (eg, syntax and semantics),
[36]
BERT: Pre-training of Deep Bidirectional Transformers for Language ...
BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.
[37]
[1506.03134] Pointer Networks - arXiv
Jun 9, 2015 · Pointer Networks learn output sequences using attention as a pointer to select input sequence elements, addressing variable output length ...Missing: NER | Show results with:NER
[38]
MUC-6
Apr 25, 1996 · Named Entity Recognition. The Named Entity task for MUC-6 involved the recognition of entity names (for people and organizations), place ...
[39]
[PDF] OVERVIEW OF RESULTS OF THE MUC-6 EVALUATION
NAMED ENTITY. The Named Entity (NE) task requires insertion of SGML tags into the text stream. The tag elements are ENAMEX (for entity names, comprising ...
[40]
[PDF] Semi-Supervised Named Entity Recognition:
by the fact that five out of eight systems were rule-based in the MUC-7 competition, while the sixteen systems involved in CONLL-2003 were based on supervised ...<|control11|><|separator|>
[41]
An Algorithm that Learns What's in a Name | Machine Learning
In this paper, we present IdentiFinderTM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities.
[42]
None
### Summary of CoNLL-2002 Shared Task on NER
[43]
https://web-archive.southampton.ac.uk/cogprints.org/5859/1/Thesis-David-Nadeau.pdf
[44]
Neural Architectures for Named Entity Recognition - arXiv
Mar 4, 2016 · This paper introduces two neural architectures for named entity recognition: one with bidirectional LSTMs and conditional random fields, and ...
[45]
Neural Architectures for Named Entity Recognition - ACL Anthology
This paper, 'Neural Architectures for Named Entity Recognition', was published in 2016 by Lample et al. at NAACL, pages 260-270.
[46]
https://arxiv.org/abs/1103.0398
[47]
Nested Named Entity Recognition with Span-level Graphs
Abstract. Span-based methods with the neural networks backbone have great potential for the nested named entity recognition (NER) problem.
[48]
Knowledge Graph Search API - Google for Developers
Apr 26, 2024 · Typical use cases · Getting a ranked list of the most notable entities that match certain criteria. · Predictively completing entities in a search ...Reference · Sign in · Google Knowledge Graph · Authorize Requests
[49]
Recent Advances in Named Entity Recognition - arXiv
Dec 20, 2024 · In this survey, we first present an overview of recent popular approaches, including advancements in Transformer-based methods and Large Language Models (LLMs)Missing: scholarly | Show results with:scholarly
[50]
Biomedical named entity recognition and linking datasets - PubMed
Dec 1, 2020 · This study reviews BNER datasets, introduces a revised JNLPBA dataset, and introduces EBED, a multi-task dataset with gene, disease, and ...
[51]
Financial Use Cases for Named Entity Recognition (NER)
Oct 27, 2021 · Using NER to tag and classify relevant data to extract information can aid and accelerate the process of assessing profitability and credit risk ...
[52]
[PDF] Fine-grained Contract NER using instruction based model
This paper presents the development of a prompt- based corpus for contract Named Entity Recog- nition (NER) encompassing eighteen fine-grained entity types from ...
[53]
[PDF] Chatbot: A Conversational Agent employed with Named Entity ...
This research focuses on Named Entity Recognition (NER) and Intent Classification models which can be integrated into NLU service of a Chatbot. Named entities ...
[54]
Food Recommendation System using Custom NER and Sentimental ...
Utilizing custom Named Entity Recognition (NER) and sentiment analysis, the system seeks to understand and cater to individual food preferences extracted from ...
[55]
Leveraging NLP Techniques for Effective Content Moderation - Lettria
May 22, 2023 · Named Entity Recognition In content moderation, NER is useful in detecting targeted harassment and doxxing, where the aggressor shares personal ...
[56]
[PDF] Named Entity Recognition for Social Media Data using Deep Learning
Twitter NER is also useful for event tracking and detection because it allows for the real-time identification of major events, trends, and conversations.<|control11|><|separator|>
[57]
Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity ... - arXiv
Sep 2, 2025 · Named-entity recognition (NER) in low-resource languages is usually tackled by finetuning very large multilingual LMs, an option that is often ...
[58]
Zero-shot Cross-lingual NER via Mitigating Language Difference
Oct 31, 2025 · Abstract. Cross-lingual Named Entity Recognition (CL-NER) aims to transfer knowledge from high-resource languages to low-resource languages.
[59]
Evaluation of open and closed-source LLMs for low-resource ...
While Zero-Shot LLMs show reasonable accuracy for many high-resource languages, their performance on low-resource tasks remains suboptimal. 2.2. Few-Shot ...
[60]
Meta-Learning for Few-Shot Named Entity Recognition
In this paper, we apply two meta-learning algorithms, Prototypical Networks and Reptile, to few-shot Named Entity Recognition (NER).Missing: zero- frameworks
[61]
Advancements in few-shot nested Named Entity Recognition
Jun 28, 2025 · This study introduces a novel span-based meta-learning framework that uses meta-learning convolution to address the challenges of few-shot nested NER.Missing: zero- | Show results with:zero-
[62]
Few-shot Name Entity Recognition on StackOverflow - arXiv
Apr 28, 2024 · We propose a few-shot learning approach for fine-grained NER, enabling effective entity recognition with minimal annotated training data.
[63]
Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)
Jun 30, 2025 · This work explores the use of Named- Entity Recognition (NER) to facilitate the privacy-preserving training (or adaptation) of LLMs.
[64]
Privacy- and Bias-aware NLP Models using Named-Entity Recognition
Jul 11, 2025 · However, despite their strong performance, LLMs introduce important legal/ethical concerns, particularly regarding privacy, data protection, and ...
[65]
AI tools and trends in 2025: What tech professionals should know
Feb 13, 2025 · One key trend in 2025 is the rise of hybrid teams, where AI agents work alongside humans. For example, GitHub Copilot acts as an AI coding ...
[66]
Advancing named entity recognition in interprofessional ...
Jun 26, 2025 · Named Entity Recognition (NER) plays a critical role in interprofessional collaboration (IPC) and education, providing a means to identify and ...
[67]
Evolution and emerging trends of named entity recognition - Cell Press
This paper explores the progress of NER research from both macro and micro perspectives. It aims to assist researchers in quickly grasping relevant information.