Fact-checked by Grok 2 weeks ago

Part-of-speech tagging

Part-of-speech tagging (POS tagging) is the process of assigning a syntactic category, such as noun, verb, adjective, adverb, or preposition, to each word in a given text, drawing on both the word's inherent definition and its contextual usage to resolve ambiguities like the dual role of "book" as a noun or verb.^[1] This task forms a foundational step in natural language processing (NLP), enabling the disambiguation of word senses and the identification of grammatical structures within sentences.^[1]^[2] The concept of POS tagging traces its origins to ancient linguistics, with Dionysius Thrax around 100 B.C. outlining eight parts of speech for Greek that profoundly influenced categorization in European languages for over two millennia.^[1] Early computational efforts in the mid-20th century relied on manual rule-based systems, such as the 1950s TDAP tagger and early 1970s TAGGIT, but these were labor-intensive and limited in scalability.^[1] The field advanced significantly in the 1980s and 1990s with probabilistic models, exemplified by Hidden Markov Models (HMMs) introduced by Kenneth Church in 1989, which leveraged statistical probabilities to automate tagging on large corpora.^[1] By the 2000s, discriminative approaches like Maximum Entropy Markov Models (MEMMs) and Conditional Random Fields (CRFs) emerged, further improving accuracy, while the 2010s saw the integration of deep learning techniques such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks to capture long-range dependencies. In the 2020s, transformer-based models like BERT have pushed accuracies beyond 98% on benchmarks and advanced multilingual and low-resource tagging.^[1]^[2]^[3] POS tagging plays a crucial role in numerous NLP applications, serving as a prerequisite for higher-level tasks including syntactic parsing, named entity recognition, machine translation, information extraction, and speech synthesis.^[1]^[4] It helps reveal syntactic relationships between words, facilitating word sense disambiguation and enhancing the performance of downstream systems like question answering and sentiment analysis.^[5] Standard tagsets, such as the Penn Treebank's 45-tag system used in corpora like the Wall Street Journal and Brown Corpus, provide consistent frameworks for annotation and evaluation.^[1] Early rule-based methods achieved modest accuracies but were overtaken by stochastic approaches; for instance, HMM-based taggers reached 96.7% accuracy on the Penn Treebank, while support vector machines (SVMs) hit 97.16% on English text.^[1]^[4] Transformation-based learning, as in Eric Brill's 1992 tagger, iteratively refines rules from annotated data to boost performance.^[5] Modern neural methods, including bidirectional LSTMs and CRFs, often exceed 97% accuracy on benchmark datasets, though challenges persist for low-resource languages and morphologically rich tongues.^[1]^[2]

Fundamentals

Definition and Purpose

Part-of-speech (POS) tagging is the process of assigning a grammatical category, such as noun, verb, adjective, or determiner, to each word in a text corpus based on both its lexical definition and its contextual usage within the sentence.^[1] This task resolves ambiguities inherent in words that can belong to multiple categories, such as "book," which functions as a noun (e.g., "a book") or verb (e.g., "to book a flight").^[1] POS tagging relies on predefined tag sets that standardize these categories across languages and applications.^[6] The primary purpose of POS tagging is to facilitate syntactic analysis by revealing the structural roles of words in a sentence, which aids in understanding grammatical relationships and sentence meaning.^[1] It also disambiguates word senses by clarifying usage in context, for instance, distinguishing the pronunciation of "content" as a noun (CONtent) versus an adjective (conTENT) in speech synthesis systems.^[1] As a foundational preprocessing step in natural language processing (NLP), POS tagging supports higher-level tasks such as dependency parsing, sentiment analysis, and information extraction by providing tagged sequences that inform subsequent algorithms.^[6] In a typical workflow, POS tagging begins with tokenization of the input text into individual words, followed by the assignment of POS labels to each token, yielding a sequence of word-tag pairs as output.^[6] For example, the sentence "The cat sleeps" is tokenized into ["The", "cat", "sleeps"] and tagged using the Penn Treebank tag set as The/DT cat/NN sleeps/VBZ, where DT denotes determiner, NN noun, and VBZ verb in third-person singular present.^[7]^[6] POS tagging is distinct from related NLP tasks like lemmatization, which normalizes words to their base or dictionary form (e.g., "sleeps" to "sleep") without assigning grammatical categories, and named entity recognition (NER), which specifically identifies and classifies entities such as persons, organizations, or locations rather than broad syntactic roles.^[1]^[6]

Importance in Natural Language Processing

Part-of-speech (POS) tagging serves as a foundational preprocessing step in natural language processing (NLP) pipelines, enabling the extraction of syntactic features that enhance the performance of higher-level tasks such as machine translation, information extraction, and speech recognition.^[3] By assigning grammatical categories to words, POS tagging provides essential structural information that informs subsequent analyses, facilitating more accurate parsing and semantic interpretation across diverse applications.^[1] One key benefit of POS tagging lies in its ability to resolve lexical ambiguities inherent in natural language, where a single word form can function in multiple grammatical roles depending on context—for instance, distinguishing "run" as a noun (e.g., a short excursion) versus a verb (e.g., to sprint).^[8] This syntactic disambiguation improves the precision of downstream NLP systems by supplying contextual cues that guide word sense disambiguation and dependency parsing, ultimately boosting overall task accuracies in areas like question answering and sentiment analysis.^[3] Historically, POS tagging emerged as a benchmark task in NLP, with early systems achieving high accuracies—such as 97% or more on English corpora like the Penn Treebank—demonstrating the feasibility of automated grammatical analysis and inspiring advancements in statistical and machine learning approaches to language processing.^[9] Seminal work, including Brill's rule-based tagger, highlighted the potential for efficient, high-performance tagging without exhaustive rule sets, paving the way for broader adoption in computational linguistics.^[10] POS tagging also bridges interdisciplinary domains, integrating traditional linguistic principles of grammar and morphology with computational modeling to support AI-driven systems that mimic human language understanding.^[8] This fusion has enabled applications in corpus annotation efforts, such as the Penn Treebank, which standardized tag sets for consistent cross-linguistic and cross-domain analysis.^[11]

Tag Sets

Common Tag Sets and Standards

One of the earliest influential tag sets for English part-of-speech (POS) tagging was developed for the Brown Corpus, a million-word collection of American English texts compiled in the 1960s. This tag set consisted of 87 simple tags, allowing for the formation of compound tags to capture detailed morphological and syntactic distinctions, such as verb forms (e.g., VB for base, VBD for past tense).^[12] The Brown tag set laid foundational groundwork for subsequent standards by emphasizing systematic annotation of diverse text genres.^[12] The Penn Treebank tag set, widely adopted for English POS tagging since the 1990s, comprises 36 primary tags that form a hierarchical structure distinguishing major syntactic categories from minor subcategories. Major categories include nouns (N), verbs (V), adjectives (J), and adverbs (R), while minor distinctions specify attributes like number or tense; for example, NN denotes a singular noun, NNS a plural noun, VB a base-form verb, and VBD a past-tense verb.^[13] This design balances syntactic detail with annotator efficiency, enabling consistent labeling across large corpora.^[13] Derived partly from the Brown tag set, the Penn system simplified certain lexical redundancies to focus on contextually relevant syntactic roles.^[12] For cross-linguistic applications, the Universal Dependencies (UD) framework introduces a standardized set of 17 coarse-grained POS tags to promote consistency across languages. These tags cover core categories such as NOUN (common nouns), VERB (verbs), ADJ (adjectives), ADV (adverbs), and others like PRON (pronouns), DET (determiners), and PUNCT (punctuation), with additional features for finer morphological properties.^[14] The UD tag set prioritizes universality by mapping language-specific tags to these shared labels, facilitating multilingual model training and comparison.^[14] Standards for POS tag sets have been shaped by organizations like the Linguistic Data Consortium (LDC), which provides detailed annotation guidelines to ensure reproducibility and interoperability. The LDC's guidelines for the Penn Treebank, for instance, specify rules for handling ambiguities, such as tagging context-dependent words like "one" as NN (noun) when functioning numerically but CD (cardinal) otherwise.^[13] These standards influence corpus development by promoting uniform practices that support downstream NLP tasks.^[13] Tag set granularity involves trade-offs between detail and performance: fine-grained sets like the Penn Treebank's offer nuanced distinctions that aid syntactic analysis but increase data sparsity, often reducing tagger accuracy due to fewer training examples per tag.^[15] In contrast, coarse-grained sets like UD's 17 tags achieve higher tagging accuracy by grouping similar categories, though they sacrifice specificity for broader applicability and easier cross-language transfer.^[15] Empirical studies show that introducing finer distinctions can yield marginal gains in targeted scenarios but generally complicates generalization without proportional benefits.^[15]

Multilingual and Domain-Specific Variations

Part-of-speech (POS) tagging tag sets must be adapted for languages with complex morphological structures, such as morphologically rich languages that feature extensive inflectional paradigms. For instance, Finnish, which has 15 grammatical cases for nouns, requires tag sets that incorporate detailed morphological features like case markers to accurately disambiguate word forms during tagging.^[16] Similarly, agglutinative languages like Turkish demand subword-level tagging or morphological analysis integrated into POS schemes, as suffixes can alter word categories and meanings in ways that standard word-based tagging cannot capture without prior segmentation.^[17] Domain-specific variations in POS tagging often involve extending or customizing tag sets to handle specialized terminology and syntactic patterns not prevalent in general corpora. In the biomedical domain, taggers are adapted to recognize domain-unique terms, such as gene names or medical abbreviations, which may require additional tags like "BIOMEDNOUN" for biological entities to improve accuracy over general-purpose tags.^[18] Legal texts, for example, benefit from custom labels for jargon such as contract clauses or statutory terms, enabling taggers to differentiate between homonyms that carry distinct legal implications in context.^[19] Technical domains, like automotive documentation, similarly employ tailored tags for precise annotation of components and procedures, enhancing downstream tasks such as error detection in manuals.^[20] Cross-lingual standards aim to harmonize POS tagging across diverse languages despite variations in grammar and word order. The Universal Dependencies (UD) framework plays a central role by providing a consistent set of 17 universal POS tags and morphological features for over 180 languages (as of November 2025).^[21]^[22] However, challenges persist with syntactic differences, such as subject-object-verb (SOV) order in Japanese, which necessitates adjustments in dependency relations linked to POS tags to maintain parsing consistency across languages.^[14] Practical examples illustrate these adaptations in real-world applications. The CLAWS tagger, designed for British English, uses a fine-grained C7 tag set to capture regional variants and idiomatic expressions, achieving high accuracy on corpora like the British National Corpus.^[23] In multilingual settings involving code-switching, such as English-Spanish texts, hybrid taggers combine resources from both languages to assign POS labels, addressing ambiguities where words from one language embed in another's sentence structure.^[24]

Tagging Methods

Rule-Based Tagging

Rule-based part-of-speech tagging employs hand-crafted linguistic rules to assign tags to words in a sentence, drawing on morphological patterns, contextual cues, and lexical resources such as dictionaries. These systems typically begin by analyzing each word's form—such as suffixes or prefixes—to generate candidate tags from a lexicon, then apply a series of deterministic rules to resolve ambiguities based on surrounding words or syntactic structures. For instance, a rule might specify that if a word ends in "-ed" and is preceded by an auxiliary verb, it should be tagged as a past tense verb (VBD) rather than an adjective. This approach ensures unambiguous cases are handled precisely without relying on probabilistic inference.^[10]^[25] A prominent example is the Brill tagger, which uses transformation-based error-driven learning to generate and apply contextual rules. It starts by assigning each word its most frequent tag from a lexicon, then iteratively applies ordered transformation rules—such as changing a noun tag to a verb if the word follows a determiner—to correct errors based on local context. These rules are typically of the form "change tag A to tag B in environment C," where environment C might involve adjacent words or their tags. Another key system is ENGTWOL, which integrates finite-state transducers for morphological analysis to produce multiple possible tags per word, followed by constraint grammar rules that eliminate incompatible tags through syntactic and morphological restrictions, such as prohibiting certain adjective-noun sequences.^[10]^[25] Rule-based taggers offer high precision for straightforward, rule-covered cases, as the explicit linguistic knowledge allows for targeted disambiguation without computational overhead from training. Their interpretability is a significant strength, enabling linguists to trace tagging decisions directly to specific rules, which facilitates debugging and customization. Additionally, they require no annotated training corpus, making them suitable for resource-scarce languages where data is unavailable. However, developing and maintaining these systems is labor-intensive, as crafting comprehensive rules demands deep linguistic expertise and can involve thousands of manual entries for lexicons and constraints. They are often brittle, performing poorly on exceptions, idioms, or domain-specific vocabulary not anticipated by the rules, leading to cascading errors in complex sentences. Scalability to new languages or dialects is limited, as rule sets must be largely rewritten, hindering adaptation without substantial reinvestment.^[25] In contrast to probabilistic methods that incorporate statistical probabilities to manage uncertainty, rule-based tagging depends entirely on predefined deterministic rules for all decisions.

Probabilistic and Statistical Tagging

Probabilistic and statistical tagging methods represent a shift from hand-crafted rules to data-driven approaches, where part-of-speech tags are assigned based on probability distributions derived from annotated corpora. These techniques model the likelihood of a tag sequence for a given word sequence by factoring in contextual dependencies among tags and the compatibility between words and their potential tags. Early implementations, such as those using stochastic bigram models, achieved accuracies around 95% on unrestricted English text by selecting the most probable tag sequence via dynamic programming, though without delving into the decoding specifics.^[26] At the core of these methods are foundational probabilistic concepts, including emission probabilities and tag transition probabilities. The emission probability, P(w_i \mid t_i), quantifies the likelihood of observing word w_i under tag t_i, estimated from the relative frequency of word-tag pairs in training data. Transition probabilities capture contextual dependencies, such as bigrams P(t_i \mid t_{i-1}) or trigrams P(t_i \mid t_{i-1}, t_{i-2}), which model how likely a tag is given one or two preceding tags, respectively; trigram models, in particular, improve accuracy by accounting for longer-range syntactic patterns, reaching up to 96.5% on standard benchmarks like the Penn Treebank. These probabilities are jointly used to compute the overall sequence probability P(\mathbf{t} \mid \mathbf{w}) \propto \prod_i P(w_i \mid t_i) \cdot P(t_i \mid t_{i-1}, t_{i-2}), enabling disambiguation of ambiguous words like "can" (verb or noun) based on surrounding context.^[26] Statistical parameters in these models are typically estimated via maximum likelihood estimation (MLE) from large annotated corpora, such as the Brown Corpus or Penn Treebank, where P(t_i \mid t_{i-1}, t_{i-2}) = \frac{\#(t_{i-2}, t_{i-1}, t_i)}{\#(t_{i-2}, t_{i-1})} reflects empirical frequencies. However, sparse data from unseen tag or word-tag combinations leads to zero probabilities, which smoothing techniques address; deleted interpolation, a method that interpolates higher-order n-gram estimates with lower-order ones using weights optimized on held-out data, effectively handles such cases by reserving portions of the corpus for weight estimation, improving robustness without overfitting. For instance, in trigram taggers, this smoothing can boost performance by 1-2% on out-of-vocabulary words.^[26] n-gram models form the backbone of many statistical taggers, with unigram taggers simply assigning the most frequent tag per word (yielding about 80-90% accuracy), bigram taggers incorporating one prior tag for contextual refinement (around 95%), and trigram taggers using two priors for finer disambiguation (up to 96-97%). These models treat the tag sequence as a Markov chain of order n-1, prioritizing empirical patterns from corpora over linguistic rules. Building briefly on rule-based precursors that relied on fixed dictionaries and heuristics, probabilistic n-gram approaches marked the empirical revolution by leveraging statistical evidence for scalable tagging.^[26] Hybrid statistical-rule systems enhance pure probabilistic methods by combining lexicon-based initial assignments (e.g., dictionary lookups for unambiguous words) with statistical disambiguation for ambiguities, often achieving accuracies exceeding 97% on domain-specific texts. In such setups, rules provide deterministic tags for high-confidence cases, while probabilities resolve the rest via n-gram scoring; for example, a dictionary might tag "running" as a verb or gerund, with bigram context probabilistically selecting the fit. This integration mitigates data sparsity in low-resource scenarios and has been pivotal in early hybrid taggers for languages like English and German.^[27]

Machine Learning and Neural Tagging

Machine learning approaches to part-of-speech (POS) tagging typically rely on supervised learning, where models are trained on annotated corpora to predict tags for input sequences. These methods extract hand-crafted features such as word shapes (e.g., capitalization patterns), prefixes, suffixes, and surrounding context to represent tokens, which are then fed into classifiers like support vector machines (SVMs) or decision trees. For instance, SVM-based taggers use lexicalized features including word length and n-gram patterns to achieve robust performance on diverse datasets. Decision trees, as explored in early machine learning applications, build hierarchical rules from features to disambiguate tags, offering interpretability alongside competitive accuracy.^[28]^[29] A key advancement in supervised sequence labeling is the use of conditional random fields (CRFs), which model the joint probability of an entire tag sequence given the input, capturing dependencies between adjacent tags more effectively than independent classifiers. CRFs treat POS tagging as a structured prediction task, incorporating features like those mentioned above into a graphical model that optimizes global consistency, often outperforming earlier probabilistic methods on benchmark corpora. Neural approaches build on this by leveraging recurrent architectures, particularly bidirectional long short-term memory (BiLSTM) networks, which process sequences in both forward and backward directions to incorporate full contextual information for each token. Seminal work demonstrated that BiLSTM models, often combined with a CRF layer, significantly improve tag prediction by learning distributed representations without relying heavily on manual features. Transformer-based models, such as BERT, have further elevated neural POS tagging through fine-tuning on labeled data, where the pre-trained encoder's attention mechanisms capture long-range dependencies. Fine-tuned BERT variants achieve over 98% accuracy on Universal Dependencies (UD) datasets for high-resource languages, surpassing traditional neural models by adapting contextual embeddings to the tagging task. Recent advances extend this to large language models (LLMs) like GPT variants, enabling zero-shot POS tagging via prompting without task-specific training; for example, GPT-4 demonstrates accuracies around 80-90% on low-resource or cross-lingual settings through natural language instructions. Multilingual models like mBERT facilitate POS tagging in low-resource languages by transferring knowledge from high-resource ones, improving performance by 5-10% on UD subsets for understudied tongues through cross-lingual embeddings.^[30]^[31]^[32]^[33] Training paradigms for these neural taggers emphasize sequence labeling objectives, where models predict tag distributions per token using softmax activation and optimize via cross-entropy loss to minimize prediction errors across the sequence. Transfer learning plays a central role, initializing models with pre-trained embeddings (e.g., from Word2Vec or contextual ones like those in BERT) before fine-tuning on POS data, which reduces the need for large annotated corpora and boosts generalization, especially in low-resource scenarios. This approach has become standard, enabling efficient adaptation of general-purpose representations to the structured nature of tagging.^[34]^[35]

Key Algorithms and Techniques

Hidden Markov Models

Hidden Markov Models (HMMs) serve as a core probabilistic framework for part-of-speech (POS) tagging, modeling the underlying sequence of tags as hidden states that generate observed words while capturing dependencies between consecutive tags.^[36] This approach addresses the ambiguity in word-tag assignments by leveraging statistical patterns derived from training data, enabling robust tagging even for words with multiple possible POS categories.^[36] In the HMM formulation for POS tagging, the states represent POS tags (e.g., noun, verb), and the observations are the input words. The model is parameterized by the initial state distribution \pi, where \pi_i = P(\text{tag}_1 = i); the transition probability matrix A, where a_{ij} = P(\text{tag}_t = j \mid \text{tag}_{t-1} = i); and the emission probability matrix B, where b_j(w) = P(\text{word}_t = w \mid \text{tag}_t = j).^[37] These components allow the model to represent how tags follow one another in natural language sequences and how likely each word is to appear under a given tag.^[1] For supervised training on a tagged corpus, parameters are typically estimated via maximum likelihood using frequency counts: transitions from co-occurring tag pairs and emissions from word-tag pairs.^[1] An alternative supervised approach employs Viterbi training, which approximates parameter estimation by assigning tags via the most likely paths and updating counts accordingly.^[38] In unsupervised scenarios with untagged text, the Baum-Welch algorithm—an expectation-maximization procedure—iteratively estimates parameters by computing expected state occupancies and transitions.^[39] The probability of an observation sequence O = o_1, o_2, \dots, o_T given the model \lambda = (A, B, \pi) is:

P(O \mid \lambda) = \sum_Q \pi_{q_1} b_{q_1}(o_1) \prod_{t=2}^T a_{q_{t-1} q_t} b_{q_t}(o_t)

where the summation is over all possible state sequences Q = q_1 q_2 \dots q_T.^[37] This formulation enables POS tagging by evaluating the joint likelihood of words and their latent tag sequences, with the most probable tagging obtained via efficient inference.^[36] The Viterbi algorithm, a dynamic programming method, finds this optimal sequence (detailed in ### Dynamic Programming Approaches).^[37]

Dynamic Programming Approaches

Dynamic programming techniques play a crucial role in part-of-speech (POS) tagging by enabling efficient inference over probabilistic models that assign tags to sequences of words, optimizing global sequence probabilities rather than local decisions. These approaches, rooted in the principles of dynamic programming, avoid the exponential cost of evaluating all possible tag combinations by building solutions incrementally through recursion and memoization. In POS tagging, they are particularly vital for models where tag assignments depend on contextual probabilities, such as transitions between tags and emissions of words given tags.^[40] The Viterbi algorithm exemplifies dynamic programming for POS tagging by identifying the most likely tag sequence that maximizes the joint probability of the tags and observations in a Hidden Markov Model (HMM). Originally developed for decoding convolutional codes, it was applied to statistical POS disambiguation in the late 1980s. The recursion defines the probability of the best path ending in tag k at position t as

V_t(k) = \max_j \left[ V_{t-1}(j) \cdot a_{jk} \right] \cdot b_k(o_t),

where a_{jk} represents the transition probability from tag j to k, and b_k(o_t) is the probability of observing word o_t given tag k. Pointers track the maximizing predecessor j for each V_t(k), allowing backtracking to reconstruct the optimal path after computing the final values. This ensures exact global optimization for first-order models.^[41]^[42]^[40] Complementing Viterbi, the forward-backward algorithm computes marginal (posterior) probabilities for each tag at each position, facilitating applications like error analysis, confidence scoring, or parameter smoothing in HMM-based taggers. It proceeds in two passes: the forward pass calculates the probability of reaching each state from the sequence start, while the backward pass computes the probability of completing the sequence from each state to the end. These are combined to yield posteriors as \gamma_t(k) = \alpha_t(k) \cdot \beta_t(k) / P(O), where \alpha_t and \beta_t are the forward and backward values, and P(O) is the total observation probability. This method supports probabilistic insights without path reconstruction.^[40] Standard implementations of Viterbi and forward-backward for first-order models exhibit O(T N^2) time complexity, with T as the sentence length and N as the tag set size, arising from maximizing or summing over prior states at each of T steps. For computationally intensive scenarios, such as higher-order HMMs or large N (e.g., fine-grained tag sets with hundreds of tags), beam search approximates these by retaining only the top-B partial paths at each step, reducing effective complexity to O(T B N) while preserving near-optimal accuracy in practice.^[40] These dynamic programming methods underpin inference in diverse POS tagging frameworks. In HMMs, they directly optimize generative probabilities; in Conditional Random Fields (CRFs), Viterbi decoding finds the maximum conditional likelihood tag path, addressing label bias issues in maximum entropy Markov models. Neural architectures, such as bi-directional LSTM-CNNs with CRF layers, employ Viterbi or beam search for structured output decoding, integrating deep representations with global normalization for superior performance on benchmarks like the Penn Treebank.^[43]

Unsupervised and Transformation-Based Methods

Unsupervised part-of-speech (POS) tagging approaches aim to induce POS tags from unlabeled text corpora by leveraging patterns in word distributions and contexts, without relying on annotated training data. These methods typically cluster words based on their distributional similarity, where words appearing in similar linguistic contexts are grouped into potential POS classes. For instance, Brown clustering, introduced as a hierarchical clustering technique for class-based n-gram language modeling, groups words by iteratively merging classes to maximize the likelihood of a bigram model, effectively capturing syntactic categories through contextual co-occurrences. This approach has been foundational for POS induction, as it allows for the discovery of tag-like clusters solely from raw text, such as distinguishing nouns from verbs based on preceding or following word types. Another key technique in unsupervised tagging involves the expectation-maximization (EM) algorithm for tag induction, which iteratively estimates hidden POS labels to maximize the likelihood of observed word sequences under a probabilistic model like a hidden Markov model (HMM). In this process, the E-step computes posterior probabilities of tags given current parameters, while the M-step updates emission and transition probabilities to improve the model fit. Seminal work has shown that EM, when applied to HMMs, can induce coherent POS categories, achieving many-to-one mappings where multiple induced classes align with traditional tags like nouns or adjectives.^[39] These methods often incorporate priors, such as Dirichlet processes, to prevent overfitting and encourage linguistically plausible tag inventories.^[44] Transformation-based learning, exemplified by the Brill tagger, provides a rule-iteration paradigm that begins with a simple baseline tagger—such as one assigning unambiguous tags to known words or most likely tags to ambiguous ones—and then applies successive transformations to correct errors. These transformations are learned in an error-driven manner from partially or fully labeled data, using contextual predicates like "the preceding word is tagged as NN" or "the following word is a proper noun" to specify rule templates. The algorithm greedily selects the transformation that reduces the most errors at each iteration, resulting in a compact set of ordered rules that achieve high accuracy with minimal supervision. For example, on English text, the Brill tagger starts with rules for unambiguous cases and refines via templates involving adjacent tags, yielding performance competitive with statistical methods at the time.^[45] Semi-supervised variants bridge unsupervised and supervised paradigms by bootstrapping from small labeled seeds, using techniques like co-training or self-training to iteratively expand the training set with pseudo-labels. In co-training, two independent views of the data—such as left and right contexts for a word—are tagged separately, and confidently predicted labels from one view are added to train the other, propagating information across iterations. Self-training, a simpler form, applies an initial tagger to unlabeled data, selects high-confidence predictions as new labeled examples, and retrains until convergence. These methods, applied to POS tagging, start with a few thousand labeled sentences and leverage millions of unlabeled tokens to refine tag boundaries, particularly effective for resolving ambiguities in closed-class words.^[46] The primary advantages of unsupervised and transformation-based methods lie in their ability to reduce annotation costs and enable tagging for low-resource languages where labeled corpora are scarce or nonexistent. By relying on abundant unlabeled text, these approaches facilitate POS induction in under-resourced settings, such as indigenous languages, where even small seed data can bootstrap effective taggers through iterative refinement. This label efficiency contrasts with fully supervised models, making them particularly valuable for multilingual NLP pipelines in diverse linguistic environments.^[47]

Historical Development

Early and Rule-Based Era (Pre-1990s)

The origins of part-of-speech (POS) tagging trace back to the 1950s, when computational linguistics emerged alongside early efforts in machine translation and syntactic analysis. Zellig Harris, a pioneering linguist, introduced distributional analysis as a method to identify word classes based on their co-occurrence patterns in text, laying foundational concepts for automated tagging. In 1958–1959, Harris developed one of the earliest automated POS taggers as part of the Transformations and Discourse Analysis Project (TDAP) at the University of Pennsylvania, employing 14 handwritten rules implemented via finite-state transducers to assign parts of speech and perform basic parsing.^[48] This system prefigured modern tagging by using local context rules for disambiguation, though it was limited to small-scale English texts. Concurrently, institutions like IBM advanced computational linguistics through projects on syntactic processing in the late 1950s and 1960s, focusing on rule-driven morphological and grammatical coding to support machine translation, which indirectly influenced early tagging techniques.^[49] During the 1960s, manual and semi-automated tagging efforts gained momentum with the creation of foundational corpora. A landmark milestone was the Brown Corpus, compiled in 1961 by W. Nelson Francis and Henry Kučera at Brown University, consisting of approximately 1 million words from 500 samples of American English across diverse genres. The corpus was POS-tagged in 1979 using a set of around 80 categories, including parts of speech, punctuation, and inflectional features, establishing a standardized resource for linguistic research. Early automated attempts, such as the Computational Grammar Coder (CGC) by Sheldon Klein and Robert F. Simmons in 1963, combined dictionary lookups with about 500 hand-crafted context rules for disambiguation, achieving initial grammatical coding on unrestricted English text but with limited accuracy due to reliance on local heuristics.^[50] In the 1970s, rule-based POS tagging evolved with more sophisticated systems emphasizing dictionary-based assignment followed by morphological and contextual rules. The TAGGIT tagger, developed by Barbara B. Greene and Gerald M. Rubin in 1971, applied a rule-based approach using an 87-tag set to the Brown Corpus, automatically tagging 77% of words correctly before manual correction of ambiguities.^[51] This system highlighted the potential of finite-state automata for efficient rule application in tagging pipelines. A key milestone was the Lancaster-Oslo/Bergen (LOB) Corpus, compiled in the 1970s as a 1-million-word counterpart to the Brown Corpus representing British English from 1961 texts, which became one of the first major tagged resources through rule-augmented processing in the early 1980s.^[52] These developments shifted focus from purely manual annotation to hybrid rule systems, enabling larger-scale analysis while exposing challenges like ambiguity resolution in morphologically rich contexts.

Probabilistic Revolution (1990s-2000s)

The 1990s marked a pivotal shift in part-of-speech (POS) tagging from rule-based systems to probabilistic and statistical approaches, enabled by increasing computational power and the availability of large annotated corpora such as the Wall Street Journal (WSJ) section of the Penn Treebank. Hidden Markov Models (HMMs) emerged as a dominant framework, modeling tag sequences as Markov chains and leveraging Viterbi decoding for efficient inference. A seminal implementation, the practical HMM-based tagger developed by Cutting et al., achieved 96.0% accuracy on WSJ test data, demonstrating the viability of stochastic methods for unrestricted text. Concurrently, maximum entropy models advanced probabilistic tagging by incorporating diverse contextual features beyond simple n-grams; Ratnaparkhi's MXPOST tagger, for instance, attained 96.6% accuracy on unseen Penn Treebank data through feature selection and iterative parameter estimation.^[53] These developments, including stochastic taggers like those in early language technology toolkits, emphasized empirical training over hand-crafted rules, significantly improving robustness and scalability. In the 2000s, the field saw further refinements in handling feature dependencies and sequence modeling. Conditional Random Fields (CRFs), introduced by Lafferty et al., addressed limitations in HMMs and maximum entropy Markov models by directly modeling the conditional probability of tag sequences given observations, accommodating non-independent features without label bias issues. This enabled more accurate incorporation of rich linguistic contexts, such as surrounding words and tags, leading to state-of-the-art performance in sequence labeling tasks including POS tagging. Advancements in n-gram modeling, particularly with smoothing techniques like deleted interpolation, mitigated data sparsity in higher-order tag transitions, enhancing generalization for trigram and beyond HMM variants.^[53] At the University of Pennsylvania, Eric Brill's transformation-based learning approach bridged rule-based and statistical paradigms by iteratively learning corrective transformations from tagged data, achieving approximately 95% accuracy on WSJ while maintaining interpretability.^[54] Key events standardized evaluation and broadened applicability. The Conference on Computational Natural Language Learning (CoNLL) shared tasks, beginning in 1999 with NP bracketing—which presupposed reliable POS tagging—fostered consistent benchmarks and cross-system comparisons, accelerating progress in statistical methods.^[55] Overall, these innovations elevated English POS tagging accuracies from around 90% in early stochastic systems to 97% in refined models by the mid-2000s.^[53] Extension to European languages was facilitated by the EAGLES guidelines, which in 1996 proposed a harmonized morphosyntactic tagset encoding core POS categories and features adaptable across languages like French, German, and Italian, promoting corpus interoperability and multilingual tagger development.^[56]

Neural and Modern Advances (2010s-2025)

In the 2010s, the adoption of recurrent neural networks (RNNs), particularly long short-term memory (LSTM) architectures, transformed part-of-speech (POS) tagging by enabling better handling of sequential dependencies compared to earlier statistical methods. A pivotal advancement was the bidirectional LSTM-CRF model introduced by Huang et al. in 2015, which processes input sequences in both forward and backward directions before applying a conditional random field layer for joint tag prediction, achieving superior accuracy on benchmarks like the Penn Treebank. This approach outperformed prior feature-engineered models by learning contextual representations directly from data, marking a shift toward representation learning in POS tagging.^[57] Pre-transformer attention mechanisms further refined these neural models during the mid-2010s, allowing taggers to weigh relevant contextual elements dynamically within RNN frameworks. For instance, attention-augmented BiLSTM models, as explored in works like those by Lample et al. adapted for sequence labeling, improved performance on ambiguous tagging scenarios by focusing on informative parts of the input sequence, setting the stage for more scalable architectures. The 2020s brought transformer-based models to the forefront, with BERT's 2018 release by Devlin et al. enabling fine-tuning for POS tagging through deep bidirectional contextual embeddings, often yielding accuracies exceeding 97% on high-resource languages like English. This pre-training and task-specific adaptation paradigm reduced reliance on hand-crafted features, allowing models to capture nuanced syntactic patterns. Multilingual extensions, such as XLM-R introduced by Conneau et al. in 2020, extended these benefits to over 100 languages via cross-lingual transfer learning, achieving robust POS tagging in low-resource settings with minimal annotated data. By 2025, large language models (LLMs) like GPT-4o facilitated zero-shot POS tagging through prompting techniques, where models infer tags without task-specific training; for instance, on low-resource languages, where they have demonstrated potential in zero-shot settings despite challenges in data-scarce scenarios. Hybrid neuro-symbolic systems emerged as a complementary trend, integrating neural encoders with symbolic rule-based components to enhance interpretability and correct neural errors in edge cases, as demonstrated in applications combining LLMs with grammatical constraints for more reliable tagging. Recent surveys from 2023 to 2025 highlight deep learning's dominance, with transformer and LLM-based taggers consistently surpassing traditional methods by 5-10% on multilingual benchmarks, though they note persistent challenges in ultra-low-resource contexts. A key trend is the integration of POS tagging into end-to-end NLP pipelines, where models like those based on T5 or PaLM perform tagging implicitly during higher-level tasks such as parsing or generation, diminishing the need for discrete POS steps. Ethical concerns have also gained prominence, particularly biases in tag sets that embed cultural or dialectal preferences, potentially perpetuating inequities in multilingual applications unless mitigated through diverse training data.^[58]^[59]^[60]

Evaluation and Challenges

Accuracy Metrics and Datasets

The primary metric for evaluating part-of-speech (POS) taggers is tag accuracy, which measures the percentage of words correctly assigned their POS tags in a test set, often serving as the baseline for performance comparison across models.^[3] Error rate, the complement of accuracy (i.e., 1 - accuracy), quantifies tagging mistakes and is particularly useful for highlighting degradation in challenging scenarios like domain shifts.^[1] For datasets with imbalanced tag distributions, such as those where rare tags like interjections appear infrequently, the F1-score—harmonic mean of precision and recall per tag, macro-averaged across classes—provides a more balanced assessment than accuracy alone.^[3] Key benchmark datasets for POS tagging include the Wall Street Journal (WSJ) portion of the Penn Treebank, comprising approximately 1 million words of newswire text annotated with 45 tags, widely used since the 1990s for English evaluation. The Universal Dependencies (UD) framework, in its latest version 2.16 released in 2025, offers over 300 treebanks across 170+ languages with consistent Universal POS tags (17 coarse-grained categories), enabling cross-lingual comparisons and multilingual model training.^[22] CoNLL-2003 provides a multilingual dataset focused on English, German, and Dutch, with about 21,000 English sentences annotated for POS and named entity recognition, serving as a standard for joint task evaluations. Standard evaluation protocols emphasize robust generalization, such as 10-fold cross-validation, where the dataset is partitioned into 10 subsets, training on 9 and testing on 1 iteratively to average performance and reduce overfitting bias.^[61] Handling out-of-vocabulary (OOV) words—those absent from training data—is critical, with protocols often reporting separate accuracies for OOV subsets to assess morphological generalization, as OOV rates can exceed 5% in low-resource settings.^[62] Inter-annotator agreement, measured via Cohen's Kappa score (accounting for chance agreement), ensures dataset quality, typically targeting values above 0.8 for POS annotations to confirm reliability before model training.^[63] State-of-the-art neural POS taggers, leveraging transformer architectures, achieve approximately 98% tag accuracy on the English UD treebank as of 2025, reflecting advances in contextual embeddings for high-resource languages.^[64] Recent evaluations also explore large language models for zero-shot POS tagging, often approaching 95-97% accuracy on English UD without fine-tuning.^[31] In contrast, low-resource languages often see accuracies around 85%, limited by sparse training data, though transfer learning from multilingual models can narrow this gap.^[34]

Dataset	Language Focus	Size (Tokens)	POS Tag Set	Key Use
Penn Treebank (WSJ)	English	~1M	45 tags	Newswire benchmarking, supervised training
Universal Dependencies (v2.16)	Multilingual (170+)	~300 treebanks, varying sizes	17 Universal POS tags	Cross-lingual evaluation, dependency integration
CoNLL-2003	English, German, Dutch	~300K (English)	Penn Treebank style	Joint POS-NER tasks, multilingual baselines

Linguistic and Computational Challenges

Part-of-speech (POS) tagging faces significant linguistic challenges stemming from the inherent ambiguity of natural language. A primary issue is word sense ambiguity, where many words can function as multiple parts of speech depending on context; for instance, approximately 40% of English words are polysemous, meaning they possess multiple related meanings that often correspond to different POS tags, such as "bank" as a noun referring to a financial institution or a verb meaning to tilt an aircraft.^[65] This polysemy complicates tagging, as disambiguating requires deep contextual understanding beyond local word features.^[3] Additionally, context-dependency exacerbates ambiguity, since POS assignment frequently relies on syntactic and semantic cues from surrounding words, leading to errors in isolation or with insufficient surrounding text.^[66] Morphological complexity further intensifies linguistic difficulties, particularly in non-English languages with rich inflectional systems. Languages like Finnish or Turkish feature extensive case markings and agglutinative structures, resulting in tagsets that encode dozens of morphosyntactic features per word form, which can inflate the number of possible tags and heighten ambiguity resolution demands.^[66] For example, a single verb form in Hungarian might require distinguishing up to 18 cases, along with tense, mood, and person markers, making accurate tagging reliant on subtle morphological cues that rule-based or probabilistic models struggle to capture comprehensively.^[67] These challenges are quantified through metrics like tag error rates on ambiguous constructions, though detailed evaluation frameworks are discussed elsewhere.^[3] On the computational side, out-of-vocabulary (OOV) words pose a persistent hurdle, comprising roughly 5-10% of tokens in typical natural language texts depending on domain and vocabulary size, where unseen words lack pre-trained embeddings or tag probabilities, forcing reliance on morphological or contextual heuristics that often underperform.^[68] Long-range dependencies add to this, as POS tags in complex sentences may depend on syntactic relations spanning multiple clauses, challenging models with limited receptive fields or sequential processing limitations.^[69] Real-time tagging efficiency remains a key concern, particularly for streaming applications, where high-dimensional models incur substantial inference latency, often exceeding milliseconds per token on resource-constrained devices.^[70] To address these issues, high-level strategies include ensemble methods that combine outputs from multiple taggers—such as rule-based, probabilistic, and neural models—to leverage complementary strengths and reduce individual biases, achieving relative error reductions of up to 10-15% on ambiguous datasets.^[71] Active learning mitigates annotation costs by iteratively selecting the most uncertain examples for human labeling, optimizing training data selection and improving tagger robustness with 20-30% fewer annotations in low-resource scenarios.^[72] Domain adaptation techniques, such as fine-tuning on target-specific corpora or instance weighting, bridge distributional shifts between training and deployment environments, enhancing accuracy by 5-8% across varied genres.^[73] Emerging challenges as of 2025 highlight biases in training data that disproportionately affect non-standard dialects, where models trained on dominant varieties (e.g., standard European Spanish) exhibit 10-20% higher error rates on dialectal speech due to underrepresented phonological and syntactic variations. In LLM-based POS tagging, privacy concerns arise from the potential leakage of sensitive information embedded in training corpora or user inputs, as large models can inadvertently memorize and regurgitate personal data during inference, prompting calls for federated learning and differential privacy integrations to safeguard against such risks.^[75]

Applications

Integration in NLP Pipelines

Part-of-speech (POS) tagging plays a central role in natural language processing (NLP) pipelines as a foundational preprocessing step, typically positioned immediately after tokenization and preceding more complex analyses such as dependency parsing or named entity recognition (NER). This placement allows POS tags to supply critical syntactic features that inform downstream tasks; for instance, identifying verbs through POS labels facilitates the construction of dependency trees by highlighting potential heads and dependents in parsing models.^[76]^[6] In end-to-end NLP systems, POS tagging is often implemented as a modular component within cascaded pipelines offered by libraries like spaCy and NLTK, where it processes tokenized input sequentially before passing annotated tokens to subsequent modules like parsers or entity extractors. For example, spaCy's default pipeline sequences the tokenizer, POS tagger, lemmatizer, dependency parser, and NER component, enabling efficient feature sharing across stages. NLTK similarly integrates POS tagging into its processing chains, supporting customizable workflows for text analysis. To address limitations of purely cascaded approaches, joint models that simultaneously perform POS tagging and related tasks have gained prominence; UDPipe, for instance, employs a trainable pipeline that combines tokenization, POS tagging, lemmatization, and dependency parsing in a unified framework, reducing discrepancies between stages.^[76]^[6]^[77] The inclusion of POS tagging in these pipelines yields measurable benefits for overall system performance, particularly by enhancing the accuracy of downstream applications through enriched syntactic context. In machine translation, for example, POS-based reordering models have demonstrated relative BLEU score improvements of up to 7.3% on Japanese-to-English tasks by better aligning source and target structures. Moreover, joint training in integrated models mitigates error propagation, where inaccuracies in POS assignment could otherwise cascade to degrade parsing or NER results; studies on joint segmentation, tagging, and related tasks report substantial error reductions, such as around 10% in chunking accuracy, compared to independent pipelines.^[78]^[79]^[80] Prominent tools exemplify this integration with advanced neural architectures. Stanford CoreNLP incorporates POS tagging as a core annotator in its pipeline, leveraging bidirectional LSTM models for high-accuracy tagging that feeds into parsing and coreference resolution. Flair, a PyTorch-based framework, embeds POS tagging within sequence labeling pipelines using contextual string embeddings, allowing seamless combination with tasks like NER for state-of-the-art performance on multilingual data. These systems underscore POS tagging's versatility as a bridge in modern NLP workflows.^[81]^[82]

Practical Uses Across Domains

Part-of-speech (POS) tagging plays a crucial role in information retrieval (IR) and search engines by facilitating keyword extraction and query understanding. By identifying nouns and verbs as primary content words, POS tagging enables the extraction of relevant terms from documents, improving indexing and retrieval accuracy in systems like Google and Bing. For instance, in query processing, POS helps disambiguate ambiguous terms based on their grammatical roles, enhancing search relevance.^[8] In healthcare, POS tagging supports the analysis of electronic health records (EHRs) and patient feedback through entity extraction and sentiment analysis. It aids named entity recognition (NER) by categorizing medical terms, such as tagging drug names like "Albuterol" as nouns, which streamlines the identification of symptoms, diseases, and medications in unstructured clinical text. This process, often using tools like Stanford CoreNLP, achieves high precision (around 82%) in entity extraction, enabling better data standardization and clinical decision support. Additionally, in sentiment analysis of patient reviews, POS tagging extracts aspects (e.g., via noun phrases) to gauge opinions on services, helping hospitals prioritize improvements from online feedback.^[83]^[84]^[85] In finance and legal domains, POS tagging enhances document analysis for risk assessment and compliance. For financial applications, it analyzes textual risk disclosures in reports by leveraging POS features to detect patterns indicative of fraud, such as unusual verb constructions in transaction narratives, improving detection models' performance over traditional methods. In legal contexts, POS tagging categorizes clauses in contracts, distinguishing grammatical elements to automate risk clause identification, as seen in construction contracts where it resolves ambiguities in word usage for better parsing and compliance checking.^[86]^[87] Beyond these sectors, POS tagging enables contextual responses in chatbots and supports accessibility tools in modern AI systems. In conversational AI, it preprocesses user inputs to extract key entities and grammatical structures, allowing chatbots to generate more accurate, context-aware replies, as demonstrated in AIML-based systems where POS tagging boosts content retrieval relevance. For accessibility, POS tagging integrates with speech-to-text systems to perform grammar correction, assigning syntactic roles to transcribed words for real-time error fixing in tools aiding users with disabilities, improving output fluency in voice interfaces.^[88]^[89]^[90]

References

[1]
[PDF] Part-of-Speech Tagging - Stanford University
Although earlier scholars (including Aristotle as well as the Stoics) had their own lists of parts of speech, it was Thrax's set of eight that became the basis ...
[2]
[PDF] A Survey of Part-of-Speech Tagging - Semantic Scholar
Part of speech tagging: a systematic review of deep learning and machine learning approaches[J]. Journal of Big Data, 2022, 9(1): 10. [2] Kanakaraddi S G ...
[3]
[PDF] A COMPREHENSIVE SURVEY ON PARTS OF SPEECH TAGGING ...
The Part of Speech tagging is the most important activity of any Natural Language based applications. The accuracy of any NLP tool is dependent on the.
[4]
[PDF] Survey: Part-Of-Speech Tagging in NLP
Part of speech tagging is used to introduce the relationship of one word with its previous word as well as its next word. Fig. 1: Process of POS Tagging. Page 2 ...
[5]
5. Categorizing and Tagging Words - NLTK
The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging.
[6]
Penn Treebank P.O.S. Tags
Alphabetical list of part-of-speech tags used in the Penn Treebank Project: · 1. CC, Coordinating conjunction · 2. CD, Cardinal number · 3. DT, Determiner · 4.
[7]
Part of speech tagging: a systematic review of deep learning and ...
Jan 24, 2022 · POS tagging is an important natural language processing application used in machine translation, word sense disambiguation, question ...
[8]
Part‐of‐speech tagging - Martinez - Wiley Interdisciplinary Reviews
Sep 30, 2011 · POS tagging, that is, the identification and labeling of the POS in a sentence, is an important NLP preprocessing of the text. There are two ...Introduction · Tagging Methods · Markov Model TaggersMissing: seminal | Show results with:seminal
[9]
[PDF] Part-of-Speech Tagging from 97% to 100%: Is It Time for Some ...
Abstract. I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accu-.Missing: seminal | Show results with:seminal
[10]
[PDF] A Simple Rule-Based Part of Speech Tagger - ACL Anthology
In this paper we describe a rule-based tagger which performs as well as taggers based upon probabilistic models. The rule-based tagger overcomes the limitations.Missing: seminal | Show results with:seminal
[11]
Building a Large Annotated Corpus of English: The Penn Treebank
Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2): ...
[12]
[PDF] Building a Large Annotated Corpus of English: The Penn Treebank
In the absence of an explicit guideline for tagging this case, the annotators had made different decisions on what part of speech this cover symbol represented.<|separator|>
[13]
[PDF] Part-of-Speech Tagging Guidelines for the Penn Treebank Project ...
The adjectival possessive forms my, your, his, her, its, our and their, on the other hand, are tagged PRP$. Possessive ending—POS. The possessive ending on ...
[14]
Universal POS tags
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, ...ADP · Universal features · Cconj · ADJMissing: documentation | Show results with:documentation
[15]
None
### Summary of Findings on POS Tagging Accuracy with Finer vs. Coarser Tagsets
[16]
Finnish Grammar - Cases
In Finnish, there are 15 cases which can be divided into five groups, each of which consists of three cases. Basic cases include nominative, genitive, and ...
[17]
[PDF] arXiv:1702.03654v1 [cs.CL] 13 Feb 2017
Feb 13, 2017 · Agglutinative languages such as Turkish, Finnish and. Hungarian require morphological disambiguation be- fore further processing due to the ...Missing: challenges | Show results with:challenges
[18]
[PDF] Rapid Adaptation of POS Tagging for Domain Specific Uses - arXiv
We present an experiment in the Biological domain where our. POS tagger achieves results comparable to POS taggers specifically trained to this domain. Many ...
[19]
Towards Grammatical Tagging for the Legal Language of ...
The challenge is overcome by our methodology for POS tagging of legal language. It leverages state-of-the-art open-source tools for Natural Language Processing ...
[20]
a framework for intelligent text correction in automotive technical ...
Feb 12, 2025 · The methodology integrates domain-specific language models with POS-based error detection to identify and correct typographical errors, ...
[21]
Universal Dependencies | Computational Linguistics | MIT Press
Jul 13, 2021 · Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 ...
[22]
CLAWS part-of-speech tagger - UCREL - Lancaster University
CLAWS is a part-of-speech (POS) tagging software for English text, also called the Constituent Likelihood Automatic Word-tagging System.
[23]
[PDF] Part-of-Speech Tagging for English-Spanish Code-Switched Text
In this paper we present results on the problem of POS tagging English-Spanish code-switched dis- course by taking advantage of existing taggers for both ...
[24]
[PDF] A syntax-based part-of-speech analyser - ACL Anthology
A syntax-based part-of-speech analyser. Atro Voutilainen. Research Unit for Multilingual Language Technology. P.O. Box 4. FIN-00014 University of Helsinki.
[25]
A Stochastic Parts Program and Noun Phrase Parser for ...
Kenneth Ward Church. 1988. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Second Conference on Applied Natural Language Processing, ...
[26]
[PDF] TAKTAG: Two-phase learning method for hybrid statistical/rule ...
This paper presents a hybrid POS disambiguation methods that cascaded statistical and rule-based approaches in a two-phase learning architecture. Our system ...
[27]
[PDF] Comprehensive Part-Of-Speech Tag Set and SVM based POS ...
Dec 11, 2016 · POS features are POS unigrams, bigrams and trigrams. Lexicalized features used for this experiment are prefixes, suffixes and word length.Missing: shape decision
[28]
[PDF] Decision Trees and NLP: A Case Study in POS Tagging
This paper presents a machine learning approach to the problems of part-of-speech disambiguation and unknown word guessing, as they appear in Modern Greek.
[29]
[PDF] What's so special about BERT's layers? A closer look at the NLP ...
Part-of-speech (POS) tagging For POS tagging, two datasets from Universal Dependencies (UD) v2.5 (Zeman et al., 2019) are used. These two datasets are the ...<|separator|>
[30]
[PDF] Evaluating large language models for the tasks of PoS tagging ...
The results showed an accuracy of 66.27% for GPT-Curie and 65.9% for GPT-. Davinci. Lai et al., 2023 recently conducted tests on Chat-. GPT across seven ...
[31]
[PDF] Comparing LLM prompting with Cross-lingual transfer performance ...
Jun 21, 2024 · Table 2: POS accuracy results for Brazilian languages: We compare the accuracy of GPT-4 to zero-shot cross- lingual transfer from English ...
[32]
[PDF] Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?
We compare the accuracies of a multilingual large language model (mBERT) fine-tuned on one or more languages related to the target language. Additionally, we ...
[33]
[PDF] Improving Low-Resource POS Tagging with Transfer Learning
Mar 25, 2024 · For our monolingual model, we observed a 3% increase in accuracy post-finetuning compared to the pretrained model (CKIP BERT base), and ...
[34]
[PDF] Leveraging Pretrained Word Embeddings for Part-of-Speech ...
Jun 7, 2019 · Our second model is a multi-task learning model that learns simultaneously POS tag- ging for related code switching language pairs. The ...
[35]
None
### Summary: Training and Probability Estimation in VOLSUNGA
[36]
[PDF] A Tutorial on Hidden Markov Models and Selected Applications in ...
In the next section we present formal mathematical solu- tions to each of the three fundamental problems for HMMs. RABINER: HIDDEN MARKOV MODELS. 261. Page 6 ...
[37]
[PDF] Hidden Markov Models
The Baum-Welch algorithm is similar, but doesn't commit to a single best path for each example. Viterbi Training Algorithm: ... Hidden Markov Model (HMM) model ...
[38]
None
### Summary of HMM Training for POS Tagging in the Paper
[39]
http://luthuli.cs.uiuc.edu/~daf//courses/Signals%20AI/Papers/HMMs/p133-cutting.pdf
[40]
Error bounds for convolutional codes and an asymptotically optimum ...
The probability of error in decoding an optimal convolutional code transmitted over a memoryless channel is bounded from above and below as a function of the ...Missing: pdf | Show results with:pdf
[41]
Grammatical Category Disambiguation by Statistical Optimization
Steven J. DeRose. 1988. Grammatical Category Disambiguation by Statistical Optimization. Computational Linguistics, 14(1):31–39. Cite (Informal): ...
[42]
https://aclanthology.org/J88-1003/
[43]
A fully Bayesian approach to unsupervised part-of-speech tagging
Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the ...
[44]
A Case Study in Part-of-Speech Tagging - ACL Anthology
Eric Brill. 1995. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, ...Missing: seminal | Show results with:seminal
[45]
Bootstrapping POS-taggers using unlabelled data - ACL Anthology
Cite (ACL):: Stephen Clark, James Curran, and Miles Osborne. 2003. Bootstrapping POS-taggers using unlabelled data. In Proceedings of the Seventh Conference on ...
[46]
[PDF] Real-World Semi-Supervised Learning of POS-Taggers for Low ...
Low-resource languages present a particularly dif- ficult challenge for natural language processing tasks. For example, supervised learning meth- ods can ...Missing: advantages | Show results with:advantages
[47]
String Analysis of Sentence Structure : Z S Harris - Internet Archive
Aug 25, 2023 · String Analysis of Sentence Structure. by: Z S Harris. Publication date: 1965-01-01. Publisher: Mouton Publishers.Missing: 1962 pdf
[48]
Computational Linguistics - Stanford Encyclopedia of Philosophy
Feb 6, 2014 · By the late 1960s and early 70s, quite sophisticated recursive parsing techniques were being employed. For example, Woods' lunar system used ...
[49]
A Computational Approach to Grammatical Coding of English Words
Computer syntactic analysis. Transformations and Discourse Analysis Papers No. 15. Reports to the National Science Foundation, U. of Pennsylvania, Phila ...
[50]
(PDF) The Automatic Grammatical Tagging of the LOB Corpus
Automated Grammatical Tagging of English. Article. Jan 1971. B. B. Greene · G. M. Rubin · View · Choice of Grammatical Word-Class without GLobal Syntactic ...
[51]
The Lancaster-Oslo/Bergen Corpus (LOB) - CoRD
Oct 22, 2008 · The LOB Corpus, original version (1970–1978), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project ...
[52]
[PDF] A Maximum Entropy Model for Part-Of-Speech Tagging
This paper presents a Maximum Entropy model for POS tagging, which uses contextual features to assign tags to unseen text, and combines diverse information ...<|separator|>
[53]
[PDF] Transformation-Based-Error-Driven Learning and Natural Language ...
This algorithm has been applied to a number of natural language problems, including part-of-speech tagging, prepositional phrase attachment disambiguation, and ...
[54]
Previous shared tasks | CoNLL
Previous shared tasks. 2023, BabyLM Challenge, English, Proceedings. 2020 ... English. 1999, NP Bracketing, English. Webmaster: Jens Lemmens · RSS feed. Powered ...Missing: POS tagging
[55]
[PDF] EAGLES Recommendations for the Morphosyntactic Annotation of ...
Mar 1, 1996 · + article in West European languages should preferably, however, be handled by assigning two tags to the same orthographic word (one for the ...
[56]
Bidirectional LSTM-CRF Models for Sequence Tagging - arXiv
Aug 9, 2015 · Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets.
[57]
[PDF] Survey on Applications of Neurosymbolic Artificial Intelligence - arXiv
Sep 8, 2022 · They showed that the methods of neural symbolic integration can be success fully applied in the context of POS tagging. Furthermore, they ...
[58]
(PDF) A Survey of Part-of-Speech Tagging - ResearchGate
Aug 6, 2025 · Part-of-Speech(POS) tagging, a fundamental task in natural language processing (NLP) that involves. categorizing each word in a text into ...
[59]
Review A comparative analysis of deep learning and machine ...
Sep 1, 2025 · This article gives a thorough assessment of POS tagging strategies from 2019 to 2023, including rule-based, statistical, machine learning, and deep learning ...Missing: seminal | Show results with:seminal
[60]
Five sources of bias in natural language processing - Compass Hub
Aug 20, 2021 · We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and ...
[61]
Universal Dependencies
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across ...Short introduction to UD · Dependency Relations · Universal POS tags · English UD
[62]
Improving performance of natural language processing part-of ... - NIH
We addressed this problem by integrating an unambiguous, domain-specific lexicon, which improves overall POS tagging performance. In this study, we ...<|control11|><|separator|>
[63]
Out-of-Vocabulary Handling in Part-of-Speech Tagging
Part-of-speech (POS) tagging is a key preprocessing step for many NLP tasks. Its broad use in education links this study to Sustainable Development Goals in ...
[64]
[PDF] Inter-annotator agreement - Ron Artstein
The textbook case for measuring inter-annotator agreement is to assess the reliability of an annotation process, as a prerequisite for ensuring correctness of ...<|control11|><|separator|>
[65]
Parsing Old English with Universal Dependencies—The Impacts of ...
The current state of the art in POS tagging reaches 97–98% accuracy across most languages [30], while up-to-date morphological analysers show an accuracy of ...
[66]
Polysemy—Evidence from Linguistics, Behavioral Science, and ...
Mar 1, 2024 · Durkin and Manning (1989), for example, estimate that 40% of frequent English words are polysemous, while scholars like Zipf (1945), Rodd, ...Introduction · Computational Approaches to... · Distributional Semantics and...
[67]
[PDF] Feature-Rich Part-of-speech Tagging for Morphologically Complex ...
POS tagging poses major challenges for mor- phologically complex languages, whose tagsets encode a lot of additional morpho-syntactic fea- tures (for most of ...
[68]
Comparison of various approaches to tagging for the inflectional ...
May 24, 2024 · This research aims to compare six different automatic taggers for the inflectional Slovak language, seeking for the most accurate tagger for literary and non- ...
[69]
[PDF] From POS tagging to dependency parsing for biomedical event ...
Jan 2, 2019 · OOV rate is relevant because if a word has not been observed in the training data at all, the tagger/parser is limited to using contextual clues ...Missing: challenges | Show results with:challenges
[70]
Towards Accurate and Efficient Chinese Part-of-Speech Tagging
... POS tagging, an important and challenging task for Chinese language processing. ... Tagging accuracies are obtained on the test data of CoNLL 2009 shared task.
[71]
LFF-POS: A linguistic fusion method to handle out-of-vocabulary ...
Handling OOV can directly improve the accuracy of the POS Tagging model and indirectly help other researchers in improving the performance of other NLP tasks ...
[72]
A comparison study on active learning integrated ensemble ...
In this study, we introduce active learning to a framework which is comprised of most popular base and ensemble approaches for sentiment analysis.
[73]
What data should I include in my POS tagging training set?
Oct 31, 2025 · Drawing data from 12 language families, we compare in-context learning, active learning (AL), and random sampling.Missing: bias 2020-2025
[74]
[PDF] Improved Parsing and POS Tagging Using Inter-Sentence ...
Several works have addressed semi-supervised learning for structured prediction, suggesting objec- tives based on the max-margin principles (Altun and.<|separator|>
[75]
Spoken Spanish PoS tagging: gold standard dataset
Jul 2, 2024 · Our benchmark will enable the development of more accurate PoS taggers for spoken Spanish and facilitate the construction of a treebank for European Spanish ...
[76]
Privacy issues in Large Language Models: A survey - ScienceDirect
This paper investigates privacy concerns in the existing LLMs and their far-reaching implications. The paper categorizes privacy concerns of LLMs into two main ...
[77]
Language Processing Pipelines · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
[78]
[PDF] Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with ...
UDPipe 1.0 (Straka et al., 2016)4 is a trainable pipeline performing sentence segmentation, tok- enization, POS tagging, lemmatization and depen- dency parsing.
[79]
[PDF] POS-based reordering models for statistical machine translation
Experiments showed relative BLEU score improvement up to 7.3% on the BTEC Japanese-to-English task, and up to 1.1% on the. Europarl German-to-English task. 1.
[80]
[PDF] Joint Word Segmentation, POS-Tagging and Syntactic Chunking
Compared to a pipeline system, the advantages of a joint system include reduction of error propagation, and the inte- gration of segmentation, POS tagging ...Missing: mitigation | Show results with:mitigation
[81]
[PDF] Joint Word Segmentation and POS Tagging Using a Single Perceptron
The joint model gives an error reduction in segmentation accuracy of 14.6% and an error reduction in tagging ac- curacy of 12.2%, compared to the traditional ...Missing: mitigation | Show results with:mitigation
[82]
Parts Of Speech - CoreNLP - Stanford NLP Group
Part of speech tagging assigns part of speech labels to tokens, such as whether they are verbs or nouns. Every token in a sentence is applied a tag.<|control11|><|separator|>
[83]
Tagging parts-of-speech - flair
This tutorials shows you how to do part-of-speech tagging in Flair, showcases univeral and language-specific models, and gives a list of all PoS models in Flair ...
[84]
[PDF] Medical Named Entity Recognition for Enhanced Electronic Health ...
This process identified symptoms, diseases, and prescribed medications. POS tagging based on Stanford CoreNLP was applied to medical terms to enhance entity ...
[85]
A deep learning model incorporating part of speech and self ...
Apr 9, 2019 · The POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and ...
[86]
[PDF] Prediction of Healthcare Quality Using Sentiment Analysis
Patients can make comments on various aspects (or features) of healthcare; these aspects are extracted from the comments passed on by patients using POS tagging ...
[87]
Financial fraud detection based on the part-of-speech features of ...
... financial fraud detection performance based on commonly used financial ... Part of speech tagging in urdu: Comparison of machine and deep learning approaches.
[88]
Automated detection of contractual risk clauses from construction ...
PoS tagging categorizes every word grammatically and assigns a PoS tag to each word. For words where the spelling is the same but the meanings are different ...
[89]
Augmenting Content Retrieval Using NLP in AIML - IEEE Xplore
Approach: In this technique, we first preprocess the user input using NLP techniques, i.e., POS tagging & Named Entity Recognition, to extract context-specific ...Missing: responses | Show results with:responses
[90]
The Role of NLP in Speech Data Analysis: Unlock Insights
Jan 22, 2025 · Part-of-Speech Tagging: Assigning grammatical labels to words for ... Speech-to-text tools equipped with NLP algorithms provide ...
[91]
What is natural language processing (NLP)? (updated 2025)
Jan 8, 2025 · Part-of-speech tagging. Part-of-speech (POS) tagging assigns grammatical roles—such as nouns, verbs, and adjectives—to each word in a sentence.4 Natural Language... · Computer Vision Vs Natural... · 6 Natural Language...<|separator|>