Fact-checked by Grok 2 weeks ago

Lexical density

Lexical density is a key metric in linguistics and discourse analysis that measures the proportion of content words—typically nouns, verbs, adjectives, and adverbs—to the total number of words in a given text or utterance, serving as an indicator of the text's informational density and structural complexity.^[1] Introduced by Jean Ure in 1971, it is calculated as the ratio of lexical (content) items to running words, often expressed as a percentage, with values generally ranging from below 40% in casual spoken language to over 50% in formal written texts.^[2]^[3] The concept gained prominence through M.A.K. Halliday's work in systemic functional linguistics, particularly in his 1985 analysis of spoken and written language, where he emphasized that higher lexical density reflects greater semantic loading and compaction in written registers compared to spoken ones, which rely more on grammatical structures and repetition.^[4] This distinction arises because spoken language often includes function words (e.g., articles, prepositions, pronouns) and fillers for real-time interaction, while written language prioritizes precision and efficiency.^[5] Lexical density is widely applied in fields such as second language acquisition, where it assesses learners' writing proficiency and lexical richness; educational linguistics, to evaluate textbook readability; and computational linguistics, for automated text complexity scoring.^[6]^[7] For instance, studies show that advanced L2 writers exhibit lexical densities closer to native speakers, around 50-60% in academic prose, signaling improved ability to convey complex ideas with fewer function words.^[8] Variations in measurement exist, with some approaches including or excluding certain word classes (e.g., Halliday's focus on clauses per word for grammatical density alongside lexical measures), but the core Ure-inspired formula remains standard, highlighting lexical density's role in distinguishing registers and predicting text comprehensibility.^[9]

Overview

Definition

Lexical density is a linguistic metric that quantifies the proportion of lexical words, also known as content words, to the total number of words in a text, thereby assessing the degree of informational content relative to grammatical structure.^[6] Lexical words belong to open-class categories, primarily including nouns, verbs, adjectives, and adverbs, which convey substantive meaning and contribute to the semantic load of the discourse.^[10] In contrast, function words, or grammatical words, form closed-class sets such as articles, prepositions, pronouns, conjunctions, and auxiliary verbs, which primarily serve structural roles with minimal semantic contribution.^[6] To illustrate, consider the simple sentence "The cat sat on the mat," where lexical words ("cat," "sat," "mat") comprise about 50% of the total words, reflecting moderate density due to a balance of content and function elements. In comparison, a denser construction like "Feline predator crouched stealthily amid undergrowth" achieves higher lexical density, with content words ("feline," "predator," "crouched," "stealthily," "undergrowth") dominating at around 83%, packing more descriptive information into fewer words.^[10] Such variations highlight how lexical density captures the compactness of meaning in language use. Higher lexical density generally signals greater text complexity, as it indicates a heavier reliance on content words to convey ideas efficiently, a characteristic often observed in written language compared to spoken forms, where function words proliferate due to interactive demands.^[9] This metric is also applied in evaluating language proficiency, particularly in academic contexts, to gauge a speaker's or writer's ability to produce information-rich discourse.^[6]

Significance

Lexical density serves as a key metric for assessing linguistic sophistication by quantifying the proportion of content words to total words, thereby revealing the maturity of a text or discourse in terms of informational content versus grammatical structure. This balance highlights how effectively language conveys meaning without excessive reliance on function words, often indicating the cognitive demands placed on producers and receivers during communication. For instance, higher lexical density correlates with greater writing proficiency, as it reflects an ability to pack more semantic information into fewer grammatical frames, a hallmark of advanced linguistic competence.^[11] In comparisons across language modes, written texts typically exhibit higher lexical density, ranging from 40% to 55%, compared to spoken language, which averages around 40% or lower, primarily due to the additional planning time available in writing that allows for more concise expression. This difference underscores the structural adaptations in spoken discourse, where real-time interaction favors grammatical fillers for fluency over dense content delivery. Such variations emphasize lexical density's role in distinguishing modes of communication and their respective efficiencies.^[12]^[9] The implications for communication are profound: low lexical density often signals redundancy or simplified structures suited to casual or interactive contexts, while high density promotes precision and informativeness but can increase cognitive load and compromise readability if overly compact. Greater lexical density demands more processing effort from audiences, as it intensifies the informational burden per unit of text, potentially leading to comprehension challenges in high-stakes or rapid exchanges.^[13]^[5]^[14] Lexical density complements other metrics like lexical diversity, which measures the ratio of unique words to total words and focuses on vocabulary breadth rather than the content-function balance, together providing a fuller picture of lexical richness without overlap in their analytical scopes.^[15]

Historical Background

Origins

The concept of lexical density emerged from mid-20th-century efforts in linguistics to quantify text complexity, rooted in the empirical traditions of structural linguistics that prioritized systematic analysis of language forms and functions over introspective methods. Structural linguists, building on the descriptive frameworks established in the 1930s and 1940s, sought to measure linguistic features objectively to understand variation in language use, laying groundwork for later quantitative metrics. This shift was facilitated by emerging computational tools in the late 1950s and early 1960s, which enabled linguists to process large samples of language data for patterns in word types and structures.^[16] Initial motivations for developing such measures arose from practical needs in language teaching and literary analysis, particularly in distinguishing formal from informal discourse amid growing interest in English as a global language post-World War II. In language education, educators required tools to assess text difficulty and informational richness to aid non-native learners, while literary scholars aimed to objectively compare stylistic features across genres and authors. These drives were evident in early applied linguistics projects, where quantifying lexical elements helped evaluate how discourse modes conveyed meaning efficiently. For instance, analyses of varying registers highlighted how denser lexical content correlated with greater conceptual precision in academic or literary contexts compared to conversational styles.^[17] Precursors to formalized lexical density appeared in 1960s computational approaches to English language research, such as the creation of major corpora that facilitated counts of lexical versus grammatical elements. The Survey of English Usage, initiated in 1959, collected samples of natural spoken English to study its structural properties, revealing patterns in word distribution that informally proxied informational load. Similarly, the Brown Corpus, compiled in 1961 from written American English texts, provided frequency data distinguishing open-class content words (nouns, verbs, adjectives, adverbs) from closed-class function words (articles, prepositions, pronouns), enabling early comparisons of lexical saturation across text types. These studies from the 1950s and 1960s established density as an intuitive indicator of a text's capacity to pack substantive information, particularly when contrasting spoken and written modes where spoken language often showed lower ratios due to repetitive function words.

Key Developments

In the 1970s, the emergence of corpus linguistics prompted the development of systematic metrics for lexical density, shifting focus toward empirical, data-driven assessments of linguistic complexity in texts.^[18] This period marked a transition from qualitative analyses to quantifiable measures, enabling comparisons across registers and genres through large-scale language samples.^[19] A foundational milestone was Jean Ure's 1971 chapter "Lexical Density and Register Differentiation" in Applications of Linguistics, edited by G. Perren and J.L. Holloway, which introduced lexical density as a tool for analyzing register differentiation, particularly in educational contexts where it highlighted variations in spoken and written proficiency.^[3] Ure's work emphasized the proportion of content words to total words, laying groundwork for its application in evaluating language development and stylistic differences.^[20] Systemic functional linguistics (SFL), pioneered by M.A.K. Halliday, further integrated lexical density into theoretical models of clause complexity and register variation, viewing it as a key indicator of how language functions in social contexts.^[21] Within SFL, lexical density distinguishes modes of discourse, with higher values typically in written registers due to denser packing of lexical items, and lower in spoken ones reflecting interactive grammatical structures.^[22] Halliday's 1985 book Spoken and Written Language elaborated this by linking lexical density to grammatical intricacy, proposing it as a measure of a text's informational load relative to its structural elements.^[23] Post-1980s developments saw lexical density adapted for computational tools in large-scale text analysis, facilitating automated processing of corpora and extending its utility beyond English to multilingual frameworks.^[24] Software such as the Lexical Complexity Analyzer and Sketch Engine enabled efficient calculation across languages, supporting cross-linguistic studies of complexity in second-language acquisition and translation.^[25]^[26] These advancements, building on corpus methodologies, allowed for broader empirical investigations into lexical patterns in diverse linguistic environments.^[15]

Calculation Methods

Ure's Measure

Ure's measure of lexical density, introduced by linguist Jean Ure in 1971, defines it as the proportion of lexical (content) words to the total number of words in a text, expressed as a percentage.^[13] Lexical words include nouns, main verbs, adjectives, and adverbs, which carry semantic content, while function words—such as determiners, pronouns, prepositions, conjunctions, and auxiliary verbs—are excluded as they primarily serve grammatical roles.^[13] The formula is:

\text{Lexical Density} = \left( \frac{\text{Number of lexical words}}{\text{Total number of words}} \right) \times 100

^[13] This measure originated in Ure's analysis of register differences between spoken and written English, examining 34 spoken texts and 30 written texts totaling about 21,000 words each, to highlight how spoken language tends to be less dense due to higher reliance on grammatical structures. Although initially developed for linguistic register studies, it has been widely adopted in educational contexts to evaluate spoken English proficiency among learners, where lower density often indicates developmental stages in language acquisition.^[27] To calculate lexical density using Ure's method, first identify and classify all words in the text by part of speech. Lexical words are counted as those with independent meaning (e.g., nouns like "book," verbs like "run," adjectives like "quick," adverbs like "quickly"), while function words are omitted (e.g., "the," "is," "and," "of"). Divide the count of lexical words by the total word count, then multiply by 100 for the percentage. For example, consider the sample sentence: "The quick brown fox jumps over the lazy dog." Here, the total words are 9. Lexical words are "quick," "brown," "fox," "jumps," "lazy," "dog" (6 words), excluding function words "the" (appearing twice) and "over." Thus, lexical density = (6 / 9) × 100 ≈ 66.7%.^[13] This process can be done manually for short texts or programmatically for larger corpora by tagging parts of speech. One key strength of Ure's measure lies in its simplicity, making it suitable for manual computation or basic automated analysis without requiring complex syntactic parsing. In Ure's original dataset, spoken texts typically showed lexical densities below 40%, while written texts reached 40% or higher, reflecting the more concise information packing in writing compared to speech, which often ranges from 35% to 50% in similar analyses.

Halliday's Measure

Michael Halliday's measure of lexical density, developed within systemic functional grammar, emphasizes the role of clausal structure in assessing informational density in texts. Introduced in his 1985 analysis of spoken and written language, this approach links lexical density to clause complexity, particularly noting how written registers pack more content into fewer ranking clauses compared to spoken ones. Halliday distinguishes this from grammatical intricacy, which evaluates the elaboration of clause complexes rather than lexical content per clause.^[20]^[28] The formula for Halliday's lexical density is given by:

\text{Lexical density} = \left( \frac{\text{number of lexical items}}{\text{number of ranking clauses}} \right) \times 100

Here, ranking clauses refer to the primary structural units of a text, identified by finite verbal processes and excluding embedded or rank-shifted clauses that function as constituents within them. Lexical items encompass content-bearing words such as nouns, full verbs, adjectives, and qualifying adverbs (e.g., those denoting manner or extent), counted across the entire text regardless of embedding.^[20]^[29] To compute the measure, analysts first parse the text to delineate ranking clauses, often relying on finite verbs as markers. Lexical items are then tallied, incorporating those in subordinate or embedded structures. For instance, in the complex sentence "The researcher analyzed the data that had been collected from various sources, concluding that trends emerged clearly," there are two ranking clauses: the main clause ("The researcher analyzed the data... sources") and the projected clause ("trends emerged clearly"). Lexical items include "researcher," "analyzed," "data," "collected," "sources," "concluding," "trends," "emerged," and "clearly" (nine total), yielding a density of (9 / 2) × 100 = 450, though adjusted for full context this reflects high embedding typical of written prose. This process reveals how syntactic embedding amplifies density without inflating the clause count.^[20]^[29] One key advantage of Halliday's measure is its sensitivity to syntactic embedding, allowing it to capture the structural sophistication of texts where additional lexical content is integrated via subordination rather than parataxis. In academic writing, values typically range from 50 to 60, signifying dense informational loading, whereas spoken texts often register lower due to simpler clausal organization. Unlike word-ratio alternatives that overlook grammar, this clause-based method provides nuanced insights into register-specific complexity.^[20]^[30]^[10]

Other Variants

One prominent extension of lexical density calculations is Xiaofei Lu's 2012 multidimensional framework for lexical richness, which integrates lexical density with measures of lexical diversity (e.g., type-token ratio variants) and lexical sophistication (e.g., proportion of advanced words) to provide a more comprehensive assessment of text quality, particularly in second language (L2) writing and oral narratives.^[31] This approach, implemented in tools like the Lexical Complexity Analyzer, allows for automated analysis across these dimensions, revealing correlations between higher density scores and improved L2 proficiency ratings in empirical studies.^[32] Computational variants have enabled automated computation of lexical density through part-of-speech tagging, facilitating real-time analysis in large corpora. For instance, Coh-Metrix employs natural language processing to calculate lexical density as the ratio of content words (nouns, verbs, adjectives, adverbs) to total words, incorporating additional cohesion metrics for broader text evaluation in educational and linguistic research. Similarly, the Tool for the Automatic Analysis of Lexical Sophistication (TAALES) complements density measures by focusing on sophistication indices derivable from tagged corpora, supporting automated profiling in L2 assessment tools.^[33] Multilingual adaptations address structural differences in non-Indo-European languages. In Chinese, a character-based language lacking clear word boundaries, lexical density is computed after automated word segmentation to distinguish content from function elements, as implemented in tools like AlphaLexChinese, which yields density metrics comparable to English while accounting for logographic features in L2 EFL writing analysis.^[34] For agglutinative languages like Turkish, where words incorporate multiple morphemes via suffixes, adjustments involve fine-grained morphological parsing during POS tagging to avoid inflating density scores from affixation; studies on Turkish EFL essays demonstrate that such refinements reveal developmental patterns in lexical usage without overcounting derived forms.^[35] Hybrid formulas combine lexical density with syntactic measures like t-unit length (the average number of words per minimal terminable unit) to profile overall text maturity. For example, integrating density ratios with mean t-unit length in L2 writing corpora highlights how denser content within longer units correlates with higher proficiency, as evidenced in automated tools assessing argumentative essays.^[36]

Influencing Factors

Textual Characteristics

Lexical density varies across text genres primarily due to differences in stylistic demands, with narrative fiction generally exhibiting lower levels, around 45%, compared to academic prose, which averages approximately 55%. This disparity arises because narrative fiction often incorporates extensive dialogue and descriptive sequences rich in grammatical words like pronouns and prepositions, mimicking spoken patterns, whereas academic prose prioritizes argumentative structures that pack more content words to convey complex ideas efficiently. Note that reported values can vary depending on the calculation method, such as word-based ratios versus clause-based measures.^[37]^[9]^[38] Sentence complexity significantly influences lexical density, as longer sentences with embedded clauses allow for a greater concentration of lexical items within fewer grammatical frameworks. Embedded clauses enable writers to integrate additional content words—such as nouns, verbs, adjectives, and adverbs—without proportionally increasing function words, thereby elevating the overall density of information in the text. This feature is particularly evident in formal writing, where syntactic embedding supports nuanced argumentation and detailed exposition.^[22]^[39] Vocabulary choices play a key role in boosting lexical density, especially through the use of nominalizations and Latinate terms prevalent in formal texts. Nominalizations convert verbs or adjectives into nouns (e.g., "decide" to "decision"), increasing the proportion of content words and allowing for denser packing of information in clauses. Similarly, Latinate vocabulary facilitates this process by providing morphological resources for nominalization, which enhances informational density in academic and technical writing compared to more Germanic-based everyday terms.^[40]^[41] Differences in mode between spoken and written language profoundly affect lexical density, with written texts typically achieving higher levels due to opportunities for revision and planning. Spoken language features interruptions, fillers (e.g., "um," "you know"), and repetitions that inflate the count of grammatical words, resulting in lower density around 25-40%. In contrast, written language minimizes such elements through editing, concentrating on content words to achieve densities of 50-60%, as seen in planned discourses like essays or reports.^[5]^[9]

Contextual Variables

Lexical density varies based on the speaker's proficiency, with more expert or proficient speakers producing higher density compared to novices due to greater vocabulary range and reduced reliance on function words. In spoken English, adults typically exhibit lexical densities of approximately 27-28% in narrative and expository contexts, reflecting their ability to pack more content words into discourse.^[9] In contrast, children around age 12 show lower values, around 20-24% in similar tasks, indicating less mature lexical control.^[9] For even younger speakers under 5 years, lexical density in associated child-directed speech (adult speech to children) averages about 29%.^[42] Audience characteristics also modulate lexical density, as speakers adapt their language to perceived listener needs, increasing density for formal or expert audiences to convey precision and decreasing it for casual ones to enhance accessibility. Lectures and presentations to professional audiences often display higher lexical density, approaching levels seen in written texts (over 40%), due to the emphasis on informational content and reduced fillers.^[2] Conversely, casual conversations exhibit lower density (under 40%), with more function words and repetitions facilitating interactive flow.^[9] Cultural factors and register choices further influence lexical density, as specialized jargon in professional domains elevates it by prioritizing content-heavy terms, while non-standard dialects may reduce it through idiomatic repetitions and contextual redundancies. Legal texts, for instance, demonstrate high lexical density due to dense nominalizations and technical vocabulary, often exceeding 50% to ensure precision in argumentation.^[43] Technological platforms introduce additional variations, with social media texts typically showing medium lexical density owing to character limits that encourage concise content words alongside abbreviations, hashtags, and non-lexical elements like emojis. This balance reflects the hybrid nature of digital communication, blending informal brevity with expressive multimodality.^[44]

Applications

In Education

Lexical density serves as a valuable marker for evaluating writing proficiency and development in second-language learners, particularly in ESL contexts. Studies tracking ESL students' essays over time show consistent increases in density as proficiency grows; for example, among Saudi EFL undergraduates, average lexical density rose from 49.82% in first-year writing samples to 53.56% in fourth-year samples, reflecting improved ability to incorporate content words.^[45] Similarly, in Chinese EFL beginners, density progressed from 41.37% at grade 7 to 43.93% at grade 9, indicating a shift toward more mature, written-like registers.^[46] These metrics, derived from variants like Ure's measure, enable educators to quantify advancements in lexical sophistication without relying solely on holistic scoring. To foster higher lexical density, pedagogical tools emphasize targeted exercises that encourage the integration of content words and reduction of function words. Nominalization activities, for instance, guide ESL students to convert processes (e.g., "The teacher explained the concept" to "The teacher's explanation of the concept") to condense meaning and boost density, as demonstrated in EFL writing interventions.^[47] Vocabulary expansion exercises, such as collocation drills or synonym replacement tasks, further support this by prompting learners to diversify lexical choices, helping them move beyond simple grammatical structures toward more informative prose. Research underscores lexical density's role in broader language skill integration, with studies revealing its correlation to reading comprehension in ESL learners; texts with moderately high density enhance comprehension when aligned with proficiency levels, while excessive density can impede it.^[48] In curriculum design, density informs the creation of balanced registers, ensuring instructional materials scaffold from low-density spoken-like inputs to higher-density written outputs suitable for progressive skill-building.^[49] Case studies of student corpora often reveal notable density gaps between spoken and written assignments, highlighting modality effects in ESL production. For example, analysis of L2 opinion responses showed written samples with a mean lexical density of 44.1%, significantly higher than the 38.6% in spoken samples, attributing the disparity to planning time and revision opportunities in writing.^[50] Such findings guide targeted interventions to bridge these gaps, improving overall communicative competence.

In Computational Linguistics

In computational linguistics, lexical density is integrated into natural language processing (NLP) pipelines to quantify text complexity at scale, often relying on part-of-speech (POS) taggers to distinguish lexical from grammatical words across large corpora. For instance, automated tools employ POS tagging to compute density metrics during preprocessing stages, enabling efficient analysis of vast datasets for tasks like readability assessment or genre classification.^[51] This approach facilitates trend analysis in corpora, such as monitoring lexical density variations in academic writing over decades, revealing shifts toward greater informational density in specialized domains.^[52] In forensic linguistics, lexical density serves as a stylometric feature for authorship attribution, particularly in verifying disputed historical documents where density patterns reflect an author's characteristic vocabulary richness. Studies have shown that density, calculated via automated POS-based methods, discriminates between authors by capturing consistent lexical-to-grammatical ratios, as demonstrated in analyses of texts like the Federalist Papers.^[53] This computational application extends to modern forensic cases, where density helps identify authorship in anonymous or contested writings by comparing against known corpora.^[54] Within AI and machine learning, lexical density is incorporated as a feature in text generation models to emulate human-like linguistic complexity, guiding outputs toward balanced informational content rather than repetitive or overly simplistic structures. For example, during training or fine-tuning of generative models, density metrics inform adjustments to vocabulary selection, ensuring generated text aligns with human norms of around 40-50% lexical content.^[55] This enhances model performance in producing coherent, varied prose, as evidenced by comparative studies where higher density correlates with perceived naturalness in AI outputs.^[56] Recent advancements in the 2020s have explored lexical density in machine translation (MT) systems to boost output naturalness, with studies revealing that neural MT often produces lower density than human translations, leading to simplified phrasing. Researchers have proposed density-aware post-editing techniques using generative AI assistants, which increase lexical ratios in learner translations and improve fluency without sacrificing accuracy.^[57] These methods, applied to genres like subtitles or literary texts, demonstrate that elevating density through targeted constraints enhances the stylistic fidelity of MT, bridging gaps in cross-lingual complexity.^[58]

References

[1]
(PDF) Unveiling Lexical Density: Comparing Genres in EAP ...
Aug 22, 2025 · Lexical density, defined by the proportion of content words, shapes the complexity of academic texts, with research articles typically ...
[2]
Lexical density as a complexity predictor: the case of Science and ...
The notion and the term of 'lexical density' was introduced by Jean Ure in 1971 (Ure, 1971) as the ratio of the number of content words per number of running ...Missing: origin | Show results with:origin
[3]
[PDF] ED322776.pdf - ERIC
Ure, J. 1971. Lexical density and register differentiation. In Perren, G.E. and Trim, J.L.E. eds. Applications of Linguistics. Cambridge:.
[4]
Lexical density and Readability: A case study of English Textbooks.
Dec 18, 2018 · (Halliday, 1985, p.61). There are eight lexical items and four ... density. Lexical density = number of lexical items x 100. total ...
[5]
Lexical complexity changes in 100 years' academic writing
For this reason, it is probably not surprising that lexical density tends to be higher in written than in spoken texts (Halliday, 1985). Lexical sophistication ...Missing: origin | Show results with:origin
[6]
[PDF] Lexical Density, Lexical Diversity and Academic Vocabulary Use
This study examines lexical density, lexical diversity and academic vocabulary use in the dissertation abstracts written by EFL (English as a foreign language) ...Missing: sources | Show results with:sources
[7]
[PDF] Lexical Composition of Effective L1 and L2 Students' Academic ...
8. lexical density (i.e. the proportion of content words to the proportion of function words); 9. MTLD (i.e. the range and variety of vocabulary used in a text ...
[8]
[PDF] Vocabulary Size and Use: Lexical Richness in L2 Written Production
Laufer (1991) has shown that development in lexical richness in writing can be measured over a period of 14 weeks and 28 weeks, particularly with learners below ...
[9]
[PDF] Lexical diversity and lexical density in speech and writing: a develop
The definition of lexical density in this article. This article follows ... 'Measuring lexical richness and diversity in second language research'.Missing: sources | Show results with:sources
[10]
[PDF] Analyzing Lexical Density and Readability in Student Writing
Jan 30, 2025 · The current study has demonstrated that achieving a balance between lexical density and readability in educational resources is essential for ...
[11]
Diachronic changes in lexical density of research article abstracts
Lexical density refers to the ratio of content words and measures the information density of academic texts. It has been regarded as an important indicator of ...
[12]
[PDF] lexical density, new word density and the readability ... - ResearchGate
The results on lexical density are discussed first. The results obtained are similar to Ure's (1971) findings on the density of written texts. The lowest ...Missing: original | Show results with:original
[13]
Lexical density, lexical diversity, and lexical sophistication in ...
The greater the lexical density, the more information the discourse carries, which leads to a higher cognitive load for information processing. Plevoets and ...
[14]
[PDF] Striking a Balance between Lexical Density and Readability - WSEAS
The results of privacy policies of 146 healthcare institutions in the Republic of Croatia were analyzed for their lexical density and Flesch Reading Ease (FRE).
[15]
Lexical density, lexical diversity, and lexical sophistication in ...
Lexical density is the ratio of the number of content words (lexical items) to the sum of content words and function words (grammatical items) (Ure, 1971).Missing: paper | Show results with:paper
[16]
[PDF] Unit 1 Corpus linguistics: the basics - Lancaster University
As McEnery and Wilson (2001: 2-4) note, the basic corpus methodology was widespread in linguistics in the early 20th century. In the late 1950s, however, the ...
[17]
(PDF) Entries on the History of Corpus Linguistics - ResearchGate
Aug 6, 2025 · Corpus linguistics is anchored in a theoretical paradigm characterised by an empiricist approach and as well as by a conception of language as a probabilistic ...
[18]
[PDF] Perspectives on Corpus Linguistics - Mark Davies
Corpus Studies of Lexical Semantics. London: Blackwell. Page 45. Page ... Some aspects of the development of Corpus Linguistics in the 1970s and. 1980s ...
[19]
(PDF) Corpora and historical linguistics - ResearchGate
Aug 8, 2025 · The present article aims to survey and assess the current state of electronic historical corpora and corpus methodology, and attempts to look into possible ...
[20]
Lexical Density and Readability: A Case Study of English Textbooks
Dec 18, 2018 · The study applied three methods in determining lexical density and readability as proposed by Halliday (1985), Ure (1971) and Flesch (1948). The ...
[21]
[PDF] Systemic Functional Linguistics and Register
Feb 23, 2017 · In terms of the grammar, written text is characterised by lexical intricacy, while spoken text is characterised by com- plex chains of clauses.
[22]
Linguistic Complexity Analysis: A Case Study of Commonly-Used ...
Jul 17, 2018 · From Halliday's point of view, lexical items include nouns, verbs, adjectives, and adverbs, while grammatical items consist of pronouns, ...<|separator|>
[23]
[PDF] Linguistic features in writing quality and development: An overview
Abstract: This paper provides an overview of how analyses of linguistic features in writing samples provide a greater understanding of predictions of both ...
[24]
[PDF] Lexical Complexity in The Writings of Iraqi, English L2, and English ...
Oct 7, 2022 · A computational tool, Lexical. Complexity Analyzer LCA, has been adopted to analyze lexical complexity measures in the texts of different groups ...
[25]
[PDF] Analysing Lexical Density, Diversity, and Sophistication in Written ...
In this article, we implement a Corpus Linguistics analysis to compare the differences in lexical density, lexical diversity, and lexical sophistication in ...
[26]
Spoken and Written Language - Google Books
Author, Michael Alexander Kirkwood Halliday ; Edition, 2, reprint ; Publisher, Deakin University, 1985 ; ISBN, 0730003094, 9780730003090 ; Length, 109 pages.
[27]
Complexity of English textbook language: A systemic functional ...
Lexical density is measured by the ratio of the total lexical items to the total ranking clauses (Halliday, 1985b; O'Loughlin, 1994). By ranking clauses, ...
[28]
Lexical density and Readability: A case study of English Textbooks
The study applied three methods in determining lexical density and readability as proposed by Halliday (1985), Ure (1971) and Flesch (1948). The analysis ...Missing: Penny | Show results with:Penny
[29]
The Relationship of Lexical Richness to the Quality of ESL Learners ...
Aug 9, 2012 · This study was an examination of the relationship of lexical richness to the quality of English as a second language (ESL) learners' oral narratives.
[30]
‪Xiaofei Lu‬ - ‪Google Scholar‬
The relationship of lexical richness to the quality of ESL learners' oral narratives. X Lu. The Modern Language Journal 96 (2), 190-208, 2012. 831, 2012.Missing: paper | Show results with:paper
[31]
The tool for the automatic analysis of lexical sophistication (TAALES)
Jul 11, 2017 · In this article, we introduce and test the reliability of the second version of the Tool for the Analysis of Lexical Sophistication (TAALES 2.0) ...Missing: density | Show results with:density
[32]
AlphaLexChinese: Measuring lexical complexity in Chinese texts ...
AlphaLexChinese (ALC) measures lexical complexity in Chinese texts. · ALC uses 50 metrics across three dimensions: density, sophistication, and variation. · Nine ...
[33]
VOCABULARY SIZE, LEXICAL DIVERSITY, LEXICAL DENSITY ...
Dec 25, 2023 · This study aims to assess vocabulary size, lexical density, and lexical diversity in argumentative essays written by Turkish English Language ...
[34]
The relationship between written discourse features and integrated ...
May 13, 2023 · ... lexical complexity included lexical density, lexical ... units in each sentence, mean T-unit length, and lexical sophistication.
[35]
Examining lexical features and academic vocabulary use in ...
Lexical density refers to the proportion of content words (i.e., nouns, verbs, adjectives, and adverbs) in a text (Laufer & Nation, 1995). Lexical density is ...
[36]
https://journals.sagepub.com/doi/10.1177/02655322231167629
[37]
Nominalization in high- and low-rated L2 undergraduate writing
Sep 20, 2023 · Nominalizations, or nouns derived from verbs or adjectives through suffixes, are a pervasive characteristic feature of written academic discourse.
[38]
Words as Tools: Learning Academic Vocabulary as Language ...
Jan 6, 2012 · Latinate vocabulary has more resources for this process than Germanic vocabulary. Nominalization and the informational density it affords serve ...
[39]
[PDF] Features of lexical richness in children's books: Comparisons with ...
We found that children's book language is lexically denser, more lexically diverse, and comprises a larger proportion of rarer word types compared to child- ...
[40]
[PDF] Lexical Density in UU Cipta Kerja - Atlantis Press
The high lexical density in legal texts does not necessarily indicate the reader's understanding. It is proven by the fact that there are still many the ...
[41]
An exploratory study on dialect density estimation for children and ...
Apr 29, 2024 · This paper evaluates an innovative framework for spoken dialect density prediction on children's and adults' African American English.A. Datasets · 1. Grammatical Features · Iv. Discussion
[42]
Lexical density in English newspapers – a cross-analysis of the New ...
Feb 11, 2025 · The findings indicate that written news had slightly higher average lexical density (55.05%) compared to spoken samples (52.15%).Literature review · Lexical density in news media · Quantitative analysis · Results<|control11|><|separator|>
[43]
Social Networks of Lexical Innovation. Investigating the ... - Frontiers
In this paper, I conduct a longitudinal study of the spread of 99 English neologisms on Twitter to study their degrees and pathways of diffusion.<|control11|><|separator|>
[44]
[PDF] lexical density as improvement indicator in the written performance ...
Nov 23, 2022 · This study assesses lexical density in the written performance of a controlled sample of. Saudi undergraduates majoring in English as a ...
[45]
Developmental Features of Lexical Richness in English Writings by ...
Jun 2, 2021 · Laufer and Nation (1995) define four dimensions of lexical richness, namely, lexical variation, lexical density, lexical sophistication, and ...
[46]
[PDF] An Exploration of Strategies to Enhance EFL Students' Academic ...
It is frequently used to mark genres, increase lexical density, condense semantics, and organize discourse information (Halliday & Matthiessen, 2014; Dong & ...
[47]
(PDF) Lexical Density of Reading Materials in An English Textbook
Aug 30, 2024 · Lexical density is one of the factors that influences students' reading comprehension. Lexical density is a condition of words' proportion ...
[48]
Assessing lexical density and its relevance to the CEFR level of ...
Dec 31, 2024 · This study assesses the quality of reading materials for non-English-major students by analysing their lexical density (LD).
[49]
Lexical and phraseological differences between second language ...
Mar 1, 2023 · L2 spoken responses classified as written showed greater lexical diversity, sophistication, and density, and less use of high-frequency word ...
[50]
Automatic Classification of Text Complexity - MDPI
Lexical variation features which include: (i) the lexical density ... POS tagging, morphological analysis, and parsing performed in the NLP pipeline elaboration.4. Nlp Pipeline For The... · 5. Numeric Linguistic... · 7. Experiments
[51]
Lexical richness in research articles: Corpus-based comparative ...
The corpus was analyzed for nine measures of lexical density, sophistication, and diversity to ascertain the trend and strength of lexical complexity changes.<|separator|>
[52]
[PDF] Jacques Savoy Authorship Attribution and Author Profiling - eBooks
Lexical density, percentage of content-bearing words in a text. LDA. Latent ... Then a set of historical documents (the Federalist Papers written in 1787–1788) is ...<|separator|>
[53]
[PDF] Investigating Topic Influence in Authorship Attribution - CEUR-WS.org
Lexical Density seems to be the only variable that discriminates authorship exclusively. All the others have some interaction with topic.
[54]
https://ceur-ws.org/Vol-276/paper5.pdf
[55]
(PDF) Comparing Measures of Syntactic and Lexical Complexity in ...
Jan 1, 2024 · PDF | On Jan 1, 2023, Nomsa Zindela published Comparing Measures of Syntactic and Lexical Complexity in Artificial Intelligence and L2 ...
[56]
Does simplification hold true for machine translations? A corpus ...
Apr 3, 2024 · The study found that the machine translation had a higher lexical diversity (STTR) and a lower lexical density than the human translation.Literature Review · Methods · Results
[57]
GenAI as a translation assistant? A corpus-based study on lexical ...
This study examines the role of GenAI as a post-editing assistant in learner translation by comparing the lexical and syntactic complexity of second language ( ...