Fact-checked by Grok 2 weeks ago

Guthrie classification of Bantu languages

The Guthrie classification of Bantu languages is a foundational system for organizing the approximately 500 spoken across , developed by British linguist Malcolm Guthrie in his 1948 monograph The Classification of the Bantu Languages. It divides these languages into 16 geographic zones labeled A through S (excluding I, O, and Q), primarily based on their spatial distribution from the northwest to , while incorporating lexical comparisons to identify regional clusters of similarity. Each zone encompasses subgroups and individual languages, assigned unique alphanumeric codes such as A10 for northwestern varieties or S40 for southeastern ones, enabling precise referencing in linguistic research. Guthrie's method emphasized practical utility over strict genetic phylogeny, drawing on earlier surveys like those by Johnston and Meinhof, but innovating through a zone-based framework that mapped language boundaries against ethnographic and historical data. In his later multi-volume Comparative Bantu (1967–1971), he refined the system by expanding the inventory to over 440 languages and providing detailed comparative vocabularies, which supported reconstructions of Proto-Bantu roots and illuminated patterns of divergence. This approach highlighted the Bantu family's internal diversity, with zones like A and B representing early expansions near the supposed homeland in the Cameroon-Nigeria border region, and zones S and G reflecting later southern and eastern migrations. Despite its geographical bias, which sometimes groups unrelated languages due to rather than , the Guthrie classification has profoundly influenced Bantu studies by standardizing nomenclature and facilitating interdisciplinary work on the —a prehistoric dispersal that shaped Africa's linguistic, cultural, and demographic landscape over millennia. Modern revisions, such as Jouni Filip Maho's New Updated Guthrie List (), integrate phylogenetic evidence from computational methods while retaining the zonal structure, ensuring its ongoing relevance in numerous scholarly publications on Bantu .

Overview

The zoning system

The Guthrie classification divides the approximately 250 Narrow Bantu languages into 16 zones labeled A through S, deliberately skipping the letters I, O, and Q to prevent confusion with numerals and vowels. This zoning framework groups languages primarily based on their geographical distribution across , reflecting broad areal patterns rather than a purely genetic phylogeny. The zones form distinct geographical clusters: zones A–C encompass the northwestern Bantu languages, primarily in regions like and ; zones D–G cover the west-central area, extending through the and adjacent territories; and zones H–S span the , distributed across southern and southeastern Africa, including , , and . These clusters highlight territorial contiguity and shared areal features, such as lexical and phonological similarities influenced by proximity, aiding in the practical organization of the diverse Bantu speech communities. The primary purpose of this zoning system is to provide a convenient tool for identification and comparative study, rather than to establish a rigorous of . Guthrie emphasized its utility for fieldwork and reference, allowing scholars to locate languages within a spatial framework without implying strict historical relationships. Within each zone, further subdivision occurs into up to 10 subgroups using a numbering system—for instance, zone A includes subgroups and A20—enabling finer-grained categorization while maintaining the overall zonal structure.

Coding convention

The Guthrie classification employs an alphanumeric coding system to uniquely identify , consisting of a zone letter (A through S) followed by a two-digit number, such as A10 for languages in Zone A, subgroup 10. This structure facilitates precise referencing within the geographical zones that form the basis of the classification. Within each zone, numbers from 10 to 90 designate subgroups, with odd numbers (e.g., 10, 30) assigned to principal or main groups and even numbers (e.g., 20, 40) to subdivisions or sub-groups of those main groups; the code 00 is reserved for languages that remain unclassified within a zone. This logical assignment ensures hierarchical organization, allowing linguists to infer relationships at a glance without implying strict genetic phylogeny. The system originated as provisional codes in 1948 publication, where the 16 zones were labeled A through S (excluding I, O, and Q) and the numbering scheme was introduced to catalog the approximately 250 Narrow languages and dialects based on available data at the time. It was refined and standardized in 1971 within Comparative Bantu, Volume 2, expanding the inventory to over 440 languages and dialects while preserving the zonal structure, incorporating additional languages, and solidifying the codes as a referential framework for studies. In practice, these codes serve as standard identifiers in linguistic research, enabling unambiguous reference to specific languages and replacing potentially ambiguous ethnonyms; for instance, Kikuyu is consistently denoted as , aiding comparative analyses and cataloging efforts across disciplines. This convention has endured, with subsequent updates like the New Updated Guthrie List extending it to newly identified varieties while preserving the original structure.

Historical context

Guthrie's contributions

Malcolm Guthrie (1903–1972) was a British linguist renowned for his expertise in African languages, particularly those of the family. Born on 10 February 1903 in , , he initially pursued a B.Sc. in metallurgy before training for the Baptist Ministry, which led him to specialize in linguistics during his career. He served as Professor of at the School of Oriental and African Studies (SOAS), University of London, and was Head of the Department of Africa there until 1970. Guthrie's early career involved extensive fieldwork in the (now the ) from 1932 to 1940, where he worked as a for the Baptist Missionary Society while conducting linguistic research. During this period and subsequent study leave from 1942 to 1944, he traveled across Bantu-speaking regions of , collecting data on over 180 languages that informed his comparative studies. This hands-on experience in the and solidified his focus on Bantu comparative linguistics, emphasizing systematic analysis amid diverse dialects. Beyond his foundational classification work, Guthrie authored key texts such as The Bantu Languages of Western Equatorial Africa (1953), a detailed survey published as part of the International African Institute's Handbook of African Languages series, to which he contributed extensively. He also played a significant role in editing and compiling resources for the Institute, including handbooks that standardized descriptions of African linguistic structures. These broader contributions advanced the documentation and understanding of Bantu languages across academic and practical applications. Guthrie's motivation for developing a comprehensive Bantu classification stemmed from the prevailing disarray in nomenclature, exacerbated by inconsistent colonial-era naming conventions and the absence of a unified standardization framework. In his 1948 publication The Classification of the Bantu Languages, he addressed this by proposing criteria for identification and grouping, aiming to provide a reliable referential system for scholars navigating the family's complexity. This effort laid the groundwork for his later, more expansive zoning approach.

Key publications

Malcolm Guthrie's foundational work on Bantu classification began with his 1948 monograph The Classification of the Bantu Languages, published by the for the International African Institute, which established the initial geographic zoning system and provided a provisional inventory of 191 arranged into 16 zones. This publication laid the groundwork for subsequent refinements by emphasizing lexical and phonological criteria for grouping, while acknowledging the challenges of distinguishing dialects from distinct languages. Complementing this, Guthrie's 1953 The Bantu Languages of Western Equatorial Africa, issued as part of the Handbook of African Languages series, offered a detailed regional survey of Bantu varieties in the specified area, highlighting their structural features and relationships to broader Bantu patterns. Guthrie's most comprehensive contribution came with Comparative Bantu, a four-volume series published between 1967 and 1971 by Gregg International, which expanded the classification to encompass 440 varieties, refined the zonal framework using 250 representative core languages for comparative analysis, and provided extensive comparative vocabularies, including reconstructions of Common Bantu roots, to facilitate and across the family. Volume 1 (1967) focused on general principles and , while subsequent volumes (1970–1971) detailed , , and extensive sets. These publications collectively established Guthrie's zoning and coding as the authoritative framework for Bantu linguistics, serving as the primary reference for scholars and directly informing the alphanumeric codes adopted in the standard for identifying .

Zone summaries

Zone A

Zone A encompasses the northwesternmost group of in Malcolm Guthrie's classification system, primarily spoken in along the coast and inland regions, with extensions into northern and . This zone represents the periphery of the , where languages exhibit close ties to the family's origins near the Cameroon-Nigeria border. The languages are organized into nine subgroups based on lexical and phonological similarities: A10 (Lundu-Balong), A20 (Duala), A30 (Bubi-Benga), A40 (Basaa), A50 (Bafia), A60 (Sanaga), A70 (Ewondo-Fang), A80 (Makaa-Njem), and A90 (Kaka). Guthrie's original 1971 inventory focused on representative varieties within these subgroups, totaling around 27 distinct languages, though subsequent updates like Maho's New Updated Guthrie List expand this to approximately 92 languages and dialects. Key examples include Duala (A24), a coastal trade language historically influenced by European contact; Ewondo (A72a), widely spoken in central with over 2 million speakers; and (A75), a major language extending across the Cameroon-Gabon border and spoken by more than 1 million people. These languages typically feature tonal systems, agglutinative morphology, and agreements characteristic of , but with variations such as reduced gender distinctions in some varieties. Zone A languages retain several conservative phonological features from Proto-Bantu, including nasalized vowels in varieties like (A75) and Gyele (A801), as well as logophoric pronouns in some A10-20 languages that may trace back to pre-Bantu stages. Lexical reconstructions often draw heavily from Zone A data due to these retentions, though overall similarity to Proto-Bantu reconstructions averages 70-80% in core vocabulary across sampled varieties. Due to their geographical position, Zone A languages show influences from neighboring non-Bantu families, particularly Grassfields Bantu (also Niger-Congo) and Ubangi (, leading to innovations like animacy-based agreements and syntactic restructuring in systems. For instance, with Gbaya (Ubangi) has contributed to optional patterns in languages like Kako (A93). This peripheral has resulted in more divergent systems compared to central Bantu zones, with some languages reducing the typical 10+ noun classes to fewer distinctions.

Zone B

Zone B comprises Bantu languages primarily distributed in the western parts of , spanning , the , the western , and northern . This zone represents a central-western cluster within Guthrie's geographical zoning system, reflecting migrations and settlements along the basin and coastal regions. The languages here exhibit close ties to other western Bantu groups but form a distinct areal unit due to shared phonological and lexical innovations. The zone is organized into eight main subgroups: B10 (Myene languages, such as Mpongwe), B20 (Kele languages, including Seki and ), B30 (Tsogo languages, like Tsogo and Kande), B40 (Shira-Punu languages, such as Punu), B50 (Nzebi languages, including Vili and Nzebi), B60 (Mbete languages, such as Mbete), B70 (Teke languages, like Teghe), and B80 (Tiene-Yanzi languages, such as Tiene). original classification identified around 25 languages in this zone, while updated referential lists expand this to approximately 47 varieties, accounting for dialects and newly documented forms. Characteristic of overall, those in Zone B feature strong systems, with 10–20 classes marked by concordant prefixes that govern across the ; for instance, classes often pair singular and forms like mu-ntu (person) and ba-ntu (). Tonal variations are also prominent, with most languages using two-level tone systems (high and low) for lexical distinction and grammatical functions, though some exhibit contour tones influenced by neighboring non- languages. These traits underscore the zone's role in illustrating Bantu diversity in contact zones. Representative examples include Punu (B43), spoken by over 200,000 people in southern and the , known for its rich and tonal verb conjugations, and Teke (B70), a cluster of dialects used by about 1 million speakers in the , featuring extensive morphology adapted to agricultural and riverine vocabularies. The coding convention assigns identifiers like B42 for specific Punu varieties, linking to the broader referential system.

Zone C

Zone C in Guthrie's classification covers a region in Central Africa, primarily the eastern Democratic Republic of the Congo (DRC), with extensions into Uganda, Rwanda, and Burundi, forming part of the northwestern Bantu continuum adjacent to non-Bantu language areas. This zone highlights the transition between forest and savanna environments, where Bantu languages interact with neighboring groups such as Central Sudanic and Nilotic families, leading to some areal linguistic influences like borrowed vocabulary and phonological features. In the updated NUGL, Guthrie identified around 27 languages in this zone, emphasizing their relative homogeneity in vocabulary and structure compared to more divergent areas; modern counts remain similar. The zone is divided into five main subgroups based on shared innovations and geographical proximity in the modern classification. The C10 subgroup includes forest languages like Mbole, Lengola, Mituku, and Genya, spoken in the Ituri region of eastern DRC. C20 comprises the Lega cluster, featuring languages such as Lega, Songola, Kumu, Zimba, Bangubangu, and Horohoro along the western Rift Valley. The C30 group covers Bira, Huma, and Peri in the northern Kivu area, while C40 includes Ruwenzori-Kivu languages like Konzo, Ndandi, and Nyanga in Uganda and eastern DRC. C50 features Bembe, Hunde, Havu, Nyabungu, Buyu, and Kabwari around Lake Kivu. Languages like Rwanda-Rundi (now JD.60) and Haya (JE.22) were historically associated but are classified in Zone J in updates. Linguistically, Zone C languages are characterized by a seven-vowel system, single vowel quantity in roots, on the radical syllable, and lexical tone on roots and suffixes, with limited k/g alternations except after nasals. Nominal classes often mark diminutives (except in C10 and C20), and verbal derivations show rare passive forms without the common -u- extension; most groups use single independent nominal prefixes and distinct negative tenses. Vocabulary overlap is high within the zone, reaching up to 60% between closely related pairs like Budgili and Bubadgi, reflecting simple phonological and grammatical systems adapted to the region's ecological and contact dynamics.

Zone D

Zone D of the Guthrie classification covers a group of spoken primarily in the western (DRC), particularly in forested regions around the central . These languages form part of the west-central cluster, with some extensions into eastern DRC and neighboring areas, though not significantly into the (Congo-Brazzaville). The zone includes over 30 languages, many of which are small-scale and underdocumented, reflecting the linguistic diversity of the equatorial forest environment. Key characteristics of Zone D languages include complex tonal systems with high, low, and sometimes mid or downstepped tones, as well as standard systems featuring paired singular-plural prefixes that govern agreement across the sentence. Phonologically, several languages exhibit labiovelar stops like /kp/ and /gb/, alongside 7-vowel systems with advanced tongue root (ATR) harmony in some cases. The subgroups in Zone D are geographically and linguistically cohesive, often sharing innovations in verb morphology and lexicon due to their forest habitat and historical isolation from savanna Bantu varieties. The D.10 Mbole-Enya subgroup, for instance, comprises languages like Mbole (D.11), Lengola (D.12), and Enya (D.14), spoken around Kisangani in northern DRC, noted for their intricate tone patterns and reduced noun class inventories compared to core Bantu norms. Further south, the D.20 Lega-Holoholo subgroup includes Lega (D.25), with approximately 440,000 speakers in South Kivu province, featuring 19 noun classes, a 7-vowel inventory, and verb extensions for causation and passivization; other members like Holoholo (D.28) extend into Tanzania. The D.30 Bira-Nyali group, centered in Ituri province, encompasses Bira (D.32) and Nyali (D.33), alongside smaller varieties like Budu (D.332) and Ndaaka (D.333), which display vowel systems varying from 5 to 9 phonemes and close ties to neighboring Zone C languages through lexical borrowing. Smaller subgroups round out Zone D, highlighting its internal diversity. The D.40 Nyanga subgroup consists mainly of Nyanga (D.43), spoken in North Kivu with a focus on tonal contours for grammatical distinctions. Similarly, D.50 includes Bembe (D.54) and Buyu (D.55), found along the DRC-Tanzania border, where noun classes integrate with locative prefixes for spatial reference, and some dialects show implosive consonants like /ɓ/ and /ɗ/. Overall, these languages exemplify the adaptive phonological and morphological traits of forest Bantu, with ongoing documentation efforts revealing their resilience amid regional multilingualism.

Zone E

Zone E in the Guthrie classification covers spoken primarily in eastern and northern , extending across highland and coastal environments. These languages form part of the eastern cluster, reflecting historical migrations and interactions with Nilotic and . The zone is distinguished by its geographical positioning along the eastern edge, influencing linguistic diversity through contact. The zone comprises approximately 25 languages, organized into four main subgroups based on shared phonological, morphological, and lexical features as outlined in Guthrie's system and updates. Key subgroups include:
  • E50 (Kikuyu-Kamba): Located in central , including Kikuyu (E51) and Kamba (E55), prominent for their role in ethnic identities and agricultural societies.
  • E60 (Chaga): In the of , comprising Chaga dialects (E621, E622, E623), noted for highland isolation preserving archaic features.
  • E70 (Pokomo-Taita): Along the Kenyan-Tanzanian coast, featuring Pokomo (E71) and Taita (E74), influenced by coastal trade and contact.
  • E40 (smaller groups): Including Temi (E46) and related varieties in , showing affinities with neighboring E50-E70 languages.
Linguistically, Zone E languages often exhibit a seven-vowel system and patterns, particularly in suffix alternations triggered by root vowels, as posited in early comparative studies. For instance, in Kikuyu (E50), advanced tongue root (ATR) affects vowel quality across morphemes, contributing to phonetic cohesion in words. This , bidirectional in some cases, distinguishes Zone E from adjacent zones with stricter five-vowel systems. Tonal complexity and innovations further mark these languages, supporting their classification as a cohesive unit in Guthrie's geographical framework.

Zone F

Zone F in Malcolm Guthrie's classification encompasses spoken primarily in central , extending around the southeastern shores of and inland plateaus. This zone includes approximately 12 to 15 languages or distinct varieties, forming a geographically contiguous group in what was formerly . The zone is divided into three main subgroups based on shared linguistic features and proximity: F10 (Tongwe–Bende), F20 (Sukuma–Nyamwezi), and F30 (Nilamba–Rangi). The F10 subgroup comprises Tongwe (F11) and Bende (F12), small languages spoken near the Malagarasi River. The F20 subgroup is the largest, featuring Sukuma (F21), Nyamwezi (F22), Sumbwa (F23), Kimbu (F24), and Bungu (F25); Nyamwezi, for instance, is a representative example with over a million speakers historically noted in the region. The F30 subgroup includes Nilamba (F31), Nyaturu (F32, also known as Rangi), and Mbugwe (F34), concentrated further east toward the Rift Valley. These subgroup codes follow Guthrie's alphanumeric system, where the letter denotes the zone and the number the internal grouping. Linguistically, Zone F languages typically exhibit a five-vowel system, though seven vowel qualities are often distinguished phonetically, along with two degrees of in many roots. They display extensive consonant alternations in verbal radicals, with some languages like those in F20 showing up to 30 distinct consonants. Nominal structures often use single prefixes for independent forms, but additional series of prefixes mark or emphasis, and lexical plays a key role in most varieties for distinguishing meaning. These traits highlight the zone's position as a transitional area between eastern and central innovations.

Zone G

Zone G constitutes the southernmost central zone in Malcolm Guthrie's geographic classification of Bantu languages, encompassing languages spoken primarily in , with extensions into northern and southern . These languages mark key stages in the across eastern , reflecting migrations along coastal and inland routes during the late . The zone includes approximately 47 distinct languages, grouped into six main subgroups based on shared lexical and phonological features. Subgroup G10 features (G11), a tonal with around 800,000 speakers in the , known for its complex system. G20 includes Shambala (G23), spoken in the with notable . G30 encompasses Zigua (G31), part of the Zigula-Zaramo cluster along the Tanzanian coast. G40 covers (G41-43), a major coastal trade with numerous dialects. G50 highlights Pogolo (G51) and Ndamba (G52) in central Tanzania. G60 includes Bena (G63) and Hehe (G62), inland varieties with conservative traits. These languages generally retain core Bantu characteristics, such as the five-vowel system and prefixal noun classes, while showing innovations like lexical tone in inland varieties and loanwords in coastal ones due to historical contacts. (G11) serves as a prominent example, illustrating the zone's role in preserving Proto-Bantu verbal extensions amid regional diversification.

Zone H

Zone H in Malcolm Guthrie's classification encompasses a group of Bantu languages spoken primarily in the southwestern regions of the (DRC), , and the , with some varieties extending into Cabinda (). These languages are geographically contiguous, forming part of the broader Southwest cluster, and are characterized by their location along the lower basin and adjacent areas, reflecting historical migrations and interactions in this riverine and coastal-influenced zone. The zone includes approximately 24 languages, organized into four main subgroups based on lexical and grammatical similarities, as outlined in referential and subsequent updates. These subgroups demonstrate high internal vocabulary relatedness, often exceeding 60% shared lexicon between closely related varieties, with phonological features such as a seven-vowel and lexical tone typical of many in the region. The H10 subgroup, known as the Kikongo group, is the largest and most prominent, comprising around 15 varieties centered in the lower area. Key examples include Kikongo (H16), a widely spoken language with dialects such as Yombe (H16c) and Fiote (part of H16d), used by millions across , DRC, and ; Vili (H12) in the ; and Bembe (H11) in the DRC. This subgroup is noted for its role in early contact and trade, influencing regional creoles and literatures. H20, the Kimbundu group, features languages mainly in central , including (H21a), a major language with over 3 million speakers historically significant in Angolan history; and smaller varieties like Sama (H22) and (H23). The H30 Yaka group includes Yaka (H31), spoken by about 1 million people in the Kwango region of DRC and , along with Suku (H32) and Mbangala (H34), which exhibit distinctive nominal prefixing and verbal extensions adapted to local environments. Finally, H40, the Mbala-Hunganna group, consists of Mbala (H41) and Hunganna (H42), smaller languages in southwestern DRC and northern , with limited documentation but sharing core systems and tonal patterns. Overall, Zone H languages highlight the diversity of expansions into Atlantic-facing regions, with ongoing research emphasizing their conservative retention of Proto-Bantu features amid influences from non-Bantu groups.

Zone J

Zone J encompasses Bantu languages spoken in the interlacustrine region of , primarily in , western , northwestern , , , and eastern . This zone was introduced subsequent to Malcolm Guthrie's original classification to group languages previously assigned to zones D and E, based on shared lexical and grammatical innovations indicating a closer genetic relationship. The languages are geographically concentrated around the , reflecting historical migrations and adaptations to highland and lakeshore environments. The Zone J languages are clickless, consistent with the majority of outside the southernmost varieties that have incorporated click consonants through contact. They number approximately 25 distinct languages or major varieties, though counts vary depending on whether dialects are treated as separate languages. These languages typically feature the canonical noun class system with 10-20 classes, agglutinative verb morphology including tense-aspect markers, and tonal systems for lexical and grammatical distinctions. Representative examples include (J15, spoken by over 4 million in ), (J61, of with about 12 million speakers), and (J62, with around 10 million speakers). Phonological traits often include in some subgroups and consonant spirantization in agentive derivations, contributing to their distinct profile within Eastern . Zone J is subdivided into six main groups based on lexical similarities and phonological correspondences:
  • J10: The Ganda subgroup, including Luganda, Lusoga, and Runyoro, primarily spoken in central along . These languages show innovations in verbal extensions and noun class mergers.
  • J20: The Haya subgroup, encompassing Haya, Zinza, and Rashi, located in northwestern near . Notable for rich tonal systems and dialect continua across ethnic groups.
  • J30: The Luhya subgroup, comprising various dialects like Luyia, , and , spoken in western and eastern . This group exhibits significant dialectal diversity, with over 15 million speakers collectively.
  • J40: The Konzo-Nande subgroup, including and Nande (Kinande), found in the straddling and DRC. Characterized by conservative phonology and highland-specific vocabulary.
  • J50: The Hunde-Shi subgroup, with Hunde, Shi, and Havu, spoken in eastern DRC near . These languages display agent noun spirantization and adaptations to volcanic highland ecology.
  • J60: The subgroup, including , , , and Vinza, distributed across , , and adjacent and DRC. Known for close between Rwanda and Rundi, and extensive use in literature and administration.
This classification highlights Zone J's role as a potential genealogical unit within , supported by lexicostatistical studies showing higher cognate retention among its members compared to adjacent zones.

Zone K

Zone K encompasses Bantu languages spoken primarily in , , and , extending into eastern and northern , forming part of the central southern Bantu branch. These languages are geographically clustered in the region around the upper River and the western plateau areas. The zone is organized into four main subgroups in Guthrie's classification: K.10 (Chokwe-Luchazi), K.20 (Lozi), K.30 (Luyana), and K.40 (Subiya-Totela). The K.10 subgroup includes approximately ten languages such as Chokwe (K.11), Luimbi (K.12a), Ngangela (K.12b), Luchazi (K.13), Lwena (K.14), Mbunda (K.15), Nyengo (K.16), Mbwela (K.17), and Nkangala (K.18), primarily distributed in eastern , western , and adjacent areas of the . The K.20 subgroup consists solely of Lozi (K.21), a widely spoken language serving as a regional in western , with extensions into northern , eastern , and parts of and . The K.30 subgroup, the Luyana group, comprises languages including Luyana (K.31), Mbowe (K.32), Kwangali (K.33), Manyo (K.331), Mbukushu (K.332), Mashi (K.34), Simaa (K.35), Shanjo (K.36), and Kwangwa (K.37), spoken across , , and . The K.40 subgroup features Totela (K.41) in and Subiya varieties such as Ikuhane (K.42) in and . Overall, Zone K accounts for around 20 distinct languages, reflecting lexical tone systems, a five-vowel inventory, and shared morphological traits like gender prefixes for diminutives (e.g., ka-/tu- classes) typical of central southern varieties.

Zone L

Zone L in Malcolm Guthrie's classification encompasses a group of Bantu languages primarily spoken in the southern regions of the (DRC), with extensions into northern , eastern , and adjacent areas. This zone, designated as L, includes approximately a dozen languages divided into six subgroups (L.10 to L.60), reflecting geographical contiguity and shared linguistic traits typical of Central varieties. The languages are characterized by a relatively conservative retention of Proto-Bantu features, including a five-vowel system and tonal distinctions, though with simplifications in grammatical structure compared to eastern zones. The subgroups are structured numerically within the zone: L.10 comprises the Pende group, including languages such as Pende (L.11), Samba (L.12), and Kwese (L.13), spoken mainly in the Kwango region of the DRC. L.20 covers the Songe group, with varieties like Kete (L.21), Binji (L.22), and Songe proper (L.23), located in the Kasai and Sankuru districts. The largest subgroup, L.30, includes the Luba languages, such as Luba-Kasai (L.31), Luba-Katanga (L.33), and Kanyoka (L.32), which are widely spoken across central DRC and noted for their use in literature and administration. L.40 consists of Kaonde (L.41), primarily in northwestern and southeastern DRC, while L.50 features Lunda languages like North Lunda (L.52) and Ruund (L.53) in the same border areas. Finally, L.60 includes the Nkoya cluster (L.62) and related varieties like Mbwera (L.61) in western . Grammatically, Zone L languages exhibit a single for independent nominals, distinguishing them from zones with more complex prefix systems, and often employ a construction for equational sentences rather than direct nominal predication. Tense-aspect marking is relatively simple, with two past and two future distinctions, and the perfective *-ile is present in most varieties except Songe. Phonologically, they maintain a seven-vowel system in some cases and feature tonal alternations on radicals, contributing to lexical differentiation. These traits underscore the zone's position as a transitional area between central and southern expansions, with Luba-Kasai serving as a prominent example due to its over five million speakers and role in regional communication.

Zone M

Zone M in the Guthrie classification encompasses Bantu languages spoken primarily in southern , northern , and northern , reflecting a geographical zone of central-southern . This zone was originally delineated by Malcolm Guthrie in 1948 based on territorial contiguity and shared phonological and morphological traits, such as the prevalence of five-vowel systems in several subgroups and the use of passive extensions like -u-. The languages here demonstrate typical characteristics, including agglutinative verb structures, systems with prefixes for singular and plural forms, and complex tonal patterns that distinguish lexical items and grammatical functions. The New Updated Guthrie List (NUGL) of 2009 organizes Zone M into six subgroups, totaling 37 languages, highlighting its relative diversity within the family. The M10 subgroup (Fipa-Mambwe group) includes five languages such as Pimbwe and Fipa, spoken in the of . M20 (Nyiha-Safwa group) comprises nine languages like Nyiha and Safwa, found in southwestern . M30 (Nyakyusa-Ngonde group) has three languages, including Nyakyusa-Ngonde along the -Malawi border. M40 (Bemba group) features four languages, notably Bemba, a major language in with over 3 million speakers serving as a regional . M50 (Lala-Bisa-Lamba group) lists eight languages such as Lala and Lamba in central , while M60 (Lenje-Tonga group) includes eight languages like and Ila, primarily in southern . These subgroups show internal genetic coherence, with innovations like double independent prefixes in verbs for some groups and special negative tense formations, though the zone as a whole lacks strong overarching unity. Linguistically, Zone M languages often exhibit seven-vowel systems in M30, contrasting with the five-vowel systems dominant in M40-, alongside features like the suffix -ile for perfective aspects (absent in M60) and extra locative prefixes such as pa-, ku-, and mu-. Tonal alternations on roots and nominal suffixes are common in M10-M30, contributing to phonological complexity. Representative examples include Bemba, which has influenced urban vernaculars in through its standardized form, and , known for its role in Zambezi valley communities with distinct dialectal variations. Overall, Zone M represents a transitional area in , bridging eastern and southern branches with evidence of historical migrations from the .

Zone N

Zone N in the Guthrie classification comprises a small group of spoken primarily in southern and adjacent areas of northern and , forming a minor zone in the eastern-southern transition. This zone includes approximately 10-15 languages, reflecting limited diversity but shared traits with neighboring P and M zones. The zone is divided into two main subgroups: N10 (Manda group) and N20 (possibly including related varieties). The N10 subgroup includes languages such as Manda (N11), Ngoni (N12), Matengo (N14), Mpoto (N13), (N15 of Malawi), Ndendeule (N101), and Nindi (N102), spoken in the Matengo highlands and coastal areas. These languages feature typical systems and tonal patterns, with innovations from contact with non-Bantu groups. Ndendeule (N101), for example, is a lesser-documented variety with around 20,000 speakers as of recent estimates. Zone N languages exhibit five- to seven-vowel systems and agglutinative morphology, distinguishing them from the larger cluster (Zone S40). The zone's compact nature highlights localized expansions, with lexical borrowing from and other eastern varieties.

Zone P

Zone P in Malcolm Guthrie's classification of comprises a group of languages spoken primarily in southeastern and adjacent areas of northern . This zone represents a transitional area between the central to the north and the southern expansions further south, with languages exhibiting phonological and morphological features typical of the broader family, such as systems and agglutinative verb structures. Guthrie identified Zone P as containing 14 languages across three subgroups in his 1948 classification, later refined in his 1971 work, emphasizing their geographical coherence along coastal and inland riverine zones. The languages of Zone P are divided into three main subgroups: P.10 (Matumbi group), P.20 ( group), and P.30 (Makua group). These subgroups reflect close lexical and grammatical affinities, with high percentages of shared vocabulary between adjacent groups, supporting geographic clustering approach. For instance, the P.20 and P.30 groups show innovations like the extension -u- for passive forms in verbs and neuter derivations, distinguishing them from neighboring zones. Phonologically, most languages in P.20 and P.30 feature a five-vowel system (/i, e, a, o, u/), while P.10 often has a seven-vowel system including /ɪ, ɛ, ɔ, ʊ/; lexical tones are absent in P.10 and P.30 but present with alternations in P.20 radicals and suffixes. forms vary across the zone, with preverbal particles common in P.30.
SubgroupCodeLanguagePrimary LocationNotes
P.10 (Matumbi)P.11NdengerekoSoutheastern Spoken along the ; features double dependent prefixes in some forms.
P.10 (Matumbi)P.12Rufiji (Ruihi)Southeastern Coastal variety with single independent nominal prefixes.
P.10 (Matumbi)P.13MatumbiSoutheastern Representative of the group; no lexical tones.
P.10 (Matumbi)P.14NgindoSoutheastern Inland variety near the coast.
P.10 (Matumbi)P.15Mbunga (Mbulga)Southeastern Limited documentation; shares vocabulary with P.20.
P.20 (Yao)P.21Yao (ChiYao)Southern , northern Widely spoken; tonal with alternations; over 1 million speakers.
P.20 (Yao)P.22MweraSouthern Closely related to Yao; agricultural communities.
P.20 (Yao)P.23MakondeSoutheastern , northern Known for matrilineal society; five-vowel system.
P.20 (Yao)P.24NdondeSoutheastern Small speech community; shares passive extensions.
P.20 (Yao)P.25Mabiha (Mavia)Northern Dialectal variation; tonal features.
P.30 (Makua)P.31Makua (eMakua)Northern Largest in zone; no tones; over 4 million speakers; dialects include Central Makua.
P.30 (Makua)P.32Lomwe (eLomwe)Northern , southern borderSignificant dialect chain; five-vowel system.
P.30 (Makua)P.33Ngulu (Dgulu)Northern Inland variety; neuter forms in nouns.
P.30 (Makua)P.34Cuabo (Echuwabo)Northern coastPreverbal negatives; close to Makua.
These languages illustrate the diversity within Zone P, with representative examples like (P.21) demonstrating tonal verb paradigms and Makua (P.31) showing simplified tone systems compared to northern zones. The zone's limited number of languages—approximately 14 in Guthrie's listing—highlights its role as a relatively compact linguistic area, influencing later classifications through its phonological conservatism.

Zone R

Zone R in Malcolm Guthrie's classification encompasses Bantu languages primarily spoken in southern , northern , and adjacent areas of . These languages form part of the Southwest Bantu subgroup, characterized by their geographical concentration in the southwestern region of the Bantu-speaking area. The zone is divided into four main subgroups. The R.10 subgroup, known as the Umbundu group, includes (R.11), Ndombe (R.12), Nyaneka (R.13), and related varieties such as Khumbi (R.14), Kuvale (R.101), Kwisi (R.102, now extinct), and Mbali (R.103); these are mainly located in southern . The R.20 subgroup, the Wambo or Ovambo group, comprises Kwanyama (R.21), Ndonga (R.22), Kwambi (R.23), Ngandjera (R.24), and dialects like Kafima and Evale, spoken across northern and southern . In the R.30 subgroup, the Herero group features Herero (R.31) with dialects such as North-West Herero, Mbanderu (R.31b), and Cimba (R.31c), distributed in northern , southern , and . The R.40 subgroup contains Yeyi (R.41), a single language spoken in the eastern Caprivi region of and Ngamiland in , often noted for its transitional features between and other Khoisan-influenced languages. Linguistically, Zone R languages typically exhibit double independent nominal prefixes, a five-vowel system without quantity alternations in radical vowels, and common use of verbal extensions like -u- for passives. For instance, in Kuanyama (R.21), the verb root -dal- ("give birth to") becomes -dalu- ("be born") to indicate . Extended radicals predominate, with fewer than 20% of vocabulary items featuring simple s in the R.10 and R.20 groups. Tonal systems vary, with some languages like showing tonal alternations on radicals, and penultimate vowel lengthening appears in certain varieties such as Mbundu.

Zone S

Zone S in Malcolm Guthrie's classification represents the southernmost geographical zone of Bantu languages, primarily encompassing southeastern Africa, including South Africa, Lesotho, Eswatini (formerly Swaziland), Botswana, Zimbabwe, and southern Mozambique. This zone is distinguished as the largest in the Guthrie system, containing approximately 30 distinct languages or major varieties, reflecting a high degree of linguistic diversity shaped by historical migrations and interactions in the region. Unlike many other zones, which are predominantly geographical, Zone S exhibits stronger genetic coherence, often treated as a valid subgroup within the Southern Bantu branch. The zone is subdivided into six main groups: S10 (Shona cluster), S20 (Venda), S30 (Sotho-Tswana), S40 (Nguni), S50 (Tsonga or Tswa-Rhonga), and S60 (Chopi or Copi). The S10 Shona languages, spoken mainly in Zimbabwe and southern Mozambique, include varieties such as Manyika (S13) and Ndau (S15), forming a dialect continuum with around six major lects. S20 consists primarily of Venda (S21), a single language with dialects in northern South Africa and southern Zimbabwe, noted for its unique phonological features like dental clicks in some varieties. S30, the Sotho-Tswana group, is centered in , , and , featuring major languages such as Southern Sotho (S33), or Pedi (S32), and Tswana (S31), which together serve over 10 million speakers and are characterized by systems typical of but with innovations in tonal patterns. The S40 Nguni subgroup, prominent in , , and , includes influential languages like (S42) and (S41), both official languages in with click consonants borrowed from substrates; , for instance, has three main click series and is spoken by about 8 million people. S50 Tsonga languages extend across southern , , and , with key varieties like Tsonga proper (S52) and Ronga (S54), emphasizing agglutinative morphology and . Finally, S60, the smallest subgroup, comprises Chopi (S61) and related lects in southern , known for complex musical traditions intertwined with linguistic expression. Zone S languages share common traits such as the use of click sounds in Nguni and some Sotho varieties, derived from contact with non-Bantu peoples, and a general adherence to the Bantu noun class system with 10-20 classes, though with regional simplifications in verb conjugations compared to central zones. These languages play a central role in the cultural and political landscape of , with and serving as lingua francas in their respective areas.

Detailed language lists

1948 classification

In 1948, Malcolm Guthrie published The Classification of the Bantu Languages, presenting an initial systematic inventory of based on geographical distribution and limited comparative data, identifying 191 groups representing approximately 250–300 varieties organized into 16 zones labeled A through S (excluding I, O, and Q). This early served as a foundational reference, employing alphanumeric codes where the letter denotes the zone and the numbers indicate subgroups and specific s. The classification was explicitly provisional, reflecting the fragmentary and variable data available from colonial-era surveys and missionary reports, with many entries designated as unclassified or tentative, such as A00 for undetermined languages in zone A. Guthrie emphasized that assignments were subject to revision as more linguistic evidence emerged, prioritizing a geographical framework over strict genetic relationships due to insufficient comparative materials. The publication included a fold-out geographical sketch map illustrating the approximate locations of these languages across Central, Eastern, and , keyed to the codes for visual reference. The languages were grouped within each into numbered s, with codes like A.10 denoting the first language in zone A's initial . Below are representative zone-by-zone summaries with key examples of codes and primary names as assigned by Guthrie; alternative or dialectical names are noted in parentheses where specified. Full lists are available in the original publication. Zone A (northwestern Congo Basin and Gabon, ~30 entries):
  • Examples: A.10 Dondo, A.11 Kele, A.12 Benga, A.13 Myene, A.20 Nkomi, A.30 , A.40 Kota (e.g., A.41 Bati), A.50 Maka (e.g., A.51 Nohu), A.60 Duala (e.g., A.61 Yombi), A.70 Mpongwe (e.g., A.71 Myene).
Zone B (western , ~40 entries):
  • Examples: B.10 (e.g., B.11 Vili, B.12 Bembe), B.20 Yombe (e.g., B.21 Ndongo), B.30 Teke (e.g., B.31 Fumu, B.32 Tio), B.40 Boma (e.g., B.41 Mfinu, B.42 Boma), B.80 Yaka (e.g., B.81 Suku).
Zone C (central Congo, ~60 entries):
  • Examples: C.10 Zande (e.g., C.11 Budza), C.20 Ngombe (e.g., C.21 Bangi, C.22 Mbole), C.30 (e.g., C.31 Kele), C.40 Bushong (e.g., C.41 Kuba), C.50 (e.g., C.51 Mbole), C.60 Tetela (e.g., C.71 Tetela), C.70 Luba (e.g., C.75 Luba), C.80 Lele (e.g., C.81 Lele).
Zone D (eastern Congo, ~30 entries):
SubgroupCodes and Languages
10D.11 Mbole, D.12 Lengola, D.13 Mituku, D.14 Genya
20D.21 , D.22 Amba, D.23 Kumu, D.24 Songola, D.25 Lega, D.26 Zimba, D.27 Bangubangu, D.28 Horohoro
30D.31 Pere, D.32 Bira, D.33 Huma
40D.41 , D.42 Ndandi, D.43 Nyanga
50D.51 Hunde, D.52 Havu, D.53 Nyabungu, D.54 Bembe, D.55 Buyu, D.56 Kabwari
60D.61 Nyarwanda, D.62 Rundi, D.63 Fulero, D.64 , D.65 Vinza, D.66 ? (Hadza noted as non-Bantu), D.67 Shi
Zone E (Great Lakes region, ~70 entries):
SubgroupCodes and Languages
10E.11 Nyoro, E.12 Toro, E.13 Nyankore, E.14 Chiga, E.15 Ganda (E.15a Sese), E.16 Soga, E.17 Gwere, E.18 Nyala
20E.21 Haya (e.g., E.22 Haya, E.22a Ziba), E.23 Zinza, E.24 Kerebe, E.25 Jita
30E.31 Masaba (e.g., E.31a Gisu), E.32 Logooli, E.33 Kuria, E.34 Saamia
40E.41 Gusii, E.42 Kuria, E.43 Logoli (varieties)
50E.51 Kikuyu, E.52 Embu, E.53 Meru, E.54 Kamba
60E.61 Rwa, E.62 Chaga (e.g., E.62a Machame), E.63 Pare, E.64 Gweno
70E.71 Pokomo, E.72 Segeju, E.73 Digo, E.74 Taita (e.g., E.74a Dawida)
Zone F (central Tanzania, ~20 entries):
SubgroupCodes and Languages
10F.10 , F.11 Kagulu, F.12 Rangi
20F.21 Sukuma, F.22 Nyamwezi, F.23 Sumbwa
30F.31 Zigula, F.32 Zaramo, F.33 , F.34 Kutu
Zone G (southern Tanzania, ~35 entries):
SubgroupCodes and Languages
10G.11 , G.12 Kagulu
20G.21 Asu, G.22 Sambaa, G.23 Bondei, G.24 Shambala
30G.31 Zigula, G.32 Mushunguli, G.33 Zaramo, G.34 Ngulu, G.35 Ruguru, G.36 , G.37 Kutu, G.38 Vidunda, G.39
40G.41 (e.g., G.42a Mrungu, G.42b Vumba), G.43 Pemba
50G.51 Pogolo, G.52 Ndamba
60G.61 Sagara, G.62 Hehe, G.63 Bena, G.64 Pangwa, G.65 Kinga, G.66 Wanji, G.67 Kisi
Zone H (Angola and , ~25 entries):
SubgroupCodes and Languages
10H.11 Vili, H.12 Kunyi, H.13 Bembe, H.14 Nkundji, H.15 Mboka, H.16 (e.g., H.16a Yombe, H.16b Sundi)
20H.21 Ndongo, H.22 Mbamba, H.23 Sanga, H.24 Ngola, H.25 , H.26 Songo
30H.31 Yaka, H.32 Suku, H.33 Hungu, H.34 Tembo, H.35 Mbangala, H.36 Inji
40H.41 Mbala, H.42 Ndibu
Zone J (northeastern extensions; empty or provisional in 1948, later developed from D/E):
  • No major assignments in 1948; reserved for potential northeastern varieties.
Zone K (southern Congo and Angola, ~20 entries):
SubgroupCodes and Languages
10K.11 Cokwe, K.12 Luvale, K.13 Lucazi, K.14 Luchazi, K.15 Mbunda, K.16 Nyengo, K.17 Mbwela, K.18 Nkangala
20K.21 Lozi
30K.31 Luyana, K.32 Mbunda, K.33 Mbalanhu, K.34 Mashasha, K.35 Mbala, K.36 Kwangwa
40K.41 Totela, K.42 Subia
Zone L (Kasai region, Congo, ~20 entries):
SubgroupCodes and Languages
10L.11 Pende, L.12 Ngalangu, L.13 Kwese
20L.21 Kete, L.22 Binji, L.23 Sanga, L.24 Lunda
30L.31 Luba-Kasai, L.32 Lulua, L.33 Kanyoka, L.34 Luba-Katanga, L.35 Hemba
40L.41 Kaonde
50L.51 Salampasu, L.52 Lunda, L.53 Luvale
60L.61 Mbwela, L.62 Nkoya
Zone M (eastern , ~30 entries):
  • Examples: M.14 Chewa, M.21 Nsenga, M.31 Tumbuka, M.41 Bemba, M.51 Lomwe, M.62 Namwanga.
Zone N ( and , ~30 entries):
  • Examples: N.11 Nyanja, N.21 Tumbuka, N.31 Sena, N.41 Ngoni, N.42 Nyakyusa.
Zone P (southern , ~15 entries):
  • Examples: P.11 Pogoro, P.21 Matumbi, P.31 Yao, P.41 Makua, P.51 Lomwe.
Zone R (, ~10 entries):
  • Examples: R.11 Ronga, R.21 Tsonga, R.31 Chopi, R.41 Bitonga, R.51 Shangaan.
Zone S (, ~20 entries):
  • Examples: S.10 Nguni (S.11 , S.12 , S.13 Swati, S.14 Ndebele); S.20 Sotho-Tswana (S.21 Southern Sotho, S.22 , S.23 Tswana); S.30 (S.31 ).
Note: The numbers approximate ~250–300 entries across all zones, including varieties. This structure allowed for easy reference and expansion in subsequent works, though many codes like A00 remained unassigned or provisional.

1971 updates

In the multi-volume work Comparative Bantu (1967–1971), Malcolm Guthrie significantly expanded and refined his 1948 classification of , increasing the number of recognized varieties from approximately 250–300 to 440 while designating 250 as "Narrow Bantu" for focused comparative reconstruction, excluding more divergent northwestern forms. This update built on the earlier geographical zoning but incorporated new data from field reports and archival sources to better capture linguistic diversity across sub-Saharan Africa. Zone boundaries were adjusted to align more closely with observed linguistic and geographical patterns, including additions to Zone C such as expanded coverage of languages (e.g., under D.60 series, later influencing Zone J). Reassignments occurred for certain languages previously placed in Zone B, shifting them to Zone D based on shared innovations in and , reflecting a residual but cohesive grouping in the northwest. These changes emphasized practical referential over strict genetic subgrouping, prioritizing lexical similarities for zone membership. Subgroup refinements filled gaps in the , providing more precise identifiers for dialects and closely related languages; for instance, E.51 was assigned to specific Kamba dialects within the broader E.50 Kikuyu-Kamba group in eastern . Such updates allowed for finer-grained inventories, with over 200 new entries integrated into the zones while maintaining the alphanumeric structure (e.g., A.10, B.40). A key component of the 1971 updates was the Common Bantu vocabulary list, comprising around 1,600 lexical items (initially 100 core for comparisons) selected for cross-language comparisons to support proto-form reconstructions and establish shared heritage. These items focused on basic nouns and concepts with high retention rates across varieties, avoiding culturally specific terms. The list facilitated the identification of cognates and sound correspondences without delving into full etymologies in the classification itself. Representative examples include:
Proto-Bantu RootMeaning
*mù-ntʊ̀person
*ì-n-tʊ̀thing
*mù-títree
*ì-kʊ̀ear
*lɛ́-tɛtongue
These roots exemplify the stable core lexicon used to validate zone assignments and subgroup links.

2009 NUGL appendix

The 2009 appendix to the New Updated Guthrie List (NUGL), compiled by linguist Jouni Filip Maho, represents a significant expansion of Malcolm Guthrie's original 1971 classification system for Bantu languages. Building on the foundational zones (A through S) established by Guthrie, which cataloged approximately 440 languages, Maho's update incorporates over 500 additional languages and varieties, bringing the total inventory to around 950 distinct entries. This expansion notably includes previously unclassified Bantu varieties as well as marginal or peripheral languages that had been overlooked or debated in earlier classifications, such as those in the newly introduced J zone for certain northeastern Bantu expansions. Key changes in the 2009 NUGL involve an extension of the Guthrie coding scheme to accommodate the increased scope. For instance, new subgroup codes like A90c are introduced to denote finer subdivisions within zones, while extra-Bantu or unassigned varieties are grouped under codes such as A100, allowing for a more flexible referential framework. Additionally, the system prefixes marginal languages with a "J" (e.g., J E41 for certain varieties), integrating contributions from projects like the classification without altering the core zonal structure. These modifications ensure compatibility with emerging linguistic data while preserving the geographical orientation of Guthrie's approach. Of the 950 varieties, approximately 631 are treated as distinct languages (without subtype suffixes), emphasizing the list's role as a referential tool rather than a strictly genetic phylogeny. The 2009 NUGL is available primarily as an online resource, hosted through platforms associated with Brill and the NUGL project, with the version dated 4 June 2009 serving as a key milestone (the second major update following Maho's 2003 revision). It includes accompanying maps approximating language locations across and alignments with codes via identifiers (e.g., A.14 for certain A-zone languages), facilitating cross-referencing with global linguistic databases. This digital format enhances accessibility for researchers, enabling updates and annotations while maintaining the list's encyclopedic utility for studies.

Updates and revisions

New Updated Guthrie List (NUGL)

The New Updated Guthrie List (NUGL) represents a significant revision of Malcolm Guthrie's zonal classification of , initiated in the early 2000s by linguist Jouni Filip Maho. Building directly on Guthrie's 1971 work, Comparative Bantu: An Introduction to the Study of the , the project aimed to incorporate advances in since the late . Maho's first update appeared in 2003, followed by a comprehensive second edition in , which integrated data from recent fieldwork across Central, Eastern, and to refine language inventories and subgroupings. The core objectives of the NUGL emphasize its role as a referential tool rather than a genetic , prioritizing practical utility for researchers in , , and . It expands Guthrie's original catalog— which listed around 440 —by incorporating over 200 additional entries for previously underdocumented varieties, dialects, and lects, while addressing ambiguities in nomenclature, such as variant spellings or disputed affiliations. This approach resolves longstanding uncertainties in without proposing new evolutionary hierarchies, making it a stable framework for cross-referencing in studies. The 2009 version inventories 631 distinct and 950 varieties in total. Structurally, the NUGL preserves Guthrie's geographic zoning system, dividing the into 16 zones labeled A through S, each subdivided by numerical and alphanumeric codes (e.g., A10 for northwestern Bantu groups like ). These codes facilitate quick location and comparison, with the 2009 online version providing an accessible, searchable format that includes alternative names, locations, and brief notes on status. Updates in the second edition refined existing codes and added provisional identifiers for emerging data, ensuring compatibility with prior scholarship while accommodating new discoveries. The NUGL has established itself as the de facto standard for Bantu language classification in modern linguistics, widely adopted in academic publications, language atlases, and databases for its balance of tradition and empirical updates. Its integration into Glottolog—a comprehensive catalog of the world's languages—further amplifies its influence, where NUGL codes serve as the primary identifiers for Bantu entries, enabling geospatial mapping and bibliographic linkage across over 500 Bantu lects. This enduring impact underscores its value in ongoing fieldwork and theoretical analyses of Bantu diversity.

Modern phylogenetic approaches

Modern phylogenetic approaches to Bantu language classification have shifted from primarily geographical zoning to genetic (cladistic) methods, employing to infer historical relationships based on shared innovations and lexical similarities. These methods treat languages as evolving entities, analogous to biological , using probabilistic models to reconstruct family trees that reflect divergence times and patterns. This contrasts with framework by prioritizing linguistic evidence over , though it often complements zonal groupings by revealing subgroups within them. A key advancement involves Bayesian , which integrates data—words with a common ancestor across languages—into models that estimate phylogenetic trees while accounting for uncertainties in data and evolutionary rates. databases, such as the Bantu Lexicon Network (BLN), provide standardized lexical datasets for over 400 , focusing on basic vocabulary like body parts and numerals to minimize borrowing effects. Software like (Bayesian Evolutionary Analysis by Sampling Trees) implements these models, applying sampling to generate dated trees from aligned sets, often calibrated with archaeological or linguistic . Seminal studies using these techniques have produced influential phylogenies. For instance, Grollemund et al. (2015) constructed a dated for approximately 400 , revealing two major expansion waves: an initial eastern route around 4,000 years (BP) avoiding dense rainforests, followed by a western branch penetrating the around 2,500 BP, influenced by habitat barriers. This phylogeny dates Proto-Bantu to roughly 4,000–5,000 BP, aligning with archaeological evidence of farming dispersals from West-Central . Similarly, Whiteley et al. (2019) applied sequence-optimization algorithms to basic vocabulary data from 95 (plus 10 Bantoid outgroups), yielding a revised that reorders traditional subgroups, such as placing Forest Bantu languages (e.g., those in zones A and B) in basal positions, suggesting they represent early divergences rather than peripheral offshoots. These approaches highlight deeper time depths and non-linear expansions not evident in Guthrie's zones, with Forest Bantu emerging as a foundational in multiple analyses, challenging prior assumptions of a uniform southward spread. By quantifying divergence rates (e.g., via relaxed molecular clocks adapted for ), they offer testable hypotheses for prehistory, such as habitat-driven pacing that slowed migrations through rainforests by up to 50% compared to routes. Ongoing refinements incorporate larger datasets and hybrid models to address incomplete sampling, bridging lexical with the New Updated Guthrie List's zonal updates. Recent studies, such as Barbieri et al. (2022), have used phylogeographic methods on lexical data from over 400 to confirm expansion through Central African rainforests around 3,000–4,000 , while Fan et al. (2023) integrated genomic data with linguistic phylogenies to trace genetic diversity gradients aligning with dated language trees.

Criticisms and limitations

Geographical vs. genetic classification

Guthrie's classification system organizes the into 16 zones labeled A through S, primarily according to their geographical distribution across Central, Eastern, and , combined with observed lexical similarities rather than evidence of shared innovations that would indicate genetic relatedness. This zonal framework was designed as a practical referential tool, grouping languages based on areal contiguity and comparable vocabulary sets derived from comparative wordlists, but it explicitly prioritizes location over phylogenetic descent. For instance, languages within a zone were selected if they exhibited common linguistic features alongside spatial proximity, allowing for a convenient indexing system without implying strict genealogical hierarchies. A key limitation of this approach is its tendency to "lump" genetically distinct branches into the same due to their current locations, often masking true historical relationships and ignoring ancient migrations that reshaped linguistic distributions. For example, the system does not reflect monophyletic clades for broader divisions like "West " or "East ," leading to groupings that reflect contact and more than common ancestry in some areas. This geographical bias results in zones that capture areal but fail to delineate deeper genetic clades, complicating efforts to trace proto-Bantu divergences. Phylogenetic analyses using computational methods on lexical, phonological, and grammatical data have provided evidence contradicting the zonal structure, revealing non-geographical clades that cross Guthrie's boundaries. Notably, studies place some northwestern , such as those in Zone A (e.g., Ambele), in basal paraphyletic grades rather than monophyletic units with geographically adjacent groups, underscoring how migrations disrupted zonal alignments. These findings demonstrate that while can indicate broad relatedness, it alone does not resolve phylogeny without accounting for historical sound changes and innovations. Despite these shortcomings, zonation endures as a standard reference for cataloging , facilitating fieldwork and comparative studies, though it is widely recognized as inadequate for historical reconstruction due to its emphasis on geography over . Modern scholars advocate supplementing or replacing it with phylogenetically informed classifications to better reflect the family's evolutionary .

Issues with language assignment

One significant challenge in applying classification arises from , where gradual linguistic variations across regions blur the boundaries between distinct languages or subgroups, complicating precise assignment to zones and codes. For instance, the varieties form a spanning coastal , leading to the assignment of provisional codes such as G401, G402, and G403 in extensions of to accommodate closely related but geographically dispersed forms that do not fit neatly into standard series. Similarly, in southern , clusters like Shona (S10–S20) exhibit internal dialectal diversity that has prompted debates over whether certain varieties, such as those bordering zone J languages, should retain S-series codes or be re-evaluated based on shared innovations, highlighting the tension between geographical proximity and linguistic divergence. Extinct languages pose additional ambiguities, as limited documentation often prevents reliable placement within Guthrie's zones; many such varieties, particularly from early contact zones in , remain unassigned or tentatively linked to broader groups without confirmed codes. This issue is exacerbated for poorly attested extinct forms in northwest (zone A), where sparse lexical data hinders comparison with surviving relatives. Post-1948 revisions introduced re-coding confusions, with languages frequently shifted between zones or series in subsequent classifications, resulting in duplicates or inconsistencies across linguistic literature. A notable example is Ngangela, coded as K12b by Guthrie (1971) but reassigned to K19 in the Tervuren classification and K20 in SIL International's listings, reflecting methodological differences like versus phonological criteria and leading to persistent errors in cross-referencing older sources. Marginal cases further complicate assignments, particularly at the fringes of the Bantu domain where "Semi-Bantu" or exhibit partial Bantu-like features but fall outside strict criteria, such as systems or vocabulary overlap; Guthrie explicitly excluded such transitional forms, like certain West African varieties (e.g., those once termed Sub-Bantu in ), to maintain referential purity. In zone A, bordering Ubangi languages (Adamawa-Ubangi branch), contact-induced features like labial-velar stops in Bantu varieties such as Lingombe (C41) create ambiguous statuses, as Ubangi substrates influence Bantu without warranting reclassification. Fulani (an Atlantic ) was never considered for inclusion, underscoring Guthrie's focus on core Bantu traits despite geographical proximity in some areas. The New Updated Guthrie List (NUGL) addresses many of these issues by standardizing codes—retaining Guthrie's originals for established languages while assigning new alphanumeric extensions (e.g., Axxx for unplaced northwest varieties)—to over 500 entries, including extinct and marginal forms, thereby reducing re-coding variability. However, legacy errors from pre-NUGL literature and databases persist, as older publications continue to reference outdated zone shifts, perpetuating confusion in studies.

References

  1. [1]
    The classification of the Bantu languages. -- : Guthrie, Malcolm, 1903
    Jul 19, 2019 · The classification of the Bantu languages. -- 91 p. : "One of a series of publications issued in connexion with the Handbook of African languages.<|control11|><|separator|>
  2. [2]
    [PDF] Revising the Bantu tree - American Museum of Natural History
    Guthrie's classification (Guthrie, 1948, 1967–1971) divides Bantu languages into geographical “Zones” labelled A–S (Fig.
  3. [3]
    None
    Summary of each segment:
  4. [4]
    The Classification of the Bantu Languages. By Malcolm Guthrie, Ph ...
    The Classification of the Bantu Languages. By Malcolm Guthrie, Ph.D. Published for the International African Institute by the Oxford University Press, 1948.
  5. [5]
    [PDF] A Survey Report for the Bantu Languages
    Jan 25, 2002 · “A revised version of Guthrie's classification of Bantu.” In D. Nurse and. G. Philippson (eds). Mann, M., and D. Dalby, et al. 1987. A ...Missing: review | Show results with:review<|control11|><|separator|>
  6. [6]
    REVIEWS MALCOLM GUTHRIE, The Classification of the Bantu ...
    The book is divided into four sections: I) Introduction, II) Identifying the. Bantu Languages, III) Methods of Classification and IV) The Bantu languages.
  7. [7]
    The Prehistorical Implications of Guthrie's Comparative Bantu. Part II
    Jan 22, 2009 · However, as you have access to this content, a full PDF is available via the 'Save PDF' action button. The second part of this article opens ...
  8. [8]
    PROFESSOR MALCOLM GUTHRIE (1903-1972)
    ... years he spent in the Belgian Congo from 1932 to. 1940 as a missionary of the Baptist Missionary Society. His first degree had been a London B.Sc.(Eng.) in ...Missing: early career
  9. [9]
    Obituary: Professor Malcolm Guthrie, 1903–1972
    Malcolm Guthrie was born on 10 February 1903. His first degree was a B.Sc. in metallurgy, but he subsequently trained for the Baptist Ministry, and then spent.Missing: early career
  10. [10]
    Professor Malcolm Guthrie - jstor
    ... Malcolm. Guthrie FBA, Professor Emeritus of Bantu languages in the University of London and, until 1970, Head of the Department of Africa at the School of ...Missing: biography | Show results with:biography
  11. [11]
    Papers of Professor Malcolm Guthrie - SOAS Archive Catalogue
    During two years study leave 1942-1944 he undertook a linguistic field-study throughout Bantu Africa, collecting much of the data he used in his comparative ...
  12. [12]
    Malcolm Guthrie: The Bantu Languages of Western Equatorial Africa ...
    Malcolm Guthrie: The Bantu Languages of Western Equatorial Africa. 94 pp. 1 folding map. London: Cumberlege (for International African. Institute), 1953, 15s.
  13. [13]
    The Bantu Languages of Western Equatorial Africa. 94 pp. 1 folding ...
    MALCOLM GUTHRIE : The Bantu Lan- guages of Western Equatorial Africa. 94 pp. 1 folding map. London: Cumberlege (for International African. Institute), 1953, 15s ...
  14. [14]
    Comparative Bantu Word List - ComparaLex
    The "Comparative Bantu Word List" refers to the glosses for the proto-Bantu reconstructions presented in Guthrie's four-volume work. Used by: 0 languages.Missing: details 440 varieties 250 core
  15. [15]
    'Comparative Bantu' by Malcolm Guthrie | Africa | Cambridge Core
    'Comparative Bantu' by Malcolm Guthrie. Published online by Cambridge University Press: 23 January 2012. Article. Article; Metrics. Article contents.
  16. [16]
    Revising the Bantu tree - Whiteley - 2019 - Wiley Online Library
    Aug 31, 2018 · Guthrie's classification was standard before ISO 639-3 (e.g. Simons and Fennig, 2017) and Glottolog coding (Hammarström et al., 2017), and ...
  17. [17]
    NUGL Online The online version of the New Updated Guthrie List, a ...
    All examples from Bantu languages in this paper are specified with an ISO-639 and a Guthrie code. The latter is a referential classification of the Bantu ...Missing: odd | Show results with:odd
  18. [18]
    (PDF) Proto-Bantu and Proto-Niger-Congo: Macro-areal Typology ...
    Northwest Bantu languages possess it. As opposed to most of the above features ... that Bantu is more conservative than many Non-Bantu languages of Niger-Congo.
  19. [19]
    [PDF] An introduction to Reconstructing Proto-Bantu Grammar - Zenodo
    Aug 1, 2022 · 'Proto' here means that this ancestral language is a reconstruction from present-day Bantu languages, not known from actual his- torical records ...
  20. [20]
    [PDF] A typology of northwestern Bantu gender systems - HAL
    Jan 25, 2023 · It is a tradition within Bantu studies to use odd numbers to refer to singular classes and even numbers to refer to plural classes.
  21. [21]
    None
    Summary of each segment:
  22. [22]
    [PDF] bantu - languages - Amazon S3
    Aug 21, 2018 · The Bantu area was divided by Guthrie into fifteen zones, lettered A, B, C, D, E, F, G, H, K,. L, M, N, P, R, S. Tervuren scholars added a ...
  23. [23]
    [PDF] Ikoma Vowel Harmony: Phonetics and Phonology
    Vowel harmony operates in prefixes, stems and suffixes, but the harmony patterns in each domain are quite different. Ikoma's harmony patterns are unusual and ...
  24. [24]
    [PDF] The Historical Interpretation of Vowel Harmony in Bantu - LARRY M ...
    In (2a) we see that both /1/ and /u/ however, only /1/ is lowered after /e/. is no change after the vowels /i/, //, occur in the vowel harmony databas. Lega D.
  25. [25]
    Along the Indian Ocean Coast: Genomic Variation in Mozambique ...
    ... zone G; 4) the fourth group, formed by Mozambicans and South Africans, is an heterogeneous set of populations covering linguistic zones N, P, and S, who ...
  26. [26]
    Pokomo language - Wikipedia
    Pokomo (Kipfokomo) is a Bantu language spoken primarily along the East African coast near Tana River in the Tana River District by the Pokomo people of Kenya.Missing: zone | Show results with:zone
  27. [27]
    Bantu Language Trees Reflect the Spread of Farming across ... - jstor
    Among equatorial West Bantu languages, the following two clades were recovered: zone H languages plus Teke B73 formed a clade, and seven zone C langu- ages ...
  28. [28]
    Guthrie's (1948) zone H on his geographic-referential map of the ...
    In Guthrie's (1948) classification of the Bantu languages-a mostly geographic-referential system, underpinned by some typological features-the Kikongo ...
  29. [29]
    [PDF] Divergence and contact in Southern Bantu language and population ...
    Aug 1, 2022 · The new phylogenetic classification presented here includes 79 Bantu lan- guages. Of them, 34 are Southern Bantu—that is, Guthrie's zone S plus ...
  30. [30]
  31. [31]
    The early history of clicks in Nguni - John Benjamins
    $$35.00Mar 9, 2022 · This relative chronology provides new insights into how the relations between Bantu- and Khoisan-speaking communities in southern Africa ...Missing: PDF | Show results with:PDF
  32. [32]
    [PDF] Khoisan influence on southwestern Bantu languages - HAL
    Jul 16, 2020 · Abstract: In this article, we show that the influence of Khoisan languages on five southwestern Bantu click languages spoken in the ...
  33. [33]
    Divergence and contact in Southern Bantu language and population ...
    The new phylogenetic classification presented here includes 79 Bantu languages. Of them, 34 are Southern Bantu—that is, Guthrie's zone S plus Lozi (K21)—from ...
  34. [34]
    Malcolm Guthrie and the Reconstruction of Bantu Prehistory*
    May 13, 2014 · The Bantu expansion is one of the most important large-scale problems in African culture history -- an epic enacted over two or three thousand years and ten ...
  35. [35]
    (PDF) Bantu Lexical Reconstruction - ResearchGate
    Jun 7, 2016 · Guthrie, Malcolm (1971), Comparative Bantu: An Introduction to the Comparative.
  36. [36]
    [PDF] Introduction - HAL-SHS
    Mar 10, 2020 · The Bantu languages are mainly spoken between Cameroon's South-West region (4°8'N and 9°14'E) in the North-West, southern Somalia's Barawe ( ...
  37. [37]
  38. [38]
    NUGL Online The online version of the New Updated Guthrie List, a ...
    Corpus ID: 63998402. NUGL Online The online version of the New Updated Guthrie List, a referential classification of the Bantu languages.Missing: Zone | Show results with:Zone
  39. [39]
    Maho, Jouni Filip 2006-2008 - Glottolog 5.2
    Maho, Jouni Filip. 2006-2008. NUGL online: the web version of the New Updated Guthrie List, a referential classification of the Bantu languages.
  40. [40]
    Bantu expansion shows that habitat alters the route and pace of ...
    We use a new dated phylogeny of ∼400 Bantu languages to show that migrating Bantu-speaking populations did not expand from their ancestral homeland in a “random ...Missing: waves | Show results with:waves
  41. [41]
    Phylogeographic analysis of the Bantu language expansion ... - PNAS
    Aug 1, 2022 · We deployed a Bayesian phylogenetic analysis for 419 Bantu and Bantoid languages' lexical data (3, 75) (see Materials and Methods). We ...
  42. [42]
    Bayesian phylogenetic analysis of linguistic data using BEAST
    Sep 23, 2021 · This article introduces Bayesian phylogenetics as applied to languages. We describe substitution models for cognate evolution, molecular clock ...Abstract · Introduction · Choosing the best analysis · Exploring the space of trees...
  43. [43]
    The Bantu area: (towards clearing up) a mess (2003) - Academia.edu
    According to this, each Bantu language that Guthrie classified was given a three (occasionaly four) character code, consisting of an upper-case letter ...
  44. [44]
    [PDF] A segmental phonology of the Oku language - SIL.org
    These languages, stretching from Senegal to Cameroon, were also called "Semi-Bantu" by some, such as Sir Harry Johnston (Cole. 1971:12). Greenberg departed from ...