Fact-checked by Grok 2 weeks ago

Hunspell

Hunspell is a free and open-source spell-checking library and command-line tool designed primarily for languages with rich morphology, complex word compounding, and diverse character encodings.^[1] It originated as an enhanced successor to MySpell, the spell checker used in early versions of OpenOffice.org, and was developed to address limitations in handling agglutinative languages like Hungarian, Finnish, and Turkish.^[1]
Key improvements include support for Unicode, advanced morphological analysis, stemming, and word generation, enabling more accurate spell-checking and suggestions through n-gram similarity matching and pronunciation-based corrections.^[1] Hunspell is licensed under a tri-license of GNU General Public License (GPL), GNU Lesser General Public License (LGPL), and Mozilla Public License (MPL), allowing flexible integration into both open-source and proprietary applications.^[2]
It maintains backward compatibility with MySpell dictionaries while supporting affix files for rule-based morphology, making it adaptable for over 100 languages through community-contributed dictionaries.^[1] Widely adopted in major software, Hunspell powers spell-checking in LibreOffice, Apache OpenOffice, Mozilla Firefox, Thunderbird, Google Chrome, macOS (since version 10.6), Adobe InDesign, Opera, and translation tools like SDL Trados and memoQ.^[1]
The library is implemented in C++ and offers bindings for numerous programming languages and interfaces, including Java, Python, Perl, Ruby, .NET, Android, and UNO, facilitating its use in diverse environments from desktop applications to embedded systems.^[1] Development of Hunspell has been led by László Németh since its inception around 2005, with sponsorship from organizations such as the FSF.hu Foundation, IMEDIA, Budapest University of Technology and Economics, OpenTaal Foundation, and the Dutch Language Union.^[1]
Ongoing maintenance occurs through the GitHub repository, where contributions focus on improving performance, adding dictionary support, and integrating with translation platforms like Weblate for collaborative localization.^[2]

Introduction

Overview

Hunspell is a free, open-source spell checker and morphological analyzer library, accompanied by a command-line tool, designed primarily to handle languages featuring rich morphology, complex compounding, and challenging character encodings.^[2]^[3] It excels in processing agglutinative languages such as Hungarian and Finnish, as well as compound-heavy languages like German, where traditional spell checkers often struggle with inflectional variations and word formation rules.^[1] The library provides robust support for morphological analysis, enabling stemming, generation, and detailed word breakdown, while its spell-checking capabilities include intelligent error detection and correction suggestions tailored to linguistic complexities.^[4] Released under a tri-license of LGPL, GPL, and MPL, Hunspell ensures broad compatibility and adoption in open-source ecosystems.^[2] Its current stable version, 1.7.2, was issued on December 29, 2022, with ongoing maintenance and minor updates continuing into 2025 to address compatibility and performance needs across distributions.^[5]^[6] Evolving from earlier tools like MySpell, it maintains backward compatibility with existing dictionaries while introducing enhancements for modern requirements.^[1] A key strength of Hunspell lies in its Unicode support, accommodating the first 65,535 Unicode characters for affix rules and enabling handling of diverse scripts and encodings beyond basic 8-bit limitations.^[1]^[7] This feature, combined with its morphological tools, positions it as a versatile solution for multilingual environments, powering spell checking in applications such as LibreOffice and Mozilla Firefox.^[3]

Design Principles

Hunspell was designed to address the limitations of earlier spell checkers in handling languages with rich morphology and complex compounding, such as agglutinative languages like Hungarian or those with intricate affixation like German.^[2]^[8] Its core innovation lies in advanced affix handling, including support for homonyms, circumfixes, fogemorphemes, and zero morphemes, which enable morphological generation and analysis far beyond basic dictionary lookups.^[2] Additionally, it incorporates twofold affix stripping to efficiently manage multiple layers of suffixes and prefixes, reducing the number of rules needed for complex word formations.^[8] Compounding rules allow recognition of arbitrarily long compounds and affixation within them, ensuring accurate spell checking for word-level writing systems.^[2] A key design principle is backward compatibility with Ispell and MySpell formats, facilitating seamless migration of existing dictionaries and minimizing adoption barriers for users transitioning from those systems.^[2]^[8] This compatibility is enhanced by innovations like alias compression for affix rules, which optimize storage without sacrificing functionality.^[2] Hunspell emphasizes high customizability through configurable suggestion algorithms, word-part replacement tables, and over 65,000 affix classes, allowing tailored implementations for diverse linguistic needs.^[8] Efficiency in suggestion generation and overall performance is prioritized via optimizations for large vocabularies and quick processing, making it suitable for real-time applications.^[2] As a library, it provides C++ and C APIs, shared library support, and bindings for multiple languages, promoting integration into varied software environments from desktop applications to web browsers.^[2] To broaden adoption, Hunspell employs a tri-license model under the Mozilla Public License (MPL), GNU Lesser General Public License (LGPL), and GNU General Public License (GPL), accommodating both open-source and proprietary uses.^[2]

History and Development

Origins from MySpell

Hunspell's development began around 2005 under the leadership of László Németh, a Hungarian developer, as a reimplementation and extension of MySpell to address shortcomings in spell checking for morphologically rich languages.^[1] MySpell itself was a C++ port of the Ispell spell checker, originally created for integration into OpenOffice.org to provide efficient affix compression and dictionary handling.^[2] A key motivation for Hunspell stemmed from MySpell's limitations, particularly its inadequate support for complex morphological rules and word compounding, which proved insufficient for languages like Hungarian that rely heavily on affixation and compound formation.^[1] To overcome these issues, Hunspell incorporated enhanced affix-based mechanisms, enabling more accurate analysis and generation of word forms while maintaining backward compatibility with MySpell dictionaries.^[2] The library saw its initial integration into OpenOffice.org with version 2.0.2, released in February 2006, where it fully replaced MySpell as the default spell checker.^[9] This adoption marked a significant step in improving multilingual support within the suite. Early efforts in Hunspell's creation emphasized support for the Hungarian language, driven by Németh's background and sponsorship from Hungarian organizations such as the FSF.hu Foundation and Budapest Technical University's Media Research Centre.^[1]

Key Milestones

Hunspell's development gained significant momentum following its initial transition from MySpell, with key adoptions marking its early integration into major open-source projects. In 2006, Hunspell was officially adopted as the default spell checker in OpenOffice.org 2.0.2, replacing MySpell and enabling enhanced support for complex morphologies in office productivity applications.^[10] By 2008, Hunspell saw broader adoption in web technologies through its integration into Mozilla Firefox 3 and Thunderbird, providing inline spell checking for email and browsing with improved handling of agglutinative languages.^[1] During the 2010s, the project advanced through the 1.2 to 1.3 version series, which introduced enhanced Unicode 6.0 support for broader character encoding compatibility and refined compounding rules to better manage word formation in languages like German and Finnish.^[11]^[1] A notable enhancement came in 2016 with the release of version 1.6.0, which optimized suggestion algorithms for faster performance, reducing generation times through improved n-gram matching and limiting overgeneration in compound words.^[12] In 2022, version 1.7.2 was released, incorporating the SPELLML XML API to enable runtime dictionary extensions and custom affix rules without recompilation, facilitating easier integration in dynamic environments. From 2023 to 2025, Hunspell underwent ongoing maintenance with bug fixes and compatibility updates, including a port to R via the hunspell package version 3.0.7, which extended its utility for statistical computing and text analysis in R environments; the project's GitHub repository maintained active development, accumulating over 265 open issues as of late 2025.^[13]^[14] Throughout its evolution, Hunspell solidified its role as a default spell checker in Google Chrome and LibreOffice, powering spell checking for billions of users across browsers and office suites.^[1]

Features

Spell Checking Capabilities

Hunspell's core spell checking functionality relies on dictionary lookup, where words are verified against a base dictionary file (.dic) containing valid word forms, augmented by affix rules (.aff) that enable stripping and reapplication of prefixes and suffixes to generate inflected or derived forms.^[2] This twofold affix processing allows efficient handling of morphological variations without enumerating every possible word in the dictionary, making it suitable for languages with rich inflection like Hungarian or Turkish.^[8] The library supports Unicode encoding via UTF-8, enabling spell checking of multilingual text and characters beyond basic ASCII, while also accommodating legacy 8-bit encodings such as ISO-8859-1 through configurable SET directives in affix files.^[8] For complex compounding, Hunspell employs recursive breaking and rule-based validation, supporting arbitrary-length compounds in languages like Dutch, Swedish, German, and Finnish via flags such as COMPOUNDFLAG and COMPOUNDRULE to define allowable combinations and prevent overgeneration.^[2] Additional options include ignore lists to exclude specific characters or patterns from checking, such as diacritics in Arabic script via the IGNORE directive, and support for personal dictionaries that allow users to add custom words with optional affixation.^[8] Case sensitivity is configurable, with features like KEEPCASE to restrict uppercase forms and CHECKCOMPOUNDCASE to enforce proper casing at compound word boundaries, accommodating language-specific rules such as German ß or Turkish dotted i.^[2] Hunspell integrates hyphenation capabilities through compatibility with the Hyphen library's pattern-based rules, using BREAK and COMPOUNDRULE options to identify hyphenation points and handle hyphenated compounds during spell checking.^[15] This extends basic error detection to include hyphenation-aware validation, though advanced morphological parsing for stem identification is handled separately.^[2]

Morphological Analysis

Hunspell's morphological analyzer decomposes input words into their base stems and associated affixes by applying rules defined in dictionary and affix files, enabling the processing of both inflectional morphology—such as tense, number, or case endings—and derivational morphology, including prefixes and suffixes that alter word class or meaning. This rule-based approach allows for precise linguistic breakdown, as seen in the analysis of "drinkable," which yields the stem "drink" with the derivational suffix flag "ds:able" and part-of-speech tag "po:verb."^[8] In generation mode, Hunspell constructs inflected or derived word forms from a given stem by applying specified affix rules, facilitating applications such as grammar checking where correct forms must be verified against expected paradigms. For instance, starting from the stem "foot," it can generate the plural "feet" using an inflectional rule flagged with "is:plural," ensuring compatibility with syntactic requirements in downstream processing.^[8] The system supports over 65,000 affix classes per dictionary, organized via flags that permit complex combinations of prefixes and suffixes, which is essential for handling the rich, agglutinative morphologies of languages like Turkish and Estonian. This capacity enables twofold affix stripping—applying multiple layers of suffixes in sequence—to parse highly compounded or inflected words without performance degradation.^[8] Output from the analyzer includes part-of-speech tags (e.g., "po:noun"), lemma extraction via the stem field (e.g., "st:foot" for irregular forms like "feet"), and full paradigm generation that enumerates all possible inflections for a given lemma. These formats are delivered as space- or tab-separated fields, supporting integration into natural language processing pipelines.^[8] Unlike simple stemming algorithms, which provide only approximate root forms through heuristic suffix removal or statistical methods, Hunspell's morphological analysis delivers a complete, rule-driven decomposition with explicit affix and feature annotations, preserving linguistic accuracy for morphologically complex languages. This depth enhances its utility beyond basic spell checking by enabling detailed error diagnosis in inflected forms.^[8]

Suggestion Algorithms

Hunspell employs a multi-stage approach to generate spelling suggestions for misspelled words, prioritizing efficiency and accuracy through targeted error correction strategies. The process begins with near-miss techniques that simulate common typing errors, such as single-letter swaps (e.g., adjacent key transpositions on keyboards), deletions, insertions, and replacements based on character proximity defined in the affix file's KEY option.^[16] These edits are generated systematically and checked against the dictionary to identify valid words, with additional support for character movements and double swaps to capture more complex mistakes.^[17] Rule-based replacements further enhance this stage via REP tables in the affix file, which map frequent misspellings to corrections (e.g., "teh" to "the"), allowing customization for language-specific or user-defined errors.^[16] If near-miss edits yield insufficient results, Hunspell advances to n-gram-based similarity matching, where it computes overlaps between the misspelled word and dictionary entries using adjustable parameters like MAXNGRAMSUGS to limit the number of candidates (default 5, range 0-10).^[16] Phonetic encoding provides an additional layer for handling pronunciation-based errors, utilizing a table-driven transcription algorithm borrowed from Aspell via the PHONE directive in the affix file; this maps characters to phonetic equivalents, enabling suggestions for non-orthographic languages or noisy input.^[16] For languages like English or those with phonetic dictionaries, this can approximate algorithms such as Double Metaphone, though Hunspell's implementation focuses on customizable PHONE tables for broader applicability.^[18] Suggestions are ranked primarily by the order of generation stages—REP replacements receive highest priority, followed by exact edit matches, n-gram similarities (weighted by edit distance and overlap length), and phonetic matches—while incorporating dictionary frequency implicitly through stem selection and morphological fit for affixed forms.^[17] Compound word support allows word-part suggestions, breaking potential compounds and applying edits to segments, with limits like MAXCOMPOUNDSUGS to prevent excessive computation.^[16] Language-specific handling, such as the LANG option for Hungarian vowel harmony rules, ensures culturally attuned corrections by restricting invalid combinations during suggestion generation.^[16] Performance optimizations enable real-time use in applications like text editors, including caps limits on suggestion counts (e.g., MAXSUGGESTIONS) and early termination if sufficient high-quality candidates are found, reducing computational overhead in large dictionaries.^[17] This integration with morphological analysis allows stem-level suggestions, where corrections align with valid affixations for inflected languages.^[16]

Technical Implementation

Dictionary Format

Hunspell employs a dual-file format for its dictionaries, consisting of a main dictionary file with the extension .dic and an accompanying affix file with the extension .aff. The .dic file serves as the primary repository of words, while the .aff file provides the rules and configurations necessary for processing those words, enabling morphological analysis and spell checking.^[19] The .dic file is structured as a plaintext list of words, one per line, beginning with an approximate word count on the first line to optimize hash memory allocation for efficient lookup. Each entry typically consists of a base word followed by optional numeric or character flags separated by a slash, which indicate the applicability of specific affix rules defined in the .aff file; for example, work/AB denotes that the word "work" can be modified by affixes associated with flags A and B. Slashes within words themselves are escaped using a backslash (e.g., word\/), and the format supports up to thousands of entries for practical efficiency in spell-checking operations.^[19] Encoding for multilingual support is declared in the .aff file using the SET directive, such as SET UTF-8 for Unicode compatibility or SET ISO8859-1 for legacy 8-bit encodings, ensuring proper handling of characters across various languages including those with diacritics or non-Latin scripts. This declaration applies to both the .aff and associated .dic files, facilitating international dictionary development.^[19] Compound word permissions are managed through flags in the .dic file, which reference rules in the .aff file, such as the COMPOUNDRULE option that allows pattern matching for valid combinations (e.g., permitting "blackbird" based on predefined regex-like patterns for compounding). These flags enable flexible construction of compound forms without enumerating every possibility in the dictionary.^[19] Extension mechanisms include personal word lists, which can be appended directly to a .dic file as additional plaintext entries with optional flags, allowing users to customize dictionaries for specific needs like adding domain-specific terms (e.g., specialterm/C). Such additions override or supplement the base dictionary during runtime, supporting user-specific adaptations without altering core files.^[19]

Affix Rules

Hunspell employs affix rules to manage morphological inflections and derivations, enabling the spell checker to recognize and generate word forms from base stems through prefixes and suffixes. These rules are specified in the affix file (typically with a .aff extension) and support complex language morphologies, such as those in agglutinative languages like Hungarian or Finnish. The core affix types are prefixes (PFX) and suffixes (SFX), each defined with conditions that determine applicability to stems, allowing for efficient handling of derivations without exhaustively listing all variants in the dictionary.^[20] The syntax for prefix rules begins with a header line: PFX <flag> <cross_product> <number>, where <flag> identifies the affix class (e.g., a single character like 'A'), <cross_product> is 'Y' to permit combination with opposite affixes or 'N' to restrict it, and <number> indicates the count of following rules. Each subsequent rule line follows: PFX <flag> <stripping> <affix> [<condition> [<morphological_fields>]]. Here, <stripping> specifies characters removed from the stem's beginning (0 for none), <affix> is the prefix added (0 for none), <condition> is a regex-like pattern (e.g., . for any character or [^y] for not ending in 'y'), and optional <morphological_fields> provide additional data like part-of-speech tags. For example, a rule PFX A Y 1 followed by PFX A 0 re . adds the prefix "re-" to any stem, enabling forms like "rework" from "work". Suffix rules mirror this structure but apply to the end: SFX <flag> <stripping> <affix> [<condition> [<morphological_fields>]], such as SFX B Y 2 with lines SFX B 0 ed [^y] and SFX B y ied y to generate "worked" or "tried" from stems ending appropriately.^[20]^[21] Flags serve as identifiers for affix classes and support multiple formats for flexibility: default 8-bit ASCII characters, UTF-8 for international scripts, two-character "long" flags, or numeric values up to 65,000 via the FLAG num directive, allowing over 65,000 distinct classes. The cross-product mechanism (Y/N) facilitates generation of combinations, such as applying both prefixes and suffixes to a stem for disjunctive or circumfix rules, while continuation flags (e.g., /Y in the affix field) enable chained applications within the same class. For compound words, specific flags and options enhance validation: COMPOUNDFLAG marks allowable compound components, COMPOUNDBEGIN, COMPOUNDMIDDLE, and COMPOUNDEND restrict positions in sequences, COMPOUNDPERMITFLAG allows affixes inside compounds, and COMPOUNDFORBIDFLAG prohibits them. Additionally, COMPOUNDMIN sets the minimum length for compound parts (default 3), and CHECKCOMPOUND prevents invalid compounds mimicking words with replacement errors.^[8]^[20] Advanced syntax elements include COMPLEXPREFIXES, which permits multiple prefix stripping for languages with right-to-left affixation, and TWOAFFIX (or CIRCUMFIX), enabling bidirectional affix application like simultaneous prefix and suffix stripping (e.g., for "un-friend-ly"). Compound validation extends via COMPOUNDRULE for pattern-based checks using regex-like expressions with flags, and options like CHECKCOMPOUNDCASE to enforce case consistency at boundaries or CHECKCOMPOUNDDUP to forbid repetitions. Language-specific adaptations, such as COMPOUNDSYLLABLE for syllable-based limits in Hungarian, integrate with these rules. Limitations include a default single-pass stripping per affix type (extendable via flags), conditions bounded by word length unless FULLSTRIP is set, and up to 65,000 classes for performance, though UTF-8 flags may underperform on certain architectures like ARM. These features collectively provide morphological flexibility while maintaining computational efficiency.^[8]^[21]^[20]

Algorithm Overview

Hunspell's spell checking algorithm begins with tokenization of input text, where words are identified using predefined break characters such as hyphens and apostrophes to delineate boundaries, ensuring accurate segmentation even in languages with complex punctuation.^[22] Following tokenization, the system applies normalization to handle variations in case, encoding, and character representations, converting inputs to a canonical form compatible with the dictionary, such as UTF-8 or ISO8859-1, through optional input/output conversion tables.^[22] The core validation step involves affix stripping, where prefixes and suffixes are iteratively removed according to rules defined in the affix file—supporting up to twofold suffix stripping for agglutinative languages— to match the remaining stem against the dictionary; if a match is found, affixes are regenerated to confirm the original word's validity.^[21]^[22] For languages featuring compound words, Hunspell employs a recursive breakdown mechanism that decomposes potential compounds into subwords using compound flags to mark eligible dictionary entries, while enforcing minimum and maximum length rules as well as checks for duplicates and case sensitivity to prevent invalid formations.^[22] This process utilizes hash tables for dictionary lookups, enabling average O(1) time complexity for stem matching and efficient handling of large lexicons with minimal memory overhead through techniques like alias compression.^[21]^[22] Error tolerance in suggestion generation relies on a Levenshtein-like edit distance calculation, limited to a small number of operations such as insertions, deletions, substitutions, and swaps—typically capped at two changes—to identify plausible corrections, supplemented by replacement tables for common phonetic or typographical errors.^[21]^[22] The morphological analysis pipeline extends beyond simple stemming by first reducing words to their base forms via affix rules and then enumerating possible paradigms, including part-of-speech tags and inflectional details, when full analysis is requested through library functions like analyze.^[21]

Applications and Usage

Integrated Software

Hunspell serves as the default spell-checking engine in several prominent open-source office suites. LibreOffice and Apache OpenOffice have integrated Hunspell since 2006, replacing the earlier MySpell component in OpenOffice.org version 2.0.2, with support for custom dictionaries that can be embedded directly into documents for personalized spell-checking needs.^[23]^[1] In web and email applications, Hunspell powers inline spell checking starting from Mozilla Firefox version 3 (2008) and Thunderbird version 3 (2009), enabling real-time correction of text in web forms, composition windows, and other editable fields.^[24]^[1]^[25] Google Chrome incorporates Hunspell for form-based and page-level spell checking, utilizing optimized binary dictionary files (.bdic) derived from standard Hunspell affix (.aff) and dictionary (.dic) formats to handle multilingual input efficiently.^[26]^[1] Beyond these core applications, Hunspell finds use in various other environments. On macOS, it has been available since version 10.6 and can be installed via Homebrew for integration into tools like text editors.^[1]^[27] Ports exist for Android, allowing embedding in mobile apps through JNI wrappers for on-device spell checking.^[1]^[28] Proprietary software such as SDL Trados Studio employs Hunspell as its primary spell checker, supporting custom and language-specific dictionaries for translation workflows.^[29]^[1] Overall, Hunspell's adoption extends to over 100 languages, facilitated by community-maintained dictionaries distributed through repositories like those for LibreOffice and Mozilla add-ons, ensuring broad accessibility across diverse linguistic contexts.^[30]^[1]

Command-Line Interface

Hunspell provides a standalone command-line interface for performing spell checking, morphological analysis, and related tasks on text files or standard input. The tool is invoked using the hunspell executable, which supports batch processing of files and interactive editing sessions. It is designed to be compatible with Ispell's interface, allowing seamless integration into scripts and text processing pipelines.^[31]^[2] The basic syntax for checking a file with a specified dictionary is hunspell -d <dictionary> <file>, where <dictionary> refers to the base name of the dictionary files (e.g., en_US for the American English dictionary, assuming .dic and .aff files are available). Without a file argument, Hunspell reads from standard input. For example, hunspell -d en_US textfile.txt processes the specified text file using the English dictionary and enters interactive mode by default if errors are found. Dictionaries can be chained for compound support, such as hunspell -d en_US,en_med medical.txt to include medical terminology. The tool respects locale environment variables like LANG or LC_ALL to select default dictionaries if none are specified.^[31]^[2] Key options control input handling, output verbosity, and processing modes. The -l flag lists only misspelled words, one per line, making it suitable for piping into other tools: hunspell -d en_US -l textfile.txt. For pipe mode, -a enables reading from standard input and outputs a formatted stream with indicators like * for correct words, & for misspelled words followed by suggestion counts and alternatives (e.g., & exsample 4 0: example, examples, sampler, sample), - for compounds, and # for words with no suggestions. The -s option stems words to their root forms, while -m performs morphological analysis, outputting details like part-of-speech tags. Input encoding can be set with -i <encoding>, and special formats like HTML (-H), TeX (-t), or nroff (-n) are supported. Personal dictionaries for user-specific additions are managed via -p <path>, defaulting to $[HOME](/page/Home)/.hunspell_<dictionary>. The --check-url flag treats URLs, emails, and paths as valid without checking.^[31]^[2] In interactive mode, Hunspell prompts for each misspelled word, offering suggestions and commands for correction. Users can replace the word (R followed by a suggestion number), add it to the personal dictionary (A), ignore it (I), or quit (q). This mode facilitates on-the-fly editing, with changes applied to the input file if writable. For batch scripts, output can be redirected; for instance, a simple error-checking script might use hunspell -d en_US -l < input.txt > errors.txt to isolate issues for review. Integration with text processors is common, such as piping through aspell wrappers or embedding in Makefiles for document validation. The tool's Ispell compatibility ensures outputs align with legacy workflows, including suggestion formats like & word N offset: sug1, sug2.^[31]^[2]

Dictionary Management

Hunspell dictionaries are created by compiling word lists into paired .dic and .aff files, which define the vocabulary and morphological rules respectively. The .dic file contains a header specifying the number of words followed by the word list, while the .aff file outlines affixation rules and flags; these can be generated manually using text editors or through specialized tools such as affixcompress for compressing affix data and wordforms for generating inflected forms from base words. For custom languages, users start with a basic word list sourced from corpora or existing resources, then iteratively refine the affix rules to handle derivations and compounds specific to the language's morphology.^[2]^[20] Over 100 language-specific Hunspell dictionaries are available, often distributed through LibreOffice extensions or the Hunspell project on SourceForge, supporting diverse scripts and features like full morphological analysis for agglutinative languages such as Hungarian, which includes complex compounding rules. These pre-built dictionaries can be extended by users for dialects or specialized terminologies, ensuring compatibility with applications like LibreOffice by placing the files in designated directories.^[32]^[3] Personal and temporary dictionaries allow runtime customization without altering core files; the command-line tool supports adding words via the -p flag, specifying a user-defined .dic file for session-specific additions, while persistent personal dictionaries are stored as simple word lists in user home directories for ongoing use across sessions.^[20] Management tools facilitate conversion and validation: Aspell dictionaries can be converted to Hunspell format by unzipping .cwl files to word lists, applying phonetic transformations if needed, and pairing with adapted affix rules. For rule consistency, the hunspell -m option performs morphological analysis on sample texts to verify dictionary integrity, and build-time make check tests ensure affix rules align with word entries during compilation.^[20]^[2] Best practices emphasize encoding verification to prevent mismatches—preferring UTF-8 for broad compatibility—and rigorous testing by running the command-line tool against representative sample texts from the target language to identify gaps in coverage or erroneous suggestions before deployment.^[20]^[2]

Licensing and Availability

License Terms

Hunspell is distributed under a tri-license comprising the GNU Lesser General Public License version 2.1 (LGPL-2.1) or later for the library, the GNU General Public License version 2.0 (GPL-2.0) for the executable, and the Mozilla Public License version 1.1 (MPL-1.1) to enable file-level licensing choices.^[2] This structure allows users to select the most appropriate license based on their project's needs, promoting flexibility in integration.^[1] Under the LGPL-2.1, the library can be dynamically linked into proprietary software without requiring the disclosure of the entire application's source code, provided that the library itself remains modifiable and its source is made available.^[33] In contrast, the GPL-2.0 applies to the standalone executable, mandating that any derivative works or distributions include full source code availability to ensure copyleft compliance.^[34] The MPL-1.1 facilitates per-file relicensing, allowing modified files to be dual-licensed under compatible terms while preserving the original file's open-source status.^[35] These copyleft requirements mean that GPL-covered derivatives must offer source code, whereas LGPL permits proprietary linking via dynamic libraries without broader disclosure obligations. The tri-license was adopted in 2006 to expand adoption beyond a GPL-only model, specifically to facilitate inclusion in projects like Mozilla products that required more permissive terms for proprietary components.^[36] This change broadened Hunspell's usability in diverse ecosystems. For compliance examples, the LGPL provisions have enabled safe integration into closed-source applications such as Google Chrome, where the spell-checking library is dynamically linked without triggering full source release.

Distribution and Ports

Hunspell is primarily distributed through its official GitHub repository at hunspell/hunspell, where developers can access the source code, contribute, and follow development updates.^[2] Pre-compiled binaries and archives are available via SourceForge, providing stable releases for download since the project's inception.^[3] For ease of installation on various platforms, Hunspell is packaged in popular repository managers, including Homebrew for macOS (installable via brew install hunspell) and apt for Debian-based Linux distributions like Ubuntu.^[27]^[6] Pre-built binaries facilitate quick deployment without compilation. On Windows, users can obtain binaries through package managers such as Chocolatey (version 1.7.0 portable) or winget (via winget install FSFhu.Hunspell).^[37]^[38] For Debian and Ubuntu in 2025, the package version is 1.7.2+really1.7.2-11, available directly from repositories.^[6] Hunspell has been ported to several programming languages and frameworks to enable integration in diverse environments. The C# port NHunspell provides spell-checking capabilities for .NET applications, with the latest stable version at 1.2.5554.^[39] In Python, pyhunspell offers bindings to the Hunspell engine, allowing dictionary loading and word suggestions, though its last major update was in 2018.^[40] For R, the hunspell package (version 3.0.6 as of March 2025) delivers high-performance stemming, tokenization, and spell-checking functionalities.^[41] Additionally, the .NET port WeCantSpell.Hunspell (version 6.0.3, updated September 2025) is a fully managed implementation without unmanaged dependencies, supporting concurrent queries and competitive performance on modern .NET frameworks.^[42] Dictionary bundles for Hunspell support over 100 languages and are often included with applications like LibreOffice, where they enable multilingual spell-checking out of the box.^[43] Separate downloads are available through LibreOffice's dictionary repository or community collections, covering spelling, hyphenation, and thesaurus data for languages ranging from major ones like English and Spanish to less common variants.^[44]^[45] For custom builds, Hunspell uses an autotools-based system (autoconf, automake, libtool) that supports cross-platform compilation on GNU/Linux, Unix-like systems, macOS, and Windows via MinGW or Cygwin.^[2] The process involves running autoreconf -vfi, ./configure (with options like --with-ui for enhanced features), make, and make install, ensuring compatibility across architectures without native CMake support in the official distribution.^[2]

Community and Maintenance

Primary Author

László Németh is the primary author and lead developer of Hunspell, an open-source spellchecking library renowned for its support of morphologically rich languages.^[1] A Hungarian national, Németh began his career as a biologist before transitioning to free software development, where he has made significant contributions to linguistic tools and office productivity software.^[46] Since 2006, he has worked as a lead programmer for the LibreOffice project, focusing on internationalization, hyphenation, and spelling components that integrate seamlessly with the suite.^[47] His expertise in handling complex agglutinative languages like Hungarian has been central to his technical approach. Németh initiated the development of Hunspell in the early 2000s, with primary sponsorship from 2003 to 2005 by the Budapest Technical University's Media Research Centre, laying the foundation for its advanced affix-based morphology and compound word handling.^[1] He led the initial implementation from approximately 2002 to 2005, transforming it from an extension of MySpell into a standalone library, and has since authored all major releases, ensuring compatibility with Unicode and diverse encoding systems.^[11] Throughout its evolution, Németh has maintained a particular emphasis on the Hungarian dictionary, refining rules for suffixation, prefixation, and morphological generation to achieve high accuracy for inflected forms.^[47] Beyond Hunspell, Németh has authored several complementary open-source projects that enhance document processing and language handling. These include LibreLogo, a turtle graphics programming environment embedded in LibreOffice; Numbertext, a cross-platform library for numerical-to-textual conversion in multiple languages; Lightproof, a rule-based grammar and style checker; and the specialized Hungarian spellchecker used in LibreOffice.^[47] His broader portfolio reflects a commitment to accessible tools for education and productivity, often tailored to non-Latin scripts and European languages. Németh's contributions have earned recognition within the free software community, including a speaking engagement at FOSDEM 2019 on interoperability and internationalization improvements in LibreOffice.^[47] The FSF.hu Foundation has provided ongoing support for Hunspell's releases, underscoring his role in sustaining high-quality linguistic resources.^[1] As of 2025, Németh continues as the active maintainer of the project on GitHub, overseeing bug fixes, feature enhancements, and dictionary integrations.^[2]

Contributions and Future Directions

The Hunspell project benefits from an active open-source community that contributes through translations, bug reporting, and dictionary development. Translations are coordinated via Weblate, supporting over 75 languages and allowing volunteers to improve localization for user interfaces and documentation.^[48] Bug reports and feature requests are managed on the project's GitHub repository, which as of 2025 hosts more than 265 issues, fostering collaborative debugging and enhancements.^[14] Dictionary contributions from users expand support for additional languages and vocabularies, often shared through community repositories and integrated into the core project.^[1] Key contributors have shaped Hunspell's architecture and evolution. Kevin Hendricks developed the foundational MySpell library, providing the initial C++ spell-checking codebase upon which Hunspell was built.^[49] Caolan McNamara implemented the original C API, enabling broader integration with applications like OpenOffice.org.^[49] Ongoing patches and improvements come from a diverse group of users, including László Németh as the primary maintainer, with community members submitting code via pull requests on GitHub.^[2] Hunspell remains actively maintained, with regular releases addressing bugs, performance, and compatibility. The project follows a schedule of periodic updates, supported by foundations such as FSF.hu, ensuring stability across integrations like LibreOffice and Mozilla products.^[5] For newcomers, the GitHub repository provides build instructions and documentation to set up development environments, including IDE configurations for C++ compilation.^[2] Future directions emphasize extensibility and integration. Recent developments include expansions to SPELLML, an XML-based API introduced in version 1.7.0, which enables run-time dictionary extensions for dynamic word lists without recompilation. Efforts are underway to enhance suggestion algorithms, potentially incorporating advanced techniques for better accuracy in complex morphologies, while improving support for low-resource languages through community-driven dictionary tools.^[5] Challenges in ongoing development include synchronizing with evolving Unicode standards to handle new characters and encodings efficiently, as seen in historical issues with non-UTF-8 support.^[8] Mobile optimizations pose additional hurdles, particularly for resource-constrained environments like Android, where performance tuning is needed to reduce memory usage and speed up checks without compromising accuracy.^[50]

References

[1]
Hunspell: About
Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox & Thunderbird, Google Chrome, and it is also used by proprietary software ...
[2]
hunspell/hunspell: The most popular spellchecking library. - GitHub
Hunspell is a free spell checker and morphological analyzer library and command-line tool, licensed under LGPL/GPL/MPL tri-license.Hunspell · Issues 265 · Actions · Security
[3]
Hunspell download | SourceForge.net
Rating 4.6 (19) · FreeDownload Hunspell for free. Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and ...
[4]
Hunspell Spell Checking and Morphological Analysis - Docs
The hunspell function is a high-level wrapper for finding spelling errors within a text document. It takes a character vector with text (plain, latex, man, ...
[5]
Releases · hunspell/hunspell - GitHub
Dec 29, 2022 · Release notes, new features and bug fixes by László Németh, supported by FSF.hu Foundation: add SPELLML support for run-time dictionary extension.
[6]
hunspell - Debian Package Tracker
[2025-08-17] hunspell 1.7.2+really1.7.2-11 MIGRATED to testing (Debian testing watch); [2025-08-11] Accepted hunspell 1.7.2+really1.7.2-11 (source) into ...
[7]
Debian -- Details of package hunspell in sid
Main features: - Unicode support (first 65535 Unicode characters) - morphological analysis (in custom item and arrangement style) - Max. 65535 affix classes and ...
[8]
hunspell(4) - Linux man page
Hunspell uses dictionary and affix files to define language for spell checking. Dictionary files contain words, and affix files define special flags.
[9]
OOo 2.0 (SRC680/aka 2.0.x) - Apache OpenOffice
OpenOffice.org 2.0.4: September 2006 · OpenOffice.org 2.1: December 2006 · OpenOffice.org 2.2: March 2007 ...
[10]
Lingucomponent Project - Apache OpenOffice
MySpell has been replaced with hunspell starting with OpenOffice.org 2.0.2. Hunspell builds on MySpell but supports Unicode and adds several other useful ...
[11]
Hunspell / News - SourceForge
Hunspell 1.2.2 released. 2008-04-12: Hunspell 1.2.2 release: - extended dictionary support to use multiple base and special dictionaries ...
[12]
NEWS · apertis/1.7.0-3apertis0 · pkg / hunspell · GitLab
2016-12-22: Hunspell 1.6.0 release: - Library changes: - Performance improvement in ngsuggest(), suggestions should be faster. - Revert MAXWORDLEN to 100 as ...
[13]
hunspell: High-Performance Stemmer, Tokenizer, and Spell Checker
The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' ...
[14]
Issues · hunspell/hunspell - GitHub
Is there a way to use the library to create a Hunspell instance without loading a dictionary? #1034 In hunspell/hunspell; · davidgiven opened on Jan 12Missing: R 3.0.7
[15]
hunspell/hyphen - GitHub
A hyphenator with non standard hyphenation facilities based on extended Libhnj. The HyFo module is released in binary form as jar files and in source form as ...Missing: integration | Show results with:integration
[16]
hunspell(5) - Arch manual pages
The Hunspell algorithm currently allows any affixed form of words, which are lexically marked as potential members of compounds. Hunspell improved this, and its ...
[17]
algo.suggest: main suggestion algorithm — Spylls documentation
Note that Spylls's implementation takes one liberty comparing to Hunspell's: In Hunspell, ngram suggestions (select all words from dictionary that ngram-similar ...Missing: 1.6 2016
[18]
algo.phonet_suggest: phonetical suggestions — Spylls documentation
Phonetical suggestion algorithm provides suggestions based on phonetical (prononication) similarity. It requires .aff file to define PHONE table – which, we ...
[19]
None
### Summary of Dictionary and Affix File Formats (Hunspell)
[20]
format of Hunspell dictionaries and affix files - Ubuntu Manpage
Hunspell(1) Hunspell requires two files to define the way a language is being spell checked: a dictionary file containing words and applicable flags, and an ...
[21]
[PDF] Hunspell – The free spelling checker
Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character ...
[22]
Linux Manpages Online - man.cx manual pages
### Summary of Hunspell Algorithm Details
[23]
Spell Checking and Dictionaries - Apache OpenOffice
The MySpell spell checker uses a modified version of Ispell's dictionaries and affix files (modified to permit fast parsing, to be case sensitive, etc.)
[24]
Using an external spell checker - Mozilla - MDN Web Docs
Jul 2, 2025 · Starting with Firefox 3 (as well as Thunderbird 3 and SeaMonkey 2), you can now install an external spell checker using an extension.Missing: 2008 | Show results with:2008
[25]
Editing the spell checking dictionaries - The Chromium Projects
Each hunspell dictionary comes in two files. The .dic file which is the list of words, and the .aff file which is a list of rules and other options.
[26]
hunspell - Homebrew Formulae
Install command: brew install hunspell. Spell checker and morphological analyzer. https://hunspell.github.io. License: MPL-1.1 or GPL-2.0-or-later or LGPL-2.1- ...
[27]
Hunspell compilled with JNI to be used in Android. - GitHub
Hunspell compilled with JNI to be used in Android. Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, ...
[28]
Hunspell Spell Checker - Documentation Center
Hunspell is automatically selected as the default spell checker in SDL Trados Studio.Supported Hunspell... · Hunspell Dictionary Format · Recommendations<|control11|><|separator|>
[29]
Add or remove Hunspell dictionaries - Adobe Help Center
Jul 10, 2023 · Extract the contents of the zip archive to a folder and locate an affix (.aff) file, a spelling dictionary (.dic) file or a hyphenation ...
[30]
hunspell(1) - Linux man page - Die.net
Hunspell is fashioned after the Ispell program. The most common usage is "hunspell" or "hunspell filename". Without filename parameter, hunspell checks ...
[31]
Development/Dictionaries - The Document Foundation Wiki
Apr 8, 2025 · Several types of dictionaries are bundled within LibreOffice: hunspell - basic spell check using the Hunspell engine; hyphen - words hyphenation ...Extending a Dictionary in... · Dictionary Authors · Adding/Updating bundled...
[32]
hunspell/COPYING.LESSER at master · hunspell/hunspell
Insufficient relevant content. The provided URL content does not include the full text of the LGPL-2.1 license or specific terms related to library use, permissions for linking in proprietary software, or dynamic linking allowances. It only contains navigation, feedback, and footer information from GitHub.
[33]
hunspell/COPYING at master · hunspell/hunspell
**Summary of GPL-2.0 License Terms for Hunspell:**
[34]
hunspell/COPYING.MPL at master · hunspell/hunspell
**Summary of MPL-1.1 Terms (Based on Available Content):**
[35]
Gray areas in software licensing - LWN.net
Feb 15, 2012 · Hunspell changed its license in 2006, to the MPL/GPL/LGPL tri-license to enable inclusion in Mozilla. It is used as the spell-checker for ...
[36]
Hunspell (Portable) 1.7.0 - Chocolatey Community
Chocolatey is software management automation for Windows that wraps installers, executables, zips, and scripts into compiled packages.
[37]
Fixing hunspell 1.7.0 for Emacs 29 on Windows - vxlabs
Nov 14, 2023 · It starts pretty well, when you are able to install hunspell with a simple winget install FSFhu.Hunspell, after which you download a set of English ...
[38]
NHunspell 1.2.5554.16953 - NuGet
Mar 17, 2015 · NHunspell is a spell check, hyphenation, word stemming and thesaurus library based on the Open Office spell check library Hunspell.
[39]
hunspell - PyPI
Aug 6, 2018 · PyHunspell itself is licensed under the LGPL version 3 or later, see lgpl-3.0.txt and gpl-3.0.txt. The files in the debian/ directory and setup.
[40]
[PDF] hunspell: High-Performance Stemmer, Tokenizer, and Spell Checker
The hunspell package is a low-level spell checker and morphological analyzer that finds spelling errors in text documents and parses words.
[41]
WeCantSpell.Hunspell 6.0.3 - NuGet
WeCantSpell.Hunspell is a .NET port of Hunspell that reads DIC/AFF files, checks/suggests words, and has no unmanaged dependencies.
[42]
Language/Support - The Document Foundation Wiki
Jun 11, 2024 · This page gives an overview of the level of language support of LibreOffice. Furthermore, links are provided to language-related add-ons and extensions.
[43]
LibreOffice/dictionaries - GitHub
Contains dictionaries related code and data. See https://wiki.documentfoundation.org/Development/Dictionaries for more information.Missing: bundles | Show results with:bundles
[44]
wachin/libreoffice-dictionaries-collection - GitHub
Sep 11, 2025 · Complete collection of multilingual dictionaries for LibreOffice (version 25.2.3) for spelling, synonyms, and hyphenation.
[45]
Laszlo Nemeth — English - LibreOffice Conference
I'm a biologist and a free software developer (39). My recent job as a lead programmer is related to also free softwares, especially to LibreOffice.Missing: background | Show results with:background
[46]
FOSDEM 2019 - László Németh
László Németh. LibreOffice developer. Author of Hunspell spell checker, LibreLogo, Numbertext, Lightproof and the Hungarian spelling dictionary. Worked for ...Missing: background | Show results with:background
[47]
Hunspell/Translations - Hosted Weblate
Hunspell is being translated into 76 languages using Weblate. Join the translation or start translating your own project.
[48]
hunspell(3) - Linux man page - Die.net
Author of MySpell is Kevin Hendricks. Author of Hunspell is LÃ¡szlÃ³ NÃ©meth. Author of the original C API is Caolan McNamara. Author of the Aspell table ...Missing: contributors | Show results with:contributors
[49]
Hunspell on Android - Stack Overflow
Feb 1, 2011 · Does anyone successfully implemented Hunspell spell-checker on Android platform? Is it even possible? Did you try it? What about the results ...hunspell from java with personal dictionary - Stack OverflowSpelling libraries (like hunspell) in UWP Applications?More results from stackoverflow.com