Hunspell
Hunspell is a free and open-source spell-checking library and command-line tool designed primarily for languages with rich morphology, complex word compounding, and diverse character encodings.[1] It originated as an enhanced successor to MySpell, the spell checker used in early versions of OpenOffice.org, and was developed to address limitations in handling agglutinative languages like Hungarian, Finnish, and Turkish.[1]Key improvements include support for Unicode, advanced morphological analysis, stemming, and word generation, enabling more accurate spell-checking and suggestions through n-gram similarity matching and pronunciation-based corrections.[1] Hunspell is licensed under a tri-license of GNU General Public License (GPL), GNU Lesser General Public License (LGPL), and Mozilla Public License (MPL), allowing flexible integration into both open-source and proprietary applications.[2]
It maintains backward compatibility with MySpell dictionaries while supporting affix files for rule-based morphology, making it adaptable for over 100 languages through community-contributed dictionaries.[1] Widely adopted in major software, Hunspell powers spell-checking in LibreOffice, Apache OpenOffice, Mozilla Firefox, Thunderbird, Google Chrome, macOS (since version 10.6), Adobe InDesign, Opera, and translation tools like SDL Trados and memoQ.[1]
The library is implemented in C++ and offers bindings for numerous programming languages and interfaces, including Java, Python, Perl, Ruby, .NET, Android, and UNO, facilitating its use in diverse environments from desktop applications to embedded systems.[1] Development of Hunspell has been led by László Németh since its inception around 2005, with sponsorship from organizations such as the FSF.hu Foundation, IMEDIA, Budapest University of Technology and Economics, OpenTaal Foundation, and the Dutch Language Union.[1]
Ongoing maintenance occurs through the GitHub repository, where contributions focus on improving performance, adding dictionary support, and integrating with translation platforms like Weblate for collaborative localization.[2]
Introduction
Overview
Hunspell is a free, open-source spell checker and morphological analyzer library, accompanied by a command-line tool, designed primarily to handle languages featuring rich morphology, complex compounding, and challenging character encodings.[2][3] It excels in processing agglutinative languages such as Hungarian and Finnish, as well as compound-heavy languages like German, where traditional spell checkers often struggle with inflectional variations and word formation rules.[1] The library provides robust support for morphological analysis, enabling stemming, generation, and detailed word breakdown, while its spell-checking capabilities include intelligent error detection and correction suggestions tailored to linguistic complexities.[4] Released under a tri-license of LGPL, GPL, and MPL, Hunspell ensures broad compatibility and adoption in open-source ecosystems.[2] Its current stable version, 1.7.2, was issued on December 29, 2022, with ongoing maintenance and minor updates continuing into 2025 to address compatibility and performance needs across distributions.[5][6] Evolving from earlier tools like MySpell, it maintains backward compatibility with existing dictionaries while introducing enhancements for modern requirements.[1] A key strength of Hunspell lies in its Unicode support, accommodating the first 65,535 Unicode characters for affix rules and enabling handling of diverse scripts and encodings beyond basic 8-bit limitations.[1][7] This feature, combined with its morphological tools, positions it as a versatile solution for multilingual environments, powering spell checking in applications such as LibreOffice and Mozilla Firefox.[3]Design Principles
Hunspell was designed to address the limitations of earlier spell checkers in handling languages with rich morphology and complex compounding, such as agglutinative languages like Hungarian or those with intricate affixation like German.[2][8] Its core innovation lies in advanced affix handling, including support for homonyms, circumfixes, fogemorphemes, and zero morphemes, which enable morphological generation and analysis far beyond basic dictionary lookups.[2] Additionally, it incorporates twofold affix stripping to efficiently manage multiple layers of suffixes and prefixes, reducing the number of rules needed for complex word formations.[8] Compounding rules allow recognition of arbitrarily long compounds and affixation within them, ensuring accurate spell checking for word-level writing systems.[2] A key design principle is backward compatibility with Ispell and MySpell formats, facilitating seamless migration of existing dictionaries and minimizing adoption barriers for users transitioning from those systems.[2][8] This compatibility is enhanced by innovations like alias compression for affix rules, which optimize storage without sacrificing functionality.[2] Hunspell emphasizes high customizability through configurable suggestion algorithms, word-part replacement tables, and over 65,000 affix classes, allowing tailored implementations for diverse linguistic needs.[8] Efficiency in suggestion generation and overall performance is prioritized via optimizations for large vocabularies and quick processing, making it suitable for real-time applications.[2] As a library, it provides C++ and C APIs, shared library support, and bindings for multiple languages, promoting integration into varied software environments from desktop applications to web browsers.[2] To broaden adoption, Hunspell employs a tri-license model under the Mozilla Public License (MPL), GNU Lesser General Public License (LGPL), and GNU General Public License (GPL), accommodating both open-source and proprietary uses.[2]History and Development
Origins from MySpell
Hunspell's development began around 2005 under the leadership of László Németh, a Hungarian developer, as a reimplementation and extension of MySpell to address shortcomings in spell checking for morphologically rich languages.[1] MySpell itself was a C++ port of the Ispell spell checker, originally created for integration into OpenOffice.org to provide efficient affix compression and dictionary handling.[2] A key motivation for Hunspell stemmed from MySpell's limitations, particularly its inadequate support for complex morphological rules and word compounding, which proved insufficient for languages like Hungarian that rely heavily on affixation and compound formation.[1] To overcome these issues, Hunspell incorporated enhanced affix-based mechanisms, enabling more accurate analysis and generation of word forms while maintaining backward compatibility with MySpell dictionaries.[2] The library saw its initial integration into OpenOffice.org with version 2.0.2, released in February 2006, where it fully replaced MySpell as the default spell checker.[9] This adoption marked a significant step in improving multilingual support within the suite. Early efforts in Hunspell's creation emphasized support for the Hungarian language, driven by Németh's background and sponsorship from Hungarian organizations such as the FSF.hu Foundation and Budapest Technical University's Media Research Centre.[1]Key Milestones
Hunspell's development gained significant momentum following its initial transition from MySpell, with key adoptions marking its early integration into major open-source projects. In 2006, Hunspell was officially adopted as the default spell checker in OpenOffice.org 2.0.2, replacing MySpell and enabling enhanced support for complex morphologies in office productivity applications.[10] By 2008, Hunspell saw broader adoption in web technologies through its integration into Mozilla Firefox 3 and Thunderbird, providing inline spell checking for email and browsing with improved handling of agglutinative languages.[1] During the 2010s, the project advanced through the 1.2 to 1.3 version series, which introduced enhanced Unicode 6.0 support for broader character encoding compatibility and refined compounding rules to better manage word formation in languages like German and Finnish.[11][1] A notable enhancement came in 2016 with the release of version 1.6.0, which optimized suggestion algorithms for faster performance, reducing generation times through improved n-gram matching and limiting overgeneration in compound words.[12] In 2022, version 1.7.2 was released, incorporating the SPELLML XML API to enable runtime dictionary extensions and custom affix rules without recompilation, facilitating easier integration in dynamic environments. From 2023 to 2025, Hunspell underwent ongoing maintenance with bug fixes and compatibility updates, including a port to R via the hunspell package version 3.0.7, which extended its utility for statistical computing and text analysis in R environments; the project's GitHub repository maintained active development, accumulating over 265 open issues as of late 2025.[13][14] Throughout its evolution, Hunspell solidified its role as a default spell checker in Google Chrome and LibreOffice, powering spell checking for billions of users across browsers and office suites.[1]Features
Spell Checking Capabilities
Hunspell's core spell checking functionality relies on dictionary lookup, where words are verified against a base dictionary file (.dic) containing valid word forms, augmented by affix rules (.aff) that enable stripping and reapplication of prefixes and suffixes to generate inflected or derived forms.[2] This twofold affix processing allows efficient handling of morphological variations without enumerating every possible word in the dictionary, making it suitable for languages with rich inflection like Hungarian or Turkish.[8] The library supports Unicode encoding via UTF-8, enabling spell checking of multilingual text and characters beyond basic ASCII, while also accommodating legacy 8-bit encodings such as ISO-8859-1 through configurable SET directives in affix files.[8] For complex compounding, Hunspell employs recursive breaking and rule-based validation, supporting arbitrary-length compounds in languages like Dutch, Swedish, German, and Finnish via flags such as COMPOUNDFLAG and COMPOUNDRULE to define allowable combinations and prevent overgeneration.[2] Additional options include ignore lists to exclude specific characters or patterns from checking, such as diacritics in Arabic script via the IGNORE directive, and support for personal dictionaries that allow users to add custom words with optional affixation.[8] Case sensitivity is configurable, with features like KEEPCASE to restrict uppercase forms and CHECKCOMPOUNDCASE to enforce proper casing at compound word boundaries, accommodating language-specific rules such as German ß or Turkish dotted i.[2] Hunspell integrates hyphenation capabilities through compatibility with the Hyphen library's pattern-based rules, using BREAK and COMPOUNDRULE options to identify hyphenation points and handle hyphenated compounds during spell checking.[15] This extends basic error detection to include hyphenation-aware validation, though advanced morphological parsing for stem identification is handled separately.[2]Morphological Analysis
Hunspell's morphological analyzer decomposes input words into their base stems and associated affixes by applying rules defined in dictionary and affix files, enabling the processing of both inflectional morphology—such as tense, number, or case endings—and derivational morphology, including prefixes and suffixes that alter word class or meaning. This rule-based approach allows for precise linguistic breakdown, as seen in the analysis of "drinkable," which yields the stem "drink" with the derivational suffix flag "ds:able" and part-of-speech tag "po:verb."[8] In generation mode, Hunspell constructs inflected or derived word forms from a given stem by applying specified affix rules, facilitating applications such as grammar checking where correct forms must be verified against expected paradigms. For instance, starting from the stem "foot," it can generate the plural "feet" using an inflectional rule flagged with "is:plural," ensuring compatibility with syntactic requirements in downstream processing.[8] The system supports over 65,000 affix classes per dictionary, organized via flags that permit complex combinations of prefixes and suffixes, which is essential for handling the rich, agglutinative morphologies of languages like Turkish and Estonian. This capacity enables twofold affix stripping—applying multiple layers of suffixes in sequence—to parse highly compounded or inflected words without performance degradation.[8] Output from the analyzer includes part-of-speech tags (e.g., "po:noun"), lemma extraction via the stem field (e.g., "st:foot" for irregular forms like "feet"), and full paradigm generation that enumerates all possible inflections for a given lemma. These formats are delivered as space- or tab-separated fields, supporting integration into natural language processing pipelines.[8] Unlike simple stemming algorithms, which provide only approximate root forms through heuristic suffix removal or statistical methods, Hunspell's morphological analysis delivers a complete, rule-driven decomposition with explicit affix and feature annotations, preserving linguistic accuracy for morphologically complex languages. This depth enhances its utility beyond basic spell checking by enabling detailed error diagnosis in inflected forms.[8]Suggestion Algorithms
Hunspell employs a multi-stage approach to generate spelling suggestions for misspelled words, prioritizing efficiency and accuracy through targeted error correction strategies. The process begins with near-miss techniques that simulate common typing errors, such as single-letter swaps (e.g., adjacent key transpositions on keyboards), deletions, insertions, and replacements based on character proximity defined in the affix file's KEY option.[16] These edits are generated systematically and checked against the dictionary to identify valid words, with additional support for character movements and double swaps to capture more complex mistakes.[17] Rule-based replacements further enhance this stage via REP tables in the affix file, which map frequent misspellings to corrections (e.g., "teh" to "the"), allowing customization for language-specific or user-defined errors.[16] If near-miss edits yield insufficient results, Hunspell advances to n-gram-based similarity matching, where it computes overlaps between the misspelled word and dictionary entries using adjustable parameters like MAXNGRAMSUGS to limit the number of candidates (default 5, range 0-10).[16] Phonetic encoding provides an additional layer for handling pronunciation-based errors, utilizing a table-driven transcription algorithm borrowed from Aspell via the PHONE directive in the affix file; this maps characters to phonetic equivalents, enabling suggestions for non-orthographic languages or noisy input.[16] For languages like English or those with phonetic dictionaries, this can approximate algorithms such as Double Metaphone, though Hunspell's implementation focuses on customizable PHONE tables for broader applicability.[18] Suggestions are ranked primarily by the order of generation stages—REP replacements receive highest priority, followed by exact edit matches, n-gram similarities (weighted by edit distance and overlap length), and phonetic matches—while incorporating dictionary frequency implicitly through stem selection and morphological fit for affixed forms.[17] Compound word support allows word-part suggestions, breaking potential compounds and applying edits to segments, with limits like MAXCOMPOUNDSUGS to prevent excessive computation.[16] Language-specific handling, such as the LANG option for Hungarian vowel harmony rules, ensures culturally attuned corrections by restricting invalid combinations during suggestion generation.[16] Performance optimizations enable real-time use in applications like text editors, including caps limits on suggestion counts (e.g., MAXSUGGESTIONS) and early termination if sufficient high-quality candidates are found, reducing computational overhead in large dictionaries.[17] This integration with morphological analysis allows stem-level suggestions, where corrections align with valid affixations for inflected languages.[16]Technical Implementation
Dictionary Format
Hunspell employs a dual-file format for its dictionaries, consisting of a main dictionary file with the extension.dic and an accompanying affix file with the extension .aff. The .dic file serves as the primary repository of words, while the .aff file provides the rules and configurations necessary for processing those words, enabling morphological analysis and spell checking.[19]
The .dic file is structured as a plaintext list of words, one per line, beginning with an approximate word count on the first line to optimize hash memory allocation for efficient lookup. Each entry typically consists of a base word followed by optional numeric or character flags separated by a slash, which indicate the applicability of specific affix rules defined in the .aff file; for example, work/AB denotes that the word "work" can be modified by affixes associated with flags A and B. Slashes within words themselves are escaped using a backslash (e.g., word\/), and the format supports up to thousands of entries for practical efficiency in spell-checking operations.[19]
Encoding for multilingual support is declared in the .aff file using the SET directive, such as SET UTF-8 for Unicode compatibility or SET ISO8859-1 for legacy 8-bit encodings, ensuring proper handling of characters across various languages including those with diacritics or non-Latin scripts. This declaration applies to both the .aff and associated .dic files, facilitating international dictionary development.[19]
Compound word permissions are managed through flags in the .dic file, which reference rules in the .aff file, such as the COMPOUNDRULE option that allows pattern matching for valid combinations (e.g., permitting "blackbird" based on predefined regex-like patterns for compounding). These flags enable flexible construction of compound forms without enumerating every possibility in the dictionary.[19]
Extension mechanisms include personal word lists, which can be appended directly to a .dic file as additional plaintext entries with optional flags, allowing users to customize dictionaries for specific needs like adding domain-specific terms (e.g., specialterm/C). Such additions override or supplement the base dictionary during runtime, supporting user-specific adaptations without altering core files.[19]
Affix Rules
Hunspell employs affix rules to manage morphological inflections and derivations, enabling the spell checker to recognize and generate word forms from base stems through prefixes and suffixes. These rules are specified in the affix file (typically with a.aff extension) and support complex language morphologies, such as those in agglutinative languages like Hungarian or Finnish. The core affix types are prefixes (PFX) and suffixes (SFX), each defined with conditions that determine applicability to stems, allowing for efficient handling of derivations without exhaustively listing all variants in the dictionary.[20]
The syntax for prefix rules begins with a header line: PFX <flag> <cross_product> <number>, where <flag> identifies the affix class (e.g., a single character like 'A'), <cross_product> is 'Y' to permit combination with opposite affixes or 'N' to restrict it, and <number> indicates the count of following rules. Each subsequent rule line follows: PFX <flag> <stripping> <affix> [<condition> [<morphological_fields>]]. Here, <stripping> specifies characters removed from the stem's beginning (0 for none), <affix> is the prefix added (0 for none), <condition> is a regex-like pattern (e.g., . for any character or [^y] for not ending in 'y'), and optional <morphological_fields> provide additional data like part-of-speech tags. For example, a rule PFX A Y 1 followed by PFX A 0 re . adds the prefix "re-" to any stem, enabling forms like "rework" from "work". Suffix rules mirror this structure but apply to the end: SFX <flag> <stripping> <affix> [<condition> [<morphological_fields>]], such as SFX B Y 2 with lines SFX B 0 ed [^y] and SFX B y ied y to generate "worked" or "tried" from stems ending appropriately.[20][21]
Flags serve as identifiers for affix classes and support multiple formats for flexibility: default 8-bit ASCII characters, UTF-8 for international scripts, two-character "long" flags, or numeric values up to 65,000 via the FLAG num directive, allowing over 65,000 distinct classes. The cross-product mechanism (Y/N) facilitates generation of combinations, such as applying both prefixes and suffixes to a stem for disjunctive or circumfix rules, while continuation flags (e.g., /Y in the affix field) enable chained applications within the same class. For compound words, specific flags and options enhance validation: COMPOUNDFLAG marks allowable compound components, COMPOUNDBEGIN, COMPOUNDMIDDLE, and COMPOUNDEND restrict positions in sequences, COMPOUNDPERMITFLAG allows affixes inside compounds, and COMPOUNDFORBIDFLAG prohibits them. Additionally, COMPOUNDMIN sets the minimum length for compound parts (default 3), and CHECKCOMPOUND prevents invalid compounds mimicking words with replacement errors.[8][20]
Advanced syntax elements include COMPLEXPREFIXES, which permits multiple prefix stripping for languages with right-to-left affixation, and TWOAFFIX (or CIRCUMFIX), enabling bidirectional affix application like simultaneous prefix and suffix stripping (e.g., for "un-friend-ly"). Compound validation extends via COMPOUNDRULE for pattern-based checks using regex-like expressions with flags, and options like CHECKCOMPOUNDCASE to enforce case consistency at boundaries or CHECKCOMPOUNDDUP to forbid repetitions. Language-specific adaptations, such as COMPOUNDSYLLABLE for syllable-based limits in Hungarian, integrate with these rules. Limitations include a default single-pass stripping per affix type (extendable via flags), conditions bounded by word length unless FULLSTRIP is set, and up to 65,000 classes for performance, though UTF-8 flags may underperform on certain architectures like ARM. These features collectively provide morphological flexibility while maintaining computational efficiency.[8][21][20]
Algorithm Overview
Hunspell's spell checking algorithm begins with tokenization of input text, where words are identified using predefined break characters such as hyphens and apostrophes to delineate boundaries, ensuring accurate segmentation even in languages with complex punctuation.[22] Following tokenization, the system applies normalization to handle variations in case, encoding, and character representations, converting inputs to a canonical form compatible with the dictionary, such as UTF-8 or ISO8859-1, through optional input/output conversion tables.[22] The core validation step involves affix stripping, where prefixes and suffixes are iteratively removed according to rules defined in the affix file—supporting up to twofold suffix stripping for agglutinative languages— to match the remaining stem against the dictionary; if a match is found, affixes are regenerated to confirm the original word's validity.[21][22] For languages featuring compound words, Hunspell employs a recursive breakdown mechanism that decomposes potential compounds into subwords using compound flags to mark eligible dictionary entries, while enforcing minimum and maximum length rules as well as checks for duplicates and case sensitivity to prevent invalid formations.[22] This process utilizes hash tables for dictionary lookups, enabling average O(1) time complexity for stem matching and efficient handling of large lexicons with minimal memory overhead through techniques like alias compression.[21][22] Error tolerance in suggestion generation relies on a Levenshtein-like edit distance calculation, limited to a small number of operations such as insertions, deletions, substitutions, and swaps—typically capped at two changes—to identify plausible corrections, supplemented by replacement tables for common phonetic or typographical errors.[21][22] The morphological analysis pipeline extends beyond simple stemming by first reducing words to their base forms via affix rules and then enumerating possible paradigms, including part-of-speech tags and inflectional details, when full analysis is requested through library functions likeanalyze.[21]
Applications and Usage
Integrated Software
Hunspell serves as the default spell-checking engine in several prominent open-source office suites. LibreOffice and Apache OpenOffice have integrated Hunspell since 2006, replacing the earlier MySpell component in OpenOffice.org version 2.0.2, with support for custom dictionaries that can be embedded directly into documents for personalized spell-checking needs.[23][1] In web and email applications, Hunspell powers inline spell checking starting from Mozilla Firefox version 3 (2008) and Thunderbird version 3 (2009), enabling real-time correction of text in web forms, composition windows, and other editable fields.[24][1][25] Google Chrome incorporates Hunspell for form-based and page-level spell checking, utilizing optimized binary dictionary files (.bdic) derived from standard Hunspell affix (.aff) and dictionary (.dic) formats to handle multilingual input efficiently.[26][1] Beyond these core applications, Hunspell finds use in various other environments. On macOS, it has been available since version 10.6 and can be installed via Homebrew for integration into tools like text editors.[1][27] Ports exist for Android, allowing embedding in mobile apps through JNI wrappers for on-device spell checking.[1][28] Proprietary software such as SDL Trados Studio employs Hunspell as its primary spell checker, supporting custom and language-specific dictionaries for translation workflows.[29][1] Overall, Hunspell's adoption extends to over 100 languages, facilitated by community-maintained dictionaries distributed through repositories like those for LibreOffice and Mozilla add-ons, ensuring broad accessibility across diverse linguistic contexts.[30][1]Command-Line Interface
Hunspell provides a standalone command-line interface for performing spell checking, morphological analysis, and related tasks on text files or standard input. The tool is invoked using thehunspell executable, which supports batch processing of files and interactive editing sessions. It is designed to be compatible with Ispell's interface, allowing seamless integration into scripts and text processing pipelines.[31][2]
The basic syntax for checking a file with a specified dictionary is hunspell -d <dictionary> <file>, where <dictionary> refers to the base name of the dictionary files (e.g., en_US for the American English dictionary, assuming .dic and .aff files are available). Without a file argument, Hunspell reads from standard input. For example, hunspell -d en_US textfile.txt processes the specified text file using the English dictionary and enters interactive mode by default if errors are found. Dictionaries can be chained for compound support, such as hunspell -d en_US,en_med medical.txt to include medical terminology. The tool respects locale environment variables like LANG or LC_ALL to select default dictionaries if none are specified.[31][2]
Key options control input handling, output verbosity, and processing modes. The -l flag lists only misspelled words, one per line, making it suitable for piping into other tools: hunspell -d en_US -l textfile.txt. For pipe mode, -a enables reading from standard input and outputs a formatted stream with indicators like * for correct words, & for misspelled words followed by suggestion counts and alternatives (e.g., & exsample 4 0: example, examples, sampler, sample), - for compounds, and # for words with no suggestions. The -s option stems words to their root forms, while -m performs morphological analysis, outputting details like part-of-speech tags. Input encoding can be set with -i <encoding>, and special formats like HTML (-H), TeX (-t), or nroff (-n) are supported. Personal dictionaries for user-specific additions are managed via -p <path>, defaulting to $[HOME](/page/Home)/.hunspell_<dictionary>. The --check-url flag treats URLs, emails, and paths as valid without checking.[31][2]
In interactive mode, Hunspell prompts for each misspelled word, offering suggestions and commands for correction. Users can replace the word (R followed by a suggestion number), add it to the personal dictionary (A), ignore it (I), or quit (q). This mode facilitates on-the-fly editing, with changes applied to the input file if writable. For batch scripts, output can be redirected; for instance, a simple error-checking script might use hunspell -d en_US -l < input.txt > errors.txt to isolate issues for review. Integration with text processors is common, such as piping through aspell wrappers or embedding in Makefiles for document validation. The tool's Ispell compatibility ensures outputs align with legacy workflows, including suggestion formats like & word N offset: sug1, sug2.[31][2]
Dictionary Management
Hunspell dictionaries are created by compiling word lists into paired.dic and .aff files, which define the vocabulary and morphological rules respectively. The .dic file contains a header specifying the number of words followed by the word list, while the .aff file outlines affixation rules and flags; these can be generated manually using text editors or through specialized tools such as affixcompress for compressing affix data and wordforms for generating inflected forms from base words. For custom languages, users start with a basic word list sourced from corpora or existing resources, then iteratively refine the affix rules to handle derivations and compounds specific to the language's morphology.[2][20]
Over 100 language-specific Hunspell dictionaries are available, often distributed through LibreOffice extensions or the Hunspell project on SourceForge, supporting diverse scripts and features like full morphological analysis for agglutinative languages such as Hungarian, which includes complex compounding rules. These pre-built dictionaries can be extended by users for dialects or specialized terminologies, ensuring compatibility with applications like LibreOffice by placing the files in designated directories.[32][3]
Personal and temporary dictionaries allow runtime customization without altering core files; the command-line tool supports adding words via the -p flag, specifying a user-defined .dic file for session-specific additions, while persistent personal dictionaries are stored as simple word lists in user home directories for ongoing use across sessions.[20]
Management tools facilitate conversion and validation: Aspell dictionaries can be converted to Hunspell format by unzipping .cwl files to word lists, applying phonetic transformations if needed, and pairing with adapted affix rules. For rule consistency, the hunspell -m option performs morphological analysis on sample texts to verify dictionary integrity, and build-time make check tests ensure affix rules align with word entries during compilation.[20][2]
Best practices emphasize encoding verification to prevent mismatches—preferring UTF-8 for broad compatibility—and rigorous testing by running the command-line tool against representative sample texts from the target language to identify gaps in coverage or erroneous suggestions before deployment.[20][2]
Licensing and Availability
License Terms
Hunspell is distributed under a tri-license comprising the GNU Lesser General Public License version 2.1 (LGPL-2.1) or later for the library, the GNU General Public License version 2.0 (GPL-2.0) for the executable, and the Mozilla Public License version 1.1 (MPL-1.1) to enable file-level licensing choices.[2] This structure allows users to select the most appropriate license based on their project's needs, promoting flexibility in integration.[1] Under the LGPL-2.1, the library can be dynamically linked into proprietary software without requiring the disclosure of the entire application's source code, provided that the library itself remains modifiable and its source is made available.[33] In contrast, the GPL-2.0 applies to the standalone executable, mandating that any derivative works or distributions include full source code availability to ensure copyleft compliance.[34] The MPL-1.1 facilitates per-file relicensing, allowing modified files to be dual-licensed under compatible terms while preserving the original file's open-source status.[35] These copyleft requirements mean that GPL-covered derivatives must offer source code, whereas LGPL permits proprietary linking via dynamic libraries without broader disclosure obligations. The tri-license was adopted in 2006 to expand adoption beyond a GPL-only model, specifically to facilitate inclusion in projects like Mozilla products that required more permissive terms for proprietary components.[36] This change broadened Hunspell's usability in diverse ecosystems. For compliance examples, the LGPL provisions have enabled safe integration into closed-source applications such as Google Chrome, where the spell-checking library is dynamically linked without triggering full source release.Distribution and Ports
Hunspell is primarily distributed through its official GitHub repository at hunspell/hunspell, where developers can access the source code, contribute, and follow development updates.[2] Pre-compiled binaries and archives are available via SourceForge, providing stable releases for download since the project's inception.[3] For ease of installation on various platforms, Hunspell is packaged in popular repository managers, including Homebrew for macOS (installable viabrew install hunspell) and apt for Debian-based Linux distributions like Ubuntu.[27][6]
Pre-built binaries facilitate quick deployment without compilation. On Windows, users can obtain binaries through package managers such as Chocolatey (version 1.7.0 portable) or winget (via winget install FSFhu.Hunspell).[37][38] For Debian and Ubuntu in 2025, the package version is 1.7.2+really1.7.2-11, available directly from repositories.[6]
Hunspell has been ported to several programming languages and frameworks to enable integration in diverse environments. The C# port NHunspell provides spell-checking capabilities for .NET applications, with the latest stable version at 1.2.5554.[39] In Python, pyhunspell offers bindings to the Hunspell engine, allowing dictionary loading and word suggestions, though its last major update was in 2018.[40] For R, the hunspell package (version 3.0.6 as of March 2025) delivers high-performance stemming, tokenization, and spell-checking functionalities.[41] Additionally, the .NET port WeCantSpell.Hunspell (version 6.0.3, updated September 2025) is a fully managed implementation without unmanaged dependencies, supporting concurrent queries and competitive performance on modern .NET frameworks.[42]
Dictionary bundles for Hunspell support over 100 languages and are often included with applications like LibreOffice, where they enable multilingual spell-checking out of the box.[43] Separate downloads are available through LibreOffice's dictionary repository or community collections, covering spelling, hyphenation, and thesaurus data for languages ranging from major ones like English and Spanish to less common variants.[44][45]
For custom builds, Hunspell uses an autotools-based system (autoconf, automake, libtool) that supports cross-platform compilation on GNU/Linux, Unix-like systems, macOS, and Windows via MinGW or Cygwin.[2] The process involves running autoreconf -vfi, ./configure (with options like --with-ui for enhanced features), make, and make install, ensuring compatibility across architectures without native CMake support in the official distribution.[2]