Fact-checked by Grok 2 weeks ago

Hunspell

Hunspell is a free and open-source spell-checking library and command-line tool designed primarily for languages with rich morphology, complex word compounding, and diverse character encodings. It originated as an enhanced successor to MySpell, the spell checker used in early versions of OpenOffice.org, and was developed to address limitations in handling agglutinative languages like Hungarian, Finnish, and Turkish.
Key improvements include support for Unicode, advanced morphological analysis, stemming, and word generation, enabling more accurate spell-checking and suggestions through n-gram similarity matching and pronunciation-based corrections.
Hunspell is licensed under a tri-license of (GPL), (LGPL), and (MPL), allowing flexible integration into both open-source and proprietary applications.
It maintains with MySpell dictionaries while supporting files for rule-based , making it adaptable for over 100 languages through community-contributed dictionaries.
Widely adopted in major software, Hunspell powers spell-checking in , , Mozilla Firefox, , , macOS (since version 10.6), , , and translation tools like SDL Trados and .
The library is implemented in C++ and offers bindings for numerous programming languages and interfaces, including Java, , Perl, Ruby, .NET, , and , facilitating its use in diverse environments from desktop applications to embedded systems.
Development of Hunspell has been led by László Németh since its inception around 2005, with sponsorship from organizations such as the FSF.hu Foundation, IMEDIA, Budapest University of Technology and Economics, OpenTaal Foundation, and the .
Ongoing maintenance occurs through the repository, where contributions focus on improving performance, adding dictionary support, and integrating with translation platforms like Weblate for collaborative localization.

Introduction

Overview

Hunspell is a , open-source and morphological analyzer , accompanied by a command-line , designed primarily to handle languages featuring rich , complex compounding, and challenging character encodings. It excels in processing agglutinative languages such as and , as well as compound-heavy languages like , where traditional spell checkers often struggle with inflectional variations and word formation rules. The provides robust support for morphological , enabling , generation, and detailed word breakdown, while its spell-checking capabilities include intelligent suggestions tailored to linguistic complexities. Released under a tri-license of LGPL, GPL, and MPL, Hunspell ensures broad compatibility and adoption in open-source ecosystems. Its current stable version, 1.7.2, was issued on December 29, 2022, with ongoing maintenance and minor updates continuing into 2025 to address compatibility and performance needs across distributions. Evolving from earlier tools like MySpell, it maintains backward compatibility with existing dictionaries while introducing enhancements for modern requirements. A key strength of Hunspell lies in its Unicode support, accommodating the first 65,535 Unicode characters for affix rules and enabling handling of diverse scripts and encodings beyond basic 8-bit limitations. This feature, combined with its morphological tools, positions it as a versatile solution for multilingual environments, powering spell checking in applications such as and Mozilla Firefox.

Design Principles

Hunspell was designed to address the limitations of earlier spell checkers in handling languages with rich morphology and complex compounding, such as agglutinative languages like Hungarian or those with intricate affixation like German. Its core innovation lies in advanced affix handling, including support for homonyms, circumfixes, fogemorphemes, and zero morphemes, which enable morphological generation and analysis far beyond basic dictionary lookups. Additionally, it incorporates twofold affix stripping to efficiently manage multiple layers of suffixes and prefixes, reducing the number of rules needed for complex word formations. Compounding rules allow recognition of arbitrarily long compounds and affixation within them, ensuring accurate spell checking for word-level writing systems. A key design principle is with Ispell and MySpell formats, facilitating seamless migration of existing dictionaries and minimizing adoption barriers for users transitioning from those systems. This compatibility is enhanced by innovations like alias compression for rules, which optimize storage without sacrificing functionality. Hunspell emphasizes high customizability through configurable algorithms, word-part tables, and over 65,000 classes, allowing tailored implementations for diverse linguistic needs. Efficiency in suggestion generation and overall performance is prioritized via optimizations for large vocabularies and quick processing, making it suitable for real-time applications. As a library, it provides C++ and C APIs, shared library support, and bindings for multiple languages, promoting integration into varied software environments from desktop applications to web browsers. To broaden adoption, Hunspell employs a tri-license model under the Mozilla Public License (MPL), GNU Lesser General Public License (LGPL), and GNU General Public License (GPL), accommodating both open-source and proprietary uses.

History and Development

Origins from MySpell

Hunspell's development began around 2005 under the leadership of , a developer, as a reimplementation and extension of MySpell to address shortcomings in for morphologically rich languages. MySpell itself was a C++ port of the Ispell , originally created for integration into to provide efficient affix compression and dictionary handling. A key motivation for Hunspell stemmed from MySpell's limitations, particularly its inadequate support for complex morphological rules and word , which proved insufficient for languages like that rely heavily on affixation and compound formation. To overcome these issues, Hunspell incorporated enhanced affix-based mechanisms, enabling more accurate analysis and generation of word forms while maintaining with MySpell dictionaries. The library saw its initial integration into with version 2.0.2, released in February 2006, where it fully replaced MySpell as the default . This adoption marked a significant step in improving multilingual support within the suite. Early efforts in Hunspell's creation emphasized support for the , driven by Németh's background and sponsorship from Hungarian organizations such as the FSF.hu Foundation and Technical University's Media Research Centre.

Key Milestones

Hunspell's development gained significant momentum following its initial transition from MySpell, with key adoptions marking its early integration into major open-source projects. In 2006, Hunspell was officially adopted as the default in 2.0.2, replacing MySpell and enabling enhanced support for complex morphologies in office productivity applications. By 2008, Hunspell saw broader adoption in web technologies through its integration into Mozilla Firefox 3 and , providing inline spell checking for email and browsing with improved handling of agglutinative languages. During the , the project advanced through the 1.2 to 1.3 version series, which introduced enhanced 6.0 support for broader compatibility and refined rules to better manage in languages like and . A notable enhancement came in 2016 with the release of version 1.6.0, which optimized suggestion algorithms for faster performance, reducing generation times through improved n-gram matching and limiting overgeneration in compound words. In 2022, version 1.7.2 was released, incorporating the SPELLML XML to enable runtime dictionary extensions and custom affix rules without recompilation, facilitating easier in dynamic environments. From 2023 to 2025, Hunspell underwent ongoing maintenance with fixes and compatibility updates, including a to via the hunspell package version 3.0.7, which extended its utility for statistical computing and text analysis in environments; the project's repository maintained active development, accumulating over 265 open issues as of late 2025. Throughout its evolution, Hunspell solidified its role as a default spell checker in and , powering spell checking for billions of users across browsers and office suites.

Features

Spell Checking Capabilities

Hunspell's core spell checking functionality relies on dictionary lookup, where words are verified against a base file (.dic) containing valid word forms, augmented by rules (.aff) that enable stripping and reapplication of prefixes and suffixes to generate inflected or derived forms. This twofold processing allows efficient handling of morphological variations without enumerating every possible word in the , making it suitable for languages with rich inflection like or Turkish. The library supports encoding via , enabling spell checking of multilingual text and characters beyond basic ASCII, while also accommodating legacy 8-bit encodings such as ISO-8859-1 through configurable SET directives in affix files. For complex compounding, Hunspell employs recursive breaking and rule-based validation, supporting arbitrary-length compounds in languages like , , , and via flags such as COMPOUNDFLAG and COMPOUNDRULE to define allowable combinations and prevent overgeneration. Additional options include ignore lists to exclude specific characters or patterns from checking, such as diacritics in via the IGNORE directive, and support for dictionaries that allow users to add custom words with optional affixation. is configurable, with features like KEEPCASE to restrict uppercase forms and CHECKCOMPOUNDCASE to enforce proper casing at compound word boundaries, accommodating language-specific rules such as ß or Turkish dotted i. Hunspell integrates hyphenation capabilities through compatibility with the library's pattern-based rules, using BREAK and COMPOUNDRULE options to identify hyphenation points and handle hyphenated compounds during spell checking. This extends basic error detection to include hyphenation-aware validation, though advanced morphological parsing for identification is handled separately.

Morphological Analysis

Hunspell's morphological analyzer decomposes input words into their base stems and associated affixes by applying rules defined in dictionary and affix files, enabling the processing of both inflectional —such as tense, number, or case endings—and derivational , including prefixes and suffixes that alter word or meaning. This rule-based approach allows for precise linguistic breakdown, as seen in the analysis of "drinkable," which yields the stem "drink" with the derivational flag "ds:able" and part-of-speech tag "po:." In generation mode, Hunspell constructs inflected or derived word forms from a given stem by applying specified affix rules, facilitating applications such as grammar checking where correct forms must be verified against expected paradigms. For instance, starting from the stem "foot," it can generate the plural "feet" using an inflectional rule flagged with "is:plural," ensuring compatibility with syntactic requirements in downstream processing. The system supports over 65,000 affix classes per dictionary, organized via flags that permit complex combinations of prefixes and suffixes, which is essential for handling the rich, agglutinative morphologies of languages like Turkish and Estonian. This capacity enables twofold affix stripping—applying multiple layers of suffixes in sequence—to parse highly compounded or inflected words without performance degradation. Output from the analyzer includes part-of-speech tags (e.g., "po:"), lemma extraction via the field (e.g., "st:foot" for irregular forms like "feet"), and full generation that enumerates all possible inflections for a given . These formats are delivered as space- or tab-separated fields, supporting integration into pipelines. Unlike simple algorithms, which provide only approximate root forms through suffix removal or statistical methods, Hunspell's morphological delivers a complete, rule-driven with explicit and feature annotations, preserving linguistic accuracy for morphologically complex languages. This depth enhances its utility beyond basic checking by enabling detailed error diagnosis in inflected forms.

Suggestion Algorithms

Hunspell employs a multi-stage approach to generate spelling suggestions for misspelled words, prioritizing efficiency and accuracy through targeted error correction strategies. The process begins with near-miss techniques that simulate common typing errors, such as single-letter swaps (e.g., adjacent key transpositions on keyboards), deletions, insertions, and replacements based on character proximity defined in the affix file's option. These edits are generated systematically and checked against the to identify valid words, with additional support for character movements and double swaps to capture more complex mistakes. Rule-based replacements further enhance this stage via REP tables in the affix file, which map frequent misspellings to corrections (e.g., "" to "the"), allowing customization for language-specific or user-defined errors. If near-miss edits yield insufficient results, Hunspell advances to n-gram-based similarity matching, where it computes overlaps between the misspelled word and dictionary entries using adjustable parameters like MAXNGRAMSUGS to limit the number of candidates (default 5, range 0-10). Phonetic encoding provides an additional layer for handling pronunciation-based errors, utilizing a table-driven transcription borrowed from Aspell via the PHONE directive in the file; this maps characters to phonetic equivalents, enabling suggestions for non-orthographic languages or noisy input. For languages like English or those with phonetic dictionaries, this can approximate algorithms such as Double Metaphone, though Hunspell's implementation focuses on customizable PHONE tables for broader applicability. Suggestions are ranked primarily by the order of generation stages—REP replacements receive highest priority, followed by exact edit matches, n-gram similarities (weighted by and overlap length), and phonetic matches—while incorporating dictionary frequency implicitly through stem selection and morphological fit for affixed forms. Compound word support allows word-part suggestions, breaking potential compounds and applying edits to segments, with limits like MAXCOMPOUNDSUGS to prevent excessive computation. Language-specific handling, such as the LANG option for vowel harmony rules, ensures culturally attuned corrections by restricting invalid combinations during suggestion generation. Performance optimizations enable real-time use in applications like text editors, including caps limits on suggestion counts (e.g., MAXSUGGESTIONS) and early termination if sufficient high-quality candidates are found, reducing computational overhead in large dictionaries. This integration with morphological analysis allows stem-level suggestions, where corrections align with valid affixations for inflected languages.

Technical Implementation

Dictionary Format

Hunspell employs a dual-file format for its dictionaries, consisting of a main dictionary with the extension .dic and an accompanying affix with the extension .aff. The .dic file serves as the primary of words, while the .aff file provides and configurations necessary for those words, enabling morphological and checking. The .dic file is structured as a list of words, one per line, beginning with an approximate word count on the first line to optimize allocation for efficient lookup. Each entry typically consists of a base word followed by optional numeric or character flags separated by a slash, which indicate the applicability of specific rules defined in the .aff file; for example, work/AB denotes that the word "work" can be modified by affixes associated with flags A and B. Slashes within words themselves are escaped using a (e.g., word\/), and supports up to thousands of entries for practical in spell-checking operations. Encoding for multilingual support is declared in the .aff file using the SET directive, such as SET UTF-8 for Unicode compatibility or SET ISO8859-1 for legacy 8-bit encodings, ensuring proper handling of characters across various languages including those with diacritics or non-Latin scripts. This declaration applies to both the .aff and associated .dic files, facilitating international dictionary development. Compound word permissions are managed through flags in the .dic file, which reference rules in the .aff file, such as the COMPOUNDRULE option that allows for valid combinations (e.g., permitting "" based on predefined regex-like patterns for ). These flags enable flexible construction of compound forms without enumerating every possibility in the . Extension mechanisms include personal word lists, which can be appended directly to a .dic file as additional plaintext entries with optional flags, allowing users to customize for specific needs like adding domain-specific terms (e.g., specialterm/C). Such additions override or supplement the base during runtime, supporting user-specific adaptations without altering core files.

Affix Rules

Hunspell employs rules to manage morphological inflections and derivations, enabling the to recognize and generate word forms from base stems through prefixes and suffixes. These rules are specified in the (typically with a .aff extension) and support complex language morphologies, such as those in agglutinative languages like or . The core types are prefixes (PFX) and suffixes (SFX), each defined with conditions that determine applicability to stems, allowing for efficient handling of derivations without exhaustively listing all variants in the dictionary. The syntax for prefix rules begins with a header line: PFX <flag> <cross_product> <number>, where <flag> identifies the affix class (e.g., a single character like 'A'), <cross_product> is 'Y' to permit combination with opposite affixes or 'N' to restrict it, and <number> indicates the count of following rules. Each subsequent rule line follows: PFX <flag> <stripping> <affix> [<condition> [<morphological_fields>]]. Here, <stripping> specifies characters removed from the stem's beginning (0 for none), <affix> is the prefix added (0 for none), <condition> is a regex-like pattern (e.g., . for any character or [^y] for not ending in 'y'), and optional <morphological_fields> provide additional data like part-of-speech tags. For example, a rule PFX A Y 1 followed by PFX A 0 re . adds the prefix "re-" to any stem, enabling forms like "rework" from "work". Suffix rules mirror this structure but apply to the end: SFX <flag> <stripping> <affix> [<condition> [<morphological_fields>]], such as SFX B Y 2 with lines SFX B 0 ed [^y] and SFX B y ied y to generate "worked" or "tried" from stems ending appropriately. Flags serve as identifiers for affix classes and support multiple formats for flexibility: default 8-bit ASCII characters, for international scripts, two-character "long" flags, or numeric values up to 65,000 via the FLAG num directive, allowing over 65,000 distinct classes. The cross-product mechanism (Y/N) facilitates generation of combinations, such as applying both prefixes and suffixes to a for disjunctive or circumfix rules, while continuation flags (e.g., /Y in the field) enable chained applications within the same class. For words, specific flags and options enhance validation: COMPOUNDFLAG marks allowable compound components, COMPOUNDBEGIN, COMPOUNDMIDDLE, and COMPOUNDEND restrict positions in sequences, COMPOUNDPERMITFLAG allows affixes inside compounds, and COMPOUNDFORBIDFLAG prohibits them. Additionally, COMPOUNDMIN sets the minimum length for compound parts (default 3), and CHECKCOMPOUND prevents invalid compounds mimicking words with replacement errors. Advanced syntax elements include COMPLEXPREFIXES, which permits multiple prefix stripping for languages with right-to-left affixation, and TWOAFFIX (or CIRCUMFIX), enabling bidirectional affix application like simultaneous prefix and stripping (e.g., for "un-friend-ly"). Compound validation extends via COMPOUNDRULE for pattern-based checks using regex-like expressions with flags, and options like CHECKCOMPOUNDCASE to enforce case consistency at boundaries or CHECKCOMPOUNDDUP to forbid repetitions. Language-specific adaptations, such as COMPOUNDSYLLABLE for syllable-based limits in , integrate with these rules. Limitations include a default single-pass stripping per affix type (extendable via flags), conditions bounded by word length unless FULLSTRIP is set, and up to 65,000 classes for performance, though UTF-8 flags may underperform on certain architectures like . These features collectively provide morphological flexibility while maintaining computational efficiency.

Algorithm Overview

Hunspell's spell checking algorithm begins with tokenization of input text, where words are identified using predefined break characters such as hyphens and apostrophes to delineate boundaries, ensuring accurate segmentation even in languages with complex punctuation. Following tokenization, the system applies normalization to handle variations in case, encoding, and character representations, converting inputs to a canonical form compatible with the dictionary, such as UTF-8 or ISO8859-1, through optional input/output conversion tables. The core validation step involves affix stripping, where prefixes and suffixes are iteratively removed according to rules defined in the affix file—supporting up to twofold suffix stripping for agglutinative languages— to match the remaining stem against the dictionary; if a match is found, affixes are regenerated to confirm the original word's validity. For languages featuring words, Hunspell employs a recursive that decomposes potential compounds into subwords using compound flags to mark eligible entries, while enforcing minimum and maximum length rules as well as checks for duplicates and to prevent invalid formations. This process utilizes hash tables for lookups, enabling average O(1) for matching and efficient handling of large lexicons with minimal overhead through techniques like alias . Error tolerance in suggestion generation relies on a Levenshtein-like calculation, limited to a small number of operations such as insertions, deletions, substitutions, and swaps—typically capped at two changes—to identify plausible corrections, supplemented by replacement tables for common phonetic or typographical errors. The morphological analysis pipeline extends beyond simple by first reducing words to their base forms via rules and then enumerating possible paradigms, including part-of-speech tags and inflectional details, when full analysis is requested through library functions like analyze.

Applications and Usage

Integrated Software

Hunspell serves as the default spell-checking engine in several prominent open-source office suites. and have integrated Hunspell since 2006, replacing the earlier MySpell component in version 2.0.2, with support for custom dictionaries that can be embedded directly into documents for personalized spell-checking needs. In web and email applications, Hunspell powers inline spell checking starting from Mozilla version 3 (2008) and version 3 (2009), enabling real-time correction of text in web forms, composition windows, and other editable fields. incorporates Hunspell for form-based and page-level spell checking, utilizing optimized binary dictionary files (.bdic) derived from standard Hunspell affix (.aff) and dictionary (.dic) formats to handle multilingual input efficiently. Beyond these core applications, Hunspell finds use in various other environments. On macOS, it has been available since version 10.6 and can be installed via Homebrew for integration into tools like text editors. Ports exist for , allowing embedding in mobile apps through JNI wrappers for on-device spell checking. Proprietary software such as Trados Studio employs Hunspell as its primary , supporting custom and language-specific dictionaries for translation workflows. Overall, Hunspell's adoption extends to over 100 languages, facilitated by community-maintained dictionaries distributed through repositories like those for and add-ons, ensuring broad accessibility across diverse linguistic contexts.

Command-Line Interface

Hunspell provides a standalone for performing spell checking, morphological analysis, and related tasks on text files or standard input. The tool is invoked using the hunspell executable, which supports of files and interactive editing sessions. It is designed to be compatible with Ispell's interface, allowing seamless integration into scripts and text processing pipelines. The basic syntax for checking a file with a specified dictionary is hunspell -d <dictionary> <file>, where <dictionary> refers to the base name of the dictionary files (e.g., en_US for the American English dictionary, assuming .dic and .aff files are available). Without a file argument, Hunspell reads from standard input. For example, hunspell -d en_US textfile.txt processes the specified text file using the English dictionary and enters interactive mode by default if errors are found. Dictionaries can be chained for compound support, such as hunspell -d en_US,en_med medical.txt to include medical terminology. The tool respects locale environment variables like LANG or LC_ALL to select default dictionaries if none are specified. Key options control input handling, output verbosity, and processing modes. The -l lists only misspelled words, one per line, making it suitable for into other tools: hunspell -d en_US -l textfile.txt. For pipe mode, -a enables reading from standard input and outputs a formatted stream with indicators like * for correct words, & for misspelled words followed by suggestion counts and alternatives (e.g., & exsample 4 0: example, examples, sampler, sample), - for compounds, and # for words with no suggestions. The -s option stems words to their root forms, while -m performs morphological analysis, outputting details like part-of-speech tags. Input encoding can be set with -i <encoding>, and special formats like (-H), (-t), or (-n) are supported. Personal dictionaries for user-specific additions are managed via -p <path>, defaulting to $[HOME](/page/Home)/.hunspell_<dictionary>. The --check-url treats URLs, emails, and paths as valid without checking. In interactive mode, Hunspell prompts for each misspelled word, offering suggestions and commands for correction. Users can replace the word (R followed by a suggestion number), add it to the personal (A), ignore it (I), or quit (q). This mode facilitates on-the-fly editing, with changes applied to the input file if writable. For batch scripts, output can be redirected; for instance, a simple error-checking script might use hunspell -d en_US -l < input.txt > errors.txt to isolate issues for review. Integration with text processors is common, such as through aspell wrappers or in Makefiles for document validation. The tool's Ispell ensures outputs align with workflows, including suggestion formats like & word N offset: sug1, sug2.

Dictionary Management

Hunspell dictionaries are created by compiling word lists into paired .dic and .aff files, which define the vocabulary and morphological rules respectively. The .dic file contains a header specifying the number of words followed by the word list, while the .aff file outlines affixation rules and flags; these can be generated manually using text editors or through specialized tools such as affixcompress for compressing affix data and wordforms for generating inflected forms from base words. For custom languages, users start with a basic word list sourced from corpora or existing resources, then iteratively refine the affix rules to handle derivations and compounds specific to the language's morphology. Over 100 language-specific Hunspell dictionaries are available, often distributed through extensions or the Hunspell project on , supporting diverse scripts and features like full morphological analysis for agglutinative languages such as , which includes complex rules. These pre-built dictionaries can be extended by users for dialects or specialized terminologies, ensuring compatibility with applications like by placing the files in designated directories. Personal and temporary dictionaries allow runtime customization without altering core files; the command-line supports adding words via the -p , specifying a user-defined .dic file for session-specific additions, while persistent personal dictionaries are stored as simple word lists in user home directories for ongoing use across sessions. tools facilitate conversion and validation: Aspell dictionaries can be converted to Hunspell format by unzipping .cwl files to word lists, applying phonetic transformations if needed, and pairing with adapted rules. For rule consistency, the hunspell -m option performs morphological analysis on sample texts to verify dictionary integrity, and build-time make check tests ensure rules align with word entries during compilation. Best practices emphasize encoding verification to prevent mismatches—preferring for broad compatibility—and rigorous testing by running the command-line tool against representative sample texts from the target language to identify gaps in coverage or erroneous suggestions before deployment.

Licensing and Availability

License Terms

Hunspell is distributed under a tri-license comprising the GNU Lesser General Public License version 2.1 (LGPL-2.1) or later for the library, the GNU General Public License version 2.0 (GPL-2.0) for the executable, and the version 1.1 (MPL-1.1) to enable file-level licensing choices. This structure allows users to select the most appropriate based on their project's needs, promoting flexibility in integration. Under the LGPL-2.1, the library can be dynamically linked into without requiring the disclosure of the entire application's , provided that the library itself remains modifiable and its source is made available. In contrast, the GPL-2.0 applies to the standalone executable, mandating that any derivative works or distributions include full availability to ensure compliance. The MPL-1.1 facilitates per-file relicensing, allowing modified files to be dual-licensed under compatible terms while preserving the original file's open-source status. These requirements mean that GPL-covered derivatives must offer , whereas LGPL permits linking via dynamic libraries without broader disclosure obligations. The tri-license was adopted in 2006 to expand adoption beyond a GPL-only model, specifically to facilitate inclusion in projects like products that required more permissive terms for proprietary components. This change broadened Hunspell's usability in diverse ecosystems. For compliance examples, the LGPL provisions have enabled safe integration into closed-source applications such as , where the spell-checking library is dynamically linked without triggering full source release.

Distribution and Ports

Hunspell is primarily distributed through its official repository at hunspell/hunspell, where developers can access the source code, contribute, and follow development updates. Pre-compiled binaries and archives are available via , providing stable releases for download since the project's inception. For ease of installation on various platforms, Hunspell is packaged in popular repository managers, including Homebrew for macOS (installable via brew install hunspell) and apt for Debian-based distributions like . Pre-built binaries facilitate quick deployment without compilation. On Windows, users can obtain binaries through package managers such as (version 1.7.0 portable) or winget (via winget install FSFhu.Hunspell). For Debian and in 2025, the package version is 1.7.2+really1.7.2-11, available directly from repositories. Hunspell has been ported to several programming languages and frameworks to enable integration in diverse environments. The C# port NHunspell provides spell-checking capabilities for .NET applications, with the latest stable version at 1.2.5554. In , pyhunspell offers bindings to the Hunspell engine, allowing dictionary loading and word suggestions, though its last major update was in 2018. For , the hunspell package (version 3.0.6 as of March 2025) delivers high-performance , tokenization, and spell-checking functionalities. Additionally, the .NET port WeCantSpell.Hunspell (version 6.0.3, updated September 2025) is a fully managed implementation without unmanaged dependencies, supporting concurrent queries and competitive performance on modern .NET frameworks. Dictionary bundles for Hunspell support over 100 languages and are often included with applications like , where they enable multilingual spell-checking out of the box. Separate downloads are available through 's dictionary repository or community collections, covering spelling, hyphenation, and thesaurus data for languages ranging from major ones like English and to less common variants. For custom builds, Hunspell uses an autotools-based system (, , libtool) that supports cross-platform compilation on /Linux, systems, macOS, and Windows via or . The process involves running autoreconf -vfi, ./configure (with options like --with-ui for enhanced features), make, and make install, ensuring compatibility across architectures without native support in the official distribution.

Community and Maintenance

Primary Author

László Németh is the primary author and lead developer of Hunspell, an open-source spellchecking library renowned for its support of morphologically rich languages. A Hungarian national, Németh began his career as a biologist before transitioning to free software development, where he has made significant contributions to linguistic tools and office productivity software. Since 2006, he has worked as a lead programmer for the LibreOffice project, focusing on internationalization, hyphenation, and spelling components that integrate seamlessly with the suite. His expertise in handling complex agglutinative languages like Hungarian has been central to his technical approach. Németh initiated the development of Hunspell in the early , with primary sponsorship from to 2005 by the Technical University's Media Research Centre, laying the foundation for its advanced affix-based morphology and compound word handling. He led the initial implementation from approximately 2002 to 2005, transforming it from an extension of MySpell into a standalone library, and has since authored all major releases, ensuring compatibility with and diverse encoding systems. Throughout its evolution, Németh has maintained a particular emphasis on the dictionary, refining rules for suffixation, prefixation, and morphological generation to achieve high accuracy for inflected forms. Beyond Hunspell, Németh has authored several complementary open-source projects that enhance document processing and language handling. These include LibreLogo, a turtle graphics programming environment embedded in ; Numbertext, a cross-platform library for numerical-to-textual conversion in multiple languages; Lightproof, a rule-based and style checker; and the specialized Hungarian spellchecker used in . His broader portfolio reflects a commitment to accessible tools for education and productivity, often tailored to non-Latin scripts and European languages. Németh's contributions have earned recognition within the community, including a speaking engagement at 2019 on and improvements in . The FSF.hu Foundation has provided ongoing support for Hunspell's releases, underscoring his role in sustaining high-quality linguistic resources. As of 2025, Németh continues as the active maintainer of the project on , overseeing bug fixes, feature enhancements, and dictionary integrations.

Contributions and Future Directions

The Hunspell project benefits from an active open-source community that contributes through translations, bug reporting, and dictionary development. Translations are coordinated via Weblate, supporting over 75 languages and allowing volunteers to improve localization for user interfaces and documentation. Bug reports and feature requests are managed on the project's repository, which as of 2025 hosts more than 265 issues, fostering collaborative debugging and enhancements. Dictionary contributions from users expand support for additional languages and vocabularies, often shared through community repositories and integrated into the core project. Key contributors have shaped Hunspell's architecture and evolution. Kevin Hendricks developed the foundational MySpell library, providing the initial C++ spell-checking codebase upon which Hunspell was built. Caolan McNamara implemented the original C API, enabling broader integration with applications like . Ongoing patches and improvements come from a diverse group of users, including Németh as the primary maintainer, with community members submitting code via pull requests on . Hunspell remains actively maintained, with regular releases addressing bugs, performance, and compatibility. The project follows a schedule of periodic updates, supported by foundations such as FSF.hu, ensuring stability across integrations like and products. For newcomers, the repository provides build instructions and documentation to set up development environments, including configurations for C++ compilation. Future directions emphasize extensibility and integration. Recent developments include expansions to SPELLML, an XML-based introduced in version 1.7.0, which enables run-time dictionary extensions for dynamic word lists without recompilation. Efforts are underway to enhance suggestion algorithms, potentially incorporating advanced techniques for better accuracy in complex morphologies, while improving support for low-resource languages through community-driven dictionary tools. Challenges in ongoing development include synchronizing with evolving Unicode standards to handle new characters and encodings efficiently, as seen in historical issues with non-UTF-8 support. Mobile optimizations pose additional hurdles, particularly for resource-constrained environments like , where performance tuning is needed to reduce memory usage and speed up checks without compromising accuracy.

References

  1. [1]
    Hunspell: About
    Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox & Thunderbird, Google Chrome, and it is also used by proprietary software ...
  2. [2]
    hunspell/hunspell: The most popular spellchecking library. - GitHub
    Hunspell is a free spell checker and morphological analyzer library and command-line tool, licensed under LGPL/GPL/MPL tri-license.Hunspell · Issues 265 · Actions · Security
  3. [3]
    Hunspell download | SourceForge.net
    Rating 4.6 (19) · FreeDownload Hunspell for free. Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and ...
  4. [4]
    Hunspell Spell Checking and Morphological Analysis - Docs
    The hunspell function is a high-level wrapper for finding spelling errors within a text document. It takes a character vector with text (plain, latex, man, ...
  5. [5]
    Releases · hunspell/hunspell - GitHub
    Dec 29, 2022 · Release notes, new features and bug fixes by László Németh, supported by FSF.hu Foundation: add SPELLML support for run-time dictionary extension.
  6. [6]
    hunspell - Debian Package Tracker
    [2025-08-17] hunspell 1.7.2+really1.7.2-11 MIGRATED to testing (Debian testing watch); [2025-08-11] Accepted hunspell 1.7.2+really1.7.2-11 (source) into ...
  7. [7]
    Debian -- Details of package hunspell in sid
    Main features: - Unicode support (first 65535 Unicode characters) - morphological analysis (in custom item and arrangement style) - Max. 65535 affix classes and ...
  8. [8]
    hunspell(4) - Linux man page
    Hunspell uses dictionary and affix files to define language for spell checking. Dictionary files contain words, and affix files define special flags.
  9. [9]
    OOo 2.0 (SRC680/aka 2.0.x) - Apache OpenOffice
    OpenOffice.org 2.0.4: September 2006 · OpenOffice.org 2.1: December 2006 · OpenOffice.org 2.2: March 2007 ...
  10. [10]
    Lingucomponent Project - Apache OpenOffice
    MySpell has been replaced with hunspell starting with OpenOffice.org 2.0.2. Hunspell builds on MySpell but supports Unicode and adds several other useful ...
  11. [11]
    Hunspell / News - SourceForge
    Hunspell 1.2.2 released. 2008-04-12: Hunspell 1.2.2 release: - extended dictionary support to use multiple base and special dictionaries ...
  12. [12]
    NEWS · apertis/1.7.0-3apertis0 · pkg / hunspell · GitLab
    2016-12-22: Hunspell 1.6.0 release: - Library changes: - Performance improvement in ngsuggest(), suggestions should be faster. - Revert MAXWORDLEN to 100 as ...
  13. [13]
    hunspell: High-Performance Stemmer, Tokenizer, and Spell Checker
    The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' ...
  14. [14]
    Issues · hunspell/hunspell - GitHub
    Is there a way to use the library to create a Hunspell instance without loading a dictionary? #1034 In hunspell/hunspell; · davidgiven opened on Jan 12Missing: R 3.0.7
  15. [15]
    hunspell/hyphen - GitHub
    A hyphenator with non standard hyphenation facilities based on extended Libhnj. The HyFo module is released in binary form as jar files and in source form as ...Missing: integration | Show results with:integration
  16. [16]
    hunspell(5) - Arch manual pages
    The Hunspell algorithm currently allows any affixed form of words, which are lexically marked as potential members of compounds. Hunspell improved this, and its ...
  17. [17]
    algo.suggest: main suggestion algorithm — Spylls documentation
    Note that Spylls's implementation takes one liberty comparing to Hunspell's: In Hunspell, ngram suggestions (select all words from dictionary that ngram-similar ...Missing: 1.6 2016
  18. [18]
    algo.phonet_suggest: phonetical suggestions — Spylls documentation
    Phonetical suggestion algorithm provides suggestions based on phonetical (prononication) similarity. It requires .aff file to define PHONE table – which, we ...
  19. [19]
    None
    ### Summary of Dictionary and Affix File Formats (Hunspell)
  20. [20]
    format of Hunspell dictionaries and affix files - Ubuntu Manpage
    Hunspell(1) Hunspell requires two files to define the way a language is being spell checked: a dictionary file containing words and applicable flags, and an ...
  21. [21]
    [PDF] Hunspell – The free spelling checker
    Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character ...
  22. [22]
    Linux Manpages Online - man.cx manual pages
    ### Summary of Hunspell Algorithm Details
  23. [23]
    Spell Checking and Dictionaries - Apache OpenOffice
    The MySpell spell checker uses a modified version of Ispell's dictionaries and affix files (modified to permit fast parsing, to be case sensitive, etc.)
  24. [24]
    Using an external spell checker - Mozilla - MDN Web Docs
    Jul 2, 2025 · Starting with Firefox 3 (as well as Thunderbird 3 and SeaMonkey 2), you can now install an external spell checker using an extension.Missing: 2008 | Show results with:2008
  25. [25]
    Editing the spell checking dictionaries - The Chromium Projects
    Each hunspell dictionary comes in two files. The .dic file which is the list of words, and the .aff file which is a list of rules and other options.
  26. [26]
    hunspell - Homebrew Formulae
    Install command: brew install hunspell. Spell checker and morphological analyzer. https://hunspell.github.io. License: MPL-1.1 or GPL-2.0-or-later or LGPL-2.1- ...
  27. [27]
    Hunspell compilled with JNI to be used in Android. - GitHub
    Hunspell compilled with JNI to be used in Android. Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, ...
  28. [28]
    Hunspell Spell Checker - Documentation Center
    Hunspell is automatically selected as the default spell checker in SDL Trados Studio.Supported Hunspell... · Hunspell Dictionary Format · Recommendations<|control11|><|separator|>
  29. [29]
    Add or remove Hunspell dictionaries - Adobe Help Center
    Jul 10, 2023 · Extract the contents of the zip archive to a folder and locate an affix (.aff) file, a spelling dictionary (.dic) file or a hyphenation ...
  30. [30]
    hunspell(1) - Linux man page - Die.net
    Hunspell is fashioned after the Ispell program. The most common usage is "hunspell" or "hunspell filename". Without filename parameter, hunspell checks ...
  31. [31]
    Development/Dictionaries - The Document Foundation Wiki
    Apr 8, 2025 · Several types of dictionaries are bundled within LibreOffice: hunspell - basic spell check using the Hunspell engine; hyphen - words hyphenation ...Extending a Dictionary in... · Dictionary Authors · Adding/Updating bundled...
  32. [32]
    hunspell/COPYING.LESSER at master · hunspell/hunspell
    Insufficient relevant content. The provided URL content does not include the full text of the LGPL-2.1 license or specific terms related to library use, permissions for linking in proprietary software, or dynamic linking allowances. It only contains navigation, feedback, and footer information from GitHub.
  33. [33]
    hunspell/COPYING at master · hunspell/hunspell
    **Summary of GPL-2.0 License Terms for Hunspell:**
  34. [34]
    hunspell/COPYING.MPL at master · hunspell/hunspell
    **Summary of MPL-1.1 Terms (Based on Available Content):**
  35. [35]
    Gray areas in software licensing - LWN.net
    Feb 15, 2012 · Hunspell changed its license in 2006, to the MPL/GPL/LGPL tri-license to enable inclusion in Mozilla. It is used as the spell-checker for ...
  36. [36]
    Hunspell (Portable) 1.7.0 - Chocolatey Community
    Chocolatey is software management automation for Windows that wraps installers, executables, zips, and scripts into compiled packages.
  37. [37]
    Fixing hunspell 1.7.0 for Emacs 29 on Windows - vxlabs
    Nov 14, 2023 · It starts pretty well, when you are able to install hunspell with a simple winget install FSFhu.Hunspell, after which you download a set of English ...
  38. [38]
    NHunspell 1.2.5554.16953 - NuGet
    Mar 17, 2015 · NHunspell is a spell check, hyphenation, word stemming and thesaurus library based on the Open Office spell check library Hunspell.
  39. [39]
    hunspell - PyPI
    Aug 6, 2018 · PyHunspell itself is licensed under the LGPL version 3 or later, see lgpl-3.0.txt and gpl-3.0.txt. The files in the debian/ directory and setup.
  40. [40]
    [PDF] hunspell: High-Performance Stemmer, Tokenizer, and Spell Checker
    The hunspell package is a low-level spell checker and morphological analyzer that finds spelling errors in text documents and parses words.
  41. [41]
    WeCantSpell.Hunspell 6.0.3 - NuGet
    WeCantSpell.Hunspell is a .NET port of Hunspell that reads DIC/AFF files, checks/suggests words, and has no unmanaged dependencies.
  42. [42]
    Language/Support - The Document Foundation Wiki
    Jun 11, 2024 · This page gives an overview of the level of language support of LibreOffice. Furthermore, links are provided to language-related add-ons and extensions.
  43. [43]
    LibreOffice/dictionaries - GitHub
    Contains dictionaries related code and data. See https://wiki.documentfoundation.org/Development/Dictionaries for more information.Missing: bundles | Show results with:bundles
  44. [44]
    wachin/libreoffice-dictionaries-collection - GitHub
    Sep 11, 2025 · Complete collection of multilingual dictionaries for LibreOffice (version 25.2.3) for spelling, synonyms, and hyphenation.
  45. [45]
    Laszlo Nemeth — English - LibreOffice Conference
    I'm a biologist and a free software developer (39). My recent job as a lead programmer is related to also free softwares, especially to LibreOffice.Missing: background | Show results with:background
  46. [46]
    FOSDEM 2019 - László Németh
    László Németh. LibreOffice developer. Author of Hunspell spell checker, LibreLogo, Numbertext, Lightproof and the Hungarian spelling dictionary. Worked for ...Missing: background | Show results with:background
  47. [47]
    Hunspell/Translations - Hosted Weblate
    Hunspell is being translated into 76 languages using Weblate. Join the translation or start translating your own project.
  48. [48]
    hunspell(3) - Linux man page - Die.net
    Author of MySpell is Kevin Hendricks. Author of Hunspell is László Németh. Author of the original C API is Caolan McNamara. Author of the Aspell table ...Missing: contributors | Show results with:contributors
  49. [49]
    Hunspell on Android - Stack Overflow
    Feb 1, 2011 · Does anyone successfully implemented Hunspell spell-checker on Android platform? Is it even possible? Did you try it? What about the results ...hunspell from java with personal dictionary - Stack OverflowSpelling libraries (like hunspell) in UWP Applications?More results from stackoverflow.com