Fact-checked by Grok 2 weeks ago

Apache SpamAssassin

Apache SpamAssassin is an open-source filtering tool that identifies and blocks unsolicited bulk , commonly known as , through a combination of , statistical, and tests applied to email headers and body content. It assigns a spam score to incoming messages based on factors such as Bayesian filtering, DNS-based blacklists, and rule-based , enabling system administrators to classify and filter emails effectively with minimal configuration. Originally created by Justin Mason in 2001 as a rewrite of an earlier Perl-based filter called filter.plx by Mark Jeftovic, it was uploaded to and quickly gained popularity for its extensibility and accuracy in combating rising volumes. In December 2003, SpamAssassin entered the Incubator and became a top-level project of in June 2004, benefiting from the foundation's collaborative development model and licensing under the 2.0. Written primarily in , it integrates seamlessly with major mail transfer agents like Postfix, , and , and supports plugins for custom rules, making it the leading open-source anti- solution for enterprise use. Key features include automatic updates for rulesets via sa-update, through sa-learn for training on and examples, and robust support for modern protocols, with the latest stable release, version 4.0.2, issued on August 30, 2025, to address security enhancements and 5.42 compatibility. Over its more than two decades of evolution, SpamAssassin has received accolades, such as top honors in the anti-spam category at LinuxWorld in 2006, and continues to evolve to counter sophisticated techniques while maintaining a focus on performance and false positive minimization.

Development History

Origins and Founding

Apache SpamAssassin originated as an open-source project initiated by software developer Justin Mason in 2001. Mason, who had been maintaining patches for an earlier Perl-based spam filter called filter.plx—originally created by Mark Jeftovic in 1997—decided to rewrite the tool from scratch to address its limitations and incorporate modern . On April 20, 2001, he uploaded the initial codebase to .net, marking the project's public debut as SpamAssassin, a Perl-implemented system designed to identify and score unsolicited bulk using rules and thresholds. The primary motivation behind SpamAssassin's creation was the escalating volume of that plagued inboxes following the rapid expansion of the in the and early . By the early , had evolved into a widespread , with unsolicited commercial messages overwhelming legitimate communications and straining infrastructure; estimates indicated that constituted a significant portion of global traffic, prompting the need for robust, customizable filtering solutions. aimed to build a flexible tool that integrated existing collaborative anti- services, such as Vipul's —a distributed checksum-based system for detecting known patterns—and Pyzor, which used similar digest-matching to identify bulk messages across networks. These integrations allowed SpamAssassin to leverage community-sourced data for improved detection accuracy from its outset. Early adoption was swift, with the project gaining traction among system administrators and open-source communities due to its modular design and effectiveness in reducing spam ingress. Hosted initially on , SpamAssassin benefited from collaborative development, with serving as the lead developer and overseeing the first public release featuring basic rule sets for header analysis, body text evaluation, and scoring mechanisms. In December 2003, the project entered the Apache Incubator to foster structured governance and broader participation, graduating as a top-level project in summer 2004 under the name Apache SpamAssassin. This transition provided enhanced legal protections, community support, and distribution channels, solidifying its role as a cornerstone of open-source email security.

Major Releases and Evolution

Apache SpamAssassin transitioned to the Apache Software Foundation in the summer of 2004, adopting the Apache License 2.0 and benefiting from the foundation's community-driven development model. This shift marked a pivotal evolution, enabling broader collaboration and sustained maintenance under the ASF's governance. Prior to this, key early releases laid the groundwork for its anti-spam capabilities; version 2.0, released in September 2002, introduced improvements in rule-based filtering and integration capabilities. Subsequent updates built on this foundation, with version 3.0 on September 22, 2004, enhancing rule sets for better heuristic matching, integrating Bayesian statistical filtering as a core feature to learn from user feedback on spam and legitimate emails, support for Sender Policy Framework (SPF), and more robust network checks shortly after the Apache transition. The 3.4 series, spanning the 2010s and into the early 2020s, emphasized stability and incremental improvements, with 3.4.0 released on February 11, 2014, adding native IPv6 support, refined DNS blocklist integration, and a Redis backend option for Bayesian filtering to handle larger-scale deployments. Later patches in this series, such as 3.4.6 on April 12, 2021, focused on security fixes and bug resolutions, after which the branch entered maintenance mode with no new features planned beyond critical updates. By 2025, the 3.4 branch received only security-related patches, reflecting the project's shift toward modernization in newer versions. A major milestone arrived with version 4.0.0 on December 17, 2022, which introduced full support, native handling throughout the codebase, and an enhanced architecture to facilitate extensibility and compatibility with diverse formats. Patch releases followed, including 4.0.1 on March 29, 2024, for 5.38 compatibility and issue resolutions, and the latest 4.0.2 on August 30, 2025, incorporating bug fixes, support for 5.42, and a new Redirector for streamlined redirection workflows. Overall, SpamAssassin's evolution has progressed from reliance on basic rules to sophisticated statistical methods like Bayesian filtering, driven by the ASF community's contributions and a focus on , , and adaptability to emerging threats.

Core Functionality

Operation and Scoring Mechanism

Apache SpamAssassin operates as a Perl-based tool, functioning either as a command-line utility via the spamassassin executable or as a daemon through spamd paired with the spamc client for efficient processing of multiple . It processes incoming emails by parsing their headers, body text, and attachments into a object, then applies a series of tests through its modular architecture to analyze content for indicators. This system allows integration of various evaluation methods, loading rules and configurations from standard directories such as /usr/share/spamassassin and site-specific files, enabling extensible and customizable scanning without requiring recompilation. The core scoring mechanism aggregates points from individual tests, where each rule or plugin—such as checks, Bayesian classifiers, or queries—assigns a numerical score upon matching characteristics. Scores are defined as positive (indicating likelihood) or negative (indicating legitimacy) real numbers or integers, with defaults of 1.0 for most tests and 0.01 for tests whose names begin with 'T_' (testing rules) if unspecified; a score of 0 effectively disables a test. The total score is the sum of all applicable hits, typically ranging from negative values for to positive accumulations for , though no strict bounds are enforced. Bayesian statistical filtering and -based tests contribute dynamically to this total, with scores adjustable based on whether these components are enabled. Filtering decisions are made by comparing the computed total score against a configurable , set by default to 5.0 via the required_score directive; messages reaching or exceeding this value are classified as , while those below are deemed , with no intermediate "uncertain" category unless custom thresholds are defined. Outcomes include the addition of diagnostic headers to the , such as X-Spam-Score displaying the numerical total (often symbolized with asterisks for , e.g., ***** for 5.0), X-Spam-Status listing hit tests and their scores, and X-Spam-Flag: YES for spam-marked messages to facilitate integration with mail transfer agents or user agents. These headers enable downstream actions like or rejection without altering the message body unless explicitly configured.

Configuration and Customization

Apache SpamAssassin is configured using traditional UNIX-style files, which allow users to the filter's behavior for site-wide or individual needs. The primary site-wide file is local.cf, typically located in /etc/mail/spamassassin/, where global settings such as scoring thresholds and network trusts are defined. For per-user customizations, the user_prefs file is used, usually placed in ~/.spamassassin/, enabling overrides of site-wide policies to accommodate personal patterns. Both files support directives in a key-value format, with comments denoted by #, and can include other files via the include directive for modular organization. Key directives in these files control core behaviors, such as the required_score directive, which sets the numerical for classifying an as (default 5.0), allowing adjustments like 4.0 for more aggressive filtering or 10.0 for conservative operation. The welcomelist_from directive (formerly whitelist_from, deprecated but interchangeable until version 4.1) specifies trusted sender addresses or domains (e.g., welcomelist_from [email protected] or welcomelist_from *@trustedisp.com), which receive negative scores to bypass detection. Conversely, blocklist_from (formerly blacklist_from, deprecated but interchangeable until version 4.1) adds senders to be automatically flagged (e.g., blocklist_from [email protected]), applying positive scores regardless of content. These directives can be layered in local.cf for broad application or refined in user_prefs for specificity. Customization extends to adjusting detection thresholds beyond scoring, such as enabling or disabling plugins with the loadplugin directive (e.g., loadplugin Mail::SpamAssassin::Plugin::SPF for sender verification) or toggling features like use_bayes 1 to activate the Bayesian classifier. The Bayes database, which learns from user-labeled emails, is trained using the sa-learn command-line tool; for example, sa-learn --spam /path/to/spam/folder classifies messages as spam, while sa-learn --ham /path/to/ham/folder trains on legitimate mail, with recommendations to use at least 1,000 examples each for reliable performance. Synchronization with --sync ensures database consistency, and --forget allows unlearning erroneous classifications. Policy application distinguishes between site-wide enforcement in local.cf, which sets defaults like global whitelists applicable to all users, and user-specific overrides in user_prefs, where individuals can personalize thresholds or lists without affecting others. This hierarchical approach ensures administrative control while supporting user autonomy, with site policies loaded first and user files overriding them during runtime. Integration with external collaborative tools, such as Razor for distributed spam signature checking, is achieved by loading the Razor2 plugin via loadplugin Mail::SpamAssassin::Plugin::Razor2 in the configuration file, provided the Razor2 Perl module is installed; additional settings like razor_timeout 5 control query limits to balance speed and accuracy. Best practices for configuration emphasize a minimal initial setup focused on essential directives like internal_networks to define trusted internal IPs, preventing false positives on legitimate mail relays, followed by enabling Bayes training for improved accuracy over time. For basic use, defaults suffice with periodic sa-learn sessions on curated datasets; advanced tuning for high-volume servers involves monitoring false positives/negatives to iteratively adjust required_score, testing changes with spamassassin --lint for syntax validation, and using trusted_networks to specify safe external hosts. Regular updates via sa-update and modular includes keep configurations maintainable without over-customization.

Spam Detection Techniques

Since version 4.0.0, Apache SpamAssassin includes full native UTF-8 support, enhancing the accuracy of spam detection techniques for emails containing international characters and encodings.

Heuristic Rule-Based Filtering

Apache SpamAssassin employs heuristic rule-based filtering through a collection of predefined rules that analyze email headers, body content, and structural elements to identify spam characteristics. These rules are stored in plain-text configuration files with the .cf extension, such as 50_scores.cf, which is part of the official rules distribution and primarily handles score assignments for various tests. Each rule typically consists of three key components: a descriptive comment or describe directive providing a human-readable explanation, a test match defined using directives like header, body, or uri along with regular expressions (regex) or evaluation conditions, and a score directive assigning a numerical value (positive for spam indicators or negative for ham) to contribute to the overall message evaluation. For instance, a rule might detect spam by matching the regex pattern /viagra/i in the body text, described as "Contains 'Viagra' in message body," and assigned a score of 1.5. The heuristics encompass diverse checks tailored to common spam traits. Header checks scrutinize fields like the From header for signs of forgery, such as mismatched domain patterns or suspicious sender formats, using regex like /From:.*mixed@fake\.com/i. Body pattern matching identifies textual anomalies, including excessive HTML tags that suggest automated generation, via tests like body HTML_MESSAGE /<html>.*<body>/i. URI evaluation targets suspicious links by applying rules to extracted URLs, such as flagging those matching /short\.ly\/spam/i or known phishing domains. Additional examples include examinations of MIME boundaries for irregular formatting indicative of obfuscation, with rules like mimeboundary BOUNDARY_STRANGE /boundary=.*[A-Z]{20,}/i, and subject line indicators for promotional phrases, such as header SUBJECT_SPAM Re: ~ /WIN \&#36;1000/i. These rule types enable deterministic pattern recognition without relying on probabilistic models. Rule updates are facilitated through the sa-update tool, which fetches the latest rules and configurations from official channels hosted at updates.spamassassin.org, ensuring the filter adapts to evolving spam tactics via periodic automated downloads. Administrators can also create custom rules in local .cf files, such as local.cf, to address site-specific threats like internal campaigns, by defining new tests and scores while overriding defaults if needed. This setup allows for granular control, with rules loaded from directories like /etc/mail/spamassassin. The primary strengths of this approach lie in its computational efficiency, as regex-based matching processes emails rapidly even on resource-constrained systems, and its high extensibility, permitting users to add or modify rules without recompiling the software. These heuristics integrate into SpamAssassin's broader scoring mechanism by accumulating points from matched rules to determine spam probability thresholds. Overall, this method provides a robust, rule-driven foundation for spam detection that remains effective against pattern-based threats.

Bayesian Statistical Filtering

Apache SpamAssassin incorporates a Bayesian statistical filtering component as a core technique for adaptive spam detection. This subsystem employs a to analyze content by extracting and evaluating , which serve as features representing patterns indicative of or legitimate mail (). The classifier computes probabilities based on the observed frequencies of these tokens in trained datasets, enabling the system to assign a spamminess score that contributes to the overall message evaluation. Unlike static rule-based methods, this approach allows SpamAssassin to evolve with changing spam tactics by learning from user-provided examples. In version 4.0.x, Bayesian filtering is implemented as a plugin, with improved handling for better tokenization of diverse languages. The implementation tokenizes email messages into discrete units, including individual words from the body text, character n-grams (short sequences of 3 to 5 characters to capture obfuscated terms like "v.i.a.g.r.a"), and elements from headers such as subject lines and sender fields. These tokens are processed through the bayes_tokenizer mechanism, which considers visible text, invisible elements (e.g., comments), URIs, and parts by default, as configurable via the bayes_token_sources directive. This comprehensive tokenization ensures that diverse linguistic and structural signals are captured for classification. Training occurs via the sa-learn command-line tool, which builds and updates the Bayesian database using labeled examples of and messages. Users invoke it with options like sa-learn --spam <spam_directory> for samples or sa-learn --ham <ham_directory> for legitimate ones, feeding the content through SpamAssassin's parser to extract and increment their counts. The database files, named bayes_toks, bayes_seen, and others, are stored by default in ~/.spamassassin/bayes_* for per-user , though global or SQL-based storage is possible for shared environments. Effective requires at least 200 and 200 examples to activate scoring; fewer yields no contribution from Bayes rules. Periodic retraining with fresh data helps maintain accuracy against evolving threats. For each token t, the classifier calculates the conditional probability P(\text{spam} \mid t) using observed hit counts with Laplace smoothing to handle sparse data and avoid zero probabilities: P(\text{spam} \mid t) = \frac{\text{spam_hits}(t) + 1}{\text{total_hits}(t) + 2} Here, \text{spam_hits}(t) is the number of spam messages containing t, and \text{total_hits}(t) is the combined count from spam and ham. The additive terms (1 for numerator, 2 for denominator) implement add-one smoothing, assuming uniform priors for the binary classes. These per-token probabilities are then aggregated across all extracted tokens (typically the top 15 most informative) using a Bayesian-like combination inspired by Bayes' theorem, often refined with a chi-square method to derive an overall spam probability. This score translates into BAYES_* rules (e.g., BAYES_99 for >95% spam probability), adding up to 3.0 points to the message total if highly indicative. The Bayesian filter's primary advantages lie in its adaptability to novel spam patterns without manual rule updates and its support for auto-whitelisting, where tokens with very low spam probabilities (e.g., <0.1) can exempt future messages from further scrutiny, reducing false positives. By personalizing to a user's mail corpus, it achieves high precision, often complementing heuristic rules for robust detection. However, it requires ongoing training to counter adversarial obfuscation techniques.

Network-Based Detection Methods

Apache SpamAssassin employs network-based detection methods to query external services and collaborative databases, enhancing its ability to identify through real-time lookups beyond local analysis. These methods involve DNS queries and connections to distributed networks, allowing the system to check sender reputations, email content signatures, and embedded URLs against shared intelligence from global contributors. By integrating these external checks, SpamAssassin can detect evolving patterns that might evade static rules, though such queries require connectivity and introduce potential . DNS-based methods form a cornerstone of SpamAssassin's network detection, primarily through queries to DNS Block Lists (DNSBLs), also known as Blackhole Lists (RBLs). When processing an , SpamAssassin extracts addresses from the message headers, such as the sender's originating or hops, and performs reverse DNS lookups by appending the (in reversed octet notation) to the DNSBL domain, for example, querying 2.0.198.127.zen.spamhaus.org for the Spamhaus Zen blacklist. A positive response, indicated by an A record (often with 127.0.0.x where x denotes the listing type), triggers a subsequent query for additional details like listing reasons or suggested scores. Services like Spamhaus provide comprehensive coverage against known spam sources, including open proxies and botnets, with s offering nuanced information to inform scoring decisions. Collaborative services extend detection by leveraging community-submitted data on patterns, focusing on fuzzy matching to identify variants of known without exact text matches. , developed by Vipul's , operates as a distributed network where users submit samples to generate cryptographic signatures (hashes) of message bodies or parts; SpamAssassin integrates this via its Razor2 plugin, querying servers with computed signatures to retrieve hit counts from prior reports, enabling detection of similar even if slightly altered. Pyzor complements this by computing fuzzy checksums of bodies—resistant to minor changes like word insertions—and querying Pyzor servers for the prevalence of those checksums in reported , with the plugin configurable to require a minimum report threshold for a match. Similarly, the Distributed Checksum Clearinghouse () uses body and envelope checksums to track bulk volumes, querying servers to assess if a message's signature appears in high volumes indicative of campaigns, thus catching widespread distributions that individual signatures might miss. These services collectively improve variant detection, with emphasizing signature-based collaboration, Pyzor focusing on body digests, and on volume correlation. URIBLs target URLs embedded in emails, a common vector for and , by extracting domains from hyperlinks in the message and querying specialized URI blacklists. SpamAssassin uses its URIDNSBL to resolve these domains via DNS lookups against services like URIBL (e.g., multi.uribl.com) or SURBL (surbl.org), where a listing returns an A record confirming the domain's association with sources such as hosts or fraudulent sites. This method scans both and parts, normalizing URLs to focus on third-level domains and higher, and supports multiple URIBL providers for broader coverage. By isolating URI checks, SpamAssassin can flag messages promoting malicious links independently of . Privacy considerations in these network methods prioritize minimal data exposure, as services like , Pyzor, and transmit only anonymized checksums or signatures rather than full content, reducing risks to user data while enabling . Participation in reporting to these networks is typically opt-in, allowing administrators to configure relays for submission only from trusted environments. To handle network unavailability or timeouts—such as DNS query failures after a configurable 15-second limit—SpamAssassin falls back to local tests without assigning network-based scores, ensuring continued operation without external dependencies. Caching nameservers are recommended to minimize repeated queries and respect service rate limits, further balancing effectiveness with resource efficiency.

Integration and Deployment

Usage Methods and Integration

Apache SpamAssassin can be invoked directly from the command line for standalone and validation. The spamassassin command processes messages from standard input or specified files, applying its rule set to score and tag potential based on tests. For instance, the --lint option performs a syntax check on files and rules without processing any , helping administrators verify setups before deployment. with local delivery agents like is achieved through simple recipes in the user's .procmailrc file, where incoming messages are piped to spamassassin or the faster spamc client for processing before or forwarding. The Milter protocol enables tighter coupling with mail transfer agents (MTAs) by allowing real-time header modifications and spam rejection during SMTP sessions, often via the spamass-milter . Server-side deployment typically involves integrating SpamAssassin with popular MTAs to filter incoming mail at scale. For Postfix, configuration uses the smtpd_milters directive in main.cf to invoke a milter like spamass-milter, enabling pre-queue scanning and rejection of high-scoring messages. supports similar integration through its native milter interface, where spamass-milter is specified in the sendmail.mc file to process mail during SMTP acceptance. can incorporate SpamAssassin via content scanning options in its configuration, such as the spam ACL condition, or through dedicated modules like SA-Exim for seamless rule application. On the , tools like can leverage SpamAssassin by trusting server-added headers (e.g., X-Spam-Status) for classification, with optional plugins such as SpamAssassin Coach allowing users to train Bayesian filters directly from the interface. SpamAssassin offers flexible deployment modes to suit varying workloads. In batch mode, the spamassassin command is executed per message, suitable for low-volume scripts or one-off where startup overhead is minimal. For high-throughput environments, the spamd daemon mode is preferred, running persistently to handle multiple concurrent requests via the spamc client, with tunable parameters like --max-children to optimize performance across dozens or hundreds of messages per second. Modern adaptations enhance its suitability for contemporary infrastructures; community-maintained images, such as those based on , facilitate containerized deployments for or environments, exposing the spamd port for integration. Native support, introduced in version 3.4.0, ensures compatibility with dual-stack networks by preferring IPv6 for DNS queries and network tests when available. In large-scale enterprise settings, SpamAssassin scales via clustered spamd instances and external databases for shared Bayesian learning, supporting thousands of users as seen in deployments by major ISPs and organizations.

Testing and Diagnostic Tools

Apache SpamAssassin provides several built-in command-line utilities for validating installations, training components, synchronizing rules, and debugging configurations. These tools enable administrators to test email scoring, check for errors, and analyze performance without affecting production environments. The spamassassin command in test mode (-t) processes a single email from standard input and outputs the spam score, hit rules, and diagnostic details to standard output, leaving the original message unchanged. This is particularly useful for verifying how specific messages are classified during setup or troubleshooting. For example, piping an email file with spamassassin -t < email.eml displays the total score and contributing rules, helping identify misconfigurations in rule weights or Bayesian data. Administrators are advised to use the Generic Test for Unsolicited Bulk Email (GTUBE) string—"XJSC4JDBQADN1.NSBN32IDNENGTUBE-STANDARD-ANTI-UBE-TEST-EMAILC.34X"—in test messages, as it triggers a predefined spam rule (GTUBE) with a score of 1000.0, ensuring consistent detection across installations. To validate configurations, the --lint option scans all loaded rule and preference files for syntax errors, undefined variables, or invalid directives, reporting issues without processing any . Running spamassassin --lint before deployment prevents runtime failures and is a recommended for maintaining integrity, especially after modifications. The sa-update utility fetches and installs updates to SpamAssassin's s, data, and plugins from official or channels, ensuring the remains current against evolving techniques. By , it verifies downloads using SHA-256/SHA-512 hashes and GPG signatures before , with options like --checkonly to preview availability without applying changes. Periodic execution, such as via jobs (e.g., daily at low-traffic times), is essential for optimal performance, as outdated rules can reduce detection accuracy. For training the Bayesian statistical filter, sa-learn processes collections of spam or ham messages to build or refine the token database, improving classification over time. Usage involves commands like sa-learn --spam /path/to/[spam](/page/Spam)/folder for spam examples or sa-learn --ham /path/to/[ham](/page/Ham)/folder for legitimate mail, supporting formats such as or . Best practices recommend an initial training set of at least 1000 spam and 1000 ham messages per user or globally, with ongoing training from user feedback (e.g., moving misclassified emails to dedicated folders and relearning them) to adapt to personal patterns; over 5000 examples yields . The tool skips duplicates by default and supports --forget to remove prior incorrect classifications. Log analysis is facilitated by sa-stats.pl, a Perl script that parses spamd syslog entries to generate reports on processed messages, average scores, hit rates, and performance metrics over specified intervals. Invoked as sa-stats.pl --logfile=/var/log/maillog --top=10, it outputs summaries like total emails scanned, spam percentage, and top-scoring rules, aiding in tuning thresholds or identifying underperforming rules. Regular analysis, such as weekly reviews, helps monitor system health and score breakdowns for optimization.

Licensing and Extensions

Open-Source Licensing

Apache SpamAssassin is distributed under the , , which permits users to freely use, modify, and distribute the software, including for commercial purposes, provided that appropriate attribution is given through retention of notices, inclusion of the text, and documentation of any changes made to the original code. This permissive ensures broad compatibility with other open-source projects and emphasizes royalty-free usage without requiring derivative works to adopt the same . Prior to its adoption by the Apache Software Foundation in , SpamAssassin was released under a dual license consisting of the GNU General Public License (GPL) and the Perl Artistic License, which offered more restrictive terms compared to the . With the release of version 3.0.0 in , coinciding with its entry into the , the project unified its licensing under the 2.0 to align with Foundation standards and enhance interoperability. As an project, contributions to SpamAssassin require participants to sign an Individual Contributor License Agreement (ICLA) or Corporate CLA, granting the rights to distribute and sublicense submitted code while allowing contributors to retain ownership. This governance model fosters a meritocratic , with the project maintained by a self-selected team of committers who oversee development. The license includes a standard no-warranty provision, disclaiming any guarantees of merchantability, fitness for a particular purpose, or non-infringement, which is typical for to limit liability. While the core software adheres to the , SpamAssassin incorporates third-party rulesets and , such as those from Spamhaus, which may impose additional terms; for instance, access to Spamhaus data via their requires compliance with their Data Query Service policy for free non-commercial use or a paid subscription for higher volumes. Users must review these external licenses to ensure full compliance when integrating such components.

Specialized Components and Extensions

Apache SpamAssassin includes specialized utility programs that enhance its core functionality, particularly for performance optimization and modular extensions. One key tool is sa-compile, which pre-compiles the site's Perl-based rules into native code using the re2c lexical analyzer generator. This process targets site-wide rulesets, excluding user-specific preferences, and generates optimized .pmc files stored in a designated update directory, such as /var/lib/spamassassin/compiled/. To build these files, administrators run sa-compile --build, which requires a compiler and the re2c tool; the resulting code leverages deterministic finite automata (DFA) for faster string matching during message scanning. On high-load servers, this compilation significantly reduces CPU usage by accelerating the evaluation of body and header rules through the Mail::SpamAssassin::Plugin::Rule2XSBody plugin, which must be explicitly loaded in the v320.pre . However, after updating rules, the .pmc files must be manually rebuilt, and the spamd daemon restarted to apply changes, as there is no automatic reloading mechanism. SpamAssassin supports a range of plugins as modular extensions to add specialized detection capabilities. Introduced in version 4.0.2, the Mail::SpamAssassin::Plugin::Redirectors plugin identifies URLs in messages that have been shortened or redirected via common services, enabling or flagging of potentially obfuscated links to improve spam detection accuracy. Other plugins facilitate integration with external tools, such as the plugin, which submits emails to a local Clam AntiVirus server for scanning and adds scores if is detected. These plugins are loaded via configuration directives like loadplugin in .cf files, allowing selective enabling based on system needs. The extensions ecosystem for SpamAssassin revolves around custom modules that users can develop and integrate as , extending the core without modifying the main codebase. Installation involves placing the .pm module file in directories like /usr/share/perl5/[Mail](/page/Mail)/SpamAssassin/[Plugin](/page/Plug-in)/ or /etc/mail/spamassassin/, followed by a loadplugin directive in a to register it during initialization. Developers hook into SpamAssassin's to register evaluation rules, parse headers, or process message bodies at specific stages, such as during check_start or check_end callbacks. Representative examples include plugins for advanced message decoding to extract embedded content for analysis, and those implementing fuzzy hashing techniques to compare message digests against collaborative databases for near-duplicate identification. This modular approach fosters community contributions, with third-party plugins available under various licenses, though they require testing via spamassassin --lint to ensure compatibility and rule efficacy.

References

  1. [1]
    Apache SpamAssassin: Welcome
    Apache SpamAssassin is the #1 Open Source anti-spam platform giving system administrators a filter to classify email and block spam (unsolicited bulk email).Documentation · Downloads · Mail::SpamAssassin · StartUsing
  2. [2]
    extensible email filter used to identify spam - Apache SpamAssassin
    SpamAssassin is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as "spam". These tests ...
  3. [3]
    SpamAssassin 3.4.3 - Apache's svn
    SpamAssassin was originally created by Justin Mason, who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic ...
  4. [4]
    board_minutes_2024_03_20.txt - Apache Whimsy
    Mar 20, 2024 · Issues for the board: None ## Membership Data: Apache SpamAssassin was founded 2004-06-01 (20 years ago) There are currently 32 committers and ...Missing: creator | Show results with:creator<|control11|><|separator|>
  5. [5]
    News and Announcements - Apache SpamAssassin
    Aug 30, 2025 · 2006-02-21: Receiving three times as many votes as the closest contender, Apache SpamAssassin took top honors in the Anti-Spam category of ...
  6. [6]
    SpamAssassin Prehistory: filter.plx
    Unfortunately the original site at http://antispam.schmooze.net/filter/ is no longer up, but the Internet Archive has a snapshot of it from December 1998 here.
  7. [7]
    [PDF] SpamAssassin
    Sep 16, 2009 · SpamAssassin is a mail filter and programming interface that identifies junk e-mail. And last week, eWeek chose SpamAssassin as one of the ...Missing: history | Show results with:history
  8. [8]
    SpamAssassin - The Apache Incubator
    SpamAssassin(tm) is a mail filter to identify spam. Using its rule base, it uses a wide range of heuristic tests on mail headers and body text to identify spam.
  9. [9]
    Apache SpamAssassin: The Enduring Legacy of Anti-Spam Mastery.
    Oct 23, 2019 · SpamAssassin was created by Justin Mason, a software engineer who had maintained a number of patches against an earlier program named filter.plx ...Missing: origin creator
  10. [10]
    LWN.net Weekly Edition for September 12, 2002
    ... SpamAssassin and to bogofilter, a new Bayesian filter written by Eric Raymond. Bogofilter was tested twice: once after training with 15% of the 3000 ...
  11. [11]
    Apache Software Foundation Announces the Release of ...
    The Apache Software Foundation is pleased to announce the release of SpamAssassin 3.0.
  12. [12]
  13. [13]
    None
    ### Summary of SpamAssassin Operations
  14. [14]
    Mail::SpamAssassin::Conf
    If no score is given for a test by the end of the configuration, a default score is assigned: a score of 1.0 is used for all tests, except those whose names ...
  15. [15]
    Mail::SpamAssassin::Conf
    SpamAssassin is configured using traditional UNIX-style configuration files, loaded from the /usr/share/spamassassin and /etc/mail/spamassassin directories.
  16. [16]
    sa-learn - Apache SpamAssassin
    Given a typical selection of your incoming mail classified as spam or ham (non-spam), this tool will feed each mail to SpamAssassin, allowing it to 'learn' what ...
  17. [17]
    Mail::SpamAssassin::Plugin::Razor2
    Razor2 calculates a signature for each part of a multipart message and then compares those signatures to a database of known spam signatures. The server returns ...
  18. [18]
  19. [19]
    50_scores.cf - Apache's svn
    # The ASF licenses this file to you under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License.
  20. [20]
    sa-update - automate SpamAssassin rule updates
    The default channel is updates.spamassassin.org, which has updated rules since the previous release. Update archives are verified by default using SHA1 hashes ...Missing: history | Show results with:history
  21. [21]
    Bayes - determine spammishness using a Bayesian classifier
    DESCRIPTION. This is a Bayesian-like form of probability-analysis classification, using an algorithm based on the one detailed in Paul Graham's A Plan For Spam ...Missing: formula source code
  22. [22]
    BayesInSpamAssassin - Apache
    The Bayesian classifier in Spamassassin tries to identify spam by looking at what are called tokens; words or short character sequences that are commonly found ...Missing: grams | Show results with:grams<|separator|>
  23. [23]
    sa-learn - train SpamAssassin's Bayesian classifier
    This tool will feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham.Missing: 2.0 | Show results with:2.0
  24. [24]
    A Statistical Approach to the Spam Problem | Linux Journal
    ### Summary of Bayesian Spam Filter Implementation
  25. [25]
  26. [26]
    Using network tests to increase accuracy
    Feb 18, 2005 · SpamAssassin supports several optional components to increase accuracy. Along with Bayes training, its set of 'network tests' is key if you want ...Missing: privacy | Show results with:privacy
  27. [27]
    DNS Blocklists - Confluence Mobile - Apache Software Foundation
    DNS Blocklists are a common form of network-accessible database used in spam detection. They're also referred to as "DNSBLs", "DNS Blacklists" and "RBLs".Missing: integration | Show results with:integration
  28. [28]
    Conf - SpamAssassin configuration file
    SpamAssassin is configured using traditional UNIX-style configuration files, loaded from the /usr/share/spamassassin and /etc/mail/spamassassin directories.SYNOPSIS · DESCRIPTION · USER PREFERENCES · RULE DEFINITIONS AND...
  29. [29]
    SpamAssassin::Plugin::Razor2 - perform Razor check of messages
    Vipul's Razor is a distributed, collaborative, spam detection and filtering network based on user submissions of spam. Detection is done with signatures that ...
  30. [30]
    RazorSiteWide - SPAMASSASSIN - Apache Software Foundation
    Another handy option to take care of this is to pass spamd the -H option with /etc/mail/spamassassin as the argument. This sets all helper apps' (dccproc/pyzor/ ...Missing: history | Show results with:history
  31. [31]
    Mail::SpamAssassin::Plugin::Pyzor
    This option sets how often a message's body checksum must have been reported to the Pyzor server before SpamAssassin will consider the Pyzor check as matched.Missing: integration | Show results with:integration
  32. [32]
    UsingPyzor - SPAMASSASSIN - Apache Software Foundation
    As of SpamAssassin version 4.0.1, a simplified perl version of the Pyzor client is included that should work if the official Pyzor client is not installed. It ...Missing: early history
  33. [33]
    Mail::SpamAssassin::Plugin::DCC - perform DCC check of messages
    The counts can be used by SpamAssassin to detect and reject or filter spam. Because simplistic checksums of spam can be easily defeated, the main DCC checksums ...Missing: integration | Show results with:integration
  34. [34]
    UsingDcc - Confluence Mobile - Apache Software Foundation
    If you have installed DCC, and are running spamassassin or spamd without the "-L" switch, you should be seeing hits in spam reports. Check your firewall setup.
  35. [35]
    URIDNSBL - look up URLs against DNS blocklists
    This works by analysing message text and HTML for URLs, extracting the domain names from those, querying their NS records in DNS, resolving the hostnames used ...
  36. [36]
    Mail::SpamAssassin::Plugin::URIDNSBL
    By default, SpamAssassin will run URI DNSBL checks. Individual URI blocklists may be disabled selectively by setting a score of a corresponding rule to 0 or ...
  37. [37]
    SpamAssassin Usage - URIBL.COM
    SpamAssassin includes support for URIBL by default. Please make sure you are on a recent version of SpamAssassin. You will find the following rules in the ...
  38. [38]
    Mail::SpamAssassin::Plugin::DCC
    The counts can be used by SpamAssassin to detect and filter spam. See https://www.dcc-servers.net/dcc/ for more information about DCC. Note that DCC is disabled ...Missing: integration | Show results with:integration
  39. [39]
    CachingNameserver - SPAMASSASSIN - Apache
    SpamAssassin will perform many DNS lookups for NetworkTests to significantly improve scoring of messages primarily by DNSBlocklists like Spamhaus, SORBS, etc.Missing: RBL integration
  40. [40]
    Documentation - Apache SpamAssassin
    Documentation. Top-level README file. Top-level INSTALL file, for notes on how to install. Top-level UPGRADE file, for notes on how to upgrade.
  41. [41]
    The GTUBE - Apache SpamAssassin
    The GTUBE provides a test by which you can verify that the filter is installed correctly and is detecting incoming spam.Missing: genents | Show results with:genents
  42. [42]
    sa-stats.pl - Apache SpamAssassin
    Are there spamd entries in your mail log within the analysis interval? Run 'sa-stats.pl -v -D ...' to see the entries that are found and discarded as well as to ...Missing: mailstat | Show results with:mailstat
  43. [43]
    None
    ### Summary of Apache License 2.0 (SpamAssassin LICENSE)
  44. [44]
  45. [45]
    SpamAssassin Switches to Apache License - LinuxInsider
    “Apache focuses on having its licensing be true to open source,” he said. Stronger Firepower. SpamAssassin is an extensible e-mail filter that uses a ...
  46. [46]
    SpamAssassin 3.0.0-rc5 RELEASED - Apache Mail Archives
    Sep 13, 2004 · ... SpamAssassin is now part of the Apache Software Foundation and has an improved software license, the 2.0 version of the Apache License. - ...Missing: transition | Show results with:transition
  47. [47]
    ASF Contributor Agreements - The Apache Software Foundation
    All contributors of ideas, code, or documentation to any Apache projects must complete, sign, and submit via email an Individual Contributor License Agreement ...Missing: SpamAssassin | Show results with:SpamAssassin
  48. [48]
    SpamAssassin CustomRulesets - The Apache Software Foundation
    FORMERLY an automatically-generated ruleset which seeks good rules directly from the SpamAssassin spamtraps. Created by: Justin Mason Contact: jm@jmason.org
  49. [49]
  50. [50]
    sa-compile - Apache SpamAssassin
    DESCRIPTION. sa-compile uses re2c to compile the site-wide parts of the SpamAssassin ruleset. No part of user_prefs or any files included from user_prefs can ...
  51. [51]
    The ClamAV Plugin - Apache
    This plugin submits the entire email to a locally running Clam AntiVirus server for virus detection. If a virus is found, it returns a positive return code to ...Missing: integration | Show results with:integration
  52. [52]
    CustomPlugins - Confluence Mobile - Apache Software Foundation
    Note that only SpamAssassin 3.0.0 or later can use these. To use these, just place the file(s) in your SpamAssassin plugin directory, which may be /usr/ ...