Fact-checked by Grok 2 weeks ago

Okapi BM25

Okapi BM25 is a probabilistic ranking function employed in information retrieval systems to assess the relevance of documents to a given search query by computing a score that balances term frequency, inverse document frequency, and document length normalization. Developed as part of the Okapi experimental testbed at City University London, it builds on the binary independence model from the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others, incorporating refinements to handle non-binary term occurrences and length biases effectively. The acronym "BM" denotes "Best Matching," while "25" signifies the iteration number in a sequence of weighting schemes devised by Stephen E. Robertson and collaborators. The core formula of Okapi BM25 sums, over each query term q, an inverse frequency (IDF) term multiplied by a saturated term (TF) factor adjusted for length:
\sum_{q \in Q} \mathrm{IDF}(q) \cdot \frac{\mathrm{TF}(q,d) \cdot (k_1 + 1)}{\mathrm{TF}(q,d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{\mathrm{avgdl}})}
where \mathrm{TF}(q,d) is the of q in d, |d| is the of d, \mathrm{avgdl} is the average length, and k_1 (typically 1.2–2.0) and b (often 0.75) are tunable parameters controlling and , respectively. This formulation addresses limitations in earlier models like BM15 by introducing a non-linear TF to prevent overemphasis on repeated terms and a pivot-based to favor neither short nor overly long . Extensions such as BM25F adapt it for structured by weighting multiple fields (e.g., , ).
Since its introduction in 1994 at the Text REtrieval Conference (TREC-3), Okapi BM25 has demonstrated superior performance in empirical evaluations, outperforming models and other probabilistic approaches on diverse corpora. It remains the de facto standard for lexical ranking in production search engines, including and , due to its efficiency, interpretability, and robustness across languages and domains. Open-source implementations in toolkits like , Xapian, and MG4J have further facilitated its widespread adoption in academic and industrial applications, influencing hybrid systems that combine it with neural retrieval methods.

Introduction

Overview and Purpose

Okapi BM25 is a probabilistic ranking function employed in information retrieval systems to assess the relevance of documents to a given query, operating as a bag-of-words retrieval model that treats documents and queries as collections of independent terms. Developed as part of the Okapi information retrieval system at City University London during the 1980s and 1990s, BM25 builds upon foundational concepts like term frequency and inverse document frequency while introducing enhancements such as term saturation to mitigate the disproportionate influence of highly frequent terms and document length normalization to account for variations in document size. The primary purpose of BM25 in search engines is to compute a relevance score for each document based on the presence and frequency of query terms, enabling the effective ranking of results in descending order of estimated relevance. This scoring mechanism supports the Probability Ranking Principle, which posits that documents should be presented to users in order of decreasing probability of relevance to the query, thereby improving retrieval precision and user satisfaction in large-scale collections. BM25 addresses key challenges in , including the handling of varying document lengths—where longer documents might otherwise be unfairly favored—and the saturation of term frequencies, preventing repeated occurrences of from linearly inflating scores beyond a point of . By incorporating these adjustments, BM25 provides a more balanced of compared to simpler models like TF-IDF, where term frequency and inverse document frequency serve as core building blocks without such refinements.

Historical Background

The probabilistic foundations of BM25 emerged from early work in information retrieval during the 1970s and 1980s, particularly the binary independence model developed by Stephen E. Robertson and Karen Spärck Jones, which introduced relevance-based term weighting under assumptions of term independence. This framework was extended by Robertson, C. J. van Rijsbergen, and M. F. Porter through explorations of term frequency distributions, including the 2-Poisson model, which modeled document terms as mixtures of "elite" (relevant) and non-elite distributions to better approximate retrieval probabilities. These advancements built on prior probabilistic models, emphasizing the use of collection statistics for ranking without requiring full relevance judgments. In the 1980s, Robertson and colleagues at City University London implemented these principles in the system, an experimental platform initially designed for bibliographic databases and probabilistic ranking. The system's early weighting schemes, such as BM1, were tested in the inaugural Text REtrieval Conference (TREC-1) in 1992, focusing on basic relevance weights derived from the Robertson-Spärck Jones formula. By TREC-2 in 1993, the team introduced refined models like BM11 and BM15, which incorporated within-document term frequency saturation and length normalization to address limitations in handling variable document sizes and repeated terms. The culmination of this evolution came in 1994 with the formalization of BM25, a of BM11 and BM15 that balanced term frequency effects with tunable parameters for improved retrieval , as detailed by Robertson and Steve Walker in their SIGIR paper and validated through TREC-3 evaluations. This version was quickly adopted within probabilistic frameworks, marking a milestone in practical term weighting and influencing subsequent large-scale search systems.

Core Components

Term Frequency Saturation

In the Okapi BM25 ranking function, term frequency saturation addresses the observation that repeated occurrences of a query term within a document contribute diminishing marginal relevance beyond an initial point, preventing linear scaling that could overly favor documents with excessive repetitions. This non-linear approach models the intuition that while higher term frequency generally indicates greater relevance, additional matches yield progressively less informational value, as supported by empirical analyses in probabilistic retrieval models. The mathematical form of the term frequency component incorporates through the expression: \frac{(k_1 + 1) f}{f + k_1 \left(1 - b + b \frac{\mathrm{doc\_len}}{\mathrm{avg\_doc\_len}}\right)} where f denotes the raw term frequency in the , k_1 (typically 1.2–2.0) is a tunable controlling the rate, b (often 0.75) modulates length effects, \mathrm{doc\_len} is the length, and \mathrm{avg\_doc\_len} is the collection's average length. As f increases, the fraction approaches k_1 + 1 asymptotically, capping the contribution regardless of further repetitions. This design stems from the 2-Poisson distribution assumption in early probabilistic models, where term occurrences follow an S-shaped curve reflecting elite documents with higher baseline frequencies, validated through TREC evaluations showing superior performance over unsaturated TF-IDF variants. For instance, a query term appearing once in a might yield a TF score of 1 (assuming k_1 = 2 and unit length ), while 10 occurrences yield approximately 2.5, limiting score inflation compared to linear TF which would multiply by 10 and capping toward 3. This saturation enhances ranking precision by emphasizing topical relevance over sheer volume, distinct from inter-document rarity captured by .

Inverse Document Frequency

The inverse document frequency (IDF) in Okapi BM25 quantifies the rarity of a q across the entire document collection, serving as a key weighting factor in the ranking function. It is formally defined as \text{IDF}(q) = \log \left( \frac{N - n(q) + 0.5}{n(q) + 0.5} \right), where N represents the total number of s in the collection, and n(q) denotes the number of s containing the q. This logarithmic scaling ensures that the IDF value increases as the becomes rarer, thereby assigning higher weights to s that appear in fewer s. The primary purpose of IDF is to diminish the influence of common terms, such as stop words like "the" or "and," which occur frequently across the corpus and offer little discriminatory power in distinguishing relevant documents from irrelevant ones. Conversely, it amplifies the contribution of rare or specific terms, which are more likely to indicate to a particular query, enhancing the overall of retrieval results. In BM25, the IDF formula incorporates a specific adjustment by adding 0.5 to both the numerator and denominator, which acts as a mechanism to prevent when n(q) = 0 and to avoid negative or undefined values in edge cases, such as when a term appears in nearly all or no documents. This modification provides while approximating the classical IDF without requiring relevance information from the collection. This formulation builds directly on Karen Spärck Jones's foundational work, which introduced within a probabilistic retrieval framework to measure term specificity as the logarithm of the ratio of total to containing documents, but BM25 adapts it for practical, non-relevance-based weighting in the Robertson-Spärck Jones probabilistic model.

Document Length Normalization

Document length normalization in BM25 addresses potential biases arising from varying document sizes by scaling the contribution of term frequency relative to the document's length compared to the collection average. This is achieved through a normalization factor incorporated into the denominator of the term weighting component, given by (1 - b + b \cdot \frac{dl}{avdl}), where dl denotes the length of the document in question, avdl is the average document length across the entire collection, and b is a tunable . The parameter b, constrained to the range [0, 1], governs the intensity of this : when b = 0, is entirely disregarded, treating all documents equally regardless of size; when b = 1, full is applied, effectively frequencies proportionally to . Intermediate values, often empirically set between 0.5 and 0.8, provide a balanced adjustment. This mechanism stems from the need to counteract the "verbosity hypothesis," where longer documents might otherwise receive undue advantage due to more opportunities for occurrences, while ensuring shorter documents with relevant terms are not overly penalized. Empirical on TREC datasets has validated this approach, demonstrating improved retrieval by mitigating length-based distortions in ranking. For instance, consider a long with a moderate of a query ; its score will be downweighted more substantially than that of an average-length exhibiting similar density, promoting fairness in assessment.

Ranking Function

Basic BM25 Formula

The BM25 ranking function computes a relevance score for a D given a query Q, by aggregating contributions from each query q_i based on their weighted occurrences in the . This function integrates saturation to diminish the impact of repeated , to emphasize rarity, and to penalize overly long relative to the collection average. The resulting score prioritizes that match query proportionally without overemphasizing or extremes. The complete basic BM25 formula is given by: \text{Score}(D, Q) = \sum_{i} \text{[IDF](/page/IDF)}(q_i) \cdot \frac{f(q_i, D) \cdot (k_1 + 1)}{f(q_i, D) + k_1 \cdot \left(1 - b + b \cdot \frac{|D|}{\text{[avgdl](/page/average)}}\right)} where f(q_i, D) is the of q_i in D, |D| is the of D in terms, \text{avgdl} is the in the collection, \text{[IDF](/page/IDF)}(q_i) is the inverse of q_i, and k_1 > 0 and $0 \leq b \leq 1 are tunable parameters controlling and , respectively. This formula arises from multiplying, for each query term, the IDF weight by a saturated term frequency component \frac{f(q_i, D) \cdot (k_1 + 1)}{f(q_i, D) + k_1 \cdot \left(1 - b + b \cdot \frac{|D|}{\text{avgdl}}\right)}, which caps the influence of high frequencies while adjusting for document length via the denominator's term, and then summing these products across all unique terms in Q. The treats the overall score as an additive combination of per-term contributions, reflecting the of to model on frequency, rarity via IDF, and length to ensure fair comparison across varying document sizes. For multi-term queries, BM25 assumes term independence, computing the score as the sum over individual terms without modeling interactions like co-occurrence or order; while phrase-specific adjustments exist in extensions, the basic form relies solely on this additive bag-of-words aggregation. The underlying model further assumes a bag-of-words representation, ignoring positional information and treating documents as unordered multisets of terms to simplify computation and focus on frequency-based relevance.

Parameter Tuning

The parameter k_1 in BM25 controls the of term frequency contributions, with higher values permitting a more linear increase in scoring as term frequency grows, while lower values emphasize to reduce the impact of repeated terms. Typical values for k_1 range from 1.2 to 2.0, as these have been found effective across various collections in empirical evaluations. The parameter b governs the degree of document length normalization, where values closer to 1 apply stronger penalties to longer , and a common default is 0.75 to balance without over-penalizing extended content. Tuning k_1 and b typically involves empirical optimization using grid search or multidimensional methods over ranges like k_1 \in [0.5, 2.0] and b \in [0.3, 0.9], evaluated on judgments from test collections such as TREC datasets. Performance is assessed via information retrieval metrics including Mean Average (MAP) and Normalized (NDCG), often through cross-validation to ensure generalizability across queries. These processes can be computationally intensive, sometimes requiring weeks for large query sets, prompting alternatives like gradient-based optimization for faster convergence. Domain-specific adjustments are essential, as optimal parameters vary by characteristics; for sparse corpora with low frequencies, lower k_1 values (e.g., around 1.0) minimize overemphasis on repetitions. In collections with highly varied document lengths, higher b values (e.g., 0.75–0.9) enhance to prevent toward longer documents. Such tuning has been shown to improve retrieval performance in field-specific applications like web search.

Interpretations

Probabilistic Basis

The Okapi BM25 ranking function originates from the probabilistic retrieval framework, which models document relevance as the probability that a document satisfies a user's information need given a query. This approach, pioneered in the 1970s, posits that documents should be ranked in decreasing order of their estimated relevance probability to optimize retrieval effectiveness. A foundational element is the Binary Independence Model (BIM), developed by Stephen E. Robertson and Karen Spärck Jones, which assumes that term presence or absence in a document is independent of other terms, conditional on relevance, and treats relevance judgments as binary (relevant or non-relevant). The BIM derives relevance weights from the distribution of terms in relevant versus non-relevant documents, providing an early probabilistic basis for term weighting in information retrieval systems. BM25 extends this foundation by incorporating term frequencies, moving beyond the binary assumptions of the BIM through the 2-Poisson model, which posits that term occurrences in relevant and non-relevant documents follow two distinct Poisson distributions to capture "eliteness" (the term's contribution to relevance). Specifically, BM25 is derived as an approximation to maximize the likelihood of relevance given observed term occurrences in the query and document, under the key assumption of term independence conditional on relevance status. This derivation simplifies the full probabilistic computation by focusing on the log-odds ratio of relevance, expressed as \log \frac{P(R=1 \mid Q, D)}{P(R=0 \mid Q, D)}, where R is the relevance indicator, Q the query, and D the document; the model estimates this ratio using empirical term statistics from a collection. The resulting score balances the evidence from term matches while avoiding exact Bayesian computations, which would be computationally intensive. Despite its strengths, the probabilistic basis of BM25 has limitations, notably the assumption of term independence, which ignores potential dependencies between terms that could influence relevance in real-world corpora. However, these simplifications have proven empirically robust, with BM25 demonstrating strong performance across diverse retrieval tasks due to its effective approximation of probabilistic relevance signals. This robustness stems from the framework's grounding in the Probability Ranking Principle, articulated by Robertson and van Rijsbergen, which justifies ordering documents by relevance probability for optimal user utility in non-interactive settings.

Information-Theoretic IDF

The inverse frequency (IDF) component in BM25 can be interpreted through the lens of as a measure of or self-information associated with a term's occurrence across the document collection. Specifically, the term \log \frac{N}{n(q)}, where N is the total number of documents and n(q) is the number of documents containing the query term q, approximates the self-information of the event that a document contains q. This self-information, defined as -\log P(q), quantifies the unexpectedness of the term's presence, with P(q) = \frac{n(q)}{N} representing the probability that a randomly selected document includes q.00021-3) This interpretation positions rare terms—those with low n(q)—as carrying greater informational value regarding a document's potential to the query, as their occurrence reduces more substantially than common terms. In essence, reflects the negative of a modeling term presence, aligning with Shannon's concept of as a measure of in the collection. By weighting rarer terms higher, BM25 emphasizes features that distinguish relevant documents from the broader , enhancing retrieval precision.00021-3) To adapt this for practical use, BM25 modifies the classic IDF with additive smoothing: \log \frac{N - n(q) + 0.5}{n(q) + 0.5}. This +0.5 adjustment, akin to Laplace smoothing or add-one priors in probabilistic estimation, incorporates pseudo-counts to stabilize the logarithm, preventing undefined values when n(q) = 0 or extreme biases in sparse data scenarios. It thereby reduces sensitivity to small sample sizes in the corpus. Compared to the classic IDF \log \frac{N}{n(q)}, BM25's smoothed variant proves more robust to variations in collection size, as the priors mitigate overestimation of rarity in small or imbalanced datasets while preserving the core information-theoretic weighting. This adjustment maintains the surprise-based without introducing undue in real-world search systems.

Variants and Extensions

Field-Based Variants

BM25F is a variant of BM25 designed for structured documents with multiple fields, such as title, body, and anchors in web pages. It extends the original formula by applying separate length normalization and tunable weights to each field, allowing the model to emphasize more important fields (e.g., higher weight for titles). The score is a weighted sum over fields:
\sum_{i=1}^{n} w_i \cdot \sum_{q \in Q} \mathrm{IDF}(q) \cdot \frac{\mathrm{TF}_i(q,d) \cdot (k_{1i} + 1)}{\mathrm{TF}_i(q,d) + k_{1i} \cdot (1 - b_i + b_i \cdot \frac{|d_i|}{\mathrm{avgdl}_i})}
where subscript i denotes the field, w_i is the field weight (often >1 for key fields like title), and parameters can be field-specific. This addresses limitations of uniform treatment in standard BM25 for heterogeneous content.

Frequency-Enhanced Variants

Frequency-enhanced variants of the Okapi BM25 ranking function address limitations in the original model's handling of term frequency (TF), particularly when terms appear infrequently or not at all in a document. These modifications introduce adjustments to the TF normalization component to provide a lower bound, preventing the contribution of a term from becoming effectively negative or overly suppressed due to document length effects. By doing so, they enhance retrieval performance, especially for queries with many terms where standard BM25 may undervalue documents with partial matches. A prominent example is BM25+, proposed by Lv and Zhai in 2011. This variant modifies the TF saturation formula in BM25 by adding a baseline boost parameterized by δ (typically set to 1.0), ensuring that even low or zero TF values contribute a minimal positive score rather than being completely discounted or penalized below the absent-term baseline. The updated TF component is the standard BM25 TF plus δ:
\frac{\mathrm{TF}(q,d) \cdot (k_1 + 1)}{\mathrm{TF}(q,d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{\mathrm{avgdl}})} + \delta
where TF(q,d) = f is the raw term frequency in document d, |d| is the document length, avgdl is the average document length, and k₁ and b are the standard BM25 parameters. The full term contribution is then IDF(q) times this value. This lower-bounding mechanism mitigates the issue in the original BM25 where the normalized TF could drop below 1 for low f in long documents, effectively making the term's contribution less than if it were absent altogether.
The rationale for BM25+ stems from axiomatic analysis revealing that unconstrained TF normalization in BM25 violates desirable properties, such as non-monotonicity in document length, leading to suboptimal scores for documents with sparse term occurrences. In the original BM25, a term absent from a document yields zero contribution, but for present terms with low frequency in long documents, the normalization can suppress the score disproportionately; BM25+ counters this by enforcing a floor, which is particularly beneficial for verbose queries containing rare or partially matching terms. Evaluations demonstrate BM25+'s effectiveness over standard BM25. On TREC collections (e.g., WT10G, WT2G, Terabyte, and Robust 2004), BM25+ achieved mean average precision () improvements of approximately 5-10% overall, with gains up to 15-20% on verbose query subsets (queries with more than 5 terms), while maintaining or slightly enhancing performance on short queries. These results highlight its robustness without increasing computational overhead, as the modification integrates seamlessly into existing BM25 implementations.

Applications

Use in Search Systems

BM25 has been the default ranking function in Elasticsearch since version 5.0, released in 2016, and in Solr since the adoption of Lucene 6.0 in the same year, enabling robust keyword-based retrieval in these widely used open-source search platforms. In Vespa, BM25 is supported as a dedicated feature for first-phase , often integrated into custom profiles for production-scale applications. This adoption since the has made BM25 a cornerstone for in systems, where it powers efficient scoring of large corpora. Open-source implementations of BM25 are prominently featured in Apache Lucene's BM25Similarity class, which provides the core computation for term frequency saturation, inverse document frequency, and length normalization, serving as the foundation for and Solr. Tuning BM25 for domain-specific corpora is facilitated through plugins like the (LTR) module, available in both and Solr, which allows machine-learned adjustments to parameters such as k1 and b using relevance judgments from user interactions or . In performance evaluations, BM25 demonstrates effectiveness for keyword-based retrieval, significantly outperforming traditional TF-IDF models; for instance, in a on articles, BM25 reduced the percentage of queries failing to retrieve relevant documents in the top 10 results from 57.7% under the default TF-IDF similarity to 16.0%. Such improvements highlight BM25's ability to handle term saturation and document length, making it a reliable in benchmarks like those from the Text REtrieval Conference (TREC). As of 2025, BM25 is increasingly combined with neural rerankers in hybrid retrieval pipelines at major search engines, including and , where it provides lexical matching for initial candidate generation before neural models refine rankings based on semantic embeddings for enhanced relevance in diverse queries. This integration leverages BM25's speed and precision for sparse retrieval alongside dense vector search, improving overall system performance in production environments.

Comparisons with Alternatives

BM25 offers notable advantages over the classical TF-IDF weighting scheme through its incorporation of term frequency saturation via the k_1 parameter and explicit document length normalization via the b parameter, which mitigate biases toward longer documents and diminishing returns on repeated terms. These enhancements result in significant improvements in mean average precision (MAP) in comparative evaluations on benchmark datasets, such as TREC collections, where TF-IDF often overemphasizes raw term counts without such controls. However, like TF-IDF, BM25 remains a bag-of-words model lacking positional information, which limits its ability to capture phrase-level or structural relevance in documents. In contrast to neural ranking models such as -based dense retrievers, BM25 excels in computational efficiency and interpretability, making it ideal for first-stage retrieval in large-scale systems where speed is paramount. On the MS MARCO dataset, BM25 achieves an MRR@10 of approximately 0.18, while BERT variants like ANCE reach 0.33, reflecting neural models' superior handling of semantic nuances but at the cost of higher latency and reduced transparency. Similarly, on TREC-2019, BM25 yields an NDCG@10 of 0.48 compared to 0.65 for dense approaches, underscoring BM25's role as a robust, lightweight baseline that underperforms on tasks requiring deep contextual understanding, such as synonymy or resolution. Compared to from Randomness (DFR) models, which share a probabilistic foundation by modeling deviations from random term distributions, BM25 is empirically simpler with fewer parameters to tune while achieving comparable or superior performance after optimization. Studies on term frequency normalization demonstrate that BM25's straightforward k_1 and b adjustments often yield more stable results than DFR's complex divergence measures, particularly for short queries where over-tuning DFR can degrade effectiveness. Despite these strengths, BM25 struggles with synonyms, contextual ambiguities, and non-lexical semantics, as it relies solely on exact term matching without embedding-based inference. In 2025 evaluations, such as those on BEIR and MS MARCO, BM25 continues to serve as the , against which advanced models are benchmarked for zero-shot .

References

  1. [1]
    [PDF] Okapi at TREC{3 - Microsoft
    BM25 is referred to as BM25(&1, &2, &3, b). It is always to be assumed that r = 0 unless stated. The modified weight function seems able to give slightly ...
  2. [2]
    [PDF] The Probabilistic Relevance Framework: BM25 and Beyond Contents
    The first symbol is used to indicate that two functions are equivalent as ranking functions. rank equivalence: ∝q. e.g. g() ∝q h(). In developing the model, ...
  3. [3]
    None
    Summary of each segment:
  4. [4]
    (PDF) Okapi at TREC-3. - ResearchGate
    ... Finally, we also use as a reference comparison Okapi BM25 (Robertson et al. 1995) . Okapi was shown to substantially outperform standard neural network ...
  5. [5]
    [PDF] Relevance Weighting of Search Terms - ResearchGate
    Relevance Weighting of Search Terms. This paper examines statistical techniques for exploit- ing relevance information to weight search terms. These.
  6. [6]
    [PDF] S.E. Robertson, C.J. van Rijsbergen and M.F. Porter, Probabilistic ...
    Returning to the 2-Poisson model, we can represent our independence assumptions thus: Suppose that there is a query with relevance property A, with terms t₁ ...
  7. [7]
    (PDF) Okapi at TREC-2. - ResearchGate
    PDF | On Jan 1, 1993, Stephen E. Robertson and others published Okapi at TREC-2. | Find, read and cite all the research you need on ResearchGate.
  8. [8]
    [PDF] Some Simple Effective Approximations to the 2–Poisson Model for ...
    The formal model which is used to inves- tigate the effects of these variables is the 2–Poisson model (Harter [5], Robertson, van Rijsbergen and. Porter [6]).
  9. [9]
    Practical BM25 - Part 3: Considerations for Picking b and k1 ... - Elastic
    Apr 19, 2018 · Learn about best practices and other considerations before modifying the b and k1 values of the BM25 similarity ranking (relevancy) ...Missing: empirical | Show results with:empirical
  10. [10]
    [PDF] A Machine Learning Approach for Improved BM25 Retrieval - Microsoft
    A challenge to using BM25 and BM25F is the necessity of parameter tun- ing. Parameters control the contributions of term frequency, field length, and field.
  11. [11]
    [PDF] Distributed EDLSI, BM25, and Power Norm at TREC 2008
    As noted in Section 4, BM25 requires a large number of tuning parameters. For our UrsinusBM25a baseline run we set b = .6, k1 = 2.25 and k3 = 3. These are.
  12. [12]
    Relevance weighting of search terms - ASIS&T Digital Library - Wiley
    Relevance weighting of search terms. S. E. Robertson,. S. E. Robertson. School ... Sparck Jones,. K. Sparck Jones. Computer Laboratory University of ...
  13. [13]
    [PDF] the probability ranking principle in ir - City, University of London
    VAN RIJSBERGEN, C. J. and SPARCK JONES, K. A test for the separation of relevant and non- relevant documents in experimental retrieval collections. Journal ...
  14. [14]
    Lower-bounding term frequency normalization - ACM Digital Library
    Abstract. In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is ...
  15. [15]
    [PDF] A Study of Retrieval Models for Long Documents and Queries in ...
    MAP. # of terms. BM25. BM25+. MATF. JM. Dir. Dir+. SPUD. (b) Change in Performance (MAP) as query scope (|q|) increases on robust-04 by increasing number of ...<|control11|><|separator|>
  16. [16]
    What is bm25 and why elasticsearch chose this algorithm for scoring ...
    May 5, 2017 · Lucene switched to BM-25 as default scoring from 6.0 - which is underlying search library used by Elasticsearch and SOLR. Share. Share a link ...
  17. [17]
    BM25 Reference - Vespa Documentation
    The bm25 rank feature estimates text relevance using the Okapi BM25 function. It's a fast, pure text ranking feature for a first phase ranking.Missing: Solr | Show results with:Solr<|separator|>
  18. [18]
    BM25Similarity (Lucene 7.0.1 API)
    Computes a score factor for a simple term and returns an explanation for that score factor. The default implementation uses: idf(docFreq, docCount);
  19. [19]
    Learning To Rank (LTR) | Elastic Docs
    Learning To Rank (LTR) uses a trained machine learning (ML) model to build a ranking function for your search engine.
  20. [20]
    Learning To Rank :: Apache Solr Reference Guide
    With the Learning To Rank (or LTR for short) module you can configure and run machine learned ranking models in Solr. The module also supports feature logging ...
  21. [21]
    BM25 vs Lucene Default Similarity | Elastic Blog
    Jan 14, 2014 · Clearly BM25 performed far better than the default similarity for this case, but it is important to keep in mind the 10 hits limit on the result ...
  22. [22]
    The science behind semantic search: How AI from Bing is powering ...
    Mar 2, 2021 · At the heart of Azure Cognitive Search is its full text, keyword-based search engine built on the BM25 algorithm—an industry standard in ...
  23. [23]
    About hybrid search | Vertex AI | Google Cloud Documentation
    Learn how to set up token-based and hybrid search in Vector Search, which combines semantic and keyword search to provide higher search quality.
  24. [24]
    Using BM25 to Optimize Q&A and Search Accuracy - ConfidentialMind
    Feb 12, 2025 · BM25 is the default scoring algorithm. Web search engines: Google, Bing, and others rely on BM25-style lexical signals among many features.
  25. [25]
    Keeping it boring (and relevant) with BM25F | Sourcegraph Blog
    Apr 4, 2025 · April 4, 2025 ... In our internal search quality evaluations, BM25 showed roughly 20% improvement across all key metrics compared to our baseline ...
  26. [26]
    [PDF] An Empirical Study of Some Selected IR Models for Bengali ... - arXiv
    In contrast with the TF-IDF and log TF-IDF models, the. BM25 score has 3 parameters, K1, B and K3, which need to be tuned for obtaining the better retrieval ...
  27. [27]
    [PDF] arXiv:2408.06643v2 [cs.IR] 14 Aug 2024
    Aug 14, 2024 · BM/ generally outperforms all. BM25 variants, achieving the best results in 11 out of 15 cases. This suggests the effectiveness of BM/ and ...Missing: BM30 BM35
  28. [28]
    [PDF] On Single and Multiple Representations in Dense Passage Retrieval
    Aug 19, 2021 · As expected, both the ANCE and. ColBERT dense retrieval approaches are significantly better than BM25 for the NDCG@10 and. MRR@10 metrics on ...
  29. [29]
    Term frequency normalisation tuning for BM25 and DFR models ...
    ... Divergence From Randomness (DFR) models and the BM25's normalisation method. Results show that for both normalisation methods, our tuning method ...Term Frequency Normalisation... · Information & Contributors · Abstract
  30. [30]
    [2508.17694] Semantic Search for Information Retrieval - arXiv
    Aug 25, 2025 · ... BM25 and TF-IDF to modern semantic retrievers. This survey provides a brief overview of the BM25 baseline, then discusses the architecture ...