Fact-checked by Grok 2 weeks ago

Okapi BM25

Okapi BM25 is a probabilistic ranking function employed in information retrieval systems to assess the relevance of documents to a given search query by computing a score that balances term frequency, inverse document frequency, and document length normalization.^[1] Developed as part of the Okapi experimental testbed at City University London, it builds on the binary independence model from the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others, incorporating refinements to handle non-binary term occurrences and length biases effectively.^[2] The acronym "BM" denotes "Best Matching," while "25" signifies the iteration number in a sequence of weighting schemes devised by Stephen E. Robertson and collaborators.^[1] The core formula of Okapi BM25 sums, over each query term q, an inverse document frequency (IDF) term multiplied by a saturated term frequency (TF) factor adjusted for document length:
\sum_{q \in Q} \mathrm{IDF}(q) \cdot \frac{\mathrm{TF}(q,d) \cdot (k_1 + 1)}{\mathrm{TF}(q,d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{\mathrm{avgdl}})}
where \mathrm{TF}(q,d) is the frequency of q in document d, |d| is the length of d, \mathrm{avgdl} is the average document length, and k_1 (typically 1.2–2.0) and b (often 0.75) are tunable parameters controlling saturation and length normalization, respectively.^[1] This formulation addresses limitations in earlier models like BM15 by introducing a non-linear TF saturation to prevent overemphasis on repeated terms and a pivot-based length normalization to favor neither short nor overly long documents.^[2] Extensions such as BM25F adapt it for structured documents by weighting multiple fields (e.g., title, body).^[2] Since its introduction in 1994 at the Text REtrieval Conference (TREC-3), Okapi BM25 has demonstrated superior performance in empirical evaluations, outperforming vector space models and other probabilistic approaches on diverse corpora.^[1] It remains the de facto standard for lexical ranking in production search engines, including Elasticsearch and OpenSearch, due to its efficiency, interpretability, and robustness across languages and domains.^[3] Open-source implementations in toolkits like Terrier, Xapian, and MG4J have further facilitated its widespread adoption in academic and industrial applications, influencing hybrid systems that combine it with neural retrieval methods.^[2]

Introduction

Overview and Purpose

Okapi BM25 is a probabilistic ranking function employed in information retrieval systems to assess the relevance of documents to a given query, operating as a bag-of-words retrieval model that treats documents and queries as collections of independent terms. Developed as part of the Okapi information retrieval system at City University London during the 1980s and 1990s, BM25 builds upon foundational concepts like term frequency and inverse document frequency while introducing enhancements such as term saturation to mitigate the disproportionate influence of highly frequent terms and document length normalization to account for variations in document size.^[2]^[4] The primary purpose of BM25 in search engines is to compute a relevance score for each document based on the presence and frequency of query terms, enabling the effective ranking of results in descending order of estimated relevance. This scoring mechanism supports the Probability Ranking Principle, which posits that documents should be presented to users in order of decreasing probability of relevance to the query, thereby improving retrieval precision and user satisfaction in large-scale collections.^[2] BM25 addresses key challenges in information retrieval, including the handling of varying document lengths—where longer documents might otherwise be unfairly favored—and the saturation of term frequencies, preventing repeated occurrences of a term from linearly inflating scores beyond a point of diminishing returns. By incorporating these adjustments, BM25 provides a more balanced estimation of relevance compared to simpler models like TF-IDF, where term frequency and inverse document frequency serve as core building blocks without such refinements.^[2]

Historical Background

The probabilistic foundations of BM25 emerged from early work in information retrieval during the 1970s and 1980s, particularly the binary independence model developed by Stephen E. Robertson and Karen Spärck Jones, which introduced relevance-based term weighting under assumptions of term independence.^[5] This framework was extended by Robertson, C. J. van Rijsbergen, and M. F. Porter through explorations of term frequency distributions, including the 2-Poisson model, which modeled document terms as mixtures of "elite" (relevant) and non-elite distributions to better approximate retrieval probabilities.^[6] These advancements built on prior probabilistic models, emphasizing the use of collection statistics for ranking without requiring full relevance judgments.^[1] In the 1980s, Robertson and colleagues at City University London implemented these principles in the Okapi system, an experimental information retrieval platform initially designed for bibliographic databases and probabilistic ranking.^[1] The system's early weighting schemes, such as BM1, were tested in the inaugural Text REtrieval Conference (TREC-1) in 1992, focusing on basic relevance weights derived from the Robertson-Spärck Jones formula.^[1] By TREC-2 in 1993, the team introduced refined models like BM11 and BM15, which incorporated within-document term frequency saturation and length normalization to address limitations in handling variable document sizes and repeated terms.^[7] The culmination of this evolution came in 1994 with the formalization of BM25, a hybrid of BM11 and BM15 that balanced term frequency effects with tunable parameters for improved retrieval effectiveness, as detailed by Robertson and Steve Walker in their SIGIR paper and validated through TREC-3 evaluations.^[8]^[1] This version was quickly adopted within probabilistic IR frameworks, marking a milestone in practical term weighting and influencing subsequent large-scale search systems.^[8]

Core Components

Term Frequency Saturation

In the Okapi BM25 ranking function, term frequency saturation addresses the observation that repeated occurrences of a query term within a document contribute diminishing marginal relevance beyond an initial point, preventing linear scaling that could overly favor documents with excessive repetitions.^[1] This non-linear approach models the intuition that while higher term frequency generally indicates greater relevance, additional matches yield progressively less informational value, as supported by empirical analyses in probabilistic retrieval models.^[2] The mathematical form of the term frequency component incorporates saturation through the expression:

\frac{(k_1 + 1) f}{f + k_1 \left(1 - b + b \frac{\mathrm{doc\_len}}{\mathrm{avg\_doc\_len}}\right)}

where f denotes the raw term frequency in the document, k_1 (typically 1.2–2.0) is a tunable parameter controlling the saturation rate, b (often 0.75) modulates length effects, \mathrm{doc\_len} is the document length, and \mathrm{avg\_doc\_len} is the collection's average document length.^[1] As f increases, the fraction approaches k_1 + 1 asymptotically, capping the contribution regardless of further repetitions.^[2] This design stems from the 2-Poisson distribution assumption in early probabilistic models, where term occurrences follow an S-shaped curve reflecting elite documents with higher baseline frequencies, validated through TREC evaluations showing superior performance over unsaturated TF-IDF variants.^[1] For instance, a query term appearing once in a document might yield a TF score of 1 (assuming k_1 = 2 and unit length normalization), while 10 occurrences yield approximately 2.5, limiting score inflation compared to linear TF which would multiply by 10 and capping toward 3.^[2] This saturation enhances ranking precision by emphasizing topical relevance over sheer volume, distinct from inter-document rarity captured by IDF.^[1]

Inverse Document Frequency

The inverse document frequency (IDF) in Okapi BM25 quantifies the rarity of a term q across the entire document collection, serving as a key weighting factor in the ranking function. It is formally defined as

\text{IDF}(q) = \log \left( \frac{N - n(q) + 0.5}{n(q) + 0.5} \right),

where N represents the total number of documents in the collection, and n(q) denotes the number of documents containing the term q.^[2] This logarithmic scaling ensures that the IDF value increases as the term becomes rarer, thereby assigning higher weights to terms that appear in fewer documents. The primary purpose of IDF is to diminish the influence of common terms, such as stop words like "the" or "and," which occur frequently across the corpus and offer little discriminatory power in distinguishing relevant documents from irrelevant ones. Conversely, it amplifies the contribution of rare or specific terms, which are more likely to indicate relevance to a particular query, enhancing the overall precision of retrieval results.^[2] In BM25, the IDF formula incorporates a specific adjustment by adding 0.5 to both the numerator and denominator, which acts as a smoothing mechanism to prevent division by zero when n(q) = 0 and to avoid negative or undefined values in edge cases, such as when a term appears in nearly all or no documents. This modification provides numerical stability while approximating the classical IDF without requiring relevance information from the collection.^[2] This formulation builds directly on Karen Spärck Jones's foundational 1972 work, which introduced IDF within a probabilistic retrieval framework to measure term specificity as the logarithm of the ratio of total to containing documents, but BM25 adapts it for practical, non-relevance-based weighting in the Robertson-Spärck Jones probabilistic model.^[2]

Document Length Normalization

Document length normalization in BM25 addresses potential biases arising from varying document sizes by scaling the contribution of term frequency relative to the document's length compared to the collection average. This is achieved through a normalization factor incorporated into the denominator of the term weighting component, given by (1 - b + b \cdot \frac{dl}{avdl}), where dl denotes the length of the document in question, avdl is the average document length across the entire collection, and b is a tunable parameter.^[1]^[2] The parameter b, constrained to the range [0, 1], governs the intensity of this normalization: when b = 0, document length is entirely disregarded, treating all documents equally regardless of size; when b = 1, full normalization is applied, effectively scaling term frequencies proportionally to document length. Intermediate values, often empirically set between 0.5 and 0.8, provide a balanced adjustment.^[1]^[2] This mechanism stems from the need to counteract the "verbosity hypothesis," where longer documents might otherwise receive undue advantage due to more opportunities for term occurrences, while ensuring shorter documents with relevant rare terms are not overly penalized. Empirical tuning on TREC datasets has validated this approach, demonstrating improved retrieval effectiveness by mitigating length-based distortions in ranking.^[1]^[2] For instance, consider a long document with a moderate frequency of a query term; its score will be downweighted more substantially than that of an average-length document exhibiting similar term density, promoting fairness in relevance assessment.^[2]

Ranking Function

Basic BM25 Formula

The BM25 ranking function computes a relevance score for a document D given a query Q, by aggregating contributions from each query term q_i based on their weighted occurrences in the document. This function integrates term frequency saturation to diminish the impact of repeated terms, inverse document frequency to emphasize rarity, and document length normalization to penalize overly long documents relative to the collection average. The resulting score prioritizes documents that match query terms proportionally without overemphasizing length or frequency extremes.^[1]^[2] The complete basic BM25 formula is given by:

\text{Score}(D, Q) = \sum_{i} \text{[IDF](/page/IDF)}(q_i) \cdot \frac{f(q_i, D) \cdot (k_1 + 1)}{f(q_i, D) + k_1 \cdot \left(1 - b + b \cdot \frac{|D|}{\text{[avgdl](/page/average)}}\right)}

where f(q_i, D) is the frequency of term q_i in document D, |D| is the length of D in terms, \text{avgdl} is the average document length in the collection, \text{[IDF](/page/IDF)}(q_i) is the inverse document frequency of q_i, and k_1 > 0 and $0 \leq b \leq 1 are tunable parameters controlling saturation and length normalization, respectively.^[1]^[2] This formula arises from multiplying, for each query term, the IDF weight by a saturated term frequency component \frac{f(q_i, D) \cdot (k_1 + 1)}{f(q_i, D) + k_1 \cdot \left(1 - b + b \cdot \frac{|D|}{\text{avgdl}}\right)}, which caps the influence of high frequencies while adjusting for document length via the denominator's normalization term, and then summing these products across all unique terms in Q. The derivation treats the overall score as an additive combination of per-term contributions, reflecting the integration of saturation to model diminishing returns on frequency, rarity via IDF, and length to ensure fair comparison across varying document sizes.^[2] For multi-term queries, BM25 assumes term independence, computing the score as the sum over individual terms without modeling interactions like co-occurrence or order; while phrase-specific adjustments exist in extensions, the basic form relies solely on this additive bag-of-words aggregation. The underlying model further assumes a bag-of-words representation, ignoring positional information and treating documents as unordered multisets of terms to simplify computation and focus on frequency-based relevance.^[1]^[2]

Parameter Tuning

The parameter k_1 in BM25 controls the saturation of term frequency contributions, with higher values permitting a more linear increase in scoring as term frequency grows, while lower values emphasize saturation to reduce the impact of repeated terms.^[2] Typical values for k_1 range from 1.2 to 2.0, as these have been found effective across various collections in empirical evaluations.^[9] The parameter b governs the degree of document length normalization, where values closer to 1 apply stronger penalties to longer documents, and a common default is 0.75 to balance relevance without over-penalizing extended content.^[10] Tuning k_1 and b typically involves empirical optimization using grid search or multidimensional methods over ranges like k_1 \in [0.5, 2.0] and b \in [0.3, 0.9], evaluated on relevance judgments from test collections such as TREC datasets.^[2] Performance is assessed via information retrieval metrics including Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG), often through cross-validation to ensure generalizability across queries.^[11] These processes can be computationally intensive, sometimes requiring weeks for large query sets, prompting alternatives like gradient-based optimization for faster convergence.^[10] Domain-specific adjustments are essential, as optimal parameters vary by corpus characteristics; for sparse corpora with low term frequencies, lower k_1 values (e.g., around 1.0) minimize overemphasis on rare repetitions.^[10] In collections with highly varied document lengths, higher b values (e.g., 0.75–0.9) enhance normalization to prevent bias toward longer documents. Such tuning has been shown to improve retrieval performance in field-specific applications like web search.^[10]

Interpretations

Probabilistic Basis

The Okapi BM25 ranking function originates from the probabilistic retrieval framework, which models document relevance as the probability that a document satisfies a user's information need given a query. This approach, pioneered in the 1970s, posits that documents should be ranked in decreasing order of their estimated relevance probability to optimize retrieval effectiveness. A foundational element is the Binary Independence Model (BIM), developed by Stephen E. Robertson and Karen Spärck Jones, which assumes that term presence or absence in a document is independent of other terms, conditional on relevance, and treats relevance judgments as binary (relevant or non-relevant).^[12] The BIM derives relevance weights from the distribution of terms in relevant versus non-relevant documents, providing an early probabilistic basis for term weighting in information retrieval systems.^[13] BM25 extends this foundation by incorporating term frequencies, moving beyond the binary assumptions of the BIM through the 2-Poisson model, which posits that term occurrences in relevant and non-relevant documents follow two distinct Poisson distributions to capture "eliteness" (the term's contribution to relevance).^[8] Specifically, BM25 is derived as an approximation to maximize the likelihood of relevance given observed term occurrences in the query and document, under the key assumption of term independence conditional on relevance status. This derivation simplifies the full probabilistic computation by focusing on the log-odds ratio of relevance, expressed as \log \frac{P(R=1 \mid Q, D)}{P(R=0 \mid Q, D)}, where R is the relevance indicator, Q the query, and D the document; the model estimates this ratio using empirical term statistics from a collection.^[2] The resulting score balances the evidence from term matches while avoiding exact Bayesian computations, which would be computationally intensive. Despite its strengths, the probabilistic basis of BM25 has limitations, notably the assumption of term independence, which ignores potential dependencies between terms that could influence relevance in real-world corpora. However, these simplifications have proven empirically robust, with BM25 demonstrating strong performance across diverse retrieval tasks due to its effective approximation of probabilistic relevance signals.^[2] This robustness stems from the framework's grounding in the Probability Ranking Principle, articulated by Robertson and van Rijsbergen, which justifies ordering documents by relevance probability for optimal user utility in non-interactive settings.^[13]

Information-Theoretic IDF

The inverse document frequency (IDF) component in BM25 can be interpreted through the lens of information theory as a measure of surprise or self-information associated with a term's occurrence across the document collection. Specifically, the term \log \frac{N}{n(q)}, where N is the total number of documents and n(q) is the number of documents containing the query term q, approximates the self-information of the event that a document contains q. This self-information, defined as -\log P(q), quantifies the unexpectedness of the term's presence, with P(q) = \frac{n(q)}{N} representing the probability that a randomly selected document includes q.00021-3) This interpretation positions rare terms—those with low n(q)—as carrying greater informational value regarding a document's potential relevance to the query, as their occurrence reduces uncertainty more substantially than common terms. In essence, IDF reflects the negative entropy of a Bernoulli process modeling term presence, aligning with Shannon's concept of entropy as a measure of information content in the collection. By weighting rarer terms higher, BM25 emphasizes features that distinguish relevant documents from the broader corpus, enhancing retrieval precision.00021-3) To adapt this for practical use, BM25 modifies the classic IDF with additive smoothing: \log \frac{N - n(q) + 0.5}{n(q) + 0.5}. This +0.5 adjustment, akin to Laplace smoothing or add-one priors in probabilistic estimation, incorporates pseudo-counts to stabilize the logarithm, preventing undefined values when n(q) = 0 or extreme biases in sparse data scenarios. It thereby reduces sensitivity to small sample sizes in the corpus.^[1] Compared to the classic IDF \log \frac{N}{n(q)}, BM25's smoothed variant proves more robust to variations in collection size, as the priors mitigate overestimation of rarity in small or imbalanced datasets while preserving the core information-theoretic weighting. This adjustment maintains the surprise-based intuition without introducing undue volatility in real-world search systems.^[1]

Variants and Extensions

Field-Based Variants

BM25F is a variant of BM25 designed for structured documents with multiple fields, such as title, body, and anchors in web pages. It extends the original formula by applying separate length normalization and tunable weights to each field, allowing the model to emphasize more important fields (e.g., higher weight for titles). The score is a weighted sum over fields:
\sum_{i=1}^{n} w_i \cdot \sum_{q \in Q} \mathrm{IDF}(q) \cdot \frac{\mathrm{TF}_i(q,d) \cdot (k_{1i} + 1)}{\mathrm{TF}_i(q,d) + k_{1i} \cdot (1 - b_i + b_i \cdot \frac{|d_i|}{\mathrm{avgdl}_i})}
where subscript i denotes the field, w_i is the field weight (often >1 for key fields like title), and parameters can be field-specific. This addresses limitations of uniform treatment in standard BM25 for heterogeneous content.^[2]^[14]

Frequency-Enhanced Variants

Frequency-enhanced variants of the Okapi BM25 ranking function address limitations in the original model's handling of term frequency (TF), particularly when terms appear infrequently or not at all in a document. These modifications introduce adjustments to the TF normalization component to provide a lower bound, preventing the contribution of a term from becoming effectively negative or overly suppressed due to document length effects. By doing so, they enhance retrieval performance, especially for queries with many terms where standard BM25 may undervalue documents with partial matches.^[15] A prominent example is BM25+, proposed by Lv and Zhai in 2011. This variant modifies the TF saturation formula in BM25 by adding a baseline boost parameterized by δ (typically set to 1.0), ensuring that even low or zero TF values contribute a minimal positive score rather than being completely discounted or penalized below the absent-term baseline. The updated TF component is the standard BM25 TF plus δ:
\frac{\mathrm{TF}(q,d) \cdot (k_1 + 1)}{\mathrm{TF}(q,d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{\mathrm{avgdl}})} + \delta
where TF(q,d) = f is the raw term frequency in document d, |d| is the document length, avgdl is the average document length, and k₁ and b are the standard BM25 parameters. The full term contribution is then IDF(q) times this value. This lower-bounding mechanism mitigates the issue in the original BM25 where the normalized TF could drop below 1 for low f in long documents, effectively making the term's contribution less than if it were absent altogether.^[15] The rationale for BM25+ stems from axiomatic analysis revealing that unconstrained TF normalization in BM25 violates desirable properties, such as non-monotonicity in document length, leading to suboptimal scores for documents with sparse term occurrences. In the original BM25, a term absent from a document yields zero contribution, but for present terms with low frequency in long documents, the normalization can suppress the score disproportionately; BM25+ counters this by enforcing a floor, which is particularly beneficial for verbose queries containing rare or partially matching terms.^[15]^[16] Evaluations demonstrate BM25+'s effectiveness over standard BM25. On TREC collections (e.g., WT10G, WT2G, Terabyte, and Robust 2004), BM25+ achieved mean average precision (MAP) improvements of approximately 5-10% overall, with gains up to 15-20% on verbose query subsets (queries with more than 5 terms), while maintaining or slightly enhancing performance on short queries. These results highlight its robustness without increasing computational overhead, as the modification integrates seamlessly into existing BM25 implementations.^[15]^[16]

Applications

Use in Search Systems

BM25 has been the default ranking function in Elasticsearch since version 5.0, released in 2016, and in Solr since the adoption of Lucene 6.0 in the same year, enabling robust keyword-based retrieval in these widely used open-source search platforms.^[17] In Vespa, BM25 is supported as a dedicated rank feature for first-phase ranking, often integrated into custom rank profiles for production-scale applications.^[18] This adoption since the 2010s has made BM25 a cornerstone for information retrieval in enterprise search systems, where it powers efficient scoring of large corpora. Open-source implementations of BM25 are prominently featured in Apache Lucene's BM25Similarity class, which provides the core computation for term frequency saturation, inverse document frequency, and length normalization, serving as the foundation for Elasticsearch and Solr.^[19] Tuning BM25 for domain-specific corpora is facilitated through plugins like the Learning to Rank (LTR) module, available in both Elasticsearch and Solr, which allows machine-learned adjustments to parameters such as k1 and b using relevance judgments from user interactions or labeled data.^[20]^[21] In performance evaluations, BM25 demonstrates effectiveness for keyword-based retrieval, significantly outperforming traditional TF-IDF models; for instance, in a benchmark on Wikipedia articles, BM25 reduced the percentage of queries failing to retrieve relevant documents in the top 10 results from 57.7% under the default TF-IDF similarity to 16.0%.^[22] Such improvements highlight BM25's ability to handle term saturation and document length, making it a reliable baseline in benchmarks like those from the Text REtrieval Conference (TREC). As of 2025, BM25 is increasingly combined with neural rerankers in hybrid retrieval pipelines at major search engines, including Bing and Google, where it provides lexical matching for initial candidate generation before neural models refine rankings based on semantic embeddings for enhanced relevance in diverse queries.^[23]^[24] This integration leverages BM25's speed and precision for sparse retrieval alongside dense vector search, improving overall system performance in production environments.^[25]

Comparisons with Alternatives

BM25 offers notable advantages over the classical TF-IDF weighting scheme through its incorporation of term frequency saturation via the k_1 parameter and explicit document length normalization via the b parameter, which mitigate biases toward longer documents and diminishing returns on repeated terms. These enhancements result in significant improvements in mean average precision (MAP) in comparative evaluations on benchmark datasets, such as TREC collections, where TF-IDF often overemphasizes raw term counts without such controls.^[26]^[27] However, like TF-IDF, BM25 remains a bag-of-words model lacking positional information, which limits its ability to capture phrase-level or structural relevance in documents.^[28] In contrast to neural ranking models such as BERT-based dense retrievers, BM25 excels in computational efficiency and interpretability, making it ideal for first-stage retrieval in large-scale systems where speed is paramount. On the MS MARCO dataset, BM25 achieves an MRR@10 of approximately 0.18, while BERT variants like ANCE reach 0.33, reflecting neural models' superior handling of semantic nuances but at the cost of higher latency and reduced transparency.^[29] Similarly, on TREC-2019, BM25 yields an NDCG@10 of 0.48 compared to 0.65 for dense approaches, underscoring BM25's role as a robust, lightweight baseline that underperforms on tasks requiring deep contextual understanding, such as synonymy or polysemy resolution.^[29] Compared to Divergence from Randomness (DFR) models, which share a probabilistic foundation by modeling deviations from random term distributions, BM25 is empirically simpler with fewer parameters to tune while achieving comparable or superior performance after optimization. Studies on term frequency normalization demonstrate that BM25's straightforward k_1 and b adjustments often yield more stable results than DFR's complex divergence measures, particularly for short queries where over-tuning DFR can degrade effectiveness.^[30] Despite these strengths, BM25 struggles with synonyms, contextual ambiguities, and non-lexical semantics, as it relies solely on exact term matching without embedding-based inference. In 2025 information retrieval evaluations, such as those on BEIR and MS MARCO, BM25 continues to serve as the de facto baseline, against which advanced models are benchmarked for zero-shot generalization.^[31]

References

[1]
[PDF] Okapi at TREC{3 - Microsoft
BM25 is referred to as BM25(&1, &2, &3, b). It is always to be assumed that r = 0 unless stated. The modified weight function seems able to give slightly ...
[2]
[PDF] The Probabilistic Relevance Framework: BM25 and Beyond Contents
The first symbol is used to indicate that two functions are equivalent as ranking functions. rank equivalence: ∝q. e.g. g() ∝q h(). In developing the model, ...
[3]
None
Summary of each segment:
[4]
(PDF) Okapi at TREC-3. - ResearchGate
... Finally, we also use as a reference comparison Okapi BM25 (Robertson et al. 1995) . Okapi was shown to substantially outperform standard neural network ...
[5]
[PDF] Relevance Weighting of Search Terms - ResearchGate
Relevance Weighting of Search Terms. This paper examines statistical techniques for exploit- ing relevance information to weight search terms. These.
[6]
[PDF] S.E. Robertson, C.J. van Rijsbergen and M.F. Porter, Probabilistic ...
Returning to the 2-Poisson model, we can represent our independence assumptions thus: Suppose that there is a query with relevance property A, with terms t₁ ...
[7]
(PDF) Okapi at TREC-2. - ResearchGate
PDF | On Jan 1, 1993, Stephen E. Robertson and others published Okapi at TREC-2. | Find, read and cite all the research you need on ResearchGate.
[8]
[PDF] Some Simple Effective Approximations to the 2–Poisson Model for ...
The formal model which is used to inves- tigate the effects of these variables is the 2–Poisson model (Harter [5], Robertson, van Rijsbergen and. Porter [6]).
[9]
Practical BM25 - Part 3: Considerations for Picking b and k1 ... - Elastic
Apr 19, 2018 · Learn about best practices and other considerations before modifying the b and k1 values of the BM25 similarity ranking (relevancy) ...Missing: empirical | Show results with:empirical
[10]
[PDF] A Machine Learning Approach for Improved BM25 Retrieval - Microsoft
A challenge to using BM25 and BM25F is the necessity of parameter tun- ing. Parameters control the contributions of term frequency, field length, and field.
[11]
[PDF] Distributed EDLSI, BM25, and Power Norm at TREC 2008
As noted in Section 4, BM25 requires a large number of tuning parameters. For our UrsinusBM25a baseline run we set b = .6, k1 = 2.25 and k3 = 3. These are.
[12]
Relevance weighting of search terms - ASIS&T Digital Library - Wiley
Relevance weighting of search terms. S. E. Robertson,. S. E. Robertson. School ... Sparck Jones,. K. Sparck Jones. Computer Laboratory University of ...
[13]
[PDF] the probability ranking principle in ir - City, University of London
VAN RIJSBERGEN, C. J. and SPARCK JONES, K. A test for the separation of relevant and non- relevant documents in experimental retrieval collections. Journal ...
[14]
Lower-bounding term frequency normalization - ACM Digital Library
Abstract. In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is ...
[15]
[PDF] A Study of Retrieval Models for Long Documents and Queries in ...
MAP. # of terms. BM25. BM25+. MATF. JM. Dir. Dir+. SPUD. (b) Change in Performance (MAP) as query scope (|q|) increases on robust-04 by increasing number of ...<|control11|><|separator|>
[16]
What is bm25 and why elasticsearch chose this algorithm for scoring ...
May 5, 2017 · Lucene switched to BM-25 as default scoring from 6.0 - which is underlying search library used by Elasticsearch and SOLR. Share. Share a link ...
[17]
BM25 Reference - Vespa Documentation
The bm25 rank feature estimates text relevance using the Okapi BM25 function. It's a fast, pure text ranking feature for a first phase ranking.Missing: Solr | Show results with:Solr<|separator|>
[18]
BM25Similarity (Lucene 7.0.1 API)
Computes a score factor for a simple term and returns an explanation for that score factor. The default implementation uses: idf(docFreq, docCount);
[19]
Learning To Rank (LTR) | Elastic Docs
Learning To Rank (LTR) uses a trained machine learning (ML) model to build a ranking function for your search engine.
[20]
Learning To Rank :: Apache Solr Reference Guide
With the Learning To Rank (or LTR for short) module you can configure and run machine learned ranking models in Solr. The module also supports feature logging ...
[21]
BM25 vs Lucene Default Similarity | Elastic Blog
Jan 14, 2014 · Clearly BM25 performed far better than the default similarity for this case, but it is important to keep in mind the 10 hits limit on the result ...
[22]
The science behind semantic search: How AI from Bing is powering ...
Mar 2, 2021 · At the heart of Azure Cognitive Search is its full text, keyword-based search engine built on the BM25 algorithm—an industry standard in ...
[23]
About hybrid search | Vertex AI | Google Cloud Documentation
Learn how to set up token-based and hybrid search in Vector Search, which combines semantic and keyword search to provide higher search quality.
[24]
Using BM25 to Optimize Q&A and Search Accuracy - ConfidentialMind
Feb 12, 2025 · BM25 is the default scoring algorithm. Web search engines: Google, Bing, and others rely on BM25-style lexical signals among many features.
[25]
Keeping it boring (and relevant) with BM25F | Sourcegraph Blog
Apr 4, 2025 · April 4, 2025 ... In our internal search quality evaluations, BM25 showed roughly 20% improvement across all key metrics compared to our baseline ...
[26]
[PDF] An Empirical Study of Some Selected IR Models for Bengali ... - arXiv
In contrast with the TF-IDF and log TF-IDF models, the. BM25 score has 3 parameters, K1, B and K3, which need to be tuned for obtaining the better retrieval ...
[27]
[PDF] arXiv:2408.06643v2 [cs.IR] 14 Aug 2024
Aug 14, 2024 · BM/ generally outperforms all. BM25 variants, achieving the best results in 11 out of 15 cases. This suggests the effectiveness of BM/ and ...Missing: BM30 BM35
[28]
[PDF] On Single and Multiple Representations in Dense Passage Retrieval
Aug 19, 2021 · As expected, both the ANCE and. ColBERT dense retrieval approaches are significantly better than BM25 for the NDCG@10 and. MRR@10 metrics on ...
[29]
Term frequency normalisation tuning for BM25 and DFR models ...
... Divergence From Randomness (DFR) models and the BM25's normalisation method. Results show that for both normalisation methods, our tuning method ...Term Frequency Normalisation... · Information & Contributors · Abstract
[30]
[2508.17694] Semantic Search for Information Retrieval - arXiv
Aug 25, 2025 · ... BM25 and TF-IDF to modern semantic retrievers. This survey provides a brief overview of the BM25 baseline, then discusses the architecture ...