Fact-checked by Grok 2 weeks ago

Query expansion

Query expansion is a technique in information retrieval (IR) systems that reformulates a user's original query by selecting and adding semantically related terms or concepts, with the goal of minimizing vocabulary mismatch between the query and documents to enhance retrieval performance.^[1] This process addresses common challenges such as synonymy, polysemy, and ambiguous phrasing in short queries, thereby improving both recall—by retrieving more relevant items—and precision—by reducing irrelevant results.^[1] Experimental studies have shown that effective query expansion can boost average precision by 10% or more in various IR tasks.^[1] The concept traces its origins to the 1960s in library systems and gained prominence through relevance feedback methods, notably J.J. Rocchio's 1971 algorithm, which iteratively refines queries based on user-marked relevant and non-relevant documents in a vector space model.^[1] Early approaches were manual or interactive, relying on user input, but automatic techniques soon emerged to scale for large corpora. Query expansion methods are broadly categorized into local analysis, which uses feedback from initial retrieval results (e.g., pseudo-relevance feedback extracting terms from top-ranked documents), and global analysis, which draws from external resources like thesauri, ontologies (e.g., WordNet), or web corpora for term relationships.^[1] Key challenges include query drift—introducing irrelevant terms that degrade performance—and high computational costs, particularly for real-time applications.^[1] In contemporary IR, especially with the advent of pre-trained language models (PLMs) and large language models (LLMs), query expansion has evolved to incorporate contextual embeddings, generative rewriting, and zero-shot capabilities, enabling more nuanced handling of ambiguous or domain-specific queries.^[2] Modern techniques, such as implicit expansion via dense vector refinement (e.g., ANCE-PRF) or generative methods like Query2Doc that synthesize pseudo-documents, have demonstrated gains of 3–15% in metrics like nDCG on benchmarks including MS MARCO and TREC Deep Learning.^[2] These advancements integrate seamlessly with retrieval-augmented generation (RAG) pipelines and support multilingual, cross-domain applications, though they introduce new considerations like model hallucination and efficiency in deployment.^[2]

Fundamentals of Query Expansion

Definition and Purpose

Query expansion is the process of reformulating an initial user query in information retrieval systems by adding, removing, or replacing terms to better align with the content of relevant documents. This technique enhances the query's ability to capture documents that might otherwise be missed due to limitations in the original formulation.^[3] The primary purpose of query expansion is to bridge the vocabulary mismatch between how users express their information needs and the terminology used in document collections. Such mismatches often arise from synonyms (e.g., "automobile" versus "car"), polysemy (words with multiple meanings), or incomplete phrasing that fails to encompass all relevant concepts. By addressing these issues, query expansion improves the overall effectiveness of retrieval, potentially increasing recall while maintaining or enhancing precision.^[3]^[4] In information retrieval pipelines, query expansion typically follows initial preprocessing stages, such as tokenization, stemming, and spelling correction, where the raw query is cleaned and normalized. Subsequent term selection involves identifying and incorporating expansion terms from resources like thesauri or feedback mechanisms, which are then integrated into the expanded query before ranking and retrieval. For instance, a query for "jaguar" might be expanded to include terms like "car" or "animal" depending on contextual cues, thereby retrieving a broader yet relevant set of results across automotive or wildlife documents.^[3]^[5]

Historical Development

The concept of query expansion originated in the early days of information retrieval, with Melvin E. Maron and John L. Kuhns proposing automatic query modification in 1960 to address challenges in relevance judgments within mechanized library systems.^[6] Their work on probabilistic indexing laid the groundwork by suggesting the addition of related terms to queries, recognizing the limitations of exact term matching in probabilistic retrieval models.^[7] In the 1970s, a foundational advancement came with J.J. Rocchio's relevance feedback algorithm, which formalized query expansion as an iterative process to refine queries using user-provided relevant documents, thereby improving retrieval precision and recall in vector space models.^[8] During the 1970s and 1980s, query expansion evolved through the integration of controlled thesauri in domain-specific systems, such as the Medical Subject Headings (MeSH) vocabulary developed for biomedical databases, enabling structured term expansion to enhance search consistency in libraries and online bibliographic services like Dialog.^[9]^[10] The 1990s marked a shift toward statistical methods amid the rise of web search engines, with adaptations to systems like Gerard Salton's SMART retrieval system incorporating automatic query expansion techniques, such as term reweighting and pseudo-relevance feedback, to handle larger corpora.^[11] This period also saw the establishment of evaluation benchmarks through the Text REtrieval Conference (TREC) series, initiated in 1992 by NIST and ARPA, which systematically assessed query expansion's impact on ad-hoc retrieval tasks across participating systems. From the 2000s onward, query expansion incorporated web-scale data sources and machine learning approaches, leveraging vast corpora for distributional semantics and external knowledge bases to generate more context-aware expansions, building on earlier statistical foundations.^[12] Comprehensive surveys, such as that by Azad and Deepak in 2019, trace these developments from 1960 to 2017, highlighting the progression from manual thesauri to automated, learning-based methods that address vocabulary mismatch in modern search environments.^[12]

Theoretical Foundations

Precision and Recall Trade-offs

In information retrieval, precision is defined as the proportion of retrieved documents that are relevant to the query, formally expressed as
\text{Precision} = \frac{|\text{Relevant} \cap \text{Retrieved}|}{|\text{Retrieved}|},
where Relevant is the set of all relevant documents and Retrieved is the set of documents returned by the system.^[13] Recall, conversely, measures the proportion of relevant documents that are successfully retrieved, given by
\text{Recall} = \frac{|\text{Relevant} \cap \text{Retrieved}|}{|\text{Relevant}|}.
These metrics capture the core tension in retrieval systems: precision emphasizes the relevance of results, while recall prioritizes comprehensiveness.^[13] Query expansion typically enhances recall by incorporating additional terms, such as synonyms or related concepts, which broaden the query's scope and increase the likelihood of matching relevant documents that might otherwise be missed due to vocabulary mismatches.^[8] However, this expansion risks reducing precision through query drift, where irrelevant terms introduce noise, leading to the retrieval of off-topic documents that dilute the relevance of the top results.^[8] For instance, global expansion methods like thesaurus-based term addition can significantly lower precision if ambiguous expansions are included without contextual constraints.^[8] To mitigate these trade-offs, strategies such as term weighting via tf-idf are employed, where expanded terms are assigned scores based on their term frequency (tf) in the query or feedback documents and inverse document frequency (idf) across the corpus, calculated as
\text{tf-idf}(t, d) = \text{tf}(t, d) \times \log\left(\frac{N}{\text{df}(t)}\right),
with N as the total number of documents and \text{df}(t) as the document frequency of term t.^[13] This prioritizes discriminative terms, helping to preserve precision while boosting recall. Additionally, ranking adjustments, such as re-weighting original query terms higher than expanded ones or applying relevance feedback to refine expansions, further balance the metrics by downplaying noisy additions. Empirical evidence from TREC evaluations demonstrates these dynamics: in TREC-3 ad-hoc tasks, massive query expansion yielded approximately 20% improvements in recall-precision averages compared to baselines, reflecting enhanced recall at various levels, though unmitigated expansions without feedback could degrade performance for some queries due to irrelevant term inclusion.^[11] Feedback-optimized expansions, like those in Rocchio's method, often achieve better balance through selective term integration.^[8]

Evaluation Metrics

Evaluating the effectiveness of query expansion techniques in information retrieval requires metrics that assess both retrieval accuracy and ranking quality, building on foundational measures like precision and recall to provide aggregated insights across multiple queries.^[14] These metrics enable systematic comparisons between expanded and original queries, often using standardized test collections to ensure reproducibility.^[15] Mean Average Precision (MAP) serves as a key metric for query expansion evaluation, averaging the precision values at each position where a relevant document is retrieved across all queries. It is particularly useful for capturing the trade-off between precision and recall in ranked results, as it approximates the area under the precision-recall curve. The formula for MAP is given by:

MAP = \frac{1}{Q} \sum_{q=1}^{Q} \frac{1}{Rel_q} \sum_{k=1}^{Rel_q} P@k

where Q is the number of queries, Rel_q is the number of relevant documents for query q, and P@k is the precision at the position of the k-th relevant document. Normalized Discounted Cumulative Gain (NDCG) evaluates ranking quality by accounting for the graded relevance of documents and penalizing the placement of irrelevant items in higher positions. This metric is especially relevant for query expansion, as expansions can alter result ordering; NDCG normalizes the discounted cumulative gain against an ideal ranking to yield a value between 0 and 1. The formula for NDCG at position p is:

NDCG@p = \frac{DCG@p}{IDCG@p}

where

DCG@p = \sum_{i=1}^{p} \frac{rel_i}{\log_2(1+i)},

rel_i is the relevance score of the document at position i, and IDCG@p is the DCG of the ideal ranking. Additional metrics include the F1-score, which harmonically combines precision and recall to balance the two, and recall at k documents (Recall@k), a coverage measure assessing the proportion of relevant documents retrieved in the top k results. The F1-score is computed as:

F1 = \frac{2 \times (Precision \times Recall)}{Precision + Recall}. $$ These complement MAP and NDCG by focusing on [harmonic balance](/page/Harmonic_balance) or partial recall, respectively. In query expansion-specific assessments, standardized test collections such as the Text REtrieval Conference (TREC) datasets and the ClueWeb collection provide benchmarks for comparing expanded queries against unexpanded baselines.[](https://trec.nist.gov/) Evaluations typically involve running retrieval systems on these collections and measuring improvements in metrics like [MAP](/page/Map).[](https://trec.nist.gov/) For instance, studies from the [2010s](/page/2010s) on TREC datasets, including biomedical and general domains, reported query expansion techniques yielding 5-15% gains in MAP over baselines, particularly in domain-specific searches where vocabulary mismatches are common. Similarly, NLP-based expansions on the 2018 TREC Common Core track improved [MAP](/page/Map) by approximately 17% over the median submission.[](https://trec.nist.gov/pubs/trec27/papers/JARIR-CC.pdf) ## Query Expansion Techniques ### Global Analysis Methods Global analysis methods in query expansion leverage the entire document corpus or external lexical resources to identify and incorporate related terms, enabling broad semantic enhancement without dependence on specific query results. These techniques precompute term relationships across the collection, facilitating consistent expansion for diverse queries and mitigating vocabulary mismatches through statistical or structural associations. Originating from early efforts in automatic [thesaurus](/page/thesaurus) construction in the [1970s](/page/1970s), such methods provide a foundation for domain-general improvements in retrieval performance. Thesaurus-based expansion employs controlled vocabularies to augment queries with synonyms, hypernyms, or hyponyms, drawing from resources like [WordNet](/page/WordNet), a lexical database organizing words into synsets based on semantic relations. For instance, expanding the query term "[car](/page/Car)" might include "[vehicle](/page/Vehicle)" and "automobile" to capture broader or equivalent meanings, thereby enhancing recall in heterogeneous collections. This approach, evaluated on large-scale testbeds like TREC, demonstrates modest gains in precision when relations are selectively applied, though over-expansion can introduce noise.[](https://www.iro.umontreal.ca/~nie/IFT6255/voorhees-94.pdf)[](https://link.springer.com/chapter/10.1007/978-1-4471-2099-5_7) Corpus statistics methods analyze term co-occurrences across the full document collection to derive associations, often using measures like [mutual information](/page/Mutual_information) to quantify dependency between terms. [Mutual information](/page/Mutual_information) is computed as $ \text{[MI](/page/MI)}(x;y) = \log \frac{P(x,y)}{P(x)P(y)} $, where $ P(x,y) $ is the joint probability of terms $ x $ and $ y $ appearing together, and $ P(x) $, $ P(y) $ are their marginal probabilities; high [MI](/page/MI) values indicate strong relatedness, guiding the selection of expansion candidates from global patterns. Early experiments in statistical [thesaurus](/page/Thesaurus) construction validated this by clustering keywords with Dice coefficients, achieving substantial recall improvements on small corpora, though scalability limits its use in massive datasets without approximations.[](https://dl.acm.org/doi/10.1145/133160.133180) Distributional methods, such as Latent Semantic Indexing (LSI), apply singular value decomposition (SVD) to a term-by-document matrix to reveal latent semantic structures, mapping terms and documents into a lower-dimensional space that captures implicit similarities. The matrix $ A $ (terms × documents) is decomposed as $ A = U \Sigma V^T $, where $ U $ and $ V $ are orthogonal matrices, and $ \Sigma $ contains singular values; truncating to the top $ k $ dimensions (e.g., 100) approximates the space, allowing queries to match documents via cosine similarity even without exact term overlap. In practice, LSI expands queries by projecting them into this space as pseudo-documents, reported improvements, such as 13% in precision on medical document retrieval tasks.[](http://wordvec.colorado.edu/papers/Deerwester_1990.pdf) These methods offer domain independence by relying on intrinsic corpus properties or general-purpose resources, avoiding the need for query-specific tuning and enabling application across varied fields. For example, in 1980s library systems, the [Medical Subject Headings](/page/Medical_Subject_Headings) (MeSH) thesaurus was integrated into biomedical retrieval like [MEDLINE](/page/MEDLINE) to expand medical queries with hierarchical terms, improving access to indexed literature without domain retraining.[](https://pmc.ncbi.nlm.nih.gov/articles/PMC3958669/) ### Local Analysis Methods Local analysis methods in query expansion focus on dynamically refining user queries by leveraging information from the initial retrieval results or direct user interactions, thereby adapting expansions to the specific context of the search rather than relying on static corpus-wide statistics. These techniques aim to address [vocabulary](/page/Vocabulary) mismatches and improve retrieval [relevance](/page/Relevance) by incorporating [feedback](/page/Feedback) that is local to the query's performance, often leading to more precise term additions compared to broader approaches. Unlike global methods that build term associations from the entire document collection, local methods prioritize efficiency and context-specificity for real-time application in [information retrieval](/page/Information_retrieval) systems.[](https://nlp.stanford.edu/IR-book/pdf/09expand.pdf) Relevance feedback represents a foundational local [analysis](/page/Analysis) technique where users explicitly indicate relevant and non-relevant documents from an initial search, enabling the system to iteratively expand the query. The seminal [Rocchio algorithm](/page/Rocchio_algorithm) formalizes this process within the [vector space model](/page/Vector_space_model) by updating the query vector to better align with user judgments. The updated query is computed as:

Q_{\text{new}} = \alpha Q_{\text{old}} + \beta \left( \text{avg relevant docs} \right) - \gamma \left( \text{avg non-relevant docs} \right)

where $\alpha$, $\beta$, and $\gamma$ are weighting parameters that balance the original query's influence against the positive and negative feedback vectors, typically derived as the centroid of relevant and non-relevant document vectors, respectively.[](https://nlp.stanford.edu/IR-book/pdf/09expand.pdf)[](https://sigir.org/files/museum/pub-08/XXIII-1.pdf) This method enhances both precision and recall by pulling the query toward relevant content while repelling it from irrelevant material, with optimal parameter values often tuned empirically (e.g., $\alpha = 1$, $\beta = 0.75$, $\gamma = 0.15$) to maximize retrieval effectiveness.[](https://nlp.stanford.edu/IR-book/pdf/09expand.pdf) Pseudo-relevance feedback (PRF), an automated variant of [relevance feedback](/page/Relevance_feedback), assumes the top-k documents retrieved by the initial query (typically k=10-20) are relevant, without requiring user input, and extracts candidate expansion terms from them to reformulate the query. This blind feedback approach selects terms based on their statistical significance, such as term frequency-inverse document frequency (TF-IDF) scores within the pseudo-relevant set, and integrates them into the expanded query to mitigate lexical gaps. For instance, in the [Okapi BM25](/page/Okapi_BM25) ranking model, PRF expands the query by adding the highest-weighted terms from the top documents, often boosting retrieval performance by 10-20% in mean average precision on standard benchmarks.[](https://yunus.hacettepe.edu.tr/~tonta/courses/spring2008/bby703/Salton.pdf)[](https://www.semanticscholar.org/paper/Improving-retrieval-performance-by-relevance-Salton-Buckley/2ebb3dd597bbd7028d8c68bcf509e5bb09ea1e78) Early implementations, such as those in the [SMART system](/page/Smart_system), demonstrated PRF's efficacy for handling short queries by automatically enriching them with contextually proximate terms.[](https://trec.nist.gov/pubs/trec3/papers/overview.pdf) Query log analysis extends local adaptation by mining historical user sessions from search engine logs to identify co-occurring terms across similar queries, tailoring expansions to patterns observed in past interactions. In this method, session co-occurrences—terms appearing together in the same user session—are used to infer semantic relationships, allowing the system to suggest or add related terms without relying solely on current retrieval results. For example, a query like "apple fruit" might be expanded with "nutrition" or "recipes" if logs show frequent co-occurrences in sessions involving produce-related searches, thereby capturing [user intent](/page/User_intent) more accurately than isolated term matching.[](https://www.colips.org/journals/volume19/19_1_-1-Zhu-KunPeng.pdf) This technique leverages probabilistic models, such as [mutual information](/page/Mutual_information) between query terms in logs, to rank expansion candidates and has been shown to improve query reformulation in web search environments by incorporating real-world usage data.[](https://www.researchgate.net/publication/228781337_Probabilistic_query_expansion_using_query_logs) In practice, local analysis methods follow a structured [implementation](/page/Implementation): an [initial](/page/Initial) query retrieves [top](/page/Top) documents, from which terms are extracted (e.g., via TF-IDF or log-based [co-occurrence](/page/Co-occurrence) scoring), selected for [expansion](/page/Expansion) (often 5-10 terms), and the reformulated query is re-submitted for ranking. This process may involve re-weighting original terms or re-ranking results to integrate feedback, ensuring computational efficiency suitable for interactive systems. Empirical evaluations from [1990s](/page/1990s) Text REtrieval Conference (TREC) tests, such as those using automatic [expansion](/page/Expansion) in the [SMART](/page/SMART) and INQUERY systems, reported [recall](/page/Recall) improvements of 20-30% on ad-hoc retrieval tasks by capturing additional relevant documents through targeted term additions.[](https://trec.nist.gov/pubs/trec3/papers/overview.pdf) These gains highlight the methods' impact on scaling retrieval to large corpora while maintaining user-specific [relevance](/page/Relevance). ### Semantic and Learning-Based Methods Semantic and learning-based methods for query expansion leverage structured [knowledge](/page/Knowledge) representations and [machine learning](/page/Machine_learning) models to capture deeper semantic relationships between terms, moving beyond surface-level statistical associations. These approaches aim to address vocabulary mismatches by incorporating external [knowledge](/page/Knowledge) sources or learned representations that encode contextual and relational meanings, thereby improving retrieval [relevance](/page/Relevance) in diverse domains such as web search and specialized [information retrieval](/page/Information_retrieval) systems.[](https://www.semanticscholar.org/paper/An-analysis-of-ontology-based-query-expansion-Velardi-Navigli/f1d4b707cccbaf2fbba7ef960e0dc32e5edaad16) Ontology-based expansion utilizes knowledge graphs to identify and incorporate semantically related entities into queries, enhancing precision by exploiting predefined hierarchical and relational structures. For instance, systems employing DBpedia can expand a query term like "[Paris](/page/Paris)" by adding related entities such as "[France](/page/France)" (as a country relation) or "[Eiffel Tower](/page/Eiffel_Tower)" (as a landmark association) through traversal of ontological links, which helps disambiguate and enrich the query context. This method has been shown to outperform traditional expansions in domain-specific retrieval, with hybrid integrations yielding improvements in precision at 20 documents retrieved on TREC benchmarks. Seminal analyses indicate that while simple synonym or hypernym expansions have limited impact, broader semantic derivations from ontologies significantly boost web search performance by aligning queries with conceptual hierarchies.[](https://dl.acm.org/doi/10.1145/3333165.3333184)[](https://www.semanticscholar.org/paper/An-analysis-of-ontology-based-query-expansion-Velardi-Navigli/f1d4b707cccbaf2fbba7ef960e0dc32e5edaad16) Word embeddings provide dense vector representations of terms that capture semantic similarities, enabling expansion by identifying nearest neighbors in the embedding space. Models like [Word2Vec](/page/Word2vec) generate these vectors through unsupervised training on large corpora, allowing queries to be augmented with contextually similar terms via metrics such as [cosine similarity](/page/Cosine_similarity), defined as:

\cos(\theta) = \frac{A \cdot B}{|A| |B|}

References

[1]
https://doi.org/10.1016/j.ipm.2019.05.009
[2]
[2509.07794] Query Expansion in the Age of Pre-trained and Large ...
Sep 9, 2025 · Modern information retrieval (IR) must reconcile short, ambiguous queries with increasingly diverse and dynamic corpora. Query expansion (QE) ...<|control11|><|separator|>
[3]
[PDF] Introduction to Information Retrieval - Stanford University
Aug 1, 2006 · ... Manning, Prabhakar Raghavan & Hinrich Schütze. Printed on April 1 ... query expansion. 177. 10 XML retrieval. 195. 11 Probabilistic ...
[4]
Study of Query Expansion Techniques and Their Application in the ...
This paper analyzes different techniques of document preprocessing and query expansion in order to know which ones offer better results when applied to query ...
[5]
Leveraging semantic resources in diversified query expansion
Jun 5, 2017 · The top Bo1 words for our example query jaguar included words such as panthera (relating to animal), cars, racing, atari (video game) and ...
[6]
On Relevance, Probabilistic Indexing and Information Retrieval
This paper reports on a novel technique for literature indexing and searching in a mechanized library system.
[7]
Query expansion techniques for information retrieval: A survey
Query expansion has a long history in literature. It was first applied by Maron and Kuhns (1960) as a technique for literature indexing and searching in a ...
[8]
[PDF] Relevance feedback and query expansion - Stanford NLP Group
The Rocchio Algorithm is the classic algorithm for implementing relevance feedback. It models a way of incorporating relevance feedback information into the ...
[9]
[PDF] Thesauri: Introduction and Recent Developments - Books
In the 1970s and early 1980s, commercial online database providers such as Dialog made use of thesauri alongside their biblio- graphic databases to enhance the ...
[10]
Medical Subject Headings Used to Search the Biomedical Literature
The list—called Medical Subject Headings, or MeSH—consists of single- and multi-word terms that are used to index and catalog the medical literature. This paper ...
[11]
[PDF] Automatic Query Expansion Using SMART : TREC 3 - Amit Singhal
SMART Version 11 is the latest in a long line of experimental information retrieval systems, dating back over 30 years, developed under the guidance of G.
[12]
Query expansion techniques for information retrieval: A survey
Query Expansion (QE) plays a crucial role in improving searches on the Internet. Here, the user's initial query is reformulated by adding additional meaningful ...
[13]
None
Below is a merged response that consolidates all the information from the provided summaries into a single, comprehensive format. To retain as much detail as possible, I will use a table in CSV format for the core definitions and details (Precision and Recall), followed by a section for URLs and additional notes. This approach ensures clarity, density, and completeness while avoiding redundancy.
[14]
[PDF] Query Expansion/Reduction and its Impact on Retrieval Effectiveness
Query expansion should help improve information retrieval effectiveness. Reported stud- ies [1-5] using the TREC data generally support this position, ...
[15]
[PDF] Evaluation in information retrieval - Stanford NLP Group
8.4 Evaluation of ranked retrieval results plotted to give a precision-recall curve, such as the one shown in Figure 8.2.
[16]
https://trec.nist.gov/
[17]
[PDF] Query Expansion Based on NLP and Word Embeddings
Query Expansion is an important process in information retrieval, which consists in adding new related terms to the original query in order to better.Missing: definition seminal
[18]
[PDF] @erY ExPansion using Lexical-semantic Relations
Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in Word Net. Experimental results show this query ...
[19]
Query Expansion using Lexical-Semantic Relations - SpringerLink
This paper examines the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets.
[20]
Experiments in automatic statistical thesaurus construction
This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri.
[21]
[PDF] Indexing by Latent Semantic Analysis Scott Deerwester Graduate ...
The particular "latent semantic indexing" (LSI) analysis that we have tried uses singular-value decomposition. We take a large matrix of term-document ...
[22]
[PDF] sigir - xxiii. relevance feedback in information retrieval
The above represents the basic relation for request modification using relevance feedback* This relation can be modified in various ways by imposing additional ...Missing: citation | Show results with:citation
[23]
[PDF] Improving retrieval performance by relevance feedback
Relevance feedback is an automatic process, introduced over 20 years ago, designed to produce improved query formulations following an initial retrieval ...
[24]
[PDF] Improving retrieval performance by relevance feedback
Relevance feedback is an automatic process, introduced over 20 years ago, designed to produce query formulations following an initial retrieval operation to ...
[25]
[PDF] Overview of the Third Text REtrieval Conference (TREC-3)
In November of 1992 the first Text REtrieval Conference (TREC-1) was held at NIST [Harman 1993]. The confer- ence, co-sponsored by ARPA and NIST, brought ...
[26]
[PDF] A new query expansion method based on query logs mining1
The central idea of our method is that, for the same query, the clicked documents in the same session should be related with each other and similar in content, ...
[27]
(PDF) Probabilistic query expansion using query logs - ResearchGate
The central idea of this method is to construct the probabilistic correlations between query terms and document terms through mining the query logs. Then the ...
[28]
[PDF] An analysis of ontology-based query expansion strategies
It is shown that expanding with synonyms or hyperonyms has a limited effect on web information retrieval performance, while other types of semantic ...Missing: seminal | Show results with:seminal<|separator|>
[29]
Query Expansion Using DBpedia and WordNet - ACM Digital Library
This paper presents a semantic approach based on DBpedia and WordNet to extract the candidate terms for query expansion. We divided DBpedia features into: ...
[30]
[1606.07608] Using Word Embeddings for Automatic Query Expansion
Jun 24, 2016 · In this paper a framework for Automatic Query Expansion (AQE) is proposed using distributed neural language model word2vec.Missing: BERT | Show results with:BERT
[31]
BERT-QE: Contextualized Query Expansion for Document Re-ranking
This paper proposes a novel query expansion model that leverages the strength of the BERT model to select relevant document chunks for expansion.
[32]
Rare Query Expansion Through Generative Adversarial Networks in ...
We explore using GAN to generate bid keywords directly from query in sponsored search ads selection, especially for rare queries. Specifically, in the query ...
[33]
Modified Query Expansion Through Generative Adversarial ... - arXiv
Dec 30, 2022 · This work addresses an alternative approach for query expansion (QE) using a generative adversarial network (GAN) to enhance the effectiveness of information ...
[34]
Query Expansion Using Word Embedding, Ontology and Natural ...
This paper proposes a hybrid approach for query expansion by combining NLP and ontology through word embedding.
[35]
[PDF] CL-QR: Cross-Lingual Enhanced Query Reformulation for Multi ...
Dec 10, 2023 · Cross-Lingual Knowledge Transferring: Pre- trained multilingual language models like mBERT ... Query expansion and entity weighting for query ...
[36]
Use Query enhancement to help users find relevant results
By setting up Query enhancement (synonyms), you can expand your users' queries with variations of a search term. For example, you can define "assassin" as a ...
[37]
Towards More Intelligent Search: Deep Learning for Query ...
May 1, 2018 · Deep learning is helping to make Bing's search results more intelligent. Here's a peek behind the curtain of how we're applying some of these techniques.
[38]
[PDF] Towards Scalability and Extensibility of Query Reformulation ...
This approach has been adopted in E-commerce search to improve recall of long-tail queries [24]. B) use LLMs for training data augmentation. [30], e.g., ...
[39]
[PDF] Retrieval-Augmented Generation for Large Language Models - arXiv
Mar 27, 2024 · By employing prompt engineering to expand queries via LLMs, these queries can then be executed in parallel. ... https://www.deepset.ai/blog/rag- ...
[40]
[PDF] Exploring the Best Practices of Query Expansion with Large ...
Nov 12, 2024 · This approach leverages LLMs to generate multiple pseudo- references, which are then integrated with the original queries to enhance both sparse ...
[41]
Query expansion using the UMLS Metathesaurus - PubMed
As an alternative method of query expansion, we propose the use of the MetaMap program for associating UMLS Metathesaurus concepts with the original query. Our ...Missing: IR ontology
[42]
[PDF] Legal Query Expansion using Ontologies and Relevance Feedback
Abstract. The aim of our research is the improvement of Boolean search with query expansion using lexical ontologies and user feedback.
[43]
Query Expansion Strategy based on Pseudo Relevance Feedback ...
Feb 18, 2015 · Abstract:Query Expansion using Pseudo Relevance Feedback is a useful and a popular technique for reformulating the query.Missing: seminal | Show results with:seminal
[44]
A Survey of Automatic Query Expansion in Information Retrieval
The following questions are addressed. Why is query expansion so important to improve search effectiveness? What are the main steps involved in the design and ...
[45]
A Survey of Automatic Query Expansion in Information Retrieval
Aug 10, 2025 · ... query expansion (AQE) has a long history in. information retrieval (IR), as it has been suggested as early as 1960 by Maron and. Kuhns [1960] ...
[46]
[PDF] Multi-Modal Query Expansion Based On Local Analysis For Medical ...
Abstract. A unified medical image retrieval framework integrating vi- sual and text keywords using a novel multi-modal query expansion (QE) is presented.
[47]
Exploring the Best Practices of Query Expansion with Large ... - arXiv
Jan 12, 2024 · In this paper, we thoroughly explore the best practice of leveraging LLMs for query expansion. To this end, we introduce a training-free, straightforward yet ...
[48]
Federated Learning and Privacy - Communications of the ACM
Apr 1, 2022 · This article provides a brief introduction to key concepts in federated learning and analytics with an emphasis on how privacy technologies may be combined in ...
[49]
Semantic approaches for query expansion: taxonomy, challenges ...
Mar 5, 2025 · Further, it discusses challenges that are inherent in the semantic query expansion field and identifies some future research directions.