Fact-checked by Grok 2 weeks ago

Content-based image retrieval

Content-based image retrieval (CBIR) is a computer vision paradigm that enables the automatic indexing and retrieval of images from large databases by analyzing and matching their inherent visual features—such as color distributions, texture patterns, shape contours, and spatial arrangements—directly from pixel data, independent of accompanying textual descriptions or metadata tags.^[1] This approach contrasts with traditional text-based retrieval, which depends on manual annotations prone to subjectivity and incompleteness.^[2] CBIR systems generally operate through three core stages: feature extraction to derive low-level descriptors from query and database images, indexing to organize features for efficient storage and querying, and similarity computation using distance metrics like Euclidean or Manhattan to rank and return the most relevant matches.^[3] Pioneered in the early 1990s amid the explosion of digital imagery, seminal prototypes such as IBM's Query By Image Content (QBIC) and MIT's Photobook introduced practical implementations focused on color and shape primitives, laying foundational benchmarks for multimedia search applications in domains like medical imaging and e-commerce.^[2] Key advancements have integrated machine learning, particularly convolutional neural networks (CNNs), to learn hierarchical representations that partially mitigate the semantic gap—the disconnect between computable visual primitives and human-interpretable concepts—yielding superior retrieval precision on benchmarks like CIFAR-10 and ImageNet subsets.^[1] Nonetheless, persistent challenges encompass scalability to petabyte-scale repositories, robustness against viewpoint variations and occlusions, and the computational overhead of deep feature embeddings, underscoring ongoing research toward hybrid systems blending content analysis with semantic embeddings.^[4]^[5]

Fundamentals

Definition and Core Principles

![Principe_cbir.png][float-right] Content-based image retrieval (CBIR) is a computational method for searching and retrieving images from large databases by analyzing the visual content of the images, including attributes such as color, texture, shape, and spatial relationships, rather than depending on manually annotated textual metadata.^[6] This approach enables automated indexing and querying based on intrinsic image properties, addressing limitations of text-based systems where annotations may be incomplete or subjective.^[7] CBIR systems typically process a user-provided query image by extracting quantifiable descriptors from its pixels to match against pre-indexed features in the database. At its core, CBIR operates through a pipeline of feature extraction, representation, and similarity matching. Feature extraction involves deriving low-level descriptors—such as color histograms for dominant hues, texture filters like Gabor wavelets for pattern analysis, or edge detectors for shape outlines—from image regions or the entire visual field.^[6] These descriptors are encoded into multidimensional vectors that capture perceptual content, forming the basis for database indexing via structures like inverted files or metric trees to facilitate efficient searches.^[8] Similarity evaluation constitutes a foundational principle, employing distance metrics to quantify resemblance between query and database feature vectors; common measures include Euclidean distance for vector proximity, Manhattan distance for feature-wise summation, and cosine similarity for angular alignment in high-dimensional spaces.^[5] Retrieval rankings are generated by sorting images according to these scores, often refined through user relevance feedback to iteratively adjust feature weights or expand the query model, mitigating the semantic gap between machine-extracted primitives and human semantic understanding.^[9] This feedback loop underscores CBIR's adaptive nature, prioritizing empirical relevance over static annotations.^[10]

Comparison to Metadata and Text-Based Search

Content-based image retrieval (CBIR) differs fundamentally from metadata-based and text-based search methods, which rely on human-generated annotations such as keywords, tags, captions, or embedded data like EXIF fields (e.g., timestamps, geolocation).^[11] Text-based search indexes images via textual descriptions, enabling keyword queries but requiring extensive manual labeling that is labor-intensive, subjective, and prone to inconsistencies across annotators.^[11] Metadata-based approaches supplement this with automated or semi-automated tags from file properties, yet they still depend on prior human input or device-generated data, limiting applicability to unannotated or legacy image collections.^[12] In contrast, CBIR extracts and compares low-level visual features—such as color histograms, texture patterns, and edge shapes—directly from pixel data, bypassing the need for textual or metadata annotations.^[13] This enables retrieval through query-by-example, where a sample image serves as input, and similarity is computed via distance metrics like Euclidean or Manhattan on feature vectors.^[14] As a result, CBIR scales better to massive, untagged databases, such as those in web-scale archives, where manual annotation would be infeasible due to the exponential growth of digital imagery—estimated at over 3.2 billion photos uploaded daily to social media as of 2023.^[15] CBIR addresses key limitations of text and metadata methods, including the "annotation bottleneck," where human tagging fails to capture nuanced visual similarities or exhaustively describe content, leading to recall gaps (e.g., text searches missing visually similar but differently labeled images).^[11] It provides objectivity in feature extraction, reducing annotator bias, and supports applications like medical imaging diagnostics, where visual patterns (e.g., tumor shapes) outweigh descriptive text.^[16] However, text-based searches excel in semantic interpretability, allowing natural language queries (e.g., "red sports car"), whereas CBIR's reliance on visual queries can frustrate users without suitable examples.^[17] Despite these strengths, CBIR suffers from the semantic gap: low-level features often fail to align with high-level human concepts, yielding irrelevant results for abstract queries (e.g., "crowded urban scene" based on color alone).^[18] Computational demands are higher, with feature extraction and indexing requiring significant processing power compared to lightweight text matching, though optimizations like indexing with KD-trees mitigate this.^[14] Metadata methods, while annotation-dependent, integrate easily with relational databases for hybrid filtering, as demonstrated in systems re-ranking text results via CBIR to boost precision by up to 20-30% in benchmarks like Corel datasets.^[17]^[19]

Aspect	Text/Metadata-Based Search	Content-Based Image Retrieval (CBIR)
Dependency	Requires manual or semi-automated annotations	Relies solely on image pixels, no annotations needed
Query Type	Keywords, natural language	Query-by-example image
Scalability	Limited by annotation effort for large datasets	Handles unannotated corpora efficiently
Semantic Handling	Strong if annotated well, but subjective	Weak due to low-level features (semantic gap)
Precision/Recall	High recall for exact matches, low for visuals	Better visual similarity, but potential irrelevance
Computational Cost	Low (string matching)	High (feature extraction, similarity computation)

Hybrid systems increasingly fuse these paradigms, using text for initial coarse filtering followed by CBIR refinement, as in web search engines that achieve improved F1-scores over pure methods.^[12] This reflects causal trade-offs: while text/metadata enable quick, intent-driven searches, CBIR's direct content analysis better preserves visual fidelity in annotation-scarce domains.^[20]

Historical Development

Early Foundations and Initial Systems (1980s-1990s)

The foundations of content-based image retrieval (CBIR) in the 1980s emerged from computer vision research emphasizing low-level feature extraction for image querying, predating dedicated CBIR terminology. Early efforts included the Query by Pictorial Example (QPE) system by Chang and Fu in 1981, which pioneered user queries via sample images to match pictorial content in databases, laying groundwork for visual similarity search without relying on textual metadata.^[9] Systems like GRIM_DBMS, developed by Rabbitti and Stanchev in 1989, extended this by incorporating SQL-like queries for graphical elements in domain-specific images such as line drawings or floor plans, using primitive shape and structure analysis.^[9] These approaches focused on global descriptors—color via histograms, texture through statistical measures like contrast and coarseness (building on Tamura et al.'s 1978 features), and basic shape invariants—processed with Euclidean or histogram intersection distances for matching.^[9]^[6] The 1990s marked the formalization of CBIR with integrated prototype systems addressing scalable retrieval from growing digital image collections. Hirata and Kato's 1992 Query by Visual Example introduced the explicit term "content-based retrieval," demonstrating sketch-based queries matching color and shape in full-color databases via feature vectors.^[9] A pivotal advancement was Swain and Ballard's 1991 color indexing method, which used histogram intersection on quantized RGB color distributions to achieve efficient object-level matching, reducing computational demands for large-scale searches.^[9] These techniques highlighted causal challenges in feature representation, where perceptual invariance (e.g., to rotation or scaling) required robust invariants like moments, yet high-dimensional feature spaces often led to the curse of dimensionality in similarity computations.^[6] Initial CBIR prototypes in the mid-1990s demonstrated practical viability. IBM's Query By Image Content (QBIC) system, prototyped from 1993 and publicly detailed in 1995 by Flickner et al., was the first commercial effort, enabling queries via color palettes (histograms and moments), texture (Tamura-derived filters for coarseness, contrast, directionality), and shape (keypoint moments and curves), indexed with Karhunen-Loève transform and R*-trees for sublinear search times on databases of thousands of images.^[21]^[6] Concurrently, the Photobook system from MIT's Media Lab, developed by Pentland, Picard, and Sclaroff around 1994–1996, supported interactive browsing through eigen-deformable templates for shape, wavelet coefficients for texture, and principal components for facial features, emphasizing user-driven refinement over rigid metrics.^[6] These systems prioritized query-by-example interfaces but revealed empirical limitations: low-level features captured visual similarity empirically (e.g., 70–80% precision in controlled tests) yet failed on semantic intent due to the "semantic gap," where causal links between pixels and concepts like "landscape" were absent without higher abstraction.^[6] An NSF workshop in 1992 underscored this, advocating multidisciplinary progress in feature engineering and indexing to handle exponential image growth from digital cameras.^[9]

Key Milestones in Classical CBIR (1990s-2000s)

The classical era of content-based image retrieval (CBIR) in the 1990s and 2000s focused on hand-crafted low-level features like color distributions, texture patterns, and geometric shapes, extracted via techniques such as histograms, wavelet transforms, and edge detection, to enable similarity-based querying without relying on textual annotations.^[6] Early systems grappled with the semantic gap—the disconnect between pixel-level descriptors and human-interpretable concepts—often achieving retrieval accuracies below 50% on diverse datasets due to limitations in feature invariance to scale, rotation, and occlusion. A foundational milestone was the Photobook system, developed at MIT's Media Laboratory and presented in 1994, which pioneered content-based manipulation using principal components for shape matching, wavelets for texture, and moment invariants for 2D/3D queries, supporting interactive browsing of databases with thousands of images.^[9] Building on this, IBM's Query By Image Content (QBIC) system, detailed in a 1995 publication, introduced primitive queries for color similarity via averaged RGB histograms, texture via Tamura features, and shape via global invariants, while employing R-trees for efficient multidimensional indexing and achieving sub-second query times on collections of up to 10,000 images; it marked the first commercial deployment as Ultimedia Manager in 1994.^[22]^[23] In 1996, Virage's Image Search Engine provided a modular commercial framework for integrating custom primitives like color correlograms and texture co-occurrence matrices, emphasizing scalability for web-scale indexing and extensible APIs that influenced subsequent enterprise solutions.^[24] By 1999, UC Berkeley's Blobworld advanced region-based paradigms through expectation-maximization clustering on joint color-texture distributions in YUV space, segmenting images into object-like "blobs" for partial matching, which improved retrieval precision by 20-30% over global features on Corel datasets by better capturing spatial coherence.^[25] The 2000s saw refinements like relevance feedback integration in QBIC extensions, where user iterations adjusted feature weights via quadratic optimization, boosting mean average precision by up to 15% in iterative queries, and the influence of MPEG-7 standards (finalized 2001) for descriptor interoperability, though adoption remained limited due to computational overheads exceeding 1 GB per 1,000-image index.^[6] These milestones underscored persistent challenges in feature dimensionality (often 100+ per image) and robustness, setting the stage for hybrid approaches before deep learning dominance.^[26]

Transition to Learning-Based Methods (2010s Onward)

The limitations of classical CBIR systems, which relied on hand-crafted low-level features such as color histograms and texture descriptors, became increasingly evident in the early 2010s due to their inability to bridge the semantic gap—the discrepancy between pixel-level representations and human-interpretable concepts.^[27] This gap resulted in poor retrieval performance for semantically similar but visually diverse images, prompting a paradigm shift toward data-driven feature learning.^[28] Deep learning methods, particularly convolutional neural networks (CNNs), emerged as a solution by automatically extracting hierarchical features that capture both local patterns and global semantics from vast labeled datasets.^[29] A pivotal milestone occurred in 2012 with the introduction of AlexNet, a deep CNN that achieved a top-5 error rate of 15.3% on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), dramatically outperforming prior shallow models and traditional hand-engineered features. AlexNet's success, enabled by innovations like ReLU activations, dropout regularization, and GPU-accelerated training on 1.2 million images, demonstrated CNNs' capacity for learning robust, generalizable representations transferable beyond classification.^[30] In CBIR, this led to the adoption of off-the-shelf CNN features—activations from fully connected or convolutional layers—as compact descriptors for similarity matching, replacing bag-of-visual-words models based on sparse local features like SIFT.^[31] By 2015, comprehensive studies validated CNN-based approaches, showing improvements in mean average precision (mAP) on benchmarks like Holidays and UKBench, where deep features yielded up to 20-30% gains over classical methods due to their invariance to affine transformations and better semantic alignment.^[32] Early adaptations included fine-tuning pre-trained models on retrieval-specific tasks and aggregating convolutional activations into global descriptors, such as vector of locally aggregated descriptors (VLAD) fused with CNN outputs, to handle large-scale databases efficiently.^[29] This transition marked CBIR's evolution from rule-based engineering to end-to-end learning, setting the stage for subsequent architectures like VGG and ResNet that further enhanced retrieval accuracy through deeper hierarchies and residual connections.^[33]

Feature Extraction Techniques

Low-Level Visual Descriptors: Color, Texture, and Shape

Low-level visual descriptors in content-based image retrieval (CBIR) extract primitive image properties directly from pixel intensities, providing foundational representations for similarity matching without relying on semantic interpretation.^[34] These features—color, texture, and shape—enable efficient indexing and querying by quantifying global or local patterns, though they often suffer from sensitivity to variations in lighting, viewpoint, or occlusion.^[35] Early CBIR systems, such as IBM's Query By Image Content (QBIC) developed in the 1990s, integrated these descriptors to support database searches based on visual content rather than textual annotations.^[36] Color descriptors capture the distribution of hues across an image, typically via histograms that bin pixel values in color spaces like RGB or HSV.^[37] A standard color histogram represents the frequency of each color bin, offering rotation- and scale-invariance but ignoring spatial relationships, which can lead to mismatches between perceptually similar images with differing layouts.^[37] To mitigate this, color correlograms extend histograms by modeling the probability of spatially proximate pixels sharing colors, improving retrieval precision in textured scenes.^[37] Color moments, computing mean, variance, and skewness per channel, provide compact statistical summaries robust to quantization noise, as demonstrated in systems achieving up to 20-30% precision gains over raw histograms in benchmark datasets like COREL.^[38] Texture descriptors analyze spatial repetitions and patterns, often using filter banks to decompose images into frequency and orientation components.^[39] Gabor wavelet filters, modeling texture as modulated Gaussian waves, extract multi-scale features by convolving images with banks tuned to 4-5 scales and 8 orientations, capturing energy at dominant frequencies for rotation-invariant matching.^[39] Wavelet transforms, such as discrete wavelet decomposition into approximation and detail subbands, quantify coarseness and directionality, outperforming Fourier methods in retrieval tasks by preserving locality, with reported accuracy improvements of 10-15% on Brodatz texture databases.^[40] Tamura features formalize human-perceived attributes like contrast (standard deviation of intensities) and periodicity (autocorrelation peaks), enabling perceptual alignment in CBIR.^[41] Shape descriptors represent geometric structure through boundaries or filled regions, emphasizing invariance to affine transformations.^[42] Edge-based methods, such as Canny detection followed by histogram binning of orientations, encode contour directions for global shape similarity, though they falter on fragmented edges in noisy images.^[43] Region moments, including Hu invariants (seven algebraically independent combinations of second and third-order central moments), provide translation-, scale-, and rotation-invariance for blob-like objects, with Zernike moments offering orthogonality for compact, noise-resistant encoding up to order 20-30.^[44] Fourier descriptors parameterize closed contours via discrete cosine or Fourier coefficients, retaining low-frequency shapes for efficient matching, as validated in retrieval experiments yielding recall rates exceeding 70% on silhouette datasets.^[42] These descriptors are often fused with color and texture via weighted Euclidean distances to bridge low-level gaps, though empirical evaluations highlight their limitations in handling partial occlusions or viewpoint changes without higher-level context.^[45]

Emergence of High-Level and Learned Representations

The pursuit of high-level representations in content-based image retrieval (CBIR) arose from the limitations of low-level descriptors, which often failed to capture semantic meaning and resulted in the "semantic gap" between visual content and human interpretation.^[46] Early efforts in the mid-2000s focused on mid- to high-level features, such as object detection, scene categorization, and concept-based indexing, typically derived through supervised classifiers or probabilistic models applied to segmented regions or bags-of-features.^[46] These approaches, exemplified in surveys up to 2007, aimed to incorporate textual annotations or rule-based semantics but were constrained by manual feature engineering and scalability issues in large databases.^[46] The true emergence of learned representations accelerated with the advent of deep learning in the early 2010s, particularly following the 2012 ImageNet Large Scale Visual Recognition Challenge, where AlexNet—a deep convolutional neural network (CNN) with eight layers—achieved a top-5 error rate of 15.3% on over 1.2 million images, demonstrating the efficacy of hierarchical feature learning from raw pixels. In CBIR, this translated to using CNN activations, especially from deeper layers, as compact, high-level embeddings that encode semantic concepts like objects and contexts more effectively than hand-crafted features; for instance, off-the-shelf features from models trained on ImageNet outperformed traditional methods like SIFT on benchmarks such as Holidays and UKBench by margins of 10-20% in mean average precision.^[47] A 2014 framework formalized this by integrating deep learning directly into CBIR pipelines, training CNNs end-to-end for retrieval tasks and showing improved generalization across domains via transfer learning.^[4] Subsequent advancements refined learned representations through techniques like fine-tuning on retrieval-specific losses (e.g., triplet or contrastive losses) to optimize embedding spaces for similarity metrics, enabling instance-level retrieval with sub-linear query times on million-scale datasets.^[29] By the mid-2010s, pre-trained CNNs such as VGG and ResNet became standard for extracting 4096- to 2048-dimensional vectors, bridging low-level pixel patterns to high-level semantics and reducing the semantic gap, as evidenced by state-of-the-art results on Oxford Buildings (mAP > 0.8) compared to pre-deep learning baselines below 0.5.^[48] This shift marked a paradigm from static descriptors to adaptive, data-driven hierarchies, though challenges persisted in domain adaptation and computational demands.^[1]

Retrieval and Querying Methods

Similarity Measures and Distance Metrics

In content-based image retrieval (CBIR), similarity measures quantify the resemblance between a query image's feature vector and those of database images, typically by computing distances in a high-dimensional feature space where lower values indicate greater similarity. These metrics operate on extracted descriptors such as color histograms, texture patterns, or shape contours, enabling ranking of results based on proximity in the embedding space. The selection of a metric influences retrieval accuracy, with Euclidean and Manhattan distances serving as foundational L_p norms, while cosine similarity addresses directional alignment in normalized vectors. Empirical evaluations on benchmark datasets, such as the Wang image database, demonstrate that no single metric universally outperforms others, as efficacy varies with feature dimensionality, distribution, and noise levels.^[49]^[50] The Euclidean distance, formulated as \sqrt{\sum (x_i - y_i)^2} for feature vectors x and y, assumes isotropic feature scaling and penalizes larger deviations quadratically, making it sensitive to outliers but effective for compact, low-dimensional representations like SIFT keypoints or global color moments. In CBIR applications, it has been a default choice in systems processing continuous-valued features, with studies confirming its utility in achieving high precision for texture-based queries on datasets comprising 1,000 images across 10 categories. However, in high-dimensional spaces exceeding 100 dimensions—common in bag-of-visual-words models—its performance degrades due to the curse of dimensionality, where distances concentrate around the mean.^[51] Manhattan distance, or L1 norm, defined as \sum |x_i - y_i|, aggregates absolute differences linearly, offering computational efficiency (O(d) time for d dimensions versus Euclidean's square root) and greater robustness to sparse noise or outliers, as it does not amplify distant points. Comparative analyses in CBIR reveal it often matches or exceeds Euclidean performance on histogram features, particularly in scenarios with non-Gaussian distributions, such as edge orientation vectors, yielding up to 5-10% improvements in recall on standard benchmarks when features include quantized color channels. Its preference in high-dimensional settings stems from empirical observations that L1 norms preserve locality better than L2 in sparse data, as validated in retrieval tasks involving medical or natural images.^[52]^[51] Cosine similarity, computed as \frac{x \cdot y}{\|x\| \|y\|}, measures the angular separation between vectors, invariant to magnitude and thus ideal for features where relative orientations (e.g., in TF-IDF-like visual bags or directional gradients) convey semantic content over absolute scales. In CBIR, it excels with high-dimensional, normalized descriptors like those from convolutional layers, where vector lengths vary due to illumination or scaling artifacts; evaluations on mass spectrometry image libraries reported average radiologist-rated similarities of 5.18 out of 6, outperforming Euclidean (5.32) in subjective relevance for certain query types. Its limitation lies in ignoring magnitude differences, which can conflate dissimilar images with parallel but scaled features, prompting hybrid use with L_p metrics in practice.^[10]^[53] Advanced variants include the Minkowski distance, generalizing L_p norms as \left( \sum |x_i - y_i|^p \right)^{1/p}, tunable for p=1 (Manhattan) or p=2 (Euclidean), and specialized histogram metrics like Canberra (\sum \frac{|x_i - y_i|}{x_i + y_i}) or chi-squared, which weight relative differences for probability distributions. Retrieval studies on diverse corpora, including 10,000-image collections, indicate Canberra's sensitivity to small bins enhances precision for skewed histograms (e.g., color distributions), though at higher computational cost. Overall, metric selection hinges on feature properties: L1/L2 for metric spaces with uniform scaling, cosine for angular relevance, with cross-validation on domain-specific data recommended to mitigate biases in generic assumptions.^[50]^[14]

Query by Example and Direct Matching

Query by example (QBE) in content-based image retrieval (CBIR) systems involves a user submitting a sample image as the query input, from which visual features such as color histograms, texture patterns, or shape descriptors are automatically extracted and used to identify and rank database images exhibiting similar content characteristics.^[54] This approach contrasts with text-based or metadata-driven search by relying solely on intrinsic image properties rather than external annotations, enabling retrieval based on perceptual similarity.^[55] Early implementations, dating to systems like IBM's Query By Image Content (QBIC) in the 1990s, demonstrated QBE's efficacy for tasks such as retrieving images of specific objects or scenes by matching low-level features directly against pre-indexed database representations.^[56] Direct matching constitutes the core retrieval mechanism within QBE frameworks, wherein feature vectors derived from the query image are compared pairwise to those of database images using predefined distance metrics to quantify similarity.^[57] Common metrics include Euclidean distance for measuring vector magnitude differences, Manhattan distance for absolute feature deviations, and Minkowski distances as generalizations thereof, often applied after dimensionality reduction techniques like principal component analysis to mitigate computational overhead.^[58] For instance, in color-based direct matching, L1 or L2 norms on histogram intersections yield retrieval precision rates exceeding 70% on benchmark datasets like Corel-1000 when combined with texture edges, though performance degrades with viewpoint variations or occlusions absent robust invariants.^[59] This method assumes that closer feature proximity correlates with semantic relevance, a causal linkage grounded in the Euclidean geometry of feature spaces but empirically limited by the curse of dimensionality in high-dimensional descriptors.^[34] In practice, direct matching in QBE pipelines processes the query features against an inverted index or exhaustive scan of the database, returning top-k results ranked by ascending distance scores, with thresholds tunable to balance recall and precision.^[60] Systems employing direct matching, such as those using statistical moment invariants for rotation-invariant retrieval, achieve sub-second query times on datasets of 10,000 images via vector quantization, outperforming naive exhaustive searches by orders of magnitude.^[59] However, its rigidity—lacking adaptation to subjective user perceptions—necessitates complementary strategies like multi-query fusion, where aggregated distances from multiple example images refine rankings, as evidenced by improved mean average precision (mAP) scores of up to 15% in hybrid evaluations.^[54] Empirical validations on standard corpora, including Wang's 1,000-image set, confirm direct matching's foundational role, with retrieval accuracies hovering at 50-60% for primitive features alone, underscoring the need for feature orthogonality to capture diverse visual cues.^[61]

Relevance Feedback and User Interaction

Relevance feedback in content-based image retrieval (CBIR) constitutes an interactive mechanism whereby users iteratively refine retrieval results by indicating the relevance of presented images to their query, thereby incorporating human semantic judgments to mitigate discrepancies between low-level visual features and high-level conceptual intent.^[62] This process originated from adaptations of text information retrieval techniques, such as the Rocchio algorithm developed in 1971, which adjusts query vectors by shifting them toward positively labeled examples and away from negatively labeled ones in a feature space.^[63] In CBIR contexts, this translates to modifying query representations—often composed of color, texture, or shape descriptors—based on user-marked relevant and irrelevant images, with empirical studies demonstrating precision improvements of up to 20-30% after 2-3 feedback iterations in controlled datasets like COREL.^[64] Early implementations, such as IBM's Query By Image Content (QBIC) system introduced in 1995, integrated relevance feedback to weight feature contributions dynamically, enabling users to emphasize aspects like color histograms or edge orientations during retrieval refinement.^[65] Classical relevance feedback methods primarily employ query point movement or feature reweighting, where aggregated feedback vectors update similarity metrics, such as Euclidean or Mahalanobis distances, to expand or contract decision boundaries in the feature space.^[66] For instance, binary feedback (relevant/irrelevant labels) can be processed via probabilistic models that estimate relevance posteriors, while multi-grade schemes allow nuanced ratings to fine-tune weights more granularly.^[67] These approaches assume short-term adaptation within a single session, relying on limited user inputs—typically 5-20 images per round—to avoid fatigue, though long-term feedback aggregates data across sessions or users to build profile-based models, enhancing generalization as evidenced by reduced mean average precision variance in benchmarks like Wang's 1,000-image dataset.^[68] User interfaces facilitate this through visual tools, such as draggable sliders for feature importance or bounding boxes for region-specific feedback, promoting causal alignment between perceptual relevance and system outputs.^[69] Advanced techniques leverage machine learning to transcend simple vector adjustments, employing classifiers like support vector machines or Gaussian mixture models trained on feedback samples to learn nonlinear mappings from visual features to semantic relevance.^[70] In contemporary systems, convolutional neural networks pretrained on large corpora (e.g., ImageNet) fine-tune embeddings via feedback loops, achieving superior retrieval accuracy—reportedly 15-25% higher F1-scores—by distilling user-specific semantics into latent representations.^[71] However, efficacy depends on feedback volume and quality; sparse inputs risk overfitting, while biased user judgments can propagate errors, underscoring the need for robust initialization from diverse initial retrievals.^[72] Despite these, relevance feedback remains pivotal for interactive CBIR, empirically bridging the semantic gap in domains like medical imaging, where iterative refinement yields diagnostically pertinent results with fewer queries.^[73]

Modern Advances in CBIR

Deep Learning Foundations: CNNs for Feature Extraction

Convolutional neural networks (CNNs) form the cornerstone of deep learning applications in content-based image retrieval (CBIR), enabling automated extraction of hierarchical visual features that surpass traditional hand-crafted descriptors in capturing semantic content. Unlike earlier methods reliant on manually engineered features such as SIFT or color histograms, CNNs learn representations directly from data through backpropagation, optimizing convolutional filters to detect patterns ranging from edges and textures in initial layers to complex objects in deeper ones.^[54] This learning process involves applying learnable kernels to input images, producing feature maps that encode local invariances, followed by non-linear activations and pooling operations to reduce dimensionality while preserving essential spatial hierarchies.^[74] The breakthrough in CNN-based feature extraction for CBIR traces to the 2012 introduction of AlexNet, which achieved a top-5 error rate of 15.3% on the ImageNet dataset, demonstrating the efficacy of deep architectures trained on large-scale labeled data.^[30] Pre-trained on datasets like ImageNet, these networks provide "off-the-shelf" features transferable to CBIR tasks without domain-specific retraining, where images are passed through the network to obtain activations from intermediate layers—typically the last convolutional or fully connected layers—as compact descriptors. For instance, global average pooling over convolutional activations yields fixed-length vectors suitable for similarity computation, outperforming shallow features in retrieval accuracy on benchmarks like Holidays and UKBench.^[75] ^[76] In practice, CNN feature extraction in CBIR involves fine-tuning or direct use of models like VGG or ResNet, where deeper layers capture high-level semantics essential for bridging the semantic gap between low-level pixels and user intent. Studies from 2015 onward showed that CNN descriptors, when aggregated via methods like VLAD or Fisher vectors, improved mean average precision (mAP) by 20-30% over bag-of-words approaches on instance retrieval datasets.^[77] This shift enabled scalable retrieval by embedding images into Euclidean spaces, where cosine similarity or Euclidean distance metrics facilitate efficient nearest-neighbor searches, though computational demands necessitate dimensionality reduction techniques like PCA.^[78] Empirical evaluations confirm that even loosely supervised CNNs maintain robust performance, attributing success to the networks' ability to generalize across visual domains.^[75]

Transformer and Attention-Based Innovations

The integration of transformer architectures into content-based image retrieval (CBIR) leverages self-attention mechanisms to model global interdependencies across image patches, enabling more robust feature representations than the locality-constrained filters of convolutional neural networks (CNNs). Vision transformers (ViTs), which treat images as sequences of fixed-size patches tokenized into embeddings, apply multi-head self-attention to weigh relationships between distant visual elements, facilitating holistic similarity computations critical for retrieval tasks such as landmark or product matching. This shift, building on foundational transformer successes in natural language processing, addressed CNN shortcomings in capturing non-local spatial correlations, with early adaptations showing improved generalization on diverse datasets.^[79] A pivotal advancement occurred in 2021 with the work of El-Nouby et al., who trained ViTs on large-scale datasets like ImageNet-21k using a metric learning objective that combined contrastive loss with differential entropy regularization to produce compact, discriminative descriptors. Feature extraction involved processing images through the ViT encoder, deriving global representations via the classification token or averaged patch embeddings, followed by evaluation on benchmarks including Stanford Online Products (SOP), In-Shop, CUB-200, Revisited Oxford (ROxford), and Revisited Paris (RParis). Their approach outperformed prior state-of-the-art methods on SOP, In-Shop, and CUB-200 in mean average precision (mAP), while achieving competitive results on ROxford and RParis, particularly for short-vector embeddings and low-resolution queries, demonstrating ViTs' efficacy in instance-level retrieval without heavy reliance on localization augmentations.^[79] Subsequent innovations focused on enhancing ViTs' locality awareness, a known limitation due to uniform attention across patches. In 2022, Song et al. introduced a dual-branch architecture that fused global features from the ViT's classification token with local patch-level information, incorporating multi-layer skip connections to aggregate representations from deeper encoder stages and bolster spatial detail in later layers. This hybrid method surpassed CNN baselines and prior ViT variants on standard instance retrieval benchmarks, establishing transformers' superiority in global representation learning while mitigating computational overhead through efficient aggregation.^[80] Attention mechanisms further refined CBIR by enabling query-sensitive interactions, as in co-attention frameworks that dynamically align query and database image features via cross-attention, reducing extraneous computation in large-scale systems. Variants like the Swin Transformer (2021), with its hierarchical shifted-window self-attention, supported efficient fine-grained retrieval by computing locality within windows and global attention across stages, yielding applications in domain-specific CBIR such as endoscopic image matching with up to 90.5% accuracy. Multi-head self-attention in these models encodes spatial positions and long-distance dependencies, improving hashing-based retrieval by producing semantically richer binary codes for approximate nearest-neighbor search. These developments collectively elevated CBIR performance, with transformers achieving higher mAP on challenging datasets through causal modeling of visual hierarchies rather than inductive biases hardcoded in CNNs.^[80]^[81]

Hybrid Approaches and Foundation Model Integration

Hybrid approaches in content-based image retrieval (CBIR) integrate traditional hand-crafted features, such as color histograms, texture descriptors like local binary patterns, and shape-based representations, with deep learning-based extractions from convolutional neural networks (CNNs) to bridge the semantic gap and enhance retrieval accuracy.^[82] For instance, a 2023 system combines transfer learning from pre-trained CNNs with machine learning classifiers like support vector machines (SVMs) to fuse low-level and high-level features, achieving improved precision on datasets like Corel-1000 by reducing false positives through complementary feature strengths.^[83] This fusion often employs weighted similarity metrics, where Euclidean distances on traditional features are combined with cosine similarities on CNN embeddings, as demonstrated in a 2015 method that outperformed pure traditional or deep-only baselines on large-scale image databases by leveraging the invariance of deep features to affine transformations alongside the interpretability of hand-crafted ones.^[84] Further advancements in hybrid methods incorporate optimization techniques, such as genetic algorithms alongside neural networks, to dynamically select and weight feature subsets, yielding up to 15% higher recall rates in query-by-example scenarios compared to standalone approaches on benchmark datasets like Wang-1000.^[85] These systems address limitations of pure deep learning, like overfitting on small datasets, by incorporating unsupervised clustering of traditional features prior to deep embedding projection, as in a 2021 adaptive hybrid index that supports efficient indexing for million-scale retrievals with sub-linear query times.^[86] Empirical evaluations consistently show hybrids excelling in domains with sparse labeled data, where traditional features provide robust priors that stabilize deep learning training.^[87] The integration of foundation models—large-scale pre-trained vision transformers or multimodal encoders like CLIP—into CBIR pipelines has enabled versatile, off-the-shelf feature extraction that generalizes across domains without task-specific fine-tuning.^[88] In a 2024 study, vision foundation models such as DINOv2 and CLIP were applied as extractors for medical CBIR, outperforming domain-adapted CNNs with mean average precision (mAP) gains of 10-20% on radiology datasets like MIMIC-CXR by capturing semantically rich embeddings invariant to dataset shifts.^[89] These models, trained on billions of image-text pairs, facilitate zero-shot retrieval where query images are embedded in a shared space with reference visuals, reducing the need for exhaustive indexing.^[90] Hybridization with foundation models extends this by combining their global contextual representations with localized traditional or mid-level features; for example, concatenating CLIP embeddings with SIFT keypoints has been shown to boost retrieval robustness in cluttered scenes, achieving 25% higher top-k accuracy on custom benchmarks versus foundation-only use.^[91] In radiology applications, BiomedCLIP—a CLIP variant fine-tuned on biomedical data—integrated with hybrid indexing yielded a P@1 score of 0.594 in zero-shot settings, highlighting causal advantages in causal realism through multimodal priors that align visual content with clinical semantics without proprietary annotations.^[90] Such integrations mitigate foundation model vulnerabilities like hallucinated features in out-of-distribution queries by grounding with empirical low-level descriptors, as validated in 2025 evaluations across diverse corpora.^[92] This paradigm shift prioritizes scalable, generalizable retrieval, with ongoing research focusing on efficient distillation to deploy these hybrids in resource-constrained environments.

Challenges and Criticisms

The Semantic Gap and Representation Limitations

The semantic gap in content-based image retrieval (CBIR) denotes the discrepancy between the objective information derivable from an image's visual data—such as pixel-level patterns—and the subjective, high-level interpretation imposed by human observers, which incorporates context, intent, and abstract concepts.^[27] This gap originates from the inherent limitations of computational feature extraction, which predominantly relies on low-level descriptors like color histograms, texture gradients, and edge shapes, failing to encode semantic elements such as object relationships, emotional tone, or narrative events.^[72] For example, two images sharing similar low-level textures might evoke entirely different human interpretations—one as a serene landscape, the other as turbulent weather—highlighting how pixel analysis diverges from perceptual cognition.^[93] The persistence of this gap stems from representational constraints in CBIR systems, where features are derived from static 2D projections that omit three-dimensional structure, temporal dynamics, and external knowledge humans implicitly apply.^[27] Low-level representations, while efficient for tasks like exact instance matching, inadequately model scene-level semantics because they prioritize quantifiable primitives over holistic understanding; quantitative evaluations, such as those on datasets like MIRFLICKR-25K, demonstrate that advanced low-level methods yield retrieval accuracies below 20% for category-based queries requiring semantic inference.^[27] Human scene analysis, by contrast, integrates top-down priors and cultural context, rendering automated bridging computationally intractable without supplementary data like annotations or textual descriptions.^[93] Even representations learned via deep neural networks, such as convolutional neural network (CNN) embeddings trained on large-scale datasets like ImageNet, exhibit limitations in fully closing the gap, as they remain tethered to visual correlations rather than causal semantic structures or novel contexts unseen in training.^[72] These hierarchical features improve mid-level abstraction—e.g., detecting object parts—but falter on ambiguities like polysemous scenes (a "bank" as river edge versus financial institution) or subjective judgments, where retrieval precision drops significantly without explicit semantic grounding.^[72] Empirical studies confirm that while CNN-based methods boost mean average precision (mAP) by 10-20% over traditional descriptors on benchmark tasks, they underperform in zero-shot semantic retrieval, underscoring unresolved tensions between scalability and interpretive fidelity.^[27] Mitigation strategies, including hybrid models fusing visual features with linguistic embeddings, partially alleviate but do not eliminate these constraints, as user-specific interpretations introduce variability that fixed representations cannot accommodate.^[72]

Scalability, Efficiency, and Resource Constraints

Scalability in content-based image retrieval (CBIR) systems is hindered by the exponential growth of image databases, often reaching billions of entries, coupled with high-dimensional feature spaces that invoke the curse of dimensionality, rendering exact nearest-neighbor searches computationally infeasible as query times scale linearly or worse with database size and dimensionality.^[94]^[95] Traditional indexing methods, such as KD-trees or ball trees, fail in high dimensions (typically beyond 20–50) due to increased overlap in partitioning and sparse data distributions, leading to near-linear scan times despite logarithmic ideals.^[96]^[97] To address these, approximate nearest-neighbor (ANN) search techniques predominate, with hashing methods—such as locality-sensitive hashing (LSH) and supervised variants—projecting features into low-dimensional binary codes (e.g., 32–64 bits) for rapid Hamming distance queries, reducing storage from gigabytes to megabytes per million images and enabling sub-linear retrieval.^[98]^[99] Semi-supervised hashing frameworks, for instance, optimize codes on labeled subsets while regularizing unlabeled data, achieving superior precision-recall over unsupervised LSH on datasets up to 80 million images, with query times independent of original dimensionality.^[98] Deep hashing extends this by integrating convolutional neural network (CNN) features directly into the hashing process, preserving semantic fidelity while compressing representations for large-scale indexing.^[100] Efficiency further improves via specialized libraries like FAISS, which supports billion-scale vector search through inverted file (IVF) structures combined with product quantization (PQ), compressing 128-dimensional features to 64 bytes while maintaining near-exact recall, yielding query latencies under 10 milliseconds on GPU-accelerated hardware for 1 billion vectors.^[101]^[102] These methods trade minimal accuracy loss (e.g., 1–5% recall drop) for orders-of-magnitude speedups, as demonstrated in medical and geospatial CBIR applications handling terabyte-scale corpora.^[103]^[102] Resource constraints arise prominently in feature extraction and storage: deep models like ResNet-50 require substantial GPU compute (e.g., hours for millions of images) and memory for high-dimensional descriptors (e.g., 2048 floats per image, totaling petabytes uncompressed for billion-scale sets), necessitating offline pre-extraction, quantization, or distributed cloud infrastructure.^[72] In constrained environments, such as mobile or edge devices, lightweight CNNs or distilled models reduce extraction time by 5–10x, though at the cost of retrieval accuracy, while server-side systems leverage vector databases for scalable, sharded indexing to manage peak loads without proportional resource escalation.^[104] Overall, these optimizations enable real-time CBIR in production, but ongoing trade-offs persist between accuracy, latency, and hardware demands in ultra-large deployments.^[72]

Vulnerabilities, Adversarial Robustness, and Dataset Biases

Content-based image retrieval (CBIR) systems, especially those employing deep neural networks for feature extraction, are susceptible to adversarial attacks that exploit the sensitivity of learned embeddings to input perturbations. These attacks generate subtle modifications to query images, causing the system to retrieve irrelevant or targeted incorrect results while maintaining visual similarity to the original. For example, adaptive targeted adversarial examples can conceal personal images from retrieval by databases, with perturbations limited to small norms (e.g., L-infinity below 8/255) achieving success rates over 90% on datasets like INSTRE and Holidays.^[105] Similarly, attacks on deep product quantization networks for image retrieval craft adversarial queries that mismatch nearest neighbors, reducing retrieval accuracy by up to 80% in targeted scenarios without exceeding perceptual bounds.^[106] Such vulnerabilities arise from the non-robust optimization of embedding spaces, where gradient-based perturbations align adversarial features away from true matches.^[107] Adversarial robustness in CBIR remains limited, as most systems prioritize accuracy over defense against evasion or poisoning. Universal perturbations, applicable across multiple queries, can degrade retrieval performance globally by shifting database embeddings in feature space, with transferability across models like ResNet and VGG observed in experiments on Oxford5k and Paris6k benchmarks.^[108] Defenses such as adversarial training—incorporating perturbed examples during embedding learning—improve resilience but increase computational overhead by 2-5x and may reduce clean accuracy by 5-10%.^[109] Privacy vulnerabilities compound these issues; query images can inadvertently leak sensitive attributes through feature correlations, enabling reconstruction or inference attacks with success rates exceeding 70% on facial datasets.^[110] Overall, CBIR's reliance on Euclidean or cosine similarity metrics amplifies fragility, as small directional changes in high-dimensional embeddings suffice to evade matching. Dataset biases in CBIR stem from imbalances in training corpora, such as overrepresentation of Western demographics or urban scenes, leading to skewed feature learning that disadvantages underrepresented categories. For instance, geographical biases in image datasets cause retrieval systems to favor images from data-rich regions (e.g., North America comprising 60-80% of labels in common benchmarks), resulting in 20-40% lower recall for queries from Africa or Asia.^[111] In fair image retrieval, test-time biases manifest when neutral queries retrieve disproportionate results favoring majority groups, with demographic parity gaps up to 0.3 in embeddings from models trained on biased sources like LAION-5B.^[112] These biases propagate causally from collection practices—e.g., crowdsourced labeling favoring accessible content—yielding culturally stereotypical associations, as evidenced by higher precision for light-skinned faces (85%) versus dark-skinned (65%) in face retrieval tasks.^[113] Mitigation via reweighting or augmentation reduces gaps by 15-25% but requires domain-specific auditing, as general debiasing overlooks retrieval-specific metrics like mean average precision under subgroup constraints.^[114]

Evaluation Frameworks

Performance Metrics and Quantitative Assessment

Precision and recall are foundational metrics for evaluating CBIR systems, where precision quantifies the fraction of retrieved images deemed relevant to the query out of the total retrieved, while recall measures the fraction of all relevant images in the database that are successfully retrieved.^[115]^[116] These metrics are computed across multiple queries, often visualized via precision-recall curves to assess trade-offs in retrieval performance, with the area under the curve (AUC) providing a summary scalar value.^[116] The F1-score, defined as the harmonic mean of precision and recall (F1 = 2 × (precision × recall) / (precision + recall)), offers a balanced single measure particularly useful when class imbalance affects relevance judgments in image datasets.^[115] For ranking-focused evaluation, average precision (AP) computes the mean precision value obtained at each position in the ranked retrieval list where a relevant image appears, emphasizing the quality of top-ranked results.^[117] Mean average precision (mAP), the mean of AP scores over all queries, serves as a standard benchmark for comparing CBIR methods, with reported values such as 60% achieved in visual-feature-only systems on medical datasets.^[116]^[118] Precision at K (P@K), the precision of the top K retrieved images, evaluates early retrieval accuracy relevant for user-facing applications where only initial results are viewed.^[115] These metrics rely on ground-truth relevance annotations, which can introduce subjectivity in image similarity judgments, prompting some frameworks to incorporate normalized discounted cumulative gain (NDCG) to weight higher-ranked relevant images more heavily while accounting for graded relevance.^[119] Empirical assessments often report mAP improvements in deep learning-based CBIR, such as those exceeding prior methods in large-scale datasets, though direct comparability requires standardized protocols to mitigate variances from feature extraction or database scale.^[118]

Benchmarks, Datasets, and Empirical Comparisons

Standard benchmark datasets for content-based image retrieval (CBIR) systems emphasize diverse challenges, including semantic categorization, near-duplicate detection, and landmark instance retrieval. The COREL dataset, available in subsets such as Corel-1K (1,000 images across 10 semantic categories with 100 images per class) and Corel-10K (10,000 images), serves as a foundational resource for evaluating retrieval accuracy in general-purpose image search, where images are grouped by high-level concepts like landscapes or animals.^[120] Similarly, the Wang dataset (1,000 images in 10 classes) is employed for precision-recall assessments in categorical retrieval tasks.^[115] For near-duplicate and copy detection, the Holidays dataset consists of 1,491 images forming 482 distinct scenes, with 500 designated query images and ground-truth relevance lists, typically evaluated via mean average precision (mAP).^[121] The Oxford Buildings dataset (Oxford 5K), containing 5,062 high-resolution images of landmarks, includes five query sets per building with varying distractor difficulties, also benchmarked on mAP to test robustness against viewpoint and illumination changes.^[121] UKBench, with 10,200 images representing 2,550 objects (four images each), uses an N-S score (average recall of top-4 results) to measure instance-level retrieval performance.^[122] Empirical comparisons reveal that deep learning-based CBIR methods substantially outperform traditional handcrafted feature approaches, such as SIFT with bag-of-words (BoW) or color histograms, by leveraging hierarchical representations from convolutional neural networks (CNNs). On the Holidays dataset, traditional methods like VLAD or Fisher vectors yield mAP scores around 70-80%, whereas CNN-derived global descriptors achieve 85-95% mAP, with compact bilinear pooling variants reaching up to 95.13% using 64-dimensional vectors.^[123] ^[124] Deep approaches reduce the semantic gap by learning invariant features, improving precision by 15-35% over BoW or histogram-based baselines on Corel-10K, as demonstrated in fusion models combining perceptual color histograms with deep embeddings.^[125]

Dataset	Traditional Method Example (mAP or Equivalent)	Deep Learning Example (mAP or Equivalent)	Improvement Notes
Holidays	~77% (SIFT + VLAD)	~90%+ (CNN bilinear)	Deep methods enhance invariance to transformations.^[123]
Oxford 5K	~60-70% (handcrafted local features)	~80-85% (end-to-end CNN)	Better handling of distractors via learned semantics.^[34]
Corel-10K	~70-80% precision (HOG/BoW)	~90-95% precision (deep fusion)	Hierarchical features bridge low-level to semantic.^[125]

Recent advancements, including transformer-based and foundation model integrations (e.g., 2023-2024 studies using pre-trained vision encoders), further elevate benchmarks on these datasets, often surpassing prior deep baselines by 5-10% mAP through multi-scale attention, though domain-specific datasets like TotalSegmentator (104 anatomical structures from CT scans) are emerging for specialized evaluations.^[89] ^[126] Traditional methods persist in resource-constrained settings due to lower computational demands, but empirical evidence consistently favors deep learning for accuracy in large-scale retrieval.^[118]

Applications and Real-World Deployment

Commercial and Large-Scale Systems

IBM's Query By Image Content (QBIC) system, prototyped in the early 1990s at the IBM Almaden Research Center, represented one of the first commercial-grade CBIR implementations, enabling queries against large image and video databases via features such as color histograms, texture patterns, shape outlines, and user sketches.^[127] QBIC supported relevance feedback to refine results iteratively and was integrated into IBM's Ultimedia Manager commercial product for multimedia asset management.^[128] Contemporary large-scale systems leverage deep neural networks for feature extraction and embedding spaces to handle billions of images efficiently. Pinterest's visual search engine, deployed since 2015, uses convolutional neural networks to generate embeddings from user-uploaded pins, powering related pin recommendations and object detection across its repository of over 200 billion images as of 2023, with live A/B tests showing up to 20% increases in user engagement metrics like saves and clicks.^[129]^[130] Google's Lens, launched in 2017 as an extension of its reverse image search capabilities, applies content-based retrieval via convolutional neural networks to match query images against web-scale indices, supporting real-time identification of objects, landmarks, and text in camera feeds.^[131] To address scalability in such systems, Meta's FAISS (Facebook AI Similarity Search) library, open-sourced in 2017, provides approximate nearest neighbor algorithms optimized for GPU-accelerated searches over dense vector representations of images, enabling sub-second retrieval latencies on datasets exceeding one billion entries through techniques like inverted file indexing and product quantization.^[101] These tools underpin production CBIR in e-commerce and social platforms, where vector databases like those built on FAISS integrate with embedding models to manage petabyte-scale image corpora while minimizing computational overhead.^[132]

Domain-Specific Implementations and Emerging Uses

In medical imaging, content-based image retrieval (CBIR) facilitates diagnostics by identifying visually similar historical cases, enhancing radiologist efficiency in managing large datasets from modalities like X-rays, CT scans, and MRIs. Systems exploit low-level features such as texture and shape to retrieve relevant images, with applications in pathology for tumor detection and dermatology for lesion analysis; for instance, a 2025 retrospective study evaluated CBIR's role in improving diagnostic accuracy for radiologists by retrieving similar cases, reporting measurable gains in precision over unaided review.^[133] In content-based dermatoscopic image retrieval, CBIR has been compared to clinician assessments, yielding diagnostic accuracies competitive with expert evaluations for skin cancer identification.^[134] Remote sensing applications leverage CBIR to query vast earth observation archives for land cover classification, disaster assessment, and environmental monitoring, using spectral and spatial features to match query images against satellite or aerial data. Techniques like composed image retrieval enable hybrid queries combining example images with modifiers, applied to hyperspectral datasets for urban planning and vegetation analysis as of 2024.^[135] Recent trends emphasize deep learning for feature extraction in remote sensing CBIR, addressing challenges in big data volumes from missions like Landsat or Sentinel, with systems achieving high recall in land use retrieval tasks.^[136] In fashion and e-commerce, CBIR supports product discovery by retrieving items based on visual attributes like color, pattern, and style, outperforming text-based search in subjective domains such as textile motifs or apparel matching. Implementations index clothing databases using convolutional neural networks for fabric type and design similarity, deployed in platforms for personalized recommendations since the early 2020s.^[137] A 2024 system for Indian traditional textile motifs demonstrated CBIR's objectivity over keyword methods, retrieving culturally specific patterns with reduced human bias.^[138] Emerging uses integrate CBIR with foundation models for geospatial and multimodal retrieval, as in 2024 proposals using geospatial models like Prithvi for multi-spectral remote sensing queries, enabling zero-shot adaptation to new observation types.^[139] Advances in deep adaptive networks, reported in 2025, enhance CBIR for medical and surveillance tasks via multi-scale features, improving robustness to rotations and scales in real-time deployments.^[140] These developments extend to cultural heritage preservation, where CBIR aids in artifact matching across archives, and visual surveillance for anomaly detection, driven by the need for scalable, semantics-aware systems amid growing image corpora.^[141]

References

[1]
[PDF] Advancements in Content-Based Image Retrieval - arXiv
Content-Based Image Retrieval (CBIR) is a crucial component of computer vision, enabling the retrieval of images based on their visual content rather than ...
[2]
[PDF] Content-based image retrieval: a comparison between query by ...
Content-based image retrieval is a promising approach because of its automatic indexing and retrieval based on their semantic features and visual appearance.
[3]
[PDF] An Overview of Content-BasedImage Retrieval Methods And ...
An Overview of Content-Based Image Retrieval Methods And. Techniques. M.H. ... Two key concepts are used in this definition: distinctive invariant features and ...
[4]
Deep Learning for Content-Based Image Retrieval
The key challenge has been attributed to the well-known ``semantic gap'' issue that exists between low-level image pixels captured by machines and high-level ...Missing: achievements | Show results with:achievements
[5]
Content-based Image Retrieval - an overview | ScienceDirect Topics
1. Introduction to Content-Based Image Retrieval · 2. Core Components and Techniques of CBIR · 3. Machine Learning and Deep Learning in CBIR · 4. Applications of ...
[6]
[PDF] Image Retrieval: Current Techniques, Promising Directions, and ...
Feature (content) extraction is the basis of content-based image retrieval. In a broad sense, features may include both text-based features (key words, ...
[7]
https://ieeexplore.ieee.org/document/10230238
[8]
[PDF] AN OVERVIEW OF CONTENT-BASED IMAGE RETRIEVAL ...
Feb 20, 2016 · Content based image retrieval (CBIR) depends on several factors, such as, feature extraction method (the usage of appropriate features in ...
[9]
[PDF] Content-based Image Retrieval - Institut für Informatik
The aim of this report is to review the current state of the art in content-based image retrieval. (CBIR), a technique for retrieving images on the basis of ...
[10]
Similarity evaluation in a content-based image retrieval (CBIR ... - NIH
In this study, the authors compared seven similarity measures to be considered for the CBIR system. The similarity between the query and the retrieved masses ...Missing: core principles
[11]
(PDF) A REVIEW ON CONTENT BASED IMAGE RETRIEVAL
Aug 6, 2025 · Text-based and content-based are the two techniques adopted for search and retrieval in image database. In text-based retrieval, images are i ...
[12]
Information fusion in content based image retrieval
So far, the most common method for retrieving multimedia content from an archive consists of using meta-data associated to the images such as the timestamp, the ...Missing: scholarly | Show results with:scholarly
[13]
What Is Content-Based Image Retrieval? | Baeldung on Computer ...
Mar 26, 2025 · Modern CBIR systems use deep convolutional networks to extract features from a query image and compare them to those of the database images. ...Missing: principles | Show results with:principles
[14]
An efficient similarity measure for content based image retrieval ...
CBIR can retrieve images that are similar to the query input image using the “query by example” technique which requires the user to input any description about ...Missing: principles | Show results with:principles<|control11|><|separator|>
[15]
[PDF] Conceptional review of the Content-based Image retrieval
Sep 9, 2023 · This article examines the process of the Content-based Image Retrieval (CBIR ) system , the features and steps involved, the methods employed, ...
[16]
Content-Based Medical Image Retrieval and Intelligent Interactive ...
Users may, likewise, provide text keywords, in which case the system performs a content- and metadata-based search. The system was fashioned with an anonymizer ...
[17]
[PDF] A World Wide Web Based Image Search Engine Using Text and ...
To overcome the disadvantage of low-precision rate of the text-based approach, we propose to use the CBIR approach to re-filter the search results. Since ...
[18]
(PDF) An analysis of content-based image retrieval - ResearchGate
Apr 5, 2021 · This study mainly focuses on content-based image retrieval; which uses image features like color, shape, texture, etc. by searching the user query image from a ...<|separator|>
[19]
[PDF] Comparison of Content Based Image Retrieval Systems Using ...
"Content-based" means that the search will analyze the actual contents of the image rather than the metadata such as keywords, tags, and/or descriptions ...
[20]
A Novel Trademark Image Retrieval System Based on Multi-Feature ...
CBIR refers to the fully automatic process of obtaining images that are relevant to a query image from a large collection based on their visual content [7,8,9].
[21]
[PDF] Query by image and video content: the QBIC system - Computer
QBIC is a system for content-based retrieval of image and video content, using color, texture, shape, and motion, and a graphical query language.
[22]
https://ieeexplore.ieee.org/document/410146
[23]
Efficient and effective Querying by Image Content - IBM Research
Jul 1, 1994 · We describe a set of novel features and similarity measures allowing query by image content, together with the QBIC system we implemented.
[24]
Virage image search engine: an open framework ... - Semantic Scholar
The Virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems and can be utilized to ...
[25]
[PDF] Blobworld: image segmentation using expectation-maximization and ...
The best-known image database system is IBM's Query by. Image Content (QBIC) [10], which allows an operator to specify various properties of a desired image.
[26]
Content-Based Image Retrieval at the End of the Early Years
The paper presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based ...Missing: key milestones classical
[27]
[2011.06490] Content-based Image Retrieval and the Semantic Gap ...
Nov 12, 2020 · Abstract page for arXiv paper 2011.06490: Content-based Image Retrieval and the Semantic Gap in the Deep Learning Era.Missing: 2010s | Show results with:2010s
[28]
Recent developments of content-based image retrieval (CBIR)
Sep 10, 2021 · In this paper, we present a survey on the fast developments and applications of CBIR theories and algorithms during the period from 2009 to 2019.
[29]
[PDF] A Decade Survey of Content Based Image Retrieval using Deep ...
The key issue being addressed by deep learning methods is to learn very discriminative, robust and compact features for image retrieval.
[30]
AlexNet and ImageNet: The Birth of Deep Learning - Pinecone
AlexNet, a CNN, won the ImageNet 2012 challenge, demonstrating deep learning's practicality and making it the first successful application of deep learning.
[31]
Feature extraction and image retrieval based on AlexNet
Aug 29, 2016 · The latest research shows that Deep CNN model is good at extracting features and representing images.
[32]
Deep Learning for Content-Based Image Retrieval - ResearchGate
Nov 18, 2015 · Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval ( ...
[33]
Deep convolutional features for image retrieval - ScienceDirect.com
Sep 1, 2021 · This work presents a state-of-the-art review in Deep Convolutional Features for image retrieval, pointing out their scope, advantages, and limitations.
[34]
Content-based image retrieval: A review of recent trends
This paper also provides an overview of CBIR framework, recent low-level feature extraction methods, machine learning algorithms, similarity measures, and a ...Missing: principles | Show results with:principles
[35]
Review on Content-based Image Retrieval Models for Efficient ...
Apr 13, 2022 · A search is "content-based" looks at the image's actual content. Color, texture, shape, and spatial relationships are image content descriptors.
[36]
Content-based image retrieval using color and texture fused features
The main idea of CBIR is to analyze image information by low level features of an image [2], which include color, texture, shape and space relationship of ...
[37]
Content-based image retrieval using color difference histogram
Color histogram-based image retrieval is easy to implement and has been well studied and widely used in CBIR systems. However, color histograms characterize ...
[38]
Shape based Image Retrieval Utilising Colour Moments and ...
In content-based image retrieval (CBIR) visual cues such as colour, texture, and shape are the most prominent features used.
[39]
Texture Features for Browsing and Retrieval of Image Data
We propose the use of Gabor wavelet features for texture analysis and provide a comprehensive experimental evaluation. ... Manjunath, "Rotation Invariant Texture ...
[40]
[PDF] Content-based Image Retrieval Using Gabor Texture Features
Basically, Gabor filters are a group of wavelets, with each wavelet capturing energy at a specific frequency and a specific direction. Expanding a signal using ...
[41]
Texture Features for Browsing and Retrieval of Image Data
Aug 7, 2025 · We propose the use of Gabor wavelet features for texture analysis and provide a comprehensive experimental evaluation. ... filters or wavelets ...
[42]
[PDF] Review of shape representation and description techniques
This paper reviews shape representation techniques, including shape signature, moments, curvature, shape matrix, and spectral features, used to describe image ...
[43]
A shape-based image retrieval method using salient edges
A simple shape feature representation in terms of salient edges has been proposed and demonstrated effective. · A similarity measure that integrates properties ...
[44]
[PDF] CONTENT BASED IMAGE RETRIEVAL USING EXACT LEGENDRE ...
Legendre Moments (LM) are orthogonal, computationally faster, and can represent image shape features compactly. CBIR system using Exact Legendre Moments (ELM) ...<|control11|><|separator|>
[45]
https://ieeexplore.ieee.org/document/7854023/
[46]
A survey of content-based image retrieval with high-level semantics
In the past decade, a few commercial products and experimental prototype systems have been developed, such as QBIC [4], Photobook [5], Virage [6], VisualSEEK [7] ...
[47]
Toward Content Based Image Retrieval with Deep Convolutional ...
In 2012, at the ImageNet competition, the group from the University of Toronto used a Deep Convolutional Neural Network (dCNN) trained on over 1 million images ...
[48]
[2002.07877] CBIR using features derived by Deep Learning - arXiv
Feb 13, 2020 · In this paper, we propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image ...
[49]
Evaluation Of Distance Measures In Content Based Image Retrieval
Most popularly used similarity metrics are described and evaluated in this paper. They are evaluated on standard Wang Image dataset which is considered as a ...Missing: survey | Show results with:survey
[50]
Content Based Image Retrieval Using Various Distance Metrics
A similarity measure plays an important role in image retrieval. This paper compares six different distance metrics such as Euclidean, Manhattan, Canberra ...
[51]
Analysis of distance metrics in content-based image retrieval using ...
The similarity measure metric measures the similarity between the two vectors of images and a large similarity means the two images are closely related ( ...
[52]
Analysis of euclidean distance and Manhattan Distance measure in ...
In content-based image retrieval systems, Manhattan distance and Euclidean distance are typically used to determine similarities between a pair of image [2].
[53]
Comparison of similarity measures in HSV quantization for CBIR
Researchers implemented various similarity measure for CBIR ... The highest average recall valued is 92% on Horses category that established to Cosine Similarity.
[54]
[PDF] Recent Advance in Content-based Image Retrieval - arXiv
Sep 2, 2017 · The purpose of this paper is to categorize and evaluate those algorithms proposed during the period of 2003 to 2016. We conclude with several ...
[55]
[PDF] A Review Paper on Content Based Image Retrieval Techniques
Almost all the CBIR techniques start from the query by example i.e. one image. The system first extracts the feature of the query image, searches the ...
[56]
[PDF] Content-Based Image Retrieval - Washington
In the context of content-based image retrieval, we need methods that can quickly decide how similar an image shape is to a query shape. Often, we require shape ...
[57]
A new matching strategy for content based image retrieval system
Most CBIR systems are similarity based, where the similarity between query and target images in a database is measured according to some similarity matching ...
[58]
[PDF] A Fast Compression-based Similarity Measure with Applications to ...
A Content-based Image Retrieval (CBIR) system based on FCD is then defined ... query-by-example system, the user is able to present to the system a ...<|separator|>
[59]
Using Statistical Moment Invariants and Entropy in Image Retrieval
Feb 10, 2010 · Abstract: Although content-based image retrieval (CBIR) is not a ... query by example manner, experimental results demonstrate that the ...
[60]
[PDF] A NOVEL APPROACH TO DEVELOP A NEW HYBRID ... - arXiv
Later a new feature namely texture feature is also added in the field of CBIR systems. CBIR approach is based on Query by example approach in this a query image ...
[61]
[PDF] Techniques of Content Based Image Retrieval: A Review
Query techniques example is query techniques that involve providing the CBIR system with an example image that it will then base its search upon. The ...
[62]
Relevance Feedback in Content-Based Image Retrieval: A Survey
In content-based image retrieval, relevance feedback is an interactive process, which builds a bridge to connect users with a search engine.
[63]
The #rocchio71### algorithm. - Stanford NLP Group
The Rocchio algorithm moves a query vector towards relevant and away from nonrelevant documents, using feedback to modify the query.
[64]
Content based Image Retrieval with Rocchio Algorithm for ...
Sep 18, 2019 · This paper presents Content based Image Retrieval with Relevance Feedback to retrieve relevant images based on an image query. Three main steps ...
[65]
Query by Sketch and Relevance Feedback for Content-Based Image ...
The system relies on a three-layer relevance feedback architecture to ... Niblak. The QBIC project: querying images by content using color texture and shape.
[66]
Relevance feedback in content-based image retrieval: some recent ...
Various relevance feedback algorithms have been proposed in recent years in the area of content-based image retrieval. This paper presents some recent ...
[67]
https://ieeexplore.ieee.org/document/4028712
[68]
Relevance feedback techniques in interactive content-based image ...
This paper proposes a relevance feedback based interactive retrieval approach, which effectively takes into account the above two characteristics in CBIR.
[69]
A relevance feedback mechanism for content-based image retrieval
Content-based image retrieval systems require the development of relevance feedback mechanisms that allow the user to progressively refine the system's response ...
[70]
https://ieeexplore.ieee.org/document/1217269
[71]
Convolutional neural networks for relevance feedback in content ...
Jul 21, 2020 · Given the great success of Convolutional Neural Network (CNN) for image representation and classification tasks, we argue that Content-Based ...
[72]
[2312.10089] Advancements in Content-Based Image Retrieval - arXiv
Dec 13, 2023 · This survey paper presents a comprehensive overview of CBIR, emphasizing its role in object detection and its potential to identify and retrieve visually ...Missing: key | Show results with:key
[73]
Relevance feedback for enhancing content based image retrieval ...
Our proposed framework addresses some limitations of existing clinical CBIR systems by enabling search for similar images with iterative relevance feedback, and ...
[74]
[PDF] ImageNet Classification with Deep Convolutional Neural Networks
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 ...Missing: key | Show results with:key
[75]
Extracting features from Convolutional neural networks for image ...
Aug 23, 2018 · Perhaps an even more interesting discovery is the fact that loosely trained CNNs still perform really well on CBIR tasks. Where by loosely ...Missing: shelf | Show results with:shelf
[76]
[PDF] SIFT Meets CNN: A Decade Survey of Instance Retrieval - arXiv
Abstract—In the early days, content-based image retrieval (CBIR) was studied with global features. Since 2003, image retrieval based.<|separator|>
[77]
Deep convolutional image retrieval: A general framework
The primary approach of applying deep CNNs in the retrieval domain is to extract the feature representations from a pretrained model by feeding images in the ...Deep Convolutional Image... · Introduction · Spatial Verification And...
[78]
[PDF] Content Based Image Retrieval by Convolutional Neural Networks
In this paper, we focus on image representation in CBIR by learning efficient feature representation. In order to achieve this we use a CNN to carry out feature.Missing: key milestones
[79]
[2102.05644] Training Vision Transformers for Image Retrieval - arXiv
Feb 10, 2021 · Abstract: Transformers have shown outstanding results for natural language understanding and, more recently, for image classification.
[80]
[2210.11909] Boosting vision transformers for image retrieval - arXiv
Oct 21, 2022 · Abstract:Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection.
[81]
Co-attention enabled content-based image retrieval - ScienceDirect
This paper proposes an efficient query sensitive co-attention mechanism for large-scale CBIR tasks. In order to reduce the extra computation cost required by ...
[82]
A Novel Hybrid Approach for a Content-Based Image Retrieval ...
CBIR is the search that focuses on the image content instead of the Metadata, such as the description associated with the image, keyword, and tags [11]. CBIR ...2. Related Work · 3. Materials And Methods · 3.2. Structure Of System
[83]
Hybrid Approach for Content-Based Image Retrieval using VGG16 ...
We propose an effective deep learning framework based on Convolution Neural Networks (CNN) and Support Vector Machine (SVM) for fast image retrieval.
[84]
A Hybrid Approach for Improved Content-based Image Retrieval ...
Feb 11, 2015 · Abstract:The objective of Content-Based Image Retrieval (CBIR) methods is essentially to extract, from large (image) databases, a specified ...
[85]
A hybrid approach for improving Content Based Image Retrieval ...
This paper presents a hybrid image retrieval system which integrates Neural Network and Genetic Algorithm together. Proposed method reduces the semantic ...
[86]
Hybrid approach for content-based image retrieval
Aug 27, 2021 · To improve the accuracy level of CBIR, the proposed system introduces an unsupervised Hybrid Approach. The proposed system gets the input images ...
[87]
(PDF) Hybrid approach for content-based image retrieval
Aug 6, 2025 · This paper proposes an adaptive hybrid index (AHI) supported by a construction-and-extraction technique to support image retrieval. First, the ...
[88]
Leveraging Foundation Models for Content-Based Image Retrieval ...
Mar 11, 2024 · In this work, we propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based image retrieval.
[89]
Leveraging foundation models for content-based image retrieval in ...
In this work, we propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based image retrieval.
[90]
Leveraging foundation models for content-based image retrieval in ...
Oct 15, 2025 · Therefore, in this work, we propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based ...
[91]
Content-based image retrieval assists radiologists in diagnosing eye ...
Apr 2, 2025 · This study evaluates whether a ML-based content-based image retrieval (CBIR) tool, combined with a curated database of orbital MRI cases with verified ...<|separator|>
[92]
Leveraging Foundation Models for Content-Based Medical Image ...
Mar 11, 2024 · Content-based image retrieval (CBIR) systems offer significant potential for enhancing teaching, diagnostics and research by the identification ...Missing: integration | Show results with:integration
[93]
[PDF] Gaps in content-based image retrieval
Content-based image retrieval (CBIR) is a novel technology that describes methods and means to access pictures by reference image patterns rather than ...
[94]
Tackling Curse of Dimensionality for Efficient Content Based Image ...
Aug 7, 2025 · The high dimensionality of visual features is one of the major challenges for content-based image retrieval (CBIR) systems, and a variety of ...
[95]
An Empirical Study on Large-Scale Content-Based Image Retrieval
One key challenge in content-based image retrieval (CBIR) is to develop a fast solution for indexing high-dimensional image contents, which is crucial to ...
[96]
An efficient high-dimensional indexing method for content-based ...
To tackle the dimensionality curse problem when using k–NN search, we combine high-dimensional indexing method based on the approximation approach with ...Missing: challenges | Show results with:challenges
[97]
Semantic content-based image retrieval: A comprehensive study
This paper provides a comprehensive study on different aspects and techniques involved in the field of CBIR, including the framework structure, image ...Missing: 2010s | Show results with:2010s
[98]
[PDF] Semi-Supervised Hashing for Large Scale Search - Google Research
From the perspective of quality of retrieved results, hashing based ANN methods aim to return an approximate set of nearest neighbors. However, in typical CBIR, ...
[99]
Large-scale content-based image retrieval system with metric ...
Learning to hash methods, as the most representative ANN search technology, has received great attention in large-scale content-based image retrieval systems.
[100]
Deep triplet hashing network for case-based medical image retrieval
Deep hashing methods have been shown to be the most efficient approximate nearest neighbor search techniques for large-scale image retrieval. However ...
[101]
Faiss: A library for efficient similarity search - Engineering at Meta
Mar 29, 2017 · Faiss provides several similarity search methods that span a wide spectrum of usage trade-offs. Faiss is optimized for memory usage and speed.Missing: CBIR | Show results with:CBIR
[102]
https://arxiv.org/pdf/2411.01473
[103]
CCBIR: a concept-based system for geospatial image retrieval
Oct 18, 2025 · Finally, scalability is achieved through the use of FAISS library, which facilitates efficient and fast retrieval even in large-scale datasets ...
[104]
Offline mobile content-based image retrieval with lightweight CNN ...
This paper proposes an offline mobile retrieval framework called OMCBIR. In this framework, we use lightweight CNN to accomplish feature extraction.
[105]
Generating Adaptive Targeted Adversarial Examples for Content ...
Sep 30, 2022 · In this paper, we propose to conceal the images with the targeted adversarial attacks on content-based image retrieval. An imperceptible ...<|separator|>
[106]
[PDF] Adversarial Attack on Deep Product Quantization Network for Image ...
PQ-AG aims to generate imperceptible adversarial perturbations for query images to form adversarial queries, whose nearest neighbors from a targeted product ...
[107]
[PDF] Targeted Mismatch Adversarial Attack: Query With a Flower to ...
This work introduces the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query ...
[108]
[PDF] Universal Perturbation Attack Against Image Retrieval
PDF. Add to Library. Alert. 6 Excerpts. DAIR: A Query-Efficient ... Generating Adaptive Targeted Adversarial Examples for Content-Based Image Retrieval.<|separator|>
[109]
A Review of Image Retrieval Techniques: Data Augmentation ... - arXiv
Sep 2, 2024 · By introducing adversarial examples during training, adversarial learning aims to make the model more resilient to such perturbations, ensuring ...
[110]
[PDF] Towards Privacy and Security Concerns of Adversarial Examples in ...
A picture shared incidentally by users may be found by adversaries to extract private information, including family members, location, income, personal interest ...
[111]
Dataset Diversity: Measuring and Mitigating Geographical Bias in ...
Explore how geographical bias in image search and retrieval cancause racial, cultural and stereotypical bias in visual datasets and3) propose methods to ...Abstract · Information · Published In
[112]
[2305.19329] Mitigating Test-Time Bias for Fair Image Retrieval - arXiv
May 23, 2023 · Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result while ...
[113]
Understanding Bias in Large-Scale Visual Datasets - arXiv
Dec 2, 2024 · Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In ICCV, 2019. [73] ↑ Zeyu Wang ...Understanding Bias In... · 3 Isolating Bias With... · 3.7 Synthetic Images
[114]
[PDF] Mitigating Test-Time Bias for Fair Image Retrieval - NIPS papers
This compromises fairness towards content providers in the image retrieval process. ... Controlling fairness and bias in dynamic learning-to-rank. In Proceedings ...
[115]
A novel content-based image retrieval system using custom CNN ...
Aug 6, 2025 · The extracted features from query and database images are given to a suitable similarity measurement; here, Euclidean distance is used for ...
[116]
An Efficient Content-Based Image Retrieval System for the ... - NIH
May 12, 2020 · The main problem in content-based image retrieval (CBIR) systems is the semantic gap which needs to be reduced for efficient retrieval.
[117]
How to calculate "Average Precision and Ranking" for CBIR system
Oct 14, 2014 · Precision-Recall graphs give you more granular detail on how the system is performing, but average precision is useful when you are comparing a ...
[118]
Optimizing visual data retrieval using deep learning driven CBIR for ...
Jul 2, 2025 · Content-based image retrieval (CBIR) systems have formidable obstacles in connecting human comprehension with machine-driven feature ...Missing: achievements | Show results with:achievements
[119]
Evaluation Metrics for Search and Recommendation Systems
May 28, 2024 · A comprehensive overview of common information retrieval metrics, such as precision, recall, MRR, MAP, and NDCG.Missing: CBIR | Show results with:CBIR
[120]
Content‐Based Image Retrieval and Feature Extraction: A ...
Aug 26, 2019 · In this paper, we aim to present a comprehensive review of the recent development in the area of CBIR and image representation.<|control11|><|separator|>
[121]
Image retrieval task on Holidays and Oxford for full-size...
Experiments on three challenging image retrieval datasets (INRIA Holidays, UKBench, Oxford 5k) confirm the improved discriminative power of our novel encoding ...
[122]
UK Bench | Vision Dataset
Description. The UK Bench dataset from Henrik Stewenius and David Nister contains 10200 images of N=2550 groups with each four images at size 640x480.
[123]
Compact Root Bilinear CNNs for Content-Based Image Retrieval
Sep 22, 2016 · Specifically, using a vector of 64-dimension, it achieves 95.13% mAP accuracy and outperforms the best results of state-of-the-art approaches.
[124]
Comparison with the state of the art on Holidays dataset with diierent...
In this paper we propose a content-based image retrieval system that handles recognizing buildings from an urban scenario using only visual cues. The system use ...
[125]
Image Retrieval Using the Fused Perceptual Color Histogram - PMC
Nov 24, 2020 · Performance comparison of various CBIR methods. On the Corel-10K dataset, the precision of the proposed FPCH method is higher than BOW, HOG, ...
[126]
Building a CBIR Benchmark with TotalSegmentator and FAISS
Aug 26, 2025 · We designed a CBIR benchmark relying on the publicly available TotalSegmentator (TS) dataset Wasserthal et al. [2023], version 1. This dataset ...
[127]
Query by image and video content: the QBIC system - IEEE Xplore
Query by image and video content: the QBIC system | IEEE Journals & Magazine | IEEE Xplore ... Date of Publication: 30 September 1995. ISSN Information: Print ...
[128]
Recent applications of IBM's query by image content (QBIC)
Flickner et al: "Query by image Content: The QBIC System", IEEE Computer Special issue on Content Based Retrieval, Vol. 28, No. 9, September 1995. Digital ...<|separator|>
[129]
Visual Search at Pinterest | Proceedings of the 21th ACM SIGKDD ...
We also demonstrate, through a comprehensive set of live experiments at Pinterest, that content recommendation powered by visual search improves user engagement ...
[130]
[1505.07647] Visual Search at Pinterest - arXiv
May 28, 2015 · A small engineering team to build, launch and maintain a cost-effective, large-scale visual search system with widely available tools.Missing: CBIR | Show results with:CBIR
[131]
How to Perform a Google Reverse Image Search | WP Engine®
Sep 12, 2023 · Google reverse image search relies on a technique that's known as a content-based image retrieval (CBIR) query. This computer vision technology ...
[132]
facebookresearch/faiss: A library for efficient similarity ... - GitHub
Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size.Missing: CBIR scalability
[133]
Evaluation of a content-based image retrieval system for radiologists ...
Jan 13, 2025 · Session C was scheduled six weeks after session A to avoid recall bias. Usage of software tools was expected to change with gaining experience ...
[134]
Diagnostic accuracy of content‐based dermatoscopic image ... - NIH
Content‐based image retrieval (CBIR) based on deep features can find visually similar dermatoscopic images. Retrieving only 16 similar images can achieve ...
[135]
Composed Image Retrieval for Remote Sensing - arXiv
May 29, 2024 · This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a ...
[136]
A Review on Recent Trends on Content-based Image Retrieval for ...
Aug 10, 2025 · Remote sensing presents unique challenges for CBIR due to the complex characteristics of the imagery. These datasets include multi-spectral ...<|control11|><|separator|>
[137]
Recent developments of content-based image retrieval (CBIR)
Aug 9, 2025 · In this paper, we present a survey on the fast developments and applications of CBIR theories and algorithms during the period from 2009 to 2019 ...<|control11|><|separator|>
[138]
Content-based image retrieval of Indian traditional textile motifs ...
May 1, 2024 · Compared with KBIR, CBIR is more objective and uses image content to retrieve images to avoid the influence of human subjectivity on the result.
[139]
MULTI-SPECTRAL REMOTE SENSING IMAGE RETRIEVAL USING ...
May 22, 2024 · This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval with multiple benefits: i) the models ...
[140]
Optimizing visual data retrieval using deep learning driven CBIR for ...
Jul 2, 2025 · This study presents a deep adaptive attention network (DAAN) for CBIR that combines multi-scale feature extraction and hybrid neural architectures to solve ...Missing: emerging | Show results with:emerging
[141]
Content-Based Image Retrieval Techniques - Nature
CBIR systems now play a critical role across diverse domains including surveillance, healthcare diagnostics, and cultural heritage preservation. Research from ...