Fact-checked by Grok 2 weeks ago

CIFAR-10

The CIFAR-10 dataset is a benchmark collection of 60,000 low-resolution color images, each measuring 32×32 pixels, organized into 10 mutually exclusive classes representing common objects, with exactly 6,000 images per class. It comprises 50,000 images designated for training and 10,000 reserved for testing, structured across five training batches of 10,000 images each and one test batch of 10,000 images (1,000 per class). The classes include airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck, selected to facilitate object recognition tasks in computer vision. Developed by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton at the University of Toronto, CIFAR-10 was created as a labeled subset of the larger 80 million tiny images dataset, originally collected from multiple web image search engines including Google to provide diverse, real-world visual data; the source dataset was withdrawn in 2020 due to containing offensive and biased labels. Introduced in 2009 through the technical report Learning Multiple Layers of Features from Tiny Images, the dataset was designed to support research in unsupervised and supervised learning, particularly for training multi-layer generative models and convolutional neural networks (CNNs) on small-scale image data. The report demonstrated baseline performance, achieving approximately 18% test error on CIFAR-10 using a basic CNN without data augmentation, highlighting its utility for evaluating feature extraction and classification algorithms. Since its release, CIFAR-10 has become a foundational resource in machine learning, widely adopted for benchmarking deep learning models due to its manageable size, balanced class distribution, and emphasis on challenging, tiny images that test generalization without requiring massive computational resources. It is available in multiple formats, including binary, Python, and MATLAB versions, and has influenced advancements in areas like transfer learning and generative modeling, with state-of-the-art accuracies now exceeding 99% on the test set using modern architectures. The dataset's enduring impact stems from its role in early demonstrations of deep learning's potential, as evidenced by its integration into frameworks like TensorFlow and PyTorch.

Introduction

Overview

The CIFAR-10 dataset is a standard benchmark for image classification tasks in computer vision, comprising 60,000 32×32 color images divided equally across 10 classes, with 50,000 images allocated for training and 10,000 for testing, resulting in 6,000 images per class overall. It originated as a labeled subset of the larger 80 million tiny images dataset, selected and annotated to support research in object recognition. Created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton in 2009, CIFAR-10 was designed specifically as a benchmark for evaluating machine learning algorithms on natural image data. Its significance lies in its balanced scale and complexity, making it an accessible yet challenging resource that has been widely adopted for developing and testing convolutional neural networks (CNNs) and other deep learning architectures in image recognition.

History

The CIFAR-10 dataset was created in 2009 by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton at the University of Toronto's Department of Computer Science, as part of Krizhevsky's master's thesis work on deep learning techniques for feature extraction from natural images. This effort involved manually labeling a subset of images from the larger 80 million tiny images dataset, which had been collected earlier by researchers at MIT and NYU to sample the visual world through web-scraped thumbnails downscaled to 32x32 pixels. Students were employed to filter and verify approximately 6,000 images per class, ensuring more reliable annotations than the original unlabeled or noisily labeled Tiny Images collection. The primary motivation for developing CIFAR-10 was to establish a benchmark dataset that was both accessible and sufficiently challenging for evaluating generative models, classifiers, and feature learning algorithms in computer vision, particularly amid early struggles with unsupervised learning on vast unlabeled data like Tiny Images. By selecting 10 common object categories and balancing the dataset with 50,000 training images and 10,000 test images, the creators aimed to facilitate reproducible experiments in multilayer feature learning without the complexities of larger-scale datasets. The project received funding from the Canadian Institute for Advanced Research, underscoring its role in advancing neural network research at the time. CIFAR-10 was first released publicly through the University of Toronto's website in 2009, initially available in binary format alongside Python and MATLAB versions for ease of access by researchers. Early adoption was rapid within the machine learning community; notably, it served as a key validation dataset in the 2012 AlexNet paper by Krizhevsky, Sutskever, and Hinton, where a convolutional neural network achieved a test error rate of around 11% on CIFAR-10 using local response normalization, demonstrating the potential of deep learning for image classification and contributing to the field's resurgence. Over the ensuing decade, CIFAR-10 underwent no fundamental structural modifications but saw enhancements in supporting documentation, such as updated baselines and citation guidelines on its official page. By the mid-2010s, it became seamlessly integrated into major deep learning frameworks, including TensorFlow's datasets module and PyTorch's torchvision library, enabling straightforward loading and experimentation for practitioners. Concurrently, community standards for preprocessing evolved, with widespread adoption of techniques like random cropping and horizontal flipping to improve model robustness, reflecting the dataset's enduring utility as a foundational benchmark without altering its core composition. In 2021, researchers discovered near-duplicates comprising about 3.3% of the test set that closely resemble training images, potentially causing data leakage and inflated benchmark results; deduplicated variants such as CIFAR-10.1 have since been recommended for more rigorous evaluations.

Dataset Description

Composition and Structure

The CIFAR-10 dataset consists of 60,000 color images in total, organized into a training set of 50,000 images and a test set of 10,000 images, with no separate validation set provided in the original release. The images span 10 distinct classes, ensuring a balanced distribution across the dataset. The dataset maintains class balance, with exactly 6,000 images per class overall: 5,000 in the training set and 1,000 in the test set. This equal allocation per class facilitates fair evaluation of classification models without inherent bias toward any category. Data is stored in binary format across multiple files for efficient handling. The training set comprises five batches (data_batch_1.bin through data_batch_5.bin), each containing 10,000 images, while the test set is a single batch (test_batch.bin) with 10,000 images. Each image is represented by 3,073 bytes: 1 byte for the class label (an integer from 0 to 9) followed by 3,072 bytes for the pixel values in RGB channels (32 × 32 pixels per channel, stored in row-major order). The dataset is freely available for download from the official University of Toronto website in binary, Python pickle, or MATLAB formats, complete with label files mapping integer indices to class names. It is also directly accessible through major machine learning frameworks such as PyTorch's torchvision and TensorFlow Datasets, which handle loading and basic parsing. A common preprocessing step for CIFAR-10 involves standard normalization by subtracting the per-channel mean (approximately [0.4914, 0.4822, 0.4465] for RGB) from each image, often followed by division by the per-channel standard deviation to center and scale the data for improved model training stability.

Image Characteristics and Classes

The CIFAR-10 dataset consists of low-resolution color images measuring 32×32 pixels across three RGB channels, which imposes significant constraints on visual detail and contributes to recognition challenges such as aliasing artifacts from downsampling and difficulty in capturing fine textures. These specifications stem from the original collection process, where higher-resolution web images were resized without advanced anti-aliasing, resulting in pixelation that particularly affects subtle features like fur patterns or distant objects. The dataset encompasses 10 mutually exclusive classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck, designed for coarse-grained object recognition without overlapping categories (e.g., automobiles and trucks are distinctly separated). Intra-class variations are substantial, including diverse viewpoints (e.g., airplanes seen from side or front angles), lighting conditions (e.g., birds in sunlight or shade), partial occlusions (e.g., horses with foreground elements), and scales, all derived from real-world photographs to simulate practical diversity. The images lack explicit background annotations, emphasizing foreground objects in cluttered or simple settings to focus on holistic shape and color cues rather than segmentation. The small image size exacerbates fine-grained discrimination tasks, such as distinguishing cats from dogs, where limited pixels obscure distinguishing traits like facial features or body proportions, leading to frequent confusions in early models. Overall, these characteristics promote robust feature learning but test models' ability to generalize from minimal information. Images were sourced from the 80 million Tiny Images dataset, comprising web-crawled real-world photos downscaled to 32×32 pixels, with initial labels derived from search terms but refined through manual annotation by students to ensure quality. The 80 Million Tiny Images dataset was retired in June 2020 by MIT due to containing offensive and derogatory content, including racial slurs as category labels. Early analyses have identified minor label ambiguities in the original set, including approximately 0.54% errors in the test portion (e.g., mislabeled cats as frogs or deer as birds), attributable to subjective interpretations of borderline cases during labeling. Additionally, analyses have revealed data leakage due to near-duplicates: about 3.3% of test images are visually similar to training images, leading to overly optimistic performance evaluations. A cleaned variant, ciFAIR-10, replaces these duplicates. The dataset maintains a balanced distribution across classes, with equal representation to facilitate equitable training.

Applications and Usage

In Computer Vision Research

CIFAR-10 has served as a primary benchmark for supervised image classification tasks in computer vision, particularly for evaluating convolutional neural networks (CNNs). Introduced in 2009, it gained prominence following the 2012 AlexNet breakthrough, which demonstrated the efficacy of deep CNNs on similar small-scale datasets and verified architectural improvements on CIFAR-10 itself, achieving an 11% test error rate with normalization techniques. This positioned CIFAR-10 as a go-to resource for rapid prototyping and comparison of classification models during the early revival of deep learning. The dataset has enabled key research trends, including investigations into transfer learning, where pre-trained models from larger corpora like ImageNet are fine-tuned on CIFAR-10 to assess domain adaptation. It has also facilitated studies on data augmentation techniques, such as random cropping and flipping, to mitigate its limited size and low resolution, improving generalization in CNN training. Furthermore, CIFAR-10 underpins robustness research against perturbations and corruptions, with benchmarks like the 2019 ICLR study introducing standardized evaluations of model resilience to noise, weather effects, and adversarial attacks on this dataset. In generative modeling, it supports training GANs for image synthesis, as exemplified by the DCGAN architecture, which used CIFAR-10 to demonstrate stable unsupervised feature learning and domain-agnostic representations achieving 82.8% classification accuracy via extracted features. CIFAR-10's impact lies in standardizing cross-paper comparisons, allowing researchers to track progress in classification accuracy and efficiency without the computational demands of larger datasets. Its balanced, small-scale structure has influenced subsequent dataset designs in computer vision, emphasizing accessibility for quick experimentation and iteration in algorithm development. However, limitations include high overfitting risks due to its modest size (50,000 training images), prompting critiques on generalization when models memorize near-duplicates or label noise present in the dataset. Additional concerns involve annotation errors from manual labeling and potential biases in object representations, contributing to a research shift toward larger, more diverse datasets like ImageNet for real-world applicability. As of 2025, CIFAR-10 remains relevant in emerging areas such as efficient model training for edge computing and benchmarking multimodal vision-language models.

Model Training and Evaluation Protocols

Model training on the CIFAR-10 dataset typically employs cross-entropy loss for the multi-class classification task, as it measures the divergence between predicted probability distributions and true labels. Optimizers such as stochastic gradient descent (SGD) with momentum or Adam are commonly used, with initial learning rates around 0.1 that decay over the course of training, often divided by 10 at specific milestones like 50% and 75% of total epochs. Data handling practices emphasize augmentation to mitigate overfitting given the dataset's limited size. Standard techniques include padding images by 4 pixels on each side followed by random 32×32 crops, horizontal flipping, and sometimes color jittering or more advanced methods like CutMix and Mixup. Batch sizes of 128 to 256 are prevalent to balance computational efficiency and gradient stability, with training durations spanning 100 to 300 epochs depending on the architecture's complexity. Baseline architectures often start with adaptations of LeNet-5, which features convolutional layers followed by subsampling and fully connected layers, modified to handle CIFAR-10's color images and 10 classes instead of grayscale digits. Deeper convolutional networks like VGG, with stacked 3×3 convolutions and max-pooling, serve as intermediate benchmarks, trained via mini-batch gradient descent. ResNet variants, such as ResNet-18 or ResNet-32, incorporate residual connections to enable deeper training without degradation, using batch normalization and SGD. DenseNet connects each layer to every subsequent layer via concatenation, promoting feature reuse and trained over 300 epochs with SGD. More recent adaptations include vision transformers (ViT), which divide 32×32 images into patches and apply self-attention, often pre-trained self-supervised before fine-tuning with Adam and cosine learning rate decay. Evaluation protocols focus on top-1 accuracy computed on the held-out test set of 10,000 images, ensuring no data leakage by strictly separating train, validation, and test splits during training. Reproducibility is maintained through fixed random seeds for initialization, data shuffling, and augmentation, allowing consistent results across runs.

Benchmarks and Results

Standard Evaluation Metrics

The primary evaluation metric for models on the CIFAR-10 dataset is top-1 classification accuracy, defined as the percentage of test images correctly classified into their true category by the model's highest-confidence prediction. This metric is computed exclusively on the held-out test set of 10,000 images to provide an unbiased estimate of generalization performance. Secondary metrics include top-5 accuracy, which measures the proportion of test images where the true class appears among the model's top five predictions; however, this is less commonly reported for CIFAR-10 due to the dataset's limited 10 classes, making top-1 sufficient for most assessments. Per-class accuracy is also used to detect potential biases or imbalances in performance across the 10 categories, despite the dataset's balanced design with 1,000 test images per class. The error rate, simply 1 minus the top-1 accuracy, serves as a complementary measure to quantify misclassification directly. Additional analyses involve the confusion matrix, which visualizes pairwise classification errors between classes to identify systematic confusions, such as between visually similar categories like cats and dogs. For robustness evaluation, metrics assess accuracy degradation under perturbations, including additive noise (e.g., Gaussian) or adversarial attacks like the Fast Gradient Sign Method (FGSM), where robust accuracy reports the success rate on perturbed test images. Standard reporting protocols emphasize evaluation solely on the test set, avoiding any reporting of training accuracy to prevent overfitting biases. For stochastic training methods, results typically include confidence intervals, such as the mean and standard deviation over multiple runs, to account for variability. During training, models are optimized using cross-entropy loss, though evaluation focuses on the aforementioned accuracy-based metrics.

Historical Milestones

The CIFAR-10 dataset was introduced in 2009 by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton from the University of Toronto as a standardized benchmark for object recognition in small color images, consisting of 60,000 32x32 pixel images across 10 classes. Initial evaluations relied on traditional machine learning techniques with hand-crafted features, such as Histogram of Oriented Gradients (HOG) combined with Support Vector Machines (SVM), which typically achieved accuracies around 60% on the test set, limited by the manual design of features that struggled to capture complex visual patterns. The shift to deep learning began in the early 2010s with the emergence of convolutional neural networks (CNNs), marking a pivotal trend from hand-engineered representations to end-to-end learned features and establishing CNN dominance on CIFAR-10. Simple CNN architectures in 2010, such as convolutional deep belief networks, reached 78.9% accuracy with data augmentation, surpassing prior methods. By 2012, AlexNet-inspired models, featuring deeper layers, ReLU activations, and dropout for regularization, improved test accuracies to 89%, as demonstrated in early adaptations of the architecture originally designed for ImageNet but verified on CIFAR-10. Key milestones accelerated in the mid-2010s with innovations addressing training challenges in very deep networks. The 2016 introduction of Residual Networks (ResNets) by He et al. enabled effective optimization of networks exceeding 100 layers through residual connections, with ResNet-110 achieving 93.57% test accuracy on CIFAR-10 using standard data augmentation. Building on this, DenseNets in 2017 by Huang et al. leveraged dense connectivity to promote feature reuse and reduce parameters, attaining up to 96.54% accuracy with the DenseNet-BC-190 variant (growth rate k=40, 190 layers). Further advancements in regularization and data augmentation pushed boundaries toward 97% and beyond by the late 2010s. Techniques like Mixup (2018) by Zhang et al., which interpolates pairs of examples and labels during training, enhanced generalization when combined with architectures like PreAct-ResNet-18, yielding over 95% accuracy and contributing to hybrid approaches exceeding 97%. Similarly, AutoAugment (2018) by Cubuk et al. automated search for optimal augmentation policies, achieving a 1.48% error rate (98.52% accuracy) on CIFAR-10 with Shake-Shake regularization. By 2018, Shake-Shake regularization by Gastaldi introduced stochastic shaking of activations in multi-branch networks, enabling state-of-the-art results of 97.14% accuracy with a 26-layer three-branch residual network variant, solidifying pre-2020 benchmarks before more recent scaling techniques emerged.

State-of-the-Art Achievements

As of 2025, state-of-the-art (SOTA) performance on CIFAR-10 has surpassed 99% top-1 accuracy, largely through advancements in vision transformers (ViTs) and hybrid architectures that leverage pretraining and sophisticated augmentation. For example, variants of the Vision Transformer, pretrained on large-scale datasets like JFT-300M and fine-tuned on CIFAR-10, have achieved 99.50% accuracy by effectively capturing global image dependencies via self-attention mechanisms. Similarly, adaptations of EfficientNet, such as EfficientNet-B7 combined with AutoAugment and mixup regularization, have reported accuracies exceeding 99.2%, demonstrating the efficacy of compound scaling in convolutional backbones for small-resolution tasks. Self-supervised methods like SimCLR have also pushed accuracies over 99% through pretext tasks on unlabeled data, enhancing feature learning for downstream classification. In recent years, hybrid models integrating transformers with convolutional layers have further refined these results, emphasizing parameter efficiency and transferability. Tsetlin Machines, a logic-based alternative to neural networks, have been optimized for CIFAR-10 but remain below 99% at around 82.8% accuracy, highlighting their niche in interpretable computing rather than raw performance. A notable 2024 contribution from a CNN-based approach achieved a near-perfect 99.95% accuracy using a multi-layer architecture with batch normalization, dropout, and extensive data augmentation, underscoring the potential of refined training protocols on standard hardware. Key advances in 2025 have focused on novel paradigms like implicit neural representations, as explored in CVPR proceedings, where end-to-end models improved classification on constrained CIFAR-10 variants without traditional augmentations, reaching up to 64.7% in specialized settings like SIREN-based tasks. Evolutions of DenseNet architectures and GAN-augmented training have also driven error rates below 1%, with self-supervised pretraining on unlabeled data enabling accuracies over 99.5% in few-shot scenarios by enhancing feature invariance. These methods, often detailed in arXiv preprints and NeurIPS submissions, prioritize seamless integration of generative techniques to mitigate label noise and distribution shifts. Emerging trends shift beyond pure accuracy toward efficiency metrics like floating-point operations (FLOPs) per inference, adversarial robustness, and few-shot adaptability, as models like ViT hybrids balance high performance with reduced computational overhead. However, critiques highlight the dataset's saturation, with accuracies plateauing near human-level performance (around 94%), rendering marginal gains less informative for advancing general computer vision and prompting calls for more diverse benchmarks.

CIFAR-100

CIFAR-100 is a dataset extension released alongside CIFAR-10 in 2009 by Alex Krizhevsky in his technical report "Learning Multiple Layers of Features from Tiny Images." It comprises 60,000 32×32 color images across 100 classes, with exactly 600 images per class, drawn from a larger collection of tiny images labeled by workers on Amazon Mechanical Turk. The dataset maintains the same overall structure as CIFAR-10, featuring a training set of 50,000 images and a held-out test set of 10,000 images, with 500 training and 100 test images per class. Each image is provided with both a "fine" label corresponding to one of the 100 specific classes and a "coarse" label assigning it to one of 20 superclasses, such as "aquatic mammals" (including beaver, dolphin, otter, seal, and whale) or "large carnivores" (including bear, leopard, lion, tiger, and wolf). This hierarchical labeling supports evaluations at multiple levels of granularity, enabling tasks that assess both precise object identification and broader category recognition. Compared to CIFAR-10's 10 broad classes, CIFAR-100 introduces greater complexity through its finer-grained categories, such as distinguishing specific dog breeds (e.g., Chihuahua, German shepherd, or Collie) rather than a single "dog" class. This increased diversity results in substantially lower classification performance, with standard convolutional neural networks like ResNet-50 achieving top-1 test accuracies around 72-76%, in contrast to over 90% on CIFAR-10. CIFAR-100 serves primarily as a benchmark for multi-class image classification and hierarchical learning, where models must handle a larger label space and leverage superclass information for improved generalization. It is also commonly used to evaluate transfer learning from CIFAR-10 pre-trained models, testing adaptation to more challenging, fine-grained recognition without altering the underlying image characteristics.

Other Comparable Datasets

Several datasets share similarities with CIFAR-10 in terms of scale, resolution, and purpose as benchmarks for image classification tasks, particularly for rapid prototyping and algorithm development in computer vision. These alternatives often vary in domain specificity, image characteristics, or supervision requirements, enabling comparisons across supervised, semi-supervised, or few-shot learning scenarios. The STL-10 dataset consists of 13,000 labeled color images at 96×96 resolution across 10 classes (airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck), supplemented by 100,000 unlabeled images for unsupervised learning. It is specifically designed to encourage advances in unsupervised feature learning and semi-supervised methods, with only 500 labeled training examples per class to simulate data-scarce environments. CINIC-10 serves as a drop-in replacement for CIFAR-10, featuring 270,000 32×32 RGB images across the same 10 classes, divided equally into 90,000-image train, validation, and test sets. Compiled by augmenting CIFAR-10 with downsampled ImageNet images, it introduces increasing complexity through "coarse" (CIFAR-like) and "fine" (higher-resolution derived) subsets, facilitating a smoother transition from small-scale benchmarks to larger datasets like ImageNet. Other notable datasets include SVHN, which comprises over 600,000 32×32 RGB images (primarily cropped digits from street view) across 10 digit classes (0-9), emphasizing real-world robustness for digit recognition rather than general objects. Fashion-MNIST offers 70,000 28×28 grayscale images of clothing items across 10 classes (e.g., t-shirt, trouser, sneaker), acting as a more challenging alternative to MNIST while maintaining CIFAR-10's structure for fashion-related classification. Mini-ImageNet, a subset of ImageNet, includes 100 classes with 600 84×84 color images each (64 train, 16 validation, 20 test classes), tailored for few-shot learning with larger images and fewer examples per class during evaluation. These datasets are all small-scale for efficient experimentation, differing from CIFAR-10 primarily in supervision paradigms—such as STL-10's emphasis on unlabeled data—or domain focus, like SVHN's digits versus broader objects, allowing researchers to test generalization across varied visual tasks. While CIFAR-100 extends CIFAR-10 to 100 classes, these alternatives provide diverse benchmarks without direct lineage.

References

  1. [1]
    CIFAR-10 and CIFAR-100 datasets
    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  2. [2]
    [PDF] Learning Multiple Layers of Features from Tiny Images
    We show how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual ...
  3. [3]
    cifar10 | TensorFlow Datasets
    Jun 1, 2024 · The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  4. [4]
    [PDF] 80 million tiny images: a large dataset for non-parametric object and ...
    Abstract—With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world.
  5. [5]
    [PDF] arXiv:2208.01823v1 [cs.CV] 3 Aug 2022
    Aug 3, 2022 · We use the classification problem on the CIFAR-10 [27] dataset as an illustrative example. ... Cat vs Dog. 79.10. 77.90. 80.05. Airplane vs ...
  6. [6]
    [PDF] Pervasive Label Errors in Test Sets Destabilize Machine Learning ...
    Nov 7, 2021 · Label errors in test sets, averaging 3.3% across 10 datasets, can undermine machine learning benchmarks, potentially leading to incorrect model ...
  7. [7]
    [PDF] How Does Frequency Bias Affect the Robustness of Neural Image ...
    We conduct experiments on SVHN, CIFAR-10, CIFAR-. 100 and TinyImageNet to show how low-frequency bias in Jacobians can improve robustness against adversarial.
  8. [8]
    [1806.00451] Do CIFAR-10 Classifiers Generalize to CIFAR ... - arXiv
    Jun 1, 2018 · To understand the danger of overfitting, we measure the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images.Missing: risk | Show results with:risk
  9. [9]
    Do We Train on Test Data? Purging CIFAR of Near-Duplicates - NIH
    There are two different CIFAR datasets [12]: CIFAR-10, which comprises 10 classes, and CIFAR-100, which comprises 100 classes. Both contain 50,000 training and ...
  10. [10]
    None
    ### Training Protocol Summary for CIFAR-10 in ResNet Paper
  11. [11]
    None
    ### Training Protocol for CIFAR-10 (DenseNet)
  12. [12]
    None
    ### Summary of Training Protocol for CIFAR-10 or Small Images
  13. [13]
    None
    ### Training Protocol for ViT on CIFAR-10
  14. [14]
    [PDF] Convolutional Deep Belief Networks on CIFAR-10 - Computer Science
    The author achieved a 65% accuracy on the CIFAR-10 test set. In [7], the authors train a mcRBM to achieve a 71.0% accuracy rate. In [6], the authors achieve ...
  15. [15]
    [1512.03385] Deep Residual Learning for Image Recognition - arXiv
    Dec 10, 2015 · Deep residual learning uses a framework to ease training of deeper networks by learning residual functions, achieving 3.57% error on ImageNet.
  16. [16]
    [PDF] Densely Connected Convolutional Networks - CVF Open Access
    The right plot in Figure 4 shows that a DenseNet-BC with only 0.8M trainable parameters is able to achieve comparable accuracy as the 1001-layer. (pre- ...
  17. [17]
    AutoAugment: Learning Augmentation Policies from Data - arXiv
    May 24, 2018 · On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are ...Missing: Mixup | Show results with:Mixup
  18. [18]
  19. [19]
    Best Tsetlin Machine student paper research at ISTM 2024
    Sep 5, 2024 · ... state-of-the-art results on CIFAR-10 for Tsetlin Machines. Accuracy for the dataset now reaches as much as 82.8%. The code associated with ...
  20. [20]
  21. [21]
    CVPR 2025 Open Access Repository
    ... state-of-the-art. On the CIFAR-10 SIREN classification task, we improve the state-of-the-art without augmentations from 38.8% to 59.6%, and from 63.4% to ...
  22. [22]
    Mapping global dynamics of benchmark creation and saturation in ...
    Nov 10, 2022 · Models achieving new state-of-the-art (SOTA) results on established benchmarks receive widespread recognition. Thus, benchmarks do not only ...Results · Sota Curve Diversity And... · Creating Global Maps Of Ai...
  23. [23]
  24. [24]
    [1810.03505] CINIC-10 is not ImageNet or CIFAR-10 - arXiv
    Oct 2, 2018 · In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10.
  25. [25]
    STL-10 dataset
    The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms.Missing: details | Show results with:details
  26. [26]
    The Street View House Numbers (SVHN) Dataset - Deep Learning
    SVHN is a real-world image dataset from Google Street View house numbers, with 10 classes (one per digit), and 73257 training digits.
  27. [27]
    zalandoresearch/fashion-mnist - GitHub
    Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a ...