Fact-checked by Grok 2 weeks ago

Neural architecture search

Neural architecture search (NAS) is a subfield of (AutoML) that automates the design of artificial architectures by systematically exploring a defined search space to identify models that optimize performance metrics such as accuracy, efficiency, or latency, thereby reducing the reliance on expert manual engineering. The origins of NAS trace back to early applications of evolutionary algorithms for design in the late , but the field surged in prominence with the advent of . The foundational modern approach was introduced by Zoph and Le in 2016, who employed to generate and evaluate architectures on image tasks like , demonstrating that automated search could rival hand-crafted designs but at significant computational expense—often requiring thousands of GPU hours. Subsequent works, such as NASNet by Zoph et al. in 2017, refined this by searching for reusable "cells" on smaller datasets and transferring them to larger ones like , achieving state-of-the-art results in and . At its core, NAS comprises three interconnected components: the search space, which specifies the universe of possible architectures (e.g., cell-based structures with operations like convolutions or skip connections); the search , which navigates this space using algorithms like , evolutionary methods, or ; and the performance estimation , which assesses candidate architectures through full training, weight sharing, or low-fidelity proxies to mitigate costs. Early strategies were computationally intensive, but innovations like parameter sharing in ENAS (Pham et al., 2018) reduced search times from days to hours by reusing weights across architectures. Key advancements have diversified search strategies, including evolutionary algorithms in AmoebaNet (Real et al., 2018), which used regularized to evolve high-performing convolutional cells competitive with NASNet on while emphasizing model size constraints. A major breakthrough came with gradient-based methods like (Liu et al., 2018), which reformulates architecture search as a differentiable , enabling end-to-end training via and drastically cutting search costs to GPU-days rather than GPU-thousands. These approaches have extended NAS beyond vision to domains like , , and efficient mobile deployment. In the years since, NAS has further advanced with hardware-aware methods, zero-shot estimation techniques, and integrations with transformer-based large language models. NAS has transformed by consistently discovering architectures that surpass manually designed ones in benchmarks, such as achieving lower error rates on and while balancing trade-offs in parameters and inference speed. Its importance lies in democratizing AI model development, accelerating innovation, and enabling tailored solutions for resource-constrained environments, though challenges remain in , across tasks, and benchmark reliability.

Introduction

Definition and Motivation

Neural Architecture Search (NAS) is an automated methodology for discovering optimal architectures tailored to specific tasks, such as or segmentation. It operates through three core components: a search space that delineates the universe of possible architectures (e.g., layer types, connections, and hyperparameters); a search strategy that navigates this space to sample candidate architectures; and a performance estimation strategy that assesses the efficacy of these candidates, often via or proxy metrics. This framework shifts the burden of architecture engineering from human experts to algorithmic exploration, enabling the identification of high-performing models without exhaustive manual iteration. The motivation for NAS arises from the inherent limitations of traditional manual architecture design, which demands substantial domain expertise, iterative experimentation, and significant time investment, often leading to suboptimal or biased outcomes due to human intuition. By automating this process, NAS not only enhances model accuracy and efficiency—frequently surpassing hand-crafted designs like ResNet or VGG—but also democratizes advanced development, integrating it into broader ecosystems to streamline end-to-end pipelines. This approach addresses the growing complexity of models, where manual tuning becomes increasingly infeasible as architectures scale in depth and width. NAS can be viewed as a natural extension of , evolving to target structural elements like and operations rather than just tuning parameters. Early applications concentrated on image classification benchmarks, including for smaller-scale validation and for large-scale transferability; for example, reinforcement learning-based searches on yielded architectures with error rates around 3.65%, while subsequent adaptations like NASNet achieved state-of-the-art top-1 accuracy of 82.7% on . A central trade-off in NAS is the balance between computational expense and performance improvements, as exhaustive searches can demand thousands of GPU hours—early methods, for instance, required up to 1800 GPU-days on —prompting ongoing research into efficient approximations to make NAS viable for resource-constrained environments.

Historical Development

Neural architecture search (NAS) originated in 2016 with the pioneering work of Barret Zoph and , who introduced an RL-based approach using a controller to generate architectures, achieving state-of-the-art performance of 3.65% test error on , surpassing prior hand-designed convolutional networks. This method marked the first automated discovery of competitive neural architectures, setting the foundation for subsequent NAS research by demonstrating that could optimize network design directly from data. In 2017, advancements scaled NAS to larger datasets, exemplified by NASNet, which searched for transferable "cells" on and applied them to , yielding a model with 82.7% top-1 accuracy while reducing computational demand by 9 billion (28% fewer than the best prior human-designed architectures like Inception-v4). This transferability innovation highlighted NAS's potential for practical deployment across tasks. Around the same time, evolutionary methods began emerging as compute-efficient alternatives to , though they gained prominence later. The high computational costs of early RL-based NAS—often requiring thousands of GPU days—prompted a shift toward efficiency in , with the introduction of one-shot methods like Efficient Neural Architecture Search (ENAS) and gradient-based approaches such as , which relaxed the discrete search space into a continuous one for end-to-end optimization, reducing search time to hours on a single GPU. Hardware-aware NAS also debuted that year with FBNet, incorporating latency predictions into the search objective to optimize for edge devices. Post-2020, NAS evolved toward hardware optimization and broader architectures, with expansions in hardware-aware methods for devices through 2022–2025, including multi-objective searches balancing accuracy, , and on resource-constrained platforms. with transformers surged, enabling automated design of efficient variants for and tasks, as surveyed in comprehensive reviews. By 2025, literature highlighted generative NAS paradigms, leveraging diffusion models and LLMs to sample architectures from learned distributions, further enhancing scalability. Key milestones included the establishment of AutoML conferences and ICML workshops fostering collaboration, alongside benchmarks like NAS-Bench-101 in 2019, which tabulated 423,000 architectures on to standardize reproducible evaluations.

Core Concepts

Search Space Definition

In neural architecture search (NAS), the search space encompasses all possible neural network architectures that can be constructed and evaluated during the optimization process. It serves as the foundational domain from which NAS methods sample and select candidate architectures, directly influencing the diversity, expressiveness, and computational feasibility of the search. Broadly, search spaces are categorized into macro and micro types. Macro search spaces define the entire network topology, including the sequence of layers, connections, and global structure, allowing for comprehensive exploration of full architectures but often at high computational cost. In contrast, micro search spaces focus on smaller, repeatable building blocks or modules, such as cells, which are then stacked to form the complete network, enabling more manageable optimization while promoting modularity and transferability across models. Search spaces are parameterized either discretely or continuously to represent architectural choices. Discrete parameterization involves selecting from a predefined set of operations (e.g., convolutions, pooling, skip connections) for each position in the , resulting in a combinatorial space where each choice is categorical. For instance, in chain-structured spaces, architectures are represented as sequential compositions of layers, where each layer's operation and hyperparameters (e.g., kernel size, number of filters) are chosen independently, leading to an exponential growth in possibilities as the number of layers increases. Hierarchical search spaces extend this by organizing architectures across multiple levels, such as optimizing individual operations within and then the arrangement of those cells into blocks, as seen in methods like PNAS, which progressively refines structures from low to high levels. Continuous parameterization relaxes the discrete choices into a differentiable form, typically using softmax distributions over candidate operations to create a supernet where architectures are weighted mixtures, facilitating gradient-based optimization. A prominent example is , where each edge in a (DAG) representing a is parameterized by architecture variables that blend operations continuously. Defining effective search spaces faces significant challenges, primarily the curse of dimensionality arising from their vast size. For example, the NASNet micro search space for a single cell with five blocks and five operation choices per block yields approximately 10^{18} possible topologies, rendering exhaustive enumeration impractical and necessitating efficient sampling or approximation strategies like pruning irrelevant operations to reduce complexity. These spaces often exceed 10^{10} configurations even in constrained settings, amplifying the computational burden of evaluating candidates. To mitigate this, techniques such as restricting operation sets or imposing structural priors (e.g., DAG constraints) are employed, though they risk biasing toward human-designed priors. Performance estimation on these spaces, which assesses architecture quality without full , is crucial but deferred to separate methodologies. Recent developments as of 2025 emphasize domain-specific search spaces tailored to emerging architectures, particularly vision transformers (ViTs) and multimodal models, to better capture task-unique priors and improve efficiency. For ViTs, search spaces now incorporate transformer-specific primitives like attention heads, positional encodings, and patch embeddings, as in Autoformer, which optimizes embedding dimensions and layer configurations within a ViT block to enhance out-of-domain generalization. In multimodal contexts, spaces integrate cross-modal fusion operations (e.g., attention across text and image branches), enabling NAS to discover hybrid architectures for tasks like visual question answering, though challenges persist in balancing modality-specific constraints with overall scalability. These trends shift from generic convolutional spaces toward hybrid designs, prioritizing adaptability to large-scale pretraining and paradigms.

Performance Estimation

Performance estimation in neural architecture search (NAS) involves evaluating the quality of candidate architectures to guide the search process toward high-performing models. The most accurate approach is full training, where each candidate architecture is trained from scratch to convergence on the target dataset and evaluated on a held-out validation set. This method provides precise performance metrics but is computationally prohibitive; for instance, the seminal NASNet search required approximately 2,000 GPU-days on using to explore thousands of architectures. Such costs highlight the need for efficient alternatives, as full training scales poorly with the size of the search space. To mitigate these expenses, proxy tasks employ low-fidelity approximations that trade some accuracy for speed. These include training candidates for fewer epochs, using smaller proxy datasets (e.g., subsets of or MNIST instead of ), or reducing model width and depth while assuming rank consistency with full training. Surveys indicate that after a fixed number of iterations often preserves performance rankings with Spearman's rank correlation coefficients above 0.7 on benchmarks like NAS-Bench-201, enabling searches to complete in hours rather than days. However, the reliability of these proxies varies across search spaces, necessitating validation on specific tasks. More recent advancements focus on zero-cost proxies, which estimate performance without any training by analyzing architecture properties in a single forward or backward pass on random data. Introduced in 2021, these proxies draw from techniques and include metrics like synaptic saliency (measuring parameter importance via gradients) and covariance (assessing feature sensitivity). Variants in methods like ENAS extend this by incorporating shared computations, but pure zero-cost approaches achieve rank s of up to 0.8 with true accuracy on NAS-Bench-101 without . By 2025, innovations such as zero-cost proxies (ParZC) enhance adaptability through learnable parameters, improving on diverse benchmarks like NDS, while evolving composite proxies combine multiple metrics nonlinearly for better generalization across tasks. Weight-sharing strategies further amortize costs by training a supernet that encompasses all candidate architectures, allowing subsets to inherit pre-trained weights for rapid evaluation. This forms the basis for one-shot NAS, where architectures are sampled from the supernet and assessed via metrics, reducing search times from GPU-days to hours as demonstrated in ENAS on CIFAR-10. Performance is typically measured by primary objectives like top-1 accuracy and latency, alongside secondary factors such as robustness to adversarial attacks, often quantified via expected calibration error or defense success rates. To validate effectiveness, Spearman's rank correlation \rho between scores and true performance is commonly used: \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} where d_i are rank differences and n is the number of architectures; values exceeding 0.6 indicate reliable proxies for guiding NAS.

Search Strategies

Reinforcement Learning Approaches

Reinforcement learning approaches to neural architecture search frame the problem as a sequential decision-making process, where a controller—typically a recurrent neural network (RNN)—samples architectures from a defined search space by generating sequences of architectural choices, such as layer types, filter sizes, and connections. The controller is trained as a policy in an RL setting, with the reward signal derived from the validation accuracy of the sampled architectures after training them on a dataset like CIFAR-10. To optimize the policy, the REINFORCE algorithm is employed, which updates the controller's parameters \theta using the policy gradient: \theta \leftarrow \theta + \alpha \nabla_\theta \log \pi(a|s; \theta) (R - b), where \pi(a|s; \theta) is the policy probability of action a given state s, R is the reward, b is a baseline (e.g., an exponential moving average of past rewards) to reduce variance, and \alpha is the learning rate. This baseline helps stabilize training by subtracting the expected reward from the actual reward, mitigating high variance in gradient estimates. The seminal work by Zoph and Le introduced this RL-based NAS framework, demonstrating its efficacy by searching for convolutional architectures on , achieving a test error rate of 3.65%, which outperformed prior hand-crafted models like DenseNet at the time. Building on this, Zoph et al. developed NASNet, which uses an RNN controller to search for reusable "cells"—motifs of operations that can be stacked to form full networks—optimized initially on and transferred to , yielding 82.7% top-1 accuracy while requiring fewer floating-point operations than human-designed alternatives. In NASNet, the controller employs for more stable updates compared to vanilla REINFORCE, focusing the search on normal and reduction cells with operations like convolutions, pooling, and skips. Variants of this approach, such as recurrent NAS methods, emphasize sequential decision-making for generating architectures, often incorporating bidirectional LSTMs or network morphisms to preserve performance during exploration. For instance, Cai et al. integrated network transformations with REINFORCE to efficiently evolve architectures without full retraining from scratch. These methods maintain the core RNN controller for sampling but enhance efficiency through techniques like weight caching, where intermediate model weights are reused across similar architectures to reduce redundant computations. Despite their pioneering role as the first fully automated NAS methods, RL approaches suffer from significant sample inefficiency, often requiring the training of thousands of child models—e.g., 12,800 for searches—demanding substantial resources like 800 GPUs over weeks. This high computational cost, coupled with risks of premature to suboptimal architectures due to exploration biases, has led to their decline in favor of more efficient strategies in subsequent research. Early RL NAS established the viability of automated architecture design but highlighted the need for and faster evaluation to scale effectively.

Evolutionary Algorithms

Evolutionary algorithms for neural architecture search (NAS) maintain a of candidate architectures, each represented as a encoding the network's and operations. These architectures are evaluated for based on their performance, typically measured by validation accuracy after on a proxy task. The process iterates over generations, applying genetic operators to evolve the toward higher-performing designs. Key operations include crossover, which combines topologies from two parent architectures by exchanging substructures such as cells or layers; , which alters operations or , such as replacing a with a different kernel size; and selection, often using tournament selection where architectures compete in pairs or groups, or rank-based selection prioritizing top performers. These derivative-free methods explore discrete search spaces effectively by mimicking , balancing exploration through diversity and exploitation via fitness-driven choices. Seminal work includes LargeEvo, which applies regularized evolution to search for image classifiers, achieving 94.6% accuracy on comparable to methods but with reduced hyperparameter tuning. This approach uses selection with regularization to favor younger architectures, preventing premature . Similarly, AmoebaNet employs aging , introducing an age attribute to genotypes that biases selection toward recent mutations, yielding hierarchical cell-based architectures with 83.9% top-1 accuracy on while surpassing hand-designed models. Recent advances incorporate population-based training (PBT) principles, such as population-guided , to dynamically steer mutations using distribution statistics from the current population, enabling rediscovery of expert designs like ResNet variants with minimal human bias. This 2025 update enhances efficiency by adapting exploration without fixed hyperparameters, reducing search time by up to 66% on benchmarks like NAS-Bench-101 compared to standard regularized . Evolutionary NAS offers advantages in parallelizability, as fitness evaluations across the population can run concurrently on distributed systems, and excels in discrete spaces where gradient-based methods falter. Mutation probability is commonly set as p_m = \frac{1}{L}, where L is the architecture's length in operations or nodes, ensuring on average one change per individual to maintain diversity. For scalability, evolutionary methods have produced hierarchical architectures transferable to large-scale tasks like classification, often matching performance with greater architectural diversity that supports robustness across datasets.

Bayesian Optimization Methods

Bayesian optimization (BO) methods in neural architecture search (NAS) model the performance landscape of architectures as a black-box , using a probabilistic to guide efficient of the . Typically, a (GP) serves as the , providing a posterior distribution over the objective based on observed evaluations, which captures both mean predictions and uncertainty. This allows BO to balance and exploitation through acquisition functions, such as expected (EI), defined as
EI(x) = \mathbb{E}[\max(f(x) - f(x^+), 0)]
where f(x) is the predicted performance at architecture x, and f(x^+) is the current best observed value. In NAS, this framework minimizes the number of costly full trainings by prioritizing promising architectures for evaluation.
Early applications of BO in NAS extended hyperparameter optimization techniques to architecture search. SMAC, a sequential model-based algorithm using random forest surrogates, and BOHB, which combines Bayesian modeling with Hyperband for multi-fidelity optimization, were adapted from hyperparameter tuning to NAS benchmarks like NAS-Bench-101, where they robustly handle invalid architectures and outperform random search by achieving equivalent performance approximately five times faster after around 50 evaluations. A notable advancement is BANANAS (2019), which replaces traditional GP surrogates with neural network predictors—such as ensembles of feedforward networks or graph convolutional networks—paired with path-based encoding of architectures to improve scalability and accuracy in high-dimensional spaces. BANANAS demonstrates state-of-the-art results on NAS-Bench-101 (test error of 5.923% after 150 evaluations) and NAS-Bench-201, showing high correlation (e.g., Spearman rank correlations exceeding 0.8) between surrogate predictions and true performance. For handling the mixed discrete-continuous nature of NAS spaces, tree-structured Parzen estimators (TPE) model the distribution of top-performing architectures versus others using density estimates over tree-structured conditionals, enabling effective sampling in frameworks like NNI for tasks such as chain-structured NAS. The efficiency of BO methods stems from their ability to reduce the number of architecture evaluations by 10-100 times compared to , particularly in benchmark spaces where surrogates achieve strong predictive correlations. For instance, on NAS-Bench-101, BO variants like SMAC require fewer than 25 full evaluations to match 's median performance, leveraging uncertainty to avoid redundant sampling.

Local Search Techniques

Local search techniques in neural architecture search (NAS) rely on iterative optimization starting from an initial architecture, exploring nearby candidates in the search space, and moving to a better-performing neighbor if one is found. This approach, often implemented as hill-climbing, begins with a randomly initialized or hand-crafted architecture and repeatedly evaluates modifications to its components, such as convolutional layers or connections, until no further improvement is possible within the defined neighborhood. The simplicity of this method makes it particularly suitable for discrete search spaces where full evaluation of each candidate is computationally feasible, contrasting with more global strategies by focusing on local improvements without requiring probabilistic models or population maintenance. A seminal example of hill-climbing in NAS is the Neural Architecture Search by Hill-climbing (NASH) method, which applies network morphisms—transformations that preserve the functionality of the parent network while expanding its capacity—to generate child architectures. Starting from a small initial network, NASH iteratively selects the best child based on validation performance after brief training, achieving a test error below 6% on in approximately 12 hours on a single GPU. Neighborhoods in such methods are typically defined by small, targeted changes, including single operation swaps (e.g., replacing a with a different size), layer addition or removal, insertion of skip connections, or widening/deepening existing layers, with selection favoring the modification that maximizes performance gain. Regularized serves as a variant, incorporating age-based regularization in the selection process to encourage exploration beyond strict local optima while maintaining a focus on incremental improvements. The core decision rule in hill-climbing evaluates s by computing the performance difference: \Delta = \text{perf}(A') - \text{perf}(A) where A is the current architecture, A' is a , and \text{perf}(\cdot) denotes the measured accuracy or after partial . A move is accepted if \Delta > 0, ensuring monotonic progress toward a local optimum; ties or small gains may incorporate epsilon-greedy perturbations for stability. This process incurs low overhead, as it can operate in a single-threaded manner without models, enabling rapid iteration on modest hardware. Despite these advantages, hill-climbing is inherently prone to local optima, especially in rugged NAS search spaces where noise from finite training exacerbates trapping in suboptimal architectures. Recent analyses highlight that reducing evaluation noise through techniques like longer warm-up training enhances its effectiveness, making it competitive with more complex methods on benchmarks like NAS-Bench-101. To mitigate local optima, hybrids incorporating random restarts—reinitializing the search from new starting points upon —have been explored, allowing multiple local optima to be sampled efficiently within a fixed computational budget.

Efficient and Advanced Methods

One-Shot NAS

One-shot (NAS) is a that trains a single supernet—a large that encompasses all possible sub-architectures within a defined search space—to enable efficient sampling and of architectures. By employing weight-sharing , the supernet allows multiple architectures to the same parameters during , thereby avoiding the redundant required for of each , which can otherwise demand thousands of GPU hours. This approach builds on estimation strategies that leverage shared weights to approximate the quality of sub-architectures quickly. Pioneering methods in one-shot NAS include the Efficient Neural Architecture Search (ENAS) framework, introduced in 2018, which uses a reinforcement learning-based controller to sample subgraphs from the supernet while sharing weights across operations to guide the search toward high-performing architectures. Another key method is Single Path One-Shot (SPOS) from 2019, which employs uniform sampling from the supernet during training to derive diverse architectures, emphasizing simplicity and broad exploration of the search space. ProxylessNAS, also from 2018, extends this paradigm to target specific hardware constraints, such as mobile devices, by directly optimizing architectures on the deployment platform using shared weights and latency-aware sampling, achieving competitive accuracy with reduced inference latency. One-shot NAS offers substantial advantages, including up to a 1000-fold in search time compared to training architectures independently, as demonstrated on benchmarks like where ENAS completed searches in under one GPU day versus thousands for prior methods. However, challenges arise from correlations in shared weights, where frequent reuse can lead to biased performance estimates as sub-architectures do not train in isolation, potentially inflating rankings for over-represented paths. To mitigate this, techniques like path dropout have been proposed to randomly mask paths during supernet training, promoting diversity and more reliable evaluations. Methods like SPOS improve architecture ranking correlation with standalone training through uniform single-path sampling, achieving Kendall Tau correlations of 0.42–0.64 on benchmarks. As of 2025, advancements in one-shot NAS have introduced training-free variants that integrate zero-cost proxies—metrics derived from the architecture's or initial forward passes without any parameter updates—to enable instantaneous performance prediction and further accelerate searches. For instance, methods like TG-NAS use operator embeddings and graph learning to generalize these proxies across diverse search spaces, achieving strong correlations with final accuracies on while requiring zero training epochs for evaluation. These developments extend the efficiency of one-shot paradigms to resource-constrained settings, such as edge devices, without compromising architectural quality.

Gradient-Based NAS

Gradient-based neural architecture search (NAS) methods enable the optimization of discrete architectural choices through continuous relaxations, allowing the use of for efficient end-to-end training. These approaches represent the search space as a continuous over possible architectures, typically by assigning learnable parameters to operations and relaxing selection to differentiable forms. This formulation addresses the limitations of discrete search strategies by enabling through the architecture parameters, significantly reducing computational costs compared to enumerative or sampling-based methods. A core technique in gradient-based NAS is the relaxation of categorical operation choices using a over architecture parameters \alpha_i, weighted by a \tau: \bar{\alpha}_i = \frac{\exp(\alpha_i / \tau)}{\sum_j \exp(\alpha_j / \tau)} This produces a weighted combination of operations during search, with the final architecture derived by selecting the operation with the highest \alpha_i via argmax after optimization. The search process is often framed as a problem, where architecture parameters \alpha are optimized in the outer loop using validation loss, while network weights w are optimized in the inner loop using training loss: \min_{\alpha} L_{\text{val}}(w^*(\alpha), \alpha), \quad \text{where} \quad w^*(\alpha) = \arg\min_w L_{\text{train}}(w, \alpha). Approximations, such as alternating single-step updates, make this tractable without full inner-loop convergence. The landmark method, Differentiable Architecture Search (DARTS), introduced this bilevel framework for searching repeatable cell structures in convolutional networks. DARTS searches on CIFAR-10 in four GPU days, yielding a normal cell and reduction cell that, when stacked into a macro architecture, achieve a 2.76% test error with 3.3 million parameters—outperforming prior manual designs while using fewer resources. Subsequent variants addressed limitations in DARTS, such as sensitivity to softmax relaxation and high memory demands. Gradually Differentiable Architecture Search (GDAS) replaces softmax with Gumbel-softmax sampling to better approximate discrete choices during search, enabling robust architecture discovery in four GPU hours and a 2.82% CIFAR-10 test error with 2.5 million parameters. Partially Connected DARTS (PC-DARTS) mitigates memory overhead by partially connecting channels in the supernet—sampling subsets for computation while normalizing edges—allowing searches with larger batch sizes and achieving a 2.57% CIFAR-10 test error in 0.1 GPU days. By 2025, advances have scaled gradient-based to large models like , with methods like Smooth Activation (SA-DARTS) introducing regularization to counter skip-connection dominance and gaps between search and phases, improving and on complex spaces. Similarly, DASViT extends differentiable search to , optimizing mixing and projection operations in a continuous space to yield architectures outperforming ViT-B/16 baselines on and other datasets while addressing scalability challenges in high-dimensional designs.

Multi-Objective NAS

Multi-objective neural architecture search (NAS) extends traditional NAS by optimizing architectures with respect to multiple conflicting criteria simultaneously, such as prediction accuracy, measured in , inference , and resource constraints like power consumption. This approach is formalized as a problem: \min_{A} (f_1(A), f_2(A), \dots, f_k(A)), where A represents a neural and each f_i(A) denotes an , such as rate or . Solutions are selected via non-dominated sorting, identifying Pareto-optimal architectures where no can improve without degrading another. Pareto optimization in multi-objective NAS often adapts evolutionary algorithms like NSGA-II, which employs non-dominated sorting and crowding distance to approximate the —a set of non-dominated architectures balancing trade-offs. In NSGA-Net, for instance, NSGA-II explores a cell-based search space to minimize classification error and , using crowding distance to promote diversity and prevent premature convergence to suboptimal solutions. This evolutionary framework maintains a of architectures, applying crossover and operators informed by , yielding a diverse in a single search run. Early methods like MONAS (2018) apply to multi-objective NAS for devices, defining a composite reward function that balances accuracy and hardware constraints such as peak power consumption. MONAS searches for architectures on datasets like , achieving accuracies comparable to single-objective baselines while satisfying power budgets under 1 watt. Similarly, hardware-aware approaches like FBNet (2018) use differentiable NAS to jointly optimize accuracy and device-specific , incorporating quantization-aware search (Q-NAS) to evaluate architectures under low-precision constraints like INT8, enabling deployment on hardware with latencies as low as 2.9 ms on S8. Recent advancements address in multi-objective NAS, such as generative methods that integrate evolutionary algorithms with generative models to explore architecture distributions. For example, a 2024 framework combines multi-objective evolutionary search with generative architecture generation to optimize for accuracy and . CE-NAS (2024) further exemplifies this by employing and to dynamically allocate GPU resources based on carbon intensity, achieving up to 7.22× reduction in CO₂ emissions compared to standard NAS while maintaining high accuracy (e.g., 80.6% top-1 on ).

Evaluation and Applications

Benchmarks and Datasets

Standardized benchmarks have become essential in neural architecture search (NAS) to enable fair, reproducible comparisons across methods by providing pre-evaluated architectures and performance metrics. These benchmarks typically consist of tabular datasets mapping architectures to their accuracies, training curves, and other properties, allowing rapid evaluation without full retraining. Early benchmarks focused on tasks, but recent ones incorporate multi-domain and hardware-aware aspects to better reflect real-world deployment. NAS-Bench-101, introduced in 2019, is a foundational tabular containing 423,624 unique architectures from a cell-based search space, each fully trained and evaluated on for 108 epochs. It provides precomputed metrics such as validation accuracy, test accuracy, and training time, facilitating fast prototyping of NAS algorithms by querying the table instead of training from scratch. The benchmark's top-performing architecture achieves a test accuracy of 94.07% on , while methods like typically recover architectures around 91-92% accuracy when evaluated on this space. Building on this, NAS-Bench-201, released in 2020, extends reproducibility to multiple datasets with 15,625 architectures in a different search space, evaluated on , CIFAR-100, and a downsampled variant (ImageNet-16-120). This allows assessment of architecture transferability across datasets, revealing that strong performers on often generalize well but degrade on more complex tasks like ImageNet-16-120. For instance, the best architecture yields 91.38% accuracy on , 73.85% on CIFAR-100, and 51.03% on ImageNet-16-120. For larger search spaces, NAS-Bench-301 (2020) introduces a modeling approach to handle the search space of approximately 10^18 architectures, using Gaussian processes to estimate performance via zero-shot proxies without exhaustive evaluation. This supports rapid zero-shot predictions, with surrogate models achieving correlations up to 0.85 with true performance at a fraction of the computational cost. By 2025, it has become a standard for prototyping in expansive spaces, integrating with performance estimation strategies. Hardware-aware benchmarks address deployment constraints beyond accuracy, such as and energy. HW-NAS-Bench (2021) augments NAS-Bench-201 by providing measured hardware metrics (e.g., , ) for all architectures across six devices, including CPU, GPU, and edge platforms, enabling multi-objective NAS that balances accuracy with efficiency. For example, it shows that top accuracy architectures often incur 2-5x higher on mobile GPUs compared to optimized ones. Common datasets in NAS benchmarks are for initial validation and for large-scale testing, due to their established role in CNN evaluation. Emerging benchmarks incorporate VTAB (Visual Task Adaptation Benchmark, 2019), a suite of 19 diverse vision tasks (e.g., object classification, counting) for assessing , with NAS methods evaluated on few-shot adaptation to measure generalization beyond single-dataset accuracy. Key metrics in these benchmarks include search cost (measured in GPU hours) and anytime accuracy (best validation accuracy at any training ), emphasizing efficiency alongside performance. The following table summarizes representative results for top architectures or methods on select benchmarks:
BenchmarkDatasetTop Accuracy (%)Example Method (Accuracy %)Search Cost (GPU Hours)Source
NAS-Bench-101CIFAR-1094.07DARTS (~91.8)N/A (tabular)Ying et al., 2019
NAS-Bench-201CIFAR-1091.38Random Search (90.83)~0.5Dong & Yang, 2020
NAS-Bench-201ImageNet-16-12051.03ENAS (50.59)~1.5Dong & Yang, 2020
NAS-Bench-301ImageNet~75.5 (surrogate est.)Zero-shot Proxy (corr. 0.85)<0.1Zela et al., 2020
HW-NAS-BenchCIFAR-10 (GPU)91.25Hardware-Optimized (latency 1.2ms)N/ACai et al., 2021

Challenges and Recent Advances

One of the primary challenges in neural architecture search (NAS) is its exorbitant computational cost, often demanding thousands of GPU-days for exhaustive of candidate architectures during the search process. This expense arises from the need to train and assess numerous models iteratively, limiting accessibility for researchers without substantial resources. Additionally, poor across datasets remains a persistent issue, as architectures optimized for specific benchmarks frequently underperform on unseen data distributions due to to the search task's peculiarities. Sensitivity to the search space design exacerbates this, with biases in space construction leading to skewed rankings and missed opportunities for discovering robust models. further complicates progress, as results are highly sensitive to random seeds in initialization and sampling, compounded by the absence of standardized for consistent protocols. Recent advances have addressed these hurdles through innovative paradigms like generative NAS, which reformulates the search as approximating the underlying distribution of high-performing architectures, thereby enabling sampling of promising candidates without full enumeration. This approach, introduced in 2025, leverages probabilistic modeling to prioritize likely optimal structures, significantly reducing computational overhead while maintaining competitive performance. Efficient global search strategies incorporating zero-cost evaluators have also gained traction; for instance, 2025 variants of Efficient NAS (ENAS) use lightweight proxies to estimate architecture quality instantaneously, facilitating broader exploration in large spaces without resource-intensive training. These evaluators, such as synaptic saliency or Jacobian traces, provide hardware-aware predictions that scale to complex scenarios. Emerging applications extend NAS beyond traditional convolutional neural networks (CNNs) to architectures like transformers and neural networks (GNNs), where tailored search spaces account for mechanisms and topologies to yield specialized models for and relational data tasks. In parallel, sustainable NAS initiatives, exemplified by GreenNAS, incorporate energy minimization as an objective, using performance predictors to curb carbon emissions during hyperparameter and architecture tuning, aligning development with environmental goals. Looking ahead, integrating NAS with promises privacy-preserving searches across decentralized devices, as demonstrated by frameworks like DPFNAS that adapt architectures while enforcing . Hybrid methods blending for global exploration with gradient-based refinement for local efficiency are poised to enhance search robustness, particularly in multi-objective settings that balance accuracy and constraints.

References

  1. [1]
    [PDF] Neural Architecture Search: A Survey
    Neural Architecture Search (NAS), the process of automating architecture engineering, is thus a logical next step in automating machine learning. Already by now ...Missing: key | Show results with:key
  2. [2]
    Neural Architecture Search - AutoML.org
    Neural Architecture Search (NAS) automates the process of architecture design of neural networks. NAS approaches optimize the topology of the networks.Missing: key | Show results with:key
  3. [3]
    Neural Architecture Search with Reinforcement Learning - arXiv
    Nov 5, 2016 · In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning.
  4. [4]
    Learning Transferable Architectures for Scalable Image Recognition
    Jul 21, 2017 · This paper introduces NASNet, a method to learn image architectures by searching for a building block on a small dataset and transferring it to ...
  5. [5]
    [PDF] Efficient Neural Architecture Search via Parameter Sharing
    Abstract. We propose Efficient Neural Architecture Search. (ENAS), a fast and inexpensive approach for au- tomatic model design. ENAS constructs a large.Missing: original | Show results with:original
  6. [6]
    Regularized Evolution for Image Classifier Architecture Search - arXiv
    Feb 5, 2018 · Access Paper: View a PDF of the paper titled Regularized Evolution for Image Classifier Architecture Search, by Esteban Real and 2 other authors.
  7. [7]
    [1806.09055] DARTS: Differentiable Architecture Search - arXiv
    Jun 24, 2018 · Abstract:This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.
  8. [8]
    [PDF] Neural Architecture Search - AutoML.org
    Different RL approaches differ in how they represent the agent's policy and how they optimize it: Zoph and Le [74] use a recurrent neural network (RNN) policy ...Missing: original paper
  9. [9]
    [PDF] A Survey on Computationally Efficient Neural Architecture Search
    Since training DNNs is itself computationally expensive, early NAS methods suffer from a high computational burden. For example, Zoph et al. [22] use ...
  10. [10]
    FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable ...
    Dec 9, 2018 · Abstract:Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large.
  11. [11]
    [2506.13755] MARCO: Hardware-Aware Neural Architecture Search ...
    Jun 16, 2025 · This paper introduces MARCO (Multi-Agent Reinforcement learning with Conformal Optimization), a novel hardware-aware framework for efficient neural ...Missing: review | Show results with:review
  12. [12]
  13. [13]
    Systematic review on neural architecture search
    Jan 6, 2025 · Neural Architecture Search (NAS), as a subset of AutoML, is a framework for optimizing hyperparameters related to the architecture of Neural ...
  14. [14]
    NAS-Bench-101: Towards Reproducible Neural Architecture Search
    Feb 25, 2019 · NAS-Bench-101 is a public dataset for NAS research, containing 423k unique architectures and over 5 million trained models, to help researchers ...
  15. [15]
    Advances in neural architecture search | National Science Review
    This paper delves into the multifaceted aspects of NAS, elaborating on its recent advances, applications, tools, benchmarks and prospective research directions.
  16. [16]
    Transformer-based neural architecture search for effective visible ...
    Mar 1, 2025 · We employ a novel transformer-based neural architecture search (TNAS) deep learning approach for effective VI–reID.
  17. [17]
    [PDF] Efficient Evaluation Methods for Neural Architecture Search: A Survey
    Jan 14, 2023 · The early-stopping strategy assumes that the performance ranking obtained by the partially-trained architectures is consistent with the actual ...
  18. [18]
    [2101.08134] Zero-Cost Proxies for Lightweight NAS - arXiv
    Jan 20, 2021 · We propose a series of zero-cost proxies, based on recent pruning literature, that use just a single minibatch of training data to compute a model's score.Missing: seminal | Show results with:seminal
  19. [19]
    ParZC: Parametric Zero-Cost Proxies for Efficient NAS
    Apr 11, 2025 · We introduce a novel method called Parametric Zero-Cost Proxies (ParZC) framework to enhance the adaptability of zero-cost proxies through parameterization.
  20. [20]
    Evolving Comprehensive Proxies for Zero-Shot Neural Architecture ...
    Jul 13, 2025 · In this work, we address this issue by assembling four distinct zero-cost proxies in a nonlinear fashion to provide a comprehensive evaluation of DNNs across ...
  21. [21]
    [1808.05377] Neural Architecture Search: A Survey - arXiv
    Aug 16, 2018 · We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and ...Missing: 2024 2025
  22. [22]
    [PDF] Aging Evolution for Image Classifier Architecture Search
    Oct 4, 2018 · a neural network architecture in a real experiment. In a real experiment, training and evaluating an architecture yields a noisy accuracy.
  23. [23]
    Population-based guiding for evolutionary neural architecture search - Scientific Reports
    ### Summary of Population-Based Guiding (PBG) for Evolutionary NAS
  24. [24]
    Neural Architecture Search with Bayesian Optimisation and Optimal ...
    Feb 11, 2018 · This paper introduces NASBOT, a Gaussian process based Bayesian Optimization framework for neural architecture search, using a distance metric ...Missing: SMAC | Show results with:SMAC
  25. [25]
    [PDF] Efficient Deep Neural Architecture Search via Bayesian Optimization
    Mar 31, 2025 · This work uses Bayesian Optimization (BO) for efficient Neural Architecture Search (NAS) in deep learning, achieving 100x acceleration over ...Missing: seminal | Show results with:seminal
  26. [26]
    [PDF] NAS-Bench-101: Towards Reproducible Neural Architecture Search
    May 14, 2019 · NAS-Bench-101 is a public dataset for NAS research, mapping 423k architectures to metrics, enabling fast evaluation of diverse models.
  27. [27]
    [PDF] BANANAS: Bayesian Optimization with Neural Architectures ... - arXiv
    Nov 2, 2020 · We compare BANANAS to the most popular NAS algorithms from a variety of paradigms: random search [30], regularized evolution [44], BOHB [11],.<|control11|><|separator|>
  28. [28]
    TPE, Random Search, Anneal Tuners on NNI
    TPE¶. The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to ...
  29. [29]
    Simple And Efficient Architecture Search for Convolutional Neural ...
    Nov 13, 2017 · The paper proposes a method to automatically search for CNN architectures using a simple hill climbing procedure with network morphisms and ...
  30. [30]
    [PDF] Exploring the Loss Landscape in Neural Architecture Search
    This paper shows that simple hill-climbing is a powerful baseline for NAS, and that reducing noise in training makes local search competitive with state-of-the ...
  31. [31]
    Efficient Neural Architecture Search via Parameter Sharing - arXiv
    Feb 9, 2018 · On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), ...
  32. [32]
    [1904.00420] Single Path One-Shot Neural Architecture Search with ...
    Mar 31, 2019 · Abstract:We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches.
  33. [33]
    Direct Neural Architecture Search on Target Task and Hardware
    Dec 2, 2018 · In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware ...
  34. [34]
    Searching for A Robust Neural Architecture in Four GPU Hours - arXiv
    Oct 10, 2019 · We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG).
  35. [35]
    [1907.05737] PC-DARTS: Partial Channel Connections for Memory ...
    Jul 12, 2019 · In this paper, we present a novel approach, namely, Partially-Connected DARTS, by sampling a small part of super-network to reduce the redundancy in exploring ...
  36. [36]
    Regularizing Differentiable Architecture Search with Smooth Activation
    Apr 22, 2025 · In this paper, we undertake a simple but effective approach, named Smooth Activation DARTS (SA-DARTS), to overcome skip dominance and discretization ...
  37. [37]
    DASViT: Differentiable Architecture Search for Vision Transformer
    Jul 17, 2025 · Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets ...Missing: scalable | Show results with:scalable
  38. [38]
    [2307.09099] A Survey on Multi-Objective Neural Architecture Search
    Jul 18, 2023 · A Survey on Multi-Objective Neural Architecture Search. Authors:Seyed Mahdi Shariatzadeh, Mahmood Fathy, Reza Berangi, Mohammad Shahverdy.Missing: seminal | Show results with:seminal
  39. [39]
    NSGA-Net: neural architecture search using multi-objective genetic ...
    NSGA-Net is a population-based search algorithm that explores a space of potential neural network architectures in three steps.
  40. [40]
    [PDF] NSGA-Net: Neural Architecture Search using Multi-Objective ... - IJCAI
    This paper introduces. NSGA-Net – an evolutionary search algorithm that explores a space of potential neural network archi- tectures in three steps, namely, a ...
  41. [41]
    [PDF] NSGA-Net: Neural Architecture Search using Multi-Objective ...
    NSGA-Net is a population-based search algorithm that explores a space of potential neural network architectures in three steps, namely, a population ...
  42. [42]
    MONAS: Multi-Objective Neural Architecture Search using ... - arXiv
    Jun 27, 2018 · Access Paper: View a PDF of the paper titled MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning, by Chi-Hung Hsu ...Missing: seminal | Show results with:seminal
  43. [43]
    Architecture generation for multi-objective neural architecture search
    Jan 1, 2024 · This paper presents a multi-objective NAS approach that integrates a multi-objective evolutionary algorithm (MOEA) with a generative model.
  44. [44]
    CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework
    ### Summary of CE-NAS for Carbon-Efficient Multi-Objective NAS
  45. [45]
    NAS-Bench-201: Extending the Scope of Reproducible Neural ...
    Jan 2, 2020 · In this work, we propose an extension to NAS-Bench-101: NAS-Bench-201 with a different search space, results on multiple datasets, and more diagnostic ...
  46. [46]
    [2008.09777] Surrogate NAS Benchmarks: Going Beyond the ...
    Aug 22, 2020 · We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost).
  47. [47]
    [2103.10584] HW-NAS-Bench:Hardware-Aware Neural Architecture ...
    Mar 19, 2021 · We develop HW-NAS-Bench, the first public dataset for HW-NAS research which aims to democratize HW-NAS research to non-hardware experts.
  48. [48]
    A Large-scale Study of Representation Learning with the Visual ...
    Oct 1, 2019 · We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples.Missing: NAS | Show results with:NAS
  49. [49]
    A survey on computationally efficient neural architecture search
    Compared with NASNet [23], ENAS reduces the search cost from 1800 GPU-days down to 16 GPU-hours on CIFAR10 classification task. Luo et al. [78] proposed the NAO ...
  50. [50]
    [PDF] Systematic review on neural architecture search
    Neural Architecture Search (NAS) automates the design of neural network architectures, which are core components of ML models.
  51. [51]
    Neural architecture search using attention enhanced precise path ...
    Mar 20, 2025 · Predictor-based Neural Architecture Search (NAS) utilizes performance predictors to swiftly estimate architecture accuracy, thereby reducing ...Missing: early | Show results with:early
  52. [52]
    Zero-Shot Neural Architecture Search: Challenges, Solutions, and ...
    May 1, 2025 · Moreover, ref. [37] have reviewed the progress of zero-cost NAS and highlighted open challenges such as metric instability, search space bias, ...
  53. [53]
    [PDF] Random Search and Reproducibility for Neural Architecture Search
    Without benchmarking against leading hyperparameter optimization baselines, it difficult to quantify the performance gains provided by specialized NAS methods.
  54. [54]
    Best practices for scientific research on neural architecture search
    The reproducibility crisis in machine learning has already shown how hard it is to reproduce each other's experiments without code in machine learning in ...
  55. [55]
    Generative neural architecture search - ScienceDirect.com
    We propose a novel generative search strategy that transforms the NAS process into approximating the distribution of high-performing neural architectures.
  56. [56]
    [2502.03553] Efficient Global Neural Architecture Search - arXiv
    Feb 5, 2025 · We develop an efficient search strategy by disjoining macro-micro network design that yields competitive architectures in terms of both accuracy and size.Missing: zero- variants
  57. [57]
    [PDF] RZ-NAS: Enhancing LLM-guided Neural Architecture Search via ...
    Through the reflective Zero-. Cost evaluation strategy, RZ-NAS can achieve better per- formance than the original proxies. In addition, it can even outperform ...Missing: global | Show results with:global
  58. [58]
    automl/awesome-transformer-search: A curated list of ... - GitHub
    To keep track of the large number of recent papers that look at the intersection of Transformers and Neural Architecture Search (NAS), we have created this ...
  59. [59]
    [PDF] arXiv:2503.02448v2 [cs.LG] 6 Mar 2025
    Mar 6, 2025 · Outstanding NAS: Our study evaluates six advanced NAS baselines, including DARTS[8] and five Graph-. NAS based algorithms: GraphNAS[3], PAS ...
  60. [60]
    GreenNAS: A Green Approach to the Hyperparameters Tuning in ...
    Mar 14, 2024 · GreenNAS is a green approach to hyperparameter tuning in deep learning, aiming to minimize environmental impact by using a performance ...
  61. [61]
    From federated learning to federated neural architecture search
    Jan 4, 2021 · This survey paper starts with a brief introduction to federated learning, including both horizontal, vertical, and hybrid federated learning.Missing: BO | Show results with:BO