Fact-checked by Grok 2 weeks ago

Cross-entropy method

The cross-entropy (CE) method is a versatile for solving complex optimization problems and estimating rare-event probabilities in , which operates by iteratively refining a to concentrate sampling efforts on high-performing or rare outcomes through minimization of the Kullback-Leibler divergence between the current distribution and an empirical distribution derived from elite samples. Developed by Reuven Y. Rubinstein in the late , the method originated as an adaptive technique for rare-event estimation in discrete-event simulations, such as reliability analysis in networks, where standard methods are inefficient due to the low probability of target events. It was subsequently extended to optimization tasks by reformulating them as rare-event problems, enabling efficient solutions to both combinatorial (e.g., traveling salesman problem, knapsack) and continuous (e.g., multi-extremal functions) challenges that are often NP-hard. At its core, the CE algorithm begins with an initial reference , generates a set of random samples, evaluates their performance using an objective function, selects the "" subset (typically the top ρ fraction, where 0.01 ≤ ρ ≤ 0.1), and updates the parameters to maximize the likelihood of generating similar elite samples in the next , effectively minimizing the distance. This process repeats until convergence, often requiring smoothing (e.g., via convex combinations of old and new parameters) to avoid premature trapping in local optima, and can handle noisy environments by incorporating techniques. For -event simulation, the method shifts the toward the rare region, dramatically reducing the number of simulations needed—for instance, achieving up to 2000-fold efficiency gains in probability estimates. Key applications span engineering, operations research, and machine learning, including network reliability assessment, portfolio optimization, and training neural networks via policy search in reinforcement learning. The method's adaptability to various parametric families (e.g., Bernoulli for binary problems, Gaussian for continuous) and its theoretical grounding in information theory have made it a foundational tool, with extensions like the minimum cross-entropy variant for multi-objective scenarios and integrations with other heuristics such as genetic algorithms. Despite its strengths in scalability and ease of implementation, performance can depend on parameter tuning, such as elite fraction selection and stopping criteria, to balance exploration and exploitation.

Fundamentals

Importance Sampling Basics

Rare event simulation refers to the task of estimating small probabilities, such as P(A) where A denotes a rare event with probability p \ll 1, using Monte Carlo methods. In direct or crude Monte Carlo estimation, one generates N independent samples under the reference probability measure P and approximates p by the empirical proportion of samples falling into A, yielding an estimator with variance p(1-p)/N \approx p/N. This results in a high relative error of approximately $1/\sqrt{Np}, which grows large as p becomes small, necessitating an impractically large N \gg 1/p to achieve reasonable precision and rendering the approach computationally inefficient for rare events. Importance sampling addresses these challenges by shifting the sampling distribution to a tilted measure Q that places higher probability on the rare event region, thereby increasing the likelihood of observing relevant samples while correcting for the bias through importance weights. Mathematically, to estimate the expectation \gamma = \mathbb{E}_P[f(X)] = \int f(x) \, dP(x) for a non-negative function f (often an indicator for rare events), one draws samples X_i \sim Q for i = 1, \dots, N and computes the unbiased estimator \hat{\gamma} = \frac{1}{N} \sum_{i=1}^N f(X_i) w(X_i), where the weight function is the Radon-Nikodym derivative w(x) = \frac{dP}{dQ}(x). This has \gamma under Q, but its variance depends on the choice of Q; a well-chosen Q reduces variance by concentrating samples where f(x) w(x) contributes most to the . The optimal importance sampling distribution Q^* that minimizes the variance of \hat{\gamma} to zero (in the limit) is given by Q^*(dx) = \frac{f(x) P(dx)}{\gamma}, which proportionally weights the reference measure P by the integrand f. However, computing Q^* requires knowledge of \gamma, the very quantity being estimated, so practical methods approximate it using parametric families or iterative procedures. The technique originated in the within statistics and physics, particularly for estimating in simulations, as pioneered in the work of Kahn and Harris (1951) on particle transmission via random sampling. This foundational approach emphasized biasing sampling distributions to improve efficiency in computations for low-probability outcomes.

Cross-Entropy in Optimization Context

In the context of the cross-entropy method (CEM), cross-entropy serves as a fundamental information-theoretic measure for aligning probability distributions to facilitate optimization, particularly in rare event simulation and stochastic approximation tasks. The between two probability density functions P and Q, denoted H(P, Q), is defined as H(P, Q) = -\int P(x) \log Q(x) \, dx, which quantifies the expected number of extra bits required to code samples from P using a code optimized for Q. This relates directly to the D(P \| Q), an asymmetric measure of distributional discrepancy, via the identity D(P \| Q) = \int P(x) \log \frac{P(x)}{Q(x)} \, dx = H(P, Q) - H(P), where H(P) = -\int P(x) \log P(x) \, dx is the entropy of P, a constant with respect to Q. Minimizing D(P \| Q) over Q thus equates to minimizing the H(P, Q), as the entropy term does not affect the optimization. Within CEM, the method leverages this framework to approximate the optimal importance sampling distribution Q^* for estimating rare events or optimizing objective functions under a reference measure P. Specifically, for rare event probabilities defined by a performance function f(X) (e.g., \gamma^* = P(f(X) \geq \eta) for some high \eta), the goal is to minimize the KL D(\tilde{P}_\gamma \| Q_\theta) over a parameterized family of distributions Q_\theta, where \tilde{P}_\gamma is the tilted distribution concentrating mass on the rare event set \{x : f(x) \geq \eta\}, given by \tilde{P}_\gamma(dx) = \frac{I\{f(x) \geq \eta\} P(dx)}{\gamma^*}. This leads to the optimization program \min_\theta D(\tilde{P}_{\gamma^*} \| Q_\theta), which seeks to tilt Q_\theta toward regions of high performance under P, thereby reducing variance in Monte Carlo estimates. The optimal Q^* would ideally place all mass on the rare event, but practical parameterization (e.g., via exponential families) approximates this by solving the cross-entropy minimization. The derivation of the CEM objective from this minimization proceeds via approximations. Since \tilde{P}_{\gamma^*} is unknown, samples are drawn from an initial Q_{\theta_0} \approx P, and the set S_\epsilon = \{X_i : f(X_i) \geq \hat{\gamma}\} is selected, where \hat{\gamma} estimates the . The KL minimization then reduces to maximizing the empirical cross-entropy term, equivalent to \hat{\theta} = \arg\max_\theta \frac{1}{|S_\epsilon|} \sum_{X_i \in S_\epsilon} \log Q_\theta(X_i), which fits Q_\theta to the samples by maximizing their average log-likelihood. This step transforms the rare event estimation or into a sequence of distribution-fitting tasks akin to , iteratively refining Q_\theta toward the optimal tilting.

Core Algorithm

Generic CE Procedure

The cross-entropy method (CEM) is an iterative, adaptive designed for estimating rare event probabilities and solving optimization problems by progressively refining a to focus on regions of interest. As a model-based optimization , CEM evolves the distribution parameters over iterations to concentrate sampling efforts on high-reward or rare-event-prone areas, thereby reducing variance in estimates and improving in simulation-based computations. This approach is particularly effective for problems where direct sampling from the original is inefficient due to the rarity of target events or the complexity of the objective landscape. The generic procedure begins with the initialization of reference parameters, denoted as \theta_0, which define an initial parametric family Q_{\theta} (e.g., Gaussian or distributions) that approximates the target behavior. In each k, N samples are generated from the current Q_{\theta_{k-1}}. These samples are then evaluated using a performance function S, such as an indicator for (where S(X) = 1 if the event occurs and 0 otherwise) or a reward metric for optimization. The samples—typically the top-performing fraction—are selected based on a determined by the fraction \alpha, which is commonly set between 0.01 and 0.1 to balance bias introduction against in the estimates. The parameters are then updated to \theta_k by minimizing the between the empirical induced by the samples and Q_{\theta}, a process equivalent to minimizing the Kullback-Leibler from the empirical measure to Q_{\theta} (or maximizing the likelihood of the samples under Q_{\theta}). This update shifts the reference toward the elites, enhancing the likelihood of sampling desirable outcomes in subsequent . The continues, generating new samples, selecting s, and updating parameters until a convergence criterion is met, such as stabilization of the elite threshold or a fixed number of . For rare event estimation, the final probability is computed using the converged distribution Q_{\theta_T} and weights derived from the likelihood ratios with respect to the original distribution. Empirical guidelines recommend N in the range of $10^3 to $10^5 samples per to ensure reliable elite selection and parameter estimates without excessive computational cost. Under mild regularity conditions, such as the existence of finite moments and of the optimal parameters, the CEM exhibits asymptotic , meaning the estimated rare event probability to the true value as N and the number of iterations increase. The elite fraction \alpha plays a crucial role in this by controlling the : smaller values increase adaptability but risk to , while larger values promote stability at the expense of slower progress toward rare regions.

Parameter Update Mechanism

In the cross-entropy method (CEM), the parameter update mechanism involves adjusting the parameters \theta_k of the distribution Q_{\theta_k} at each k to minimize the distance to an optimal reference distribution, which is approximated using elite samples S_\epsilon selected from the generated samples based on their performance. The update rule is formulated as \theta_k = \arg\max_{\theta} \frac{1}{|S_\epsilon|} \sum_{X_i \in S_\epsilon} \log Q_\theta(X_i), where |S_\epsilon| denotes the number of elite samples, and this optimization maximizes the average log-likelihood of the elite samples under the parameterized distribution Q_\theta. This approach was introduced in Rubinstein's formulation, which highlighted its equivalence to (MLE) on the elite samples, thereby enabling the method's applicability to a wide range of parametric families beyond simple cases. For common parametric families belonging to the natural exponential family, such as the multivariate normal distribution Q_\theta \sim \mathcal{N}(\mu, \Sigma), the update admits explicit closed-form solutions that leverage sample moments of the elite set. Specifically, the mean parameter is updated as the sample mean of the elites, \mu_k = \frac{1}{|S_\epsilon|} \sum_{X_i \in S_\epsilon} X_i, while the covariance matrix is updated as the sample covariance (or a biased variant for improved numerical stability), \Sigma_k = \frac{1}{|S_\epsilon|} \sum_{X_i \in S_\epsilon} (X_i - \mu_k)(X_i - \mu_k)^\top. These updates ensure that the sampling distribution progressively concentrates around high-performing regions of the search space. In cases where the parameterized family lacks closed-form MLE solutions, such as non-parametric densities or complex distributions, the optimization of the objective J(\theta) = \sum_{i=1}^M \log f_\theta(X_i), with M = |S_\epsilon| and f_\theta denoting the density, is performed using techniques or gradient-based methods like stochastic gradient ascent on the log-likelihood. This numerical approach maintains the method's adaptability while approximating the argmax through iterative refinements.

Applications and Examples

Continuous Optimization Case

In the continuous optimization case, the cross-entropy method (CEM) is adapted by treating the objective function S(\mathbf{x}) directly as the performance measure, without the indicator function used in rare-event simulation. Instead of focusing on rare events, elite samples are selected as the top \alpha N samples ranked by their S(\mathbf{x}) values, where N is the sample size and \alpha is the elite fraction, typically 0.1. This shifts the sampling distribution toward regions of higher performance through iterative updates via maximum likelihood estimation on a parametric family, such as multivariate Gaussians. A representative example involves maximizing multimodal benchmark functions like the Ackley or in \mathbb{R}^d. The Ackley function, defined as S(\mathbf{x}) = -20 \exp\left(-0.2 \sqrt{\frac{1}{d} \sum_{i=1}^d x_i^2}\right) - \exp\left(\frac{1}{d} \sum_{i=1}^d \cos(2\pi x_i)\right) + 20 + e, features multiple local maxima surrounding a global maximum at the origin, while the , S(\mathbf{x}) = -\sum_{i=1}^{d-1} \left[100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2 \right], forms a narrow parabolic valley leading to the global maximum at (1, \dots, 1). Initialization uses samples from \mathcal{N}(\mathbf{0}, \mathbf{I}), with typical parameters N = 1000, \alpha = 0.1, and 50–100 iterations to concentrate the distribution near promising regions. The procedure follows these steps:
Initialize: Set μ ← 0, Σ ← I (d × d identity matrix), max_iter ← 100, N ← 1000, α ← 0.1
For k = 1 to max_iter:
    Generate X_1, ..., X_N ∼ N(μ, Σ)
    Evaluate S(X_j) for j = 1 to N
    Sort indices such that S(X_{(1)}) ≥ ... ≥ S(X_{(N)})
    Select elites: E = {X_{(1)}, ..., X_{(⌈αN⌉)}}
    Update μ ← (1/|E|) ∑_{X ∈ E} X  (sample mean of elites)
    Update Σ ← (1/|E|) ∑_{X ∈ E} (X - μ)(X - μ)^T  (biased sample covariance of elites)
    If convergence criterion met (e.g., small change in μ or Σ), stop
Return μ as approximate maximizer
This update uses the sample and biased of the elites to fit the Gaussian parameters, tilting the variance toward high-performance areas. CEM converges to local maxima by progressively narrowing the sampling distribution around high-S(\mathbf{x}) regions, achieving empirical of 10–100 times relative to in settings. An early application in a 2006 paper by Kroese, Porotsky, and Rubinstein demonstrated CEM's effectiveness for continuous on functions like Ackley and Rosenbrock, outperforming baselines in certain cases through its adaptive .

Discrete and Combinatorial Problems

In problems, the cross-entropy method (CEM) parameterizes the sampling distribution Q_\theta over the finite search space using tractable families, such as a product of independent distributions for binary decisions or a transition matrix for sequential choices in paths or permutations. For instance, in problems like graph partitioning or node selection, each component is modeled as a random variable with parameter p_k, while for path-based problems, Q_\theta is defined as a with transition probabilities \theta_{ij} representing the likelihood of moving from i to j. Elites are selected from a sample of N generated solutions X_1, \dots, X_N using the indicator I\{S(X_k) \geq \gamma\} for maximization (or \leq \gamma for minimization), where S(\cdot) is the objective function and \gamma is an elite threshold. A representative application is the buffer allocation problem in parallel queueing networks, where the goal is to distribute a fixed total buffer capacity across stations to maximize throughput. Here, CEM parameterizes allocations as sums of independent Bernoulli trials, generating N candidate allocations, evaluating their simulated throughput, selecting the top \alpha N elites, and updating parameters via maximum likelihood estimation. Another example is approximating solutions to the traveling salesman problem (TSP), where sequences of cities are generated as paths in a Markov chain with transition matrix \theta; in each iteration, N paths are sampled, the shortest \alpha N elite tours are chosen, and transition probabilities are updated as \hat{\theta}_{ij} = \frac{\sum_{k=1}^N I\{S(X_k) \leq \gamma\} I\{X_k \text{ transitions from } i \text{ to } j\}}{\sum_{k=1}^N I\{S(X_k) \leq \gamma\} I\{X_k \text{ visits } i\}}, the ratio of elite transitions from i to j over total elite visits to i. For binary cases like buffer decisions, the Bernoulli parameter updates as p_k = \frac{\sum_{i \in S_\varepsilon} X_{i,k}}{|S_\varepsilon|}, averaging the elite samples at position k. CEM demonstrates strong performance on large-scale combinatorial problems, such as estimating buffer overflows in telecommunication queueing networks, where it efficiently adapts to achieve low variance estimates for . In TSP benchmarks, the method yields near-optimal tours; for a 53-node instance (ft53), it achieves an average solution within 2.9% of optimal after approximately 40 iterations using N = 28,090 samples and α = 0.7, converging in about 6 minutes on standard hardware (e.g., , 500 MHz). To enhance stability in discrete settings with noisy or sparse elites, smoothed CEM replaces the fixed \gamma with the \alpha- \hat{\gamma}(\alpha) of the sample objectives S(X_i), reducing variance in parameter updates while maintaining convergence.

Extensions and Variants

Advanced Formulations

To enhance the robustness of the cross-entropy method (CEM) in scenarios with varying convergence behavior, an adaptive elite fraction \alpha_k can be employed, dynamically adjusting the proportion of samples based on observed variance or progress; for instance, increasing \alpha_k when variance is high to incorporate more diverse samples and prevent stagnation. This extension, building on the fully automated CE (FACE) algorithm, allows the sample size N_{\text{elite}} to scale with problem complexity, such as N_{\text{elite}} = c_0 n for stochastic networks where c_0 \in [0.01, 0.1] and n is the base sample size, while total samples N_k adapt between minimum and maximum bounds to ensure iterative improvement. A common refinement for parameter updates in CEM involves incorporating a learning rate for smoothing, given by \theta_k = \beta \theta_{k-1} + (1 - \beta) \hat{\theta}_{\text{MLE}}, where \beta \in [0.8, 0.9] weights the previous parameters to mitigate and premature convergence, particularly in discrete or noisy settings. This smoothed (MLE) update stabilizes the evolution of the distribution Q_{\theta}, reducing sensitivity to outliers in elite samples while preserving the method's adaptive nature. For constrained optimization problems, CEM can be adapted by integrating penalties directly into the performance function S(x), such as adding a Lagrangian term \lambda g(x) where g(x) \leq 0 enforces constraints, thereby biasing the sampling toward feasible regions without altering the core update mechanism. Alternatively, rejection sampling within Q_{\theta} discards infeasible samples during generation, though this may increase computational cost; these approaches transform the constrained problem into an effective unconstrained rare-event estimation task. In estimating extremely rare events where the target probability \gamma < 10^{-10}, the multilevel CEM variant progressively refines the biasing distribution across levels by coarsening thresholds, such as \gamma_l = 10^{-l} for level l, and updating sequentially from coarse to fine resolutions to maintain variance control. At each level l, the elite threshold is set via \gamma_l = P(S \geq u_l) under the current Q_{\theta_{l-1}}, with u_l increasing to focus on rarer subsets, enabling efficient handling of high-dimensional or low-probability tails through chained importance sampling. Developments in the have integrated CEM with deep neural networks to parameterize high-dimensional Q_{\theta}, facilitating neural for complex in and simulation; for example, post-2020 works employ differentiable CEM variants with embedded neural policies to optimize in continuous, high-dimensional action spaces. The cross-entropy method (CEM) shares similarities with evolutionary strategies () as a population-based optimization but differs fundamentally in its . While , such as the covariance matrix adaptation evolution strategy (), rely on and crossover operators to generate offspring and adapt search through evolution paths, CEM employs parametric fitting by selecting samples from a and updating the parameters (e.g., and of a Gaussian) via weighted averaging to minimize with the optimal . This parametric approach in CEM often leads to faster in low-dimensional problems compared to the more computationally intensive recombination in , though variants like provide superior exploration in higher dimensions due to their adaptive step-size and shaping. In contrast to particle filters and sequential (SMC) methods, CEM optimizes a fixed family of distributions Q_\theta sequentially by fitting to samples without adaptive resampling based on effective sample size (). SMC methods, used for state and filtering, maintain particle diversity through invariant transition kernels (e.g., Metropolis-Hastings) and resample particles adaptively to control ESS and prevent degeneracy, requiring point-wise evaluation of target distributions at each step. CEM's elite-based updates, while simpler and not reliant on such kernels, can lead to less robust handling of multimodal posteriors compared to SMC's dynamic resampling. CEM serves as a derivative-free precursor to policy gradient methods in reinforcement learning (RL), such as REINFORCE, by optimizing parameterized policies through population sampling rather than direct gradient computation. Unlike policy gradients, which estimate gradients of the expected reward with respect to policy parameters (e.g., via likelihood ratio tricks), CEM fits the full distribution over actions or parameters to elite trajectories, enabling black-box optimization without backpropagation. This approach has been applied in deep RL settings, such as policy optimization for continuous control tasks around 2016, where CEM's sampling-based updates complemented gradient methods in high-dimensional spaces. The CEM has inspired hybrid algorithms like CE-inspired genetic algorithms (CEGA), which blend CEM's probabilistic fitting with operators such as selection and crossover to enhance global search in combinatorial problems. These hybrids, detailed in foundational treatments of CEM, leverage elite sampling from CEM alongside GA's evolutionary mechanisms for improved robustness in multi-extremal optimization. An extension of CEM to multi-armed bandits, known as CEMAB, minimizes cumulative by treating arm selection probabilities as parameters to optimize via elite samples from batch evaluations. In this , elite —those with top estimated rewards—are used to update probabilities proportionally or via truncation, achieving sublinear in large-scale settings (e.g., thousands of arms) by focusing on promising subsets.

References

  1. [1]
    The Cross-Entropy Method for Combinatorial and Continuous ...
    We present a new and fast method, called the cross-entropy method, for finding the optimal solution of combinatorial and continuous nonconvex optimization.
  2. [2]
    Optimization of computer simulation models with rare events
    Optimization of computer simulation models with rare events☆. Author links ... Rubinstein. The efficiency and heavy traffic properties of the score ...
  3. [3]
    [PDF] The Cross-Entropy Method for Optimization 1 Introduction
    The cross-entropy (CE) method was proposed by Rubinstein (1997) as an adap- tive importance sampling procedure for the estimation of rare-event probabili- ties, ...
  4. [4]
    [PDF] A Tutorial on the Cross-Entropy Method - MIT
    Sep 2, 2003 · The CE method was motivated by an adaptive algorithm for estimating probabilities of rare events in complex stochastic networks (Rubinstein, ...
  5. [5]
    [PDF] 1 Rare event simulation and importance sampling
    Importance sampling is a technique that gets around this problem by changing the proba- bility distributions of the model so as to make the rare event happen ...
  6. [6]
    [PDF] An Introduction to Rare Event Simulation and Importance Sampling
    In essence, such methods attempt to find the important regions of sample space numerically. These methods can be applied to problems for which no good choice of ...
  7. [7]
    [PDF] Importance Sampling: A Review - Stat@Duke
    Importance sampling (IS) refers to a collection of Monte Carlo methods where a mathematical expectation with respect to a target distribution is approximated by ...
  8. [8]
    [PDF] 9 Importance sampling - Art Owen
    When the density p is normalized the first form includes the second via h(x) = g(x)p(x). Combining control variates with importance sampling from q leads to the.
  9. [9]
    [PDF] 9. Estimation of Particle Transmission by Random Sampling - Inria
    Estimation of Particle Transmission by Random Sampling. By Herman Kahn and T. E. Harris. We are here concerned with essentially the same problem that some of ...
  10. [10]
    [2206.12286] An attempt to trace the birth of importance sampling
    Jun 21, 2022 · In this note, we try to trace the birth of importance sampling (IS) back to 1949. We found the classical formulation of IS in a paper from Kahn in June 1949.
  11. [11]
    A Tutorial on the Cross-Entropy Method | Annals of Operations ...
    Rubinstein, R.Y. (1997). “Optimization of Computer Simulation Models with Rare Events.”European Journal of Operations Research 99, 89–112. Article Google ...
  12. [12]
    The Cross-Entropy Method - SpringerLink
    This book is a comprehensive and accessible introduction to the cross-entropy (CE) method. The CE method started life around 1997 when the first author ...
  13. [13]
  14. [14]
  15. [15]
    Application of the Cross-Entropy Method to the Buffer Allocation ...
    The buffer allocation problem (BAP) is a well-known difficult problem in the design of production lines. We present a stochastic algorithm for solving the.
  16. [16]
    [PDF] 2002: ESTIMATING BUFFER OVERFLOWS IN THREE STAGES ...
    In this paper we propose a fast adaptive Importance Sam- pling method for the efficient simulation of buffer overflow probabilities in queueing networks.
  17. [17]
    Automated estimation of rare event probabilities in biochemical ...
    Jan 25, 2011 · Algorithm 2 implements the multilevel cross-entropy method for optimal dwSSA parameter estimation. It returns the vector of optimal parameter ...
  18. [18]
    [PDF] Path Integral Policy Improvement with Covariance Matrix Adaptation
    We compare PI2 to other members of the same family – Cross-Entropy Methods and CMAES – at the conceptual level and in terms of performance. The com- parison ...
  19. [19]
    [PDF] Particle algorithms for optimization on binary spaces - arXiv
    Secondly, we compare variants of the sequential Monte Carlo algorithm, the cross-entropy method, simulated annealing and simple multiple-restart local search to ...
  20. [20]
  21. [21]
    [PDF] The Cross Entropy Method for Fast Policy Search
    The main goal of this paper is to introduce a fast learning algorithm based on the Cross Entropy (CE) method instead of the slow SA algorithms. CE has be- come ...
  22. [22]
    [PDF] Deep Reinforcement Learning through Policy Op7miza7on
    Cross Entropy Method (CEM) / Finite Differences / Fixing Random Seed ... RL w/Policy Sketches [Andreas et al, 2016]; Learning Language through InteracMon ...
  23. [23]
    A meta-heuristic based in the hybridization of Genetic Algorithms ...
    Aug 15, 2016 · This paper presents a hybrid metaheuristic that combines a Genetic Algorithm (GA) with a Cross Entropy (CE) method to solve continuous optimization functions.
  24. [24]
    [PDF] A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits
    Large-Scale Multi-Armed Bandits ... The method, called Cross-Entropy-based Multi Armed. Bandit (CEMAB), uses the Cross-Entropy method as a noisy optimizer.