Fact-checked by Grok 2 weeks ago

Monte Carlo tree search

Monte Carlo tree search (MCTS) is a that combines the precision of with the generality of random sampling via methods to make decisions in complex, sequential environments, particularly and Markov decision processes. It builds an asymmetric search tree incrementally by simulating possible future states and evaluating them through random playouts, enabling efficient exploration of vast action spaces without requiring domain-specific knowledge. The algorithm consists of four iterative phases repeated over multiple simulations: selection, where a path from the root to a leaf is chosen using a policy that balances exploration of new actions and exploitation of promising ones, often via the upper confidence bound for trees (UCT) formula; expansion, which adds child nodes representing new actions to the selected leaf; simulation (or rollout), involving random action selection until a terminal state to estimate the node's value; and backpropagation, updating the visit counts and value estimates for all nodes along the selected path. This structure allows MCTS to focus computational effort on high-potential branches while using statistical estimates to approximate outcomes in unexpanded areas. MCTS was independently developed in 2006 by Levente Kocsis and Csaba Szepesvári, who introduced the UCT variant as a bandit-based approach to planning, and by Rémi Coulom, who applied it to programs. Prior to this, Monte Carlo methods had been used in search since the , but MCTS marked a breakthrough by integrating them with tree policies for anytime decision-making. Its efficacy was dramatically demonstrated in 2016 when DeepMind's program, combining MCTS with deep neural networks for policy and value evaluation, defeated the world Go champion , achieving superhuman performance in a game long considered computationally intractable. Beyond games like Go, chess, and titles, MCTS has found applications in optimization problems, robot path planning, , and , where it aids in approximating optimal policies in high-dimensional spaces. Ongoing research continues to enhance MCTS with neural approximations, parallelization, and domain adaptations to address real-world challenges.

Background Concepts

Monte Carlo Methods

Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical approximations for mathematical problems, particularly those involving , complex computations, or high-dimensional integrals where deterministic methods are infeasible. These techniques approximate solutions by simulating random processes that mimic the underlying probability distributions, providing estimates of expected values, integrals, or other quantities through statistical aggregation. The origins of methods trace back to the 1940s in the context of simulations at , where they were developed by mathematician and computer scientist to model neutron diffusion and other particle behaviors in atomic bomb design. Ulam conceived the idea in 1946 while recovering from an illness and pondering the probabilities of solitaire games, leading to its first practical application that year for estimating paths, which required handling vast uncertainties without analytical solutions. The name "Monte Carlo" was suggested by , inspired by the casino's association with randomness. At their core, Monte Carlo methods operate by generating a large number N of independent random samples X_i from the relevant , evaluating a f on each sample to perform simulations, and then aggregating the outcomes—typically via the sample or variance—to approximate the true . This process is formalized in the key approximation equation: E[f(X)] \approx \frac{1}{N} \sum_{i=1}^N f(X_i), where E[f(X)] denotes the of f(X) under the distribution of the X, and the error decreases as N increases, following the . A classic illustrative example is estimating the value of \pi by uniformly sampling points in a enclosing a quarter circle of radius 1; the ratio of points falling inside the circle to the total samples approximates \pi/4, yielding \pi upon multiplication by 4. Similarly, these methods solve equations, such as Fredholm equations of the second kind, by interpreting the as an and using random walks or to estimate solutions efficiently. In decision problems under uncertainty, Monte Carlo methods play a crucial role by simulating numerous possible scenarios to evaluate the distribution of outcomes, enabling probabilistic assessments of risks and rewards without requiring exhaustive exploration of the state space. This approach allows decision-makers to quantify variability and compute metrics like expected or value-at-risk through repeated trials, providing robust insights into complex systems where exact calculations are impractical. Such sampling techniques extend naturally to structured decision spaces via tree search frameworks, approximating values in hierarchical environments.

Tree Search in Games

In turn-based games with , such as chess or , a represents the hierarchical structure of all possible game states and transitions between them. Each node in the tree corresponds to a specific board position or game state, while edges represent legal actions or moves that lead to subsequent states. The root node denotes the initial position, and leaf nodes indicate terminal states where the game ends, typically with a win, loss, or draw for one of the players. This structure allows systematic exploration of future possibilities, enabling based on anticipated outcomes. Traditional tree search algorithms for such games rely on the procedure, which assumes two adversarial players: a maximizer seeking to maximize their score and a minimizer aiming to minimize it. Starting from the root, recursively evaluates subtrees by selecting the maximum value for the maximizer's turns and the minimum for the minimizer's, propagating the optimal value back to the root to choose the best move. This method guarantees optimal play in deterministic games under but requires evaluating the entire tree, which is computationally prohibitive for deep searches. To mitigate this, alpha-beta pruning optimizes by dynamically tracking lower (alpha) and upper (beta) bounds on node values, eliminating branches that cannot influence the final decision without altering the root's value. Alpha-beta can reduce the effective search space from O(b^d) to roughly O(\sqrt{b^d}) nodes in the best case, where b is the and d is the search depth, depending on move ordering. The primary challenge in applying tree search to complex games arises from the of the state space. With an average b (the number of legal moves per position) and search depth d, the number of leaf nodes scales as b^d, leading to enormous trees; for chess, b \approx 35 yields an estimated $10^{120} possible games, far exceeding feasible computation even with optimizations. In games like Go, where b \approx 250, full enumeration becomes utterly impractical, as even shallow searches overwhelm modern hardware. Monte Carlo methods offer a way to approximate evaluations in such deep trees through random sampling. When exhaustive search is impossible, tree search employs evaluation functions to assign heuristic scores to non-terminal leaf nodes, estimating their desirability based on features like material balance or positional control. These static evaluators guide the algorithm by approximating the game's value without simulating to completion, but their accuracy depends on hand-crafted rules tailored to the . Despite these advancements, traditional tree search has notable limitations. Effective heuristics require extensive -specific , limiting generalizability across , while the core framework struggles with elements—such as chance events in like —necessitating extensions like that further increase computational demands.

Historical Development

Origins in Monte Carlo Simulation

The originated during the in the mid-1940s, when mathematician , working at , conceived the approach in 1946 while recovering from an illness. Inspired by calculating the probabilities of winning games of solitaire, Ulam proposed using random sampling to model complex probabilistic processes, such as neutron diffusion in atomic bomb design, as an alternative to exhaustive combinatorial calculations. This idea was quickly developed into practical simulations run on the computer in 1948, marking one of the earliest applications of digital computing for such statistical experiments. Following , the method gained broader recognition through a seminal 1949 publication by and , which formally coined the term "" and described it as a statistical tool for approximating solutions to mathematical problems via random sampling. The paper highlighted its applications in physics, including Enrico Fermi's earlier 1930s work on using analogous random sampling techniques, and extended to problems like estimating integrals and solving systems of equations. Fermi had even constructed a mechanical analog device, the FERMIAC or "Monte Carlo trolley," in 1947 to simulate neutron paths, underscoring the method's roots in challenges. In the 1950s, methods expanded beyond physics into and , where they were applied to optimize under uncertainty, such as and . For instance, the approach proved effective for solving integral equations by treating them as expectations over random paths, a technique that reduced in economic modeling of processes. Key advancements were driven by , who contributed foundational work on pseudorandom number generation—via methods like the middle-square technique—and variance reduction strategies, such as , to improve efficiency. The transition to widespread computing in the early 1950s, facilitated by stored-program machines like the —influenced by von Neumann's architecture—enabled scaling these simulations from manual or analog efforts to automated, high-volume runs, laying essential groundwork for later computational applications in diverse fields.

Emergence and Evolution of MCTS

Monte Carlo tree search (MCTS) was independently developed in 2006 by Rémi Coulom and by Levente Kocsis and Csaba Szepesvári, both introducing the Upper Confidence Bound applied to Trees (UCT) . Coulom presented his work in the seminal "Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search," at the Computers and Games conference. This framework combined tree search with Monte Carlo evaluations specifically to enhance performance in , addressing limitations of prior random simulation approaches by enabling more selective exploration and of values across the search . Building on general Monte Carlo simulation techniques from earlier decades, UCT provided a principled balance between exploration and exploitation, rapidly elevating MCTS as a breakthrough for handling high-branching-factor games like Go. From 2007 to 2010, MCTS experienced a surge in adoption and refinement within the research community, particularly for Go programs, with key enhancements including the integration of pattern matchers to guide simulations and in-tree decisions more effectively. Open-source implementations proliferated, such as Leela, initiated around 2006–2007 by Gian-Carlo Pascutto, which incorporated UCT alongside techniques like Rapid Action Value Estimation () for improved move evaluation. Programs like demonstrated MCTS's competitive potential, winning the 2007 Olympiad and achieving the first victory by a computer Go program against a professional player, Guo Juan (5 dan), in a 9×9 blitz game that year, underscoring the algorithm's practical impact. The 2010s brought transformative breakthroughs, propelled by DeepMind's integration of MCTS with deep neural networks. In 2016, combined policy and value networks with MCTS to defeat world champion 4–1 in a historic five-game match, demonstrating superhuman performance on the full 19×19 Go board. This was followed in 2017 by , which mastered Go, chess, and through self-play reinforcement learning, using MCTS guided solely by a single without domain-specific priors. A key milestone in this era was the 2011 program, an open-source MCTS-based Go engine that achieved top rankings among public implementations on servers like KGS and secured third place at the Computer Olympiad's 9×9 event, highlighting ongoing refinements in scalability and knowledge heuristics. Since 2020, MCTS has evolved beyond games into diverse domains, with (introduced in 2019 and extended in subsequent works) enabling planning in model-free environments like by learning latent models during search. Applications have expanded to simulations, where MCTS optimizes transformations and architecture searches to minimize gate counts and depths on noisy intermediate-scale quantum devices. In , recent advancements leverage MCTS for real-time path planning, multi-robot exploration, and under , such as spectral expansion variants for efficient long-horizon tasks. Efficiency gains have been realized through GPU acceleration, with implementations achieving up to 20× speedups in evaluation phases for complex searches. From 2023 to 2025, integrations with large language models (LLMs) have further broadened MCTS's scope, enabling enhanced planning in non-game contexts like goal-oriented dialogue and sequential problem-solving by using LLMs for action generation and prioritization within the search tree.

Core Algorithm

Principle of Operation

Monte Carlo Tree Search (MCTS) operates by incrementally building an asymmetric rooted at the current , guiding its expansion through repeated random simulations (playouts) that estimate values by sampling outcomes from leaves to states. This core idea enables efficient exploration of vast decision spaces by concentrating simulations on promising branches, using accumulated statistics to prioritize high-reward paths over exhaustive enumeration, which is infeasible in domains with high branching factors like Go or chess variants. The process begins with an empty tree containing only the root and proceeds iteratively until a computational —such as a fixed number of simulations or —is exhausted. Each comprises a cycle of selection, expansion, , and : a path is selected from root to a leaf via a policy that favors visited nodes with strong empirical performance; the leaf is expanded by adding child nodes for untried actions; a random playout generates a terminal reward from the expanded node; and this reward is propagated backward, updating node statistics along the path. This loop refines the tree's structure and value estimates over successive iterations, adapting the search to reveal optimal strategies. In the search tree, each maintains essential statistics: the visit count N (number of times the node has been traversed in simulations), the total reward Q (sum of outcomes from playouts originating at or below the node), and optionally the average value \bar{V} = Q / N. The root represents the initial state, with children denoting available actions; these metrics enable data-driven guidance, directing future selections toward subtrees showing superior performance while incorporating Monte Carlo sampling for rollout evaluations. MCTS balances search breadth and depth by employing a bandit-inspired selection that weighs of high-value branches against of under-visited ones, ensuring comprehensive coverage of the state space without uniform allocation of resources. Upon termination after the allotted iterations, the recommended action is the root's child with the highest visit count, as this robustly approximates the most promising choice based on frequency and outcomes. The algorithm can be outlined in pseudocode as follows:
Initialize tree with root node for current state

While simulations < maximum or time remains:
    Select path from root to a leaf node using tree policy
    If leaf is non-terminal:
        Expand by adding one or more child nodes
        Select a child to simulate from
    Simulate random playout from selected node to terminal reward
    Backpropagate reward, updating N and Q for nodes along the path

Return action corresponding to root child with maximum N

Selection Phase

The selection phase constitutes the first step in each iteration of the (MCTS) , involving a traversal of the existing from the root to a leaf in order to identify a promising position for further development. This phase serves to guide the search toward high-potential paths by leveraging the partial knowledge encoded in the tree, thereby avoiding the computational expense of exhaustive enumeration across the full state space. The traversal operates recursively starting from the root: at each internal node, a child is chosen that maximizes a selection score computed from the node's accumulated statistics, including the average reward from prior simulations and the visit count, which together promote paths with demonstrated value while preventing overcommitment to any single branch. This process repeats until reaching a leaf node, defined as a terminal position in the current tree or an unexpanded node where not all possible actions have been explored. When multiple children yield identical selection scores, ties are typically resolved either deterministically—such as by selecting the first child in a predefined order—or randomly to introduce variability and enhance exploration across equivalent options. In a game tree context, such as chess or Go, the selection phase might traverse a sequence of moves to a node exhibiting a high empirical win rate from previous playouts, directing the algorithm to refine strong strategic lines informed by simulation outcomes. This traversal concludes at an unexpanded leaf, setting the stage for subsequent tree growth.

Expansion Phase

The expansion phase in Monte Carlo tree search follows the selection phase, where a suitable leaf node has been identified, and precedes the simulation phase by extending the search tree with new nodes. This phase is triggered upon reaching a non-terminal leaf node that has not been fully expanded, allowing to explore additional branches in tree. During expansion, the process generates legal successor states or actions from the current leaf node's state and creates corresponding child nodes in the tree. One child node is typically added for an untried action, though variants may expand multiple actions; in high-branching-factor domains, heuristics or progressive widening may further limit the actions considered to manage overhead and memory usage. Newly created child nodes are initialized with zero visit counts (N = 0) and zero average rewards (Q = 0), establishing a for subsequent statistical updates. Advanced variants incorporate for initialization, such as biases or neural network-derived probabilities, to accelerate toward promising regions of the search space. For instance, in Go, the system uses outputs from a policy network to set initial visit counts or probabilities for child nodes, enhancing the efficiency of expansion in complex positions. Expansion is not performed in every iteration to prevent uncontrolled tree growth; instead, it often occurs selectively, such as every k-th simulation or when specific thresholds on node statistics are met, thereby balancing tree depth and breadth. In chess applications, expansion from a leaf typically adds child nodes for all legal piece movements, such as pawn advances or knight jumps, progressively representing deeper tactical variations.

Simulation Phase

In the simulation phase of Monte Carlo tree search (MCTS), also referred to as the rollout or playout phase, a complete game is simulated from the newly expanded leaf to a terminal state using a default policy to generate an estimate of the position's value. This phase provides a quick, albeit crude, by sampling possible future outcomes without further tree expansion. The simulation follows the game's rules but employs simple heuristics to expedite computation, focusing solely on reaching an end state rather than optimizing play. The default policy typically involves uniform random selection of legal moves at each step until the game terminates, such as through a win, loss, draw, or other concluding condition defined by the game's rules. The length of these simulations varies with the game's complexity and depth; for instance, in long games like Go, playouts may span hundreds of moves to fill the board or trigger passes. Upon termination, the outcome is scored from the root player's perspective, typically 1 for a win, 0 for a loss, and 0.5 for a draw, yielding a single-sample estimate of the from that under random play. This random sampling approach is computationally efficient, allowing many simulations within time constraints, though its accuracy relies on averaging results across numerous iterations to reduce variance. In practice, for games like Go, the simulation proceeds by randomly placing stones on empty intersections for both players until the board is filled or players pass, after which and captures are tallied to determine the score and according to standard rules. While effective for unbiased estimation, the uniform random policy can lead to unrealistic play, prompting variants that incorporate improvements, such as learned heuristics or simple pattern-matching to moves toward plausible ones without excessive computational overhead. These enhancements aim to produce more informative value estimates while preserving the phase's speed. The raw outcome from the is then briefly propagated upward through the selected path to update statistics.

Backpropagation Phase

In the backpropagation phase of Monte Carlo tree search (MCTS), the result obtained from the simulation phase is propagated backwards along the path from the newly expanded leaf node to the , updating the statistical estimates for each node encountered. This phase integrates the raw outcome of the simulation—typically a reward value such as 1 for a win, 0 for a loss, or 0.5 for a draw in two-player zero-sum games from the player's perspective—into the tree's value estimates, enabling the algorithm to refine its understanding of action quality over multiple iterations. For each node on the traversed path, the visit count N is incremented by 1 to reflect the additional that passed through it. Simultaneously, the total reward Q is updated by adding the simulation's reward r, resulting in the new estimates Q_{\text{new}} = Q_{\text{old}} + r and N_{\text{new}} = N_{\text{old}} + 1. The node's value estimate is then computed as the empirical mean reward V = Q / N, providing an unbiased estimate of the expected outcome from that state under the current . In multi-step decision problems, such as Markov decision processes (MDPs), the reward r may incorporate to prioritize nearer-term outcomes, using a discount factor \gamma where $0 < \gamma < 1, so that future rewards are scaled by \gamma^k for step k. However, in many applications like Go or chess, where rewards are assigned only at states without values, discounting is often omitted, treating the outcome as undiscounted to simplify computation and focus on end-game results. This iterative updating process gradually increases the confidence in promising paths by accumulating evidence from repeated simulations, allowing the tree to prioritize actions leading to higher average rewards as visit counts grow. Over numerous iterations, nodes with favorable value estimates receive more exploration, enhancing the overall accuracy of without requiring an explicit model of the .

Key Mechanisms

Exploration and Exploitation Dilemma

In Monte Carlo tree search (MCTS), the and dilemma arises as a core challenge in balancing the need to deepen knowledge of promising actions with the necessity to investigate less-tested alternatives to avoid suboptimal decisions. This tradeoff is directly analogous to the problem, where each possible action from a corresponds to an "arm" of a , and each rollout serves as a "pull" that yields a reward estimate; the objective is to maximize cumulative reward over a limited number of pulls by deciding whether to repeatedly select high-reward arms () or sample uncertain ones that might yield higher long-term gains (). Within the MCTS framework, this dilemma operates at the level of individual tree nodes, treating child nodes (representing actions) as bandit arms whose values are estimated from outcomes; over-reliance on risks converging to local optima by ignoring potentially superior strategies, while excessive can waste computational resources on low-value paths, leading to inefficient search trees. To mitigate under-exploration, MCTS policies incorporate in the face of , systematically biasing selection toward nodes with fewer visits to promote of hidden high-reward branches. The theoretical foundation for resolving this tradeoff draws from regret minimization in algorithms, which quantify the cost of suboptimal decisions and design policies like ε-greedy (randomly exploring with probability ε) or upper confidence bound (UCB) methods to bound cumulative regret asymptotically. In practice, an imbalanced approach in MCTS can result in shallow trees that fail to evaluate deep strategies or entrapment in unproductive subtrees, significantly degrading performance in complex domains; this balance is typically tuned through parameters such as the exploration constant to optimize empirical outcomes. For instance, in , an MCTS agent might initially exploit moves securing central control for reliable draws but must explore peripheral edge placements to uncover winning lines that prevent overcommitment to blocked center strategies, illustrating how proper balancing uncovers global optima.

Upper Confidence Bound Formula

The Upper Confidence Bound (UCB) formula originated in the problem, where it was introduced by Auer, Cesa-Bianchi, and in 2002 as UCB1, a deterministic for balancing and in bandit environments. This approach was later adapted for Monte Carlo tree search (MCTS) by Coulom in 2006, who proposed Upper Confidence Bound applied to Trees (UCT) to guide the selection phase in tree-structured decision processes. In the UCB1 formulation for bandits, the algorithm selects the i that maximizes the index \bar{X}_i + \sqrt{\frac{2 \ln n}{n_i}}, where \bar{X}_i is the empirical reward from arm i, n is the total number of plays, and n_i is the number of times arm i has been played. For MCTS under UCT, this is generalized to nodes, where a parent node with N visits selects its i by maximizing \frac{Q_i}{N_i} + C \sqrt{\frac{\ln N}{N_i}}, with Q_i as the total action-value estimate for i, N_i as the visit count for i, and C as the parameter (typically \sqrt{2}). Here, \frac{Q_i}{N_i} represents the empirical mean value, favoring of promising actions, while the second term promotes of under-visited children through its inverse dependence on N_i. The exploration parameter C controls the trade-off intensity and requires tuning based on the ; in games like Go, values between 1 and 2 are commonly used, with \sqrt{2} \approx 1.414 serving as a theoretical derived from bandit optimality guarantees. The logarithmic term \ln N scales with the parent's visit count, ensuring that exploration pressure increases as more evidence accumulates, thereby encouraging visits to low-N_i children to reduce uncertainty. The UCB index derives from Hoeffding's inequality, which bounds the deviation of the empirical mean from the true mean for bounded random variables, providing a high-probability upper confidence interval that UCB1 uses to select optimistic estimates. Specifically, Hoeffding's inequality states that for independent bounded random variables X_1, \dots, X_n \in [0,1] with mean \mu, P\left( \bar{X} \geq \mu + \epsilon \right) \leq \exp\left( -2 n \epsilon^2 \right), leading to the confidence radius \sqrt{\frac{\ln(1/\delta)}{2n}}. With \delta = 1/n^2 for union bounds over time, this gives a form \sqrt{\frac{\ln n}{n}}. The UCB1 algorithm uses \sqrt{\frac{2 \ln n}{n_i}}, where the constant 2 is chosen to ensure logarithmic regret bounds with high probability, as derived in the analysis using Hoeffding's inequality and appropriate union bounds. In MCTS, UCT applies this recursively during the selection phase, evaluating the bound at each independently to traverse from root to , which accommodates varying depths without global . This per-node application ensures progressive deepening of the search while maintaining bandit-like guarantees locally. A notable variant is UCB1-TUNED, also from Auer et al. (2002), which refines the confidence term to \bar{X}_i + \sqrt{\frac{\bar{X}_i (1 - \bar{X}_i)}{n_i} \cdot 2 \ln n + c_1 \frac{\ln n}{n_i} + c_2 \frac{\ln \ln n}{n_i}} for better bias correction in non-stationary or bounded-reward settings. UCB1-TUNED has been adapted to MCTS for improved performance in games like and .

Variants and Enhancements

Pure Monte Carlo search is a method in game-playing algorithms that relies on repeated random simulations, or rollouts, from the current game state to estimate action values without constructing or maintaining a . In this approach, for each possible action from the root state, a fixed number of complete random games are simulated, and the outcomes are averaged to approximate the of that action. This technique serves as a precursor to more sophisticated methods like Monte Carlo tree search (MCTS), emphasizing simplicity over structured exploration. The process begins by identifying all legal actions from the current state. For each action, an independent set of simulations is performed: the chosen action is applied to create a new state, from which the game is played out randomly to completion using uniform random policy for subsequent moves. No intermediate states are stored or reused across simulations; statistics are maintained only at the root level, tracking the total score and number of simulations for each initial action. After completing M simulations per action, the action value is estimated as the average outcome: Q(a) \approx \frac{1}{M} \sum_{i=1}^{M} r_i where r_i is the result of the i-th rollout starting with action a. The action with the highest average score Q(a) is then selected for execution. This method's primary advantages lie in its conceptual and implementation simplicity, requiring no evaluation function, heuristic policies, or tree management, which makes it suitable for domains with very large branching factors where enumerating all possibilities is infeasible. It performs independent assessments without biasing toward prior knowledge, allowing it to handle complex state spaces effectively with minimal domain-specific tuning. However, pure Monte Carlo search suffers from significant inefficiencies, as each simulation redundantly explores overlapping paths from the root, leading to repeated computations and poor sample efficiency. It also scales poorly with game depth, as the variance in rollout outcomes remains high even with many simulations, providing unreliable estimates for long-horizon decisions. Historically, pure Monte Carlo search predates MCTS and was applied in early computer game programs, notably in for estimating position equity through random sampling of game trajectories. In Go, Bernd Brügmann's 1993 Gobble pioneered its use on 9x9 boards, simulating random games to evaluate moves and achieving reasonable performance with limited hardware, laying the groundwork for subsequent Monte Carlo-based Go engines. These applications demonstrated its viability in and high-branching-factor games before the advent of tree-based enhancements.

Advanced Improvements

Since the introduction of Monte Carlo Tree Search (MCTS) in 2006, numerous enhancements have addressed limitations such as in early-game evaluations, inefficient exploration in high-branching domains, and computational scalability. One early improvement is Rapid Action Value Estimation (), introduced in , which mitigates the slow of action values in sparse subtrees by aggregating statistics across similar actions in different contexts, particularly reducing during the opening phases of games like Go. RAVE computes a combined value estimate as a weighted average of the standard MCTS action value and a shared value from all playouts where the action appears, with weights balancing based on visit counts to favor reliable estimates as more simulations accumulate. A major advancement came from integrating deep neural networks, as exemplified in in 2016, where policy networks guide move selection and expansion by predicting promising actions, while value networks provide accurate leaf evaluations to replace or augment random simulations, drastically reducing the computational burden of rollouts. These networks, trained via on expert games and through self-play, enable MCTS to focus on high-value branches, achieving superhuman performance in Go by combining neural priors with tree search. Building on this, in 2017 introduced the Predictor + UCT (PUCT) selection formula, which incorporates prior probabilities P from a policy network into the upper confidence bound: Q(s,a) + c \cdot P(s,a) \cdot \frac{\sqrt{N(s)}}{1 + n(s,a)} where Q(s,a) is the mean action value, c is an exploration constant, N(s) is the parent visit count, and n(s,a) is the child visit count; this enhances exploration-exploitation balance by prioritizing actions with both empirical promise and neural-guided likelihood. Other techniques address specific challenges in MCTS. The All Moves As First (AMAF) , adapted for MCTS in 2007 and refined in subsequent works, handles transpositions—common in games like Go where different move sequences lead to identical board states—by accumulating value statistics for actions across all equivalent positions in a , rather than just the root path, thus improving value propagation in transposition-heavy domains. Progressive widening manages high branching factors by dynamically increasing the number of explored actions per node as visit counts grow, often using a power-law formula like k \cdot N^\alpha (with \alpha < 1) to limit expansion to the most promising children early on, preventing tree explosion in continuous or large-action spaces. MCTS variants, such as leaf-parallelization approaches scaled to thousands of cores, distribute simulations across workers while synchronizing shared tree statistics, enabling efficient search in resource-intensive applications like molecular design. More recent developments from 2020 to 2025 emphasize offline training paradigms and advanced architectures. Offline methods train MCTS-guided policies on pre-generated datasets from prior interactions, reducing online computation while maintaining improvement through iterative refinement, as seen in extensions of AlphaZero-like algorithms for resource-constrained environments. with models has enabled MCTS for in language-based , where transformers generate textual sequences or embeddings to inform expansion and simulation, enhancing reasoning in interactive text adventures by combining probabilistic search with . Examples include enhancements like eUCT and dUCT for retrosynthesis (2025) and MCTS for optimizing prompt sequences (2025). These improvements have demonstrably boosted performance, with neural integrations like those in increasing win rates against prior MCTS baselines by over 15% in professional Go benchmarks, underscoring their impact on scaling to complex domains.

Applications

In Board and Card Games

Monte Carlo tree search (MCTS) has had a profound impact on , beginning with the 2007 program , which achieved dan-level performance on 9×9 boards through enhancements like rapid action value estimation () that improved win rates against traditional engines from 24% to 69%. This marked the first time a computer program defeated a human professional at 9×9 Go, reaching approximately 2500 on the Computer Go Server. A decade later, integrated MCTS with deep neural networks, attaining a 99.8% win rate against leading Go programs and securing a 5–0 victory over . The system navigated Go's estimated 10^{170} possible game states by concentrating simulations on high-value branches, yielding gains of over 500 points for MCTS-based Go engines compared to prior methods. Enhanced by neural networks, performed millions of simulations per second on GPU clusters during play. In the landmark 2016 match against world champion , won 4–1, with MCTS credited for enabling creative, intuitive moves that mimicked human strategy by statistically sampling outcomes rather than exhaustive enumeration. This victory highlighted MCTS's strength in handling , as the algorithm's tree expansion focused on promising paths informed by policy and value networks. Chess presents a more constrained domain for MCTS due to the game's relatively lower and the prevalence of alpha-beta pruning in engines like , which excels in solved endgames via tablebases. Nonetheless, MCTS finds niche applications in endgame exploration and opening preparation, where basic implementations achieve around 500 without traditional heuristics, while hybrids like leverage it for probabilistic evaluation, reaching superhuman performance at approximately 3700 as of May 2025. Integrations in experimental variants explore MCTS for dynamic positions, though traditional search remains dominant for its efficiency in perfect-information scenarios. For card games with imperfect information, such as poker, MCTS hybrids with counterfactual regret minimization (CFR) address hidden states through action abstractions and real-time subgame solving. , a CFR-MCTS , defeated top professionals in no-limit Texas Hold'em over 120,000 hands, winning by 147 millibets per game with 99.98% . These abstractions compress the state space, allowing MCTS simulations to approximate equilibria despite incomplete knowledge. In , MCTS variants like Monte Carlo (PIMC) simulate hidden cards by sampling possible deals, aiding bidding and play decisions. PIMC-based approaches have demonstrated efficacy in partnership games with elements, as seen in wins by programs like Wbridge5 in 2016, 2017, and 2018. Overall, MCTS has elevated program performance across these games, with simulation speeds reaching millions per second on GPUs enabling scalable analysis.

In Other Domains

Monte Carlo tree search (MCTS) has been adapted for robotics applications, particularly in path planning within uncertain and dynamic environments. In the DARPA Subterranean Challenge, which began in 2018, teams employed MCTS-based algorithms for decentralized multi-robot exploration in complex underground settings, enabling efficient mapping and navigation amid sensor noise and partial observability. For instance, decentralized MCTS variants supported real-time motion planning, achieving robust performance in the challenge finals by balancing exploration of unknown areas with exploitation of gathered data, as demonstrated in systems like those of Team CERBERUS. These adaptations handle real-time sensor noise through progressive widening techniques that sample continuous state spaces, ensuring safe and agile locomotion in legged robots. In , MCTS addresses problems such as scheduling and of the traveling salesman problem (TSP), with notable applications in 2020s . For , MCTS algorithms generate high-quality solutions for large-scale instances by iteratively simulating production sequences, outperforming traditional in dynamic environments. In , MCTS facilitates on-demand and autonomous vehicle dispatch by optimizing routes under operational constraints like time windows and traffic variability, as demonstrated in multi-agent systems where it improves delivery efficiency by up to 20% over baseline heuristics. For TSP , neural-guided MCTS paradigms have solved instances with thousands of nodes, integrating heatmaps for edge selection to enhance scalability in routing. In finance, MCTS supports under uncertainty and through simulated market scenarios. By modeling multi-period planning as a , MCTS combined with neural networks selects asset allocations that maximize returns while minimizing variance, outperforming static models in volatile markets. For , MCTS simulates hedging strategies, accounting for transaction costs and market frictions to compute superhedging prices with reduced computational overhead compared to full enumeration. In trading applications, it enables real-time decision-making by exploring policy spaces in frameworks, achieving lower drawdowns in backtested portfolios. Within scientific domains, MCTS aids molecular design for and optimization. In , MCTS integrates with structure prediction tools like to generate de novo molecules, using tree search to explore chemical spaces for binding affinity optimization; for example, assembly methods compete with AlphaFold-Multimer in predicting protein complexes, offering advantages in sampling rare conformations for drug target validation. In , methods, including tree search variants, optimize circuit architectures by searching over gate sequences, with reports of depth reductions in variational quantum eigensolvers through techniques like progressive widening and . These approaches leverage MCTS's simulation phase to evaluate quantum metrics like under noise. Recent advancements from 2023 to 2025 highlight MCTS in autonomous driving and climate-related policy decisions. In autonomous driving, MCTS enables behavior planning and trajectory generation in interactive traffic, as seen in scenario simulations where it outperforms rule-based methods by anticipating multi-agent behaviors, with applications in systems like those tested by for safe navigation in urban environments. For climate modeling, MCTS optimizes policy decisions in carbon markets by simulating emission trading strategies, aiding for environmental regulations and improving forecast accuracy in stochastic pricing models. MCTS also continues to advance , as in frameworks like for model-free planning in diverse environments. To extend MCTS beyond discrete games, adaptations include for continuous action spaces and multi-agent extensions. approximates continuous domains, such as velocities or maneuvers, by sampling action particles and refining via gradient-based optimization, preserving optimality guarantees in spectral expansions. Multi-agent variants employ decentralized MCTS for coordination, using shared value estimates to resolve conflicts in or tasks.

Evaluation

Advantages

Monte Carlo tree search (MCTS) exhibits notable sample efficiency by concentrating simulations on promising branches of the search tree, thereby outperforming uniform random sampling in environments with high branching factors, such as complex board games. This focused exploration allows MCTS to allocate computational resources more effectively, leading to better value estimates with fewer simulations compared to flat methods. As an anytime , MCTS progressively improves its decision quality with additional computation time, without relying on a predefined search depth, making it particularly suitable for applications where decisions must be made under varying time constraints. This property enables the algorithm to provide a reasonable at any interruption point, balancing responsiveness and accuracy in dynamic settings. MCTS demonstrates strong domain independence, requiring minimal domain-specific heuristics and instead deriving action values through and random , which facilitates its application across diverse sequential problems without extensive prior knowledge. This reliance on general-purpose allows MCTS to adapt to new domains by leveraging only a basic model of state transitions and rewards. In handling imperfect information, MCTS employs techniques such as information set sampling or abstractions to manage hidden states, effectively approximating belief states during expansion and to support in partially observable environments like card games. The algorithm's is enhanced by its inherent parallelizability, where multiple simulations can be run concurrently across processors, and recent implementations support GPU acceleration to perform millions of simulations per second, enabling handling of large-scale problems. Empirically, MCTS has enabled superhuman performance in the game of Go, where programs like achieved a 99.8% win rate against top computer opponents and defeated world champions without relying on handcrafted features, demonstrating its power in vast state spaces.

Disadvantages

Despite its effectiveness in complex decision-making problems, Monte Carlo Tree Search (MCTS) incurs high computational costs, often requiring millions of simulations to achieve reliable accuracy in domains with large state spaces, such as Go, which renders it prohibitive without access to powerful hardware like GPUs or TPUs. This demand stems from the need for extensive sampling during the simulation phase to approximate value estimates, leading to prolonged runtimes that limit applicability in resource-constrained environments. MCTS is highly sensitive to hyperparameters, particularly the exploration constant C in the Upper Confidence Bound (UCB) formula and the choice of rollout policies, which must be carefully tuned for each specific to balance and effectively. Poor tuning can result in suboptimal search behavior, such as over- in low-variance settings or premature to inferior actions, necessitating domain-specific expertise or additional optimization efforts. In games with shallow depth or small branching factors, such as solved problems like , MCTS proves inefficient as an overkill approach, where random rollouts during simulations waste computational resources compared to traditional exact methods like alpha-beta pruning that can fully explore the tree. The reliance on sampling exacerbates this inefficiency in such scenarios, as the method's strength in handling deep, uncertain trees does not translate to benefits in simpler structures. The nature of MCTS introduces significant variance in value estimates derived from finite simulations, causing fluctuating results across runs and requiring numerous iterations to stabilize performance, which further amplifies computational demands. This variance arises from the random selection in rollouts, potentially leading to unreliable in time-limited settings without techniques. Standard MCTS struggles with games of imperfect , where hidden states introduce uncertainty that the basic algorithm cannot handle natively, often resulting in inaccurate evaluations unless extended by methods like POMCP to sample over possible information sets. Without such adaptations, the search tree fails to account for opponents' concealed actions or states, leading to suboptimal policies in domains like poker or .

References

  1. [1]
    [PDF] A Survey of Monte Carlo Tree Search Methods - CORE
    Abstract—Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the.
  2. [2]
    [PDF] Monte Carlo Tree Search and Its Applications
    Sep 4, 2015 · Monte Carlo Tree Search (MCTS) is a probabilistic algorithm using random simulations to grow a game tree, combining random sampling with tree ...
  3. [3]
    [PDF] Bandit based Monte-Carlo Planning - General Game Playing
    In this paper we consider Monte-Carlo planning algorithms that we call rollout- based. As opposed to the algorithm described in the introduction (stage-wise.Missing: original | Show results with:original
  4. [4]
    [PDF] An Analysis of Monte Carlo Tree Search - Brown CS
    Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in re- cent years. Despite the vast amount of ...
  5. [5]
    Monte Carlo Tree Search: a review of recent modifications and ...
    Jul 19, 2022 · MCTS has been originally proposed in the work by Kocsis and Szepesvári (2006) and by Coulom (2006), as an algorithm for making computer players ...<|control11|><|separator|>
  6. [6]
    [PDF] Monte Carlo Tree Search: A Tutorial - Winter Simulation Conference
    2012. “A Survey of Monte Carlo Tree Search Methods”.IEEE Transactions on Computational Intelligence and AI in Games 4(1):1–49.
  7. [7]
    Mastering the game of Go with deep neural networks and tree search
    Jan 27, 2016 · To efficiently combine MCTS with deep neural networks, AlphaGo uses an asynchronous multi-threaded search that executes simulations on CPUs, and ...
  8. [8]
    Monte Carlo Tree Search: A Review of Recent Modifications ... - arXiv
    Mar 8, 2021 · Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems.Missing: scholarly | Show results with:scholarly
  9. [9]
    (PDF) Monte Carlo Tree Search: a review of recent modifications ...
    Jul 19, 2022 · Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems.
  10. [10]
    What Is Monte Carlo Simulation? - IBM
    Monte Carlo Simulation is a type of computational algorithm that uses repeated random sampling to obtain the likelihood of a range of results of occurring.
  11. [11]
    Monte Carlo Simulation: What It Is, How It Works, History, 4 Key Steps
    A Monte Carlo simulation estimates the likelihood of different outcomes by accounting for the presence of random variables.Using Monte Carlo Analysis to... · Random Variable · Excel
  12. [12]
    [PDF] Stan Ulam, John von Neumann, and the Monte Carlo Method
    T he Monte Carlo method is a sta- tistical sampling technique that over the years has been applied successfully to a vast number of scientific problems.
  13. [13]
    [PDF] THE BEGINNING of the MONTE CARLO METHOD
    John von Neumann saw the relevance of Ulam's suggestion and, on March 11,. 1947, sent a handwritten letter to Robert. Richtmyer, the Theoretical Division lead-.Missing: origin | Show results with:origin
  14. [14]
    Estimating Pi Using the Monte Carlo Method and Particle Tracing
    May 24, 2024 · This involves randomly placing points inside a square and counting how many lie within a circle inscribed in the square. The ratio of points ...
  15. [15]
    [PDF] A monte-carlo method for solving a class of integral equations
    This note describes a random walk equivalent to the Neumann series solution of an integral equation. A diffusion analogy and the problem of importance sampling ...
  16. [16]
    Decision analysis in projects - Monte Carlo simulation - PMI
    This article presents an alternative calculation technique that offers, in some cases, important advantages over decision tree analysis.<|control11|><|separator|>
  17. [17]
    Monte Carlo Process: Decision Making Under Uncertainty
    On the other hand, by means of a Monte Carlo analysis, the method also provides an exploratory scenario approach that enables the capture of the uncertainty, ...
  18. [18]
    [PDF] 6 ADVERSARIAL SEARCH - Artificial Intelligence: A Modern Approach
    The minimax algorithm performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is m, and there are b legal moves at ...<|control11|><|separator|>
  19. [19]
    [PDF] Branching factor - Model AI Assignments
    Jul 13, 2018 · average branching factor for the game Go is 250.[1]. Higher branching factors make algorithms that follow every branch at every node, such as ...
  20. [20]
    Hitting the Jackpot: The Birth of the Monte Carlo Method | LANL
    Nov 1, 2023 · First conceived in 1946 by Stanislaw Ulam at Los Alamos† and subsequently developed by John von Neumann, Robert Richtmyer, and Nick Metropolis.
  21. [21]
    The Monte Carlo Method - Taylor & Francis Online
    Journal of the American Statistical Association Volume 44, 1949 - Issue 247 ... The Monte Carlo Method. Nicholas Metropolis Los Alamos Laboratory. &. S. Ulam ...
  22. [22]
    The Monte Carlo Method as a Natural Mode of Expression in ...
    The Monte Carlo Method as a Natural Mode of Expression in Operations Research. Gilbert W. King.
  23. [23]
    [PDF] JM Hammersley and DC Handscomb - Computer Science
    About 1948 Fermi, Metropolis, and Ülam obtained Monte Carlo estimates for the eigenvalues of the Schrödinger equation.
  24. [24]
    [PDF] Stan Ulam, John von Neumann, and the Monte Carlo Method
    T he Monte Carlo method is a sta- tistical sampling technique that over the years has been applied successfully to a vast number of scientific problems.
  25. [25]
    Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
    This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase.Missing: UCT | Show results with:UCT
  26. [26]
    [PDF] Monte Carlo Tree Search in Go - Department of Computing Science
    Since 2007, all Computer Olympiads in 19 × 19 Go have been won by MCTS programs: in 2007 MOGO; 2008 the MCTS version of THE MANY FACES OF. GO; 2009, 2011 and ...Missing: victories | Show results with:victories
  27. [27]
    gcp/Leela: Leela - a Go program combining Monte Carlo ... - GitHub
    It is stronger than Pachi, which probably was the strongest free (non neural network) MCTS engine around ~2017. It uses a combination of Policy Gradient ...
  28. [28]
    The Grand Challenge of Computer Go - Communications of the ACM
    Mar 1, 2012 · We begin by discussing the performance of MCTS programs in 9 × 9 Go. In 2007, MoGo won the first game against a professional player, Guo Juan (5 ...
  29. [29]
    Fuego Computer Go Competitions - SourceForge
    Third place in the 74th (9x9) in August 2011. Fuego occasionally plays on KGS under the user names Fuego19, Fuego13 and Fuego9. In March 2013, Fuego19 achieved ...Missing: program | Show results with:program
  30. [30]
    Monte Carlo tree search with spectral expansion for planning with ...
    Dec 4, 2024 · Monte Carlo tree search is a powerful planning algorithm that strategically explores simulated future possibilities.
  31. [31]
  32. [32]
    [PDF] A Survey of Monte Carlo Tree Search Methods - UC Irvine
    mentioned “A Survey of Monte Carlo Tree Search Methods” publication. ▷ Thank you to Tsan-sheng Hsu (Academia Sinica, Institute of Information. Science) for ...
  33. [33]
    A Survey of Monte Carlo Tree Search Methods - IEEE Xplore
    Feb 3, 2012 · A Survey of Monte Carlo Tree Search Methods. Abstract: Monte Carlo tree search (MCTS) is a recently proposed search method that combines the ...Missing: PDF | Show results with:PDF
  34. [34]
    Bandit Based Monte-Carlo Planning - SpringerLink
    Download book PDF · Machine Learning: ECML 2006 (ECML 2006). Bandit Based ... Kocsis, L., Szepesvári, C. (2006). Bandit Based Monte-Carlo Planning. In ...
  35. [35]
    [PDF] Mastering the game of Go with deep neural networks and tree search
    Jan 28, 2016 · All games of perfect information have an optimal value function, v*(s), which determines the outcome of the game, from every board position.
  36. [36]
    Combining online and offline knowledge in UCT - ACM Digital Library
    Combining online and offline knowledge in UCT. Article. Share on. Combining online and offline knowledge in UCT. Authors: Sylvain Gelly. Sylvain Gelly. Univ ...
  37. [37]
    [PDF] Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
    In this paper was presented a new algorithm for Monte-Carlo tree search. It ... 1 Games of the matches are available at http://remi.coulom.free.fr/CG2006/ ...Missing: UCT | Show results with:UCT
  38. [38]
    [PDF] Using Confidence Bounds for Exploitation-Exploration Trade-offs
    Thus the use of upper confidence bounds automatically trades off between exploitation and exploration. 3. The Adversarial Bandit Problem with Shifts. The ...
  39. [39]
    [PDF] 5.4 Monte Carlo Tree Search
    answer, called pure Monte Carlo search, is to do N simulations starting from the current Pure Monte Carlo search state of the game, and track which of the ...
  40. [40]
    Monte-Carlo Tree Search - Chessprogramming wiki
    CG 2006 · Rémi Coulom (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. CG 2006 · Bruno Bouzy (2006). History and Territory ...
  41. [41]
  42. [42]
  43. [43]
    [PDF] All-Moves-As-First Heuristics in Monte-Carlo Go
    In this paper we consider the following algorithms which are parameterized variants of the AMAF heuristic. The α-AMAF algorithm blends the value estimates from ...Missing: transpositions | Show results with:transpositions
  44. [44]
    [PDF] Parallel Monte-Carlo Tree Search | Semantic Scholar
    Three parallelization methods for MCTS are discussed: leaf parallelization, root Parallelization, and tree parallelization. Monte-Carlo Tree Search (MCTS) ...
  45. [45]
    [PDF] Monte-Carlo Tree Search and Rapid Action Value Estimation in ...
    Mar 22, 2011 · Monte-Carlo tree search [1] is a new paradigm for search, which has revolutionised computer Go [2, 3], and is rapidly replacing traditional ...
  46. [46]
    [PDF] Playing Chess with Monte Carlo Tree Search - Son Tung Do
    The chess AI uses Monte Carlo Tree Search (MCTS), a heuristic search algorithm, to guide the search and play without human expertise. MCTS-100 can beat ...
  47. [47]
    [PDF] Superhuman AI for heads-up no-limit poker: Libratus beats top ...
    Jan 26, 2018 · No prior AI has defeated top human players in this game. In this paper we introduce Libratus (12), an. AI that takes a distinct approach to ...Missing: MCTS | Show results with:MCTS
  48. [48]
    (PDF) Optimizing $\alpha\mu - ResearchGate
    Jan 29, 2021 · The boosted version of Wbridge5 (Ventos et al. 2017) won the WCBC in 2016, 2017 and 2018. The best Bridge bots use PIMC, with a double-dummy.
  49. [49]
    Decentralised Multi-Robot Exploration Using Monte Carlo Tree Search
    This article presents the core technologies and deployment strategies of Team CERBERUS that enabled our winning run in the DARPA Subterranean Challenge finals.
  50. [50]
    Multi-robot, multi-sensor exploration of multifarious environments ...
    Oct 31, 2023 · Experiments are also detailed from the DARPA Subterranean Challenge, where our proposed autonomy pipeline contributed to us winning the “Most ...
  51. [51]
    Investigating the Monte-Carlo Tree Search approach for the job ...
    Oct 9, 2025 · ... leaf represents a complete sequence of scheduling decisions. This ... Selection phase. We begin at the root node : i. We first apply the ...
  52. [52]
    Optimizing on-demand food delivery with BDI-based multi-agent ...
    Jul 11, 2025 · These results indicate that advanced intention scheduling methods like MCTS can significantly improve real-time decision-making, thereby ...Missing: 2020s | Show results with:2020s
  53. [53]
    MCTS Based Dispatch of Autonomous Vehicles under Operational ...
    This article incorporates operational constraint satisfaction into the dispatch planning by utilising the MCTS based dispatch planner Flow-Achieving Scheduling ...
  54. [54]
    Hedging of financial derivative contracts via Monte Carlo tree search
    Oct 20, 2023 · This paper applies the Monte Carlo tree search as a method for replication in the presence of risk and market friction.
  55. [55]
    The Application of Artificial Intelligence (AI) in Early Drug Discovery
    Further augmenting the capabilities of reinforcement learning in drug discovery is MCTS, another critical tool that enhances decision-making by deftly balancing ...
  56. [56]
    Comparison of assembly with MCTS vs. AlphaFold-multimer on ...
    Recent breakthroughs, such as AlphaFold2 [1], have dramatically advanced static structure prediction, achieving near-experimental accuracy and reshaping the ...
  57. [57]
    Monte Carlo graph search for quantum circuit optimization
    Dec 19, 2023 · This work proposes a quantum architecture search algorithm which is based on a Monte Carlo graph search and measures of importance sampling.Article Text · INTRODUCTION · PRELIMINARIES · EXPERIMENTS
  58. [58]
    [2502.03962] Quantum Circuit Design using a Progressive Widening ...
    Feb 6, 2025 · This article proposes a gradient-free Monte Carlo Tree Search (MCTS) technique to automate the process of quantum circuit design.
  59. [59]
    Monte-Carlo Tree Search for Behavior Planning in Autonomous ...
    Oct 18, 2023 · This study presents an innovative approach to address this challenge by utilizing a Monte-Carlo Tree Search (MCTS) based algorithm for autonomous driving ...Missing: 2023-2025 | Show results with:2023-2025
  60. [60]
    Monte Carlo Tree Search Based Trajectory Generation for ...
    This paper focuses on the development of a trajectory planning method for connected and automated vehicles (CAVs) that takes into account the interactive ...Missing: autonomous driving path
  61. [61]
    Leveraging Deep Learning for Carbon Market Price Forecasting and ...
    May 30, 2025 · MCTS is a simulation-based search algorithm widely used for sequential decision-making under uncertainty, including portfolio optimization in ...
  62. [62]
    (PDF) Monte-Carlo Tree Search in Continuous Action Spaces with ...
    Aug 6, 2025 · In this paper, we introduce Value-Gradient UCT (VG-UCT), which combines traditional MCTS with gradient-based optimization of action particles.Missing: extensions | Show results with:extensions
  63. [63]
    An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search
    Apr 26, 2022 · This paper proposes a dynamic sampling tree policy for Monte Carlo Tree Search (MCTS) to efficiently allocate computational budget for best ...
  64. [64]
    [PDF] Speculative Monte-Carlo Tree Search - NIPS
    Since MCTS is an Anytime Algorithm2 [12], it allows us to obtain “partial” simulation results (i.e., values and policies) at any point during its execution, and ...
  65. [65]
    An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search
    Mar 2, 2023 · Takimoto. 2014. "Efficient Sampling Method for Monte Carlo Tree Search Problem". IEICE Transactions on Information and Systems 97(3):392--398.
  66. [66]
  67. [67]
    Information capture and reuse strategies in Monte Carlo Tree ...
    Kocsis and Szepesvári [7] prove that, for games of perfect information, UCT converges on the optimal move in the limit. That is, as the number of iterations ...
  68. [68]
    Scalability and parallelization of Monte-Carlo tree search
    We analyze its scalability, and in particular its limitations and the implications in terms of parallelization. We focus on our Go program MoGo and our ...
  69. [69]
  70. [70]
    Rethinking the "Heatmap + Monte Carlo Tree Search'' Paradigm for...
    W3: Complexity and Sensitivity of MCTS Hyperparameters: MCTS requires numerous hyperparameters and appears highly sensitive to these settings, which could ...
  71. [71]
    Combining Monte Carlo Tree Search and Heuristic Search for ...
    Mar 13, 2025 · A sensitivity analysis of this important hyperparameter is shown in Sect. Exploitation vs exploration coefficient analysis. The illegal search ...<|separator|>
  72. [72]
    [PDF] Pitfalls and Solutions When Using Monte Carlo Tree Search for ...
    MCTS is used for complex games, but limiting search can lead to exploitation. Large search spaces and many actions per move are also issues.
  73. [73]
    Two Games with Monte Carlo Tree Search - null program
    Apr 27, 2017 · The AI's blindness is caused by shallow traps, a common problem for MCTS. It's what makes MCTS a poor fit for Chess. A shallow trap is a ...
  74. [74]
    [PDF] Comparison of Monte Carlo Tree Search Methods in the Imperfect ...
    [7] T. Furtak and M. Buro, “Recursive monte carlo search for imperfect information games,” in Computational Intelligence in Games (CIG), 2013 IEEE Conference ...
  75. [75]