Fact-checked by Grok 2 weeks ago

OpenAI Five

OpenAI Five is system developed by , designed to play the complex at level. It simulates five AI agents coordinating as in the game's five-on-five format, handling partial observability, long time horizons of up to 45 minutes, and without relying on human demonstration data or explicit communication between agents. The system employs self-play training via Proximal Policy Optimization (PPO), processing approximately 180 years' worth of simulated games per day on of 256 GPUs and 128,000 CPU cores, while observing every fourth frame, yielding ~20,000 moves (decision points) per game. Announced in June 2018, OpenAI Five had previously defeated amateur and semi-professional human teams in informal scrims. Its first widely streamed public matches on occurred on August 5, 2018, during the OpenAI Five Benchmark event, where it defeated a team of 99.95th-percentile players, including former professionals, 2–0 in a best-of-three series, and in informal scrims, it won two of its first three games against semi-pro teams at the 99th percentile. A major milestone came on April 13, 2019, when the system competed against , the reigning world champion team, defeating 2–0 in a best-of-three match in a livestreamed —marking the first time an defeated professional players in a live, full-game setting. This achievement highlighted the scalability of for multi-agent environments, requiring 800 petaflop/s-days of compute and over 45,000 years of simulated gameplay. The underlying technology, detailed in a December 2019 , leverages distributed training to process batches of about 2 million frames every two seconds over 10 months, enabling continual learning and adaptation without predefined strategies. OpenAI Five's success underscored the potential for to tackle real-world problems involving , imperfect information, and strategic depth, influencing subsequent advancements in beyond gaming.

Background and Development

Dota 2 as an AI Benchmark

is a (MOBA) game featuring 5v5 team-based matches where players control unique in a environment. Each team aims to destroy the opponent's ancient structure by managing resources such as gold and experience gained from defeating enemy creeps, neutral monsters, and opposing , while strategically using hero-specific abilities and purchasing items to enhance capabilities. The game's mechanics include a that limits visibility to areas within of friendly units, introducing partial observability and requiring players to infer enemy positions based on limited information. Released on July 9, 2013, by , built upon the mod and quickly became a prominent title with an open scripting that facilitated bot development. This , exposed through , allowed researchers and developers to query game states and issue commands, enabling the creation of AI agents. AI research in gaming has progressed from two-player, perfect-information environments like chess, where IBM's Deep Blue defeated world champion Garry Kasparov in 1997 using brute-force search and evaluation functions, to more complex board games like Go. DeepMind's , in 2016, surpassed human experts in Go through deep neural networks and , handling vast state spaces in a perfect-information game but still in a turn-based, single-opponent setting. represents a significant leap to multi-agent, imperfect-information scenarios, where five agents per team must collaborate in real-time against coordinated opponents, mirroring real-world challenges in cooperative systems. The game's challenges make it a rigorous for , with matches typically lasting 30–45 minutes but having no fixed upper limit and sometimes exceeding 60 minutes or longer, involving long time horizons that demand sustained over thousands of . Partial observability via the requires agents to manage uncertainty and predict adversary actions, while continuous action spaces—such as precise movement and ability targeting—complicate exploration in . Additionally, effective teamwork among the five agents is essential, as success depends on role specialization, communication through implicit coordination, and adaptive strategies in a highly interactive, non-stationary environment.

Training Process and Milestones

OpenAI's engagement with Dota 2 as an AI research domain began in early with the development of a separate 1v1 bot, featuring initial prototypes focused on simplified matches that tested core techniques before scaling toward team-based play. By late summer , this 1v1 bot had advanced sufficiently to compete against professional players, marking an early breakthrough in adapting to the game's strategic depth. The OpenAI Five project, specifically targeting the full 5v5 format with team coordination challenges, commenced training in June 2018 and was first publicly introduced on June 25, 2018. This laid the groundwork for the more complex training regimes that followed. Central to the training methodology was , a policy gradient reinforcement learning algorithm scaled to handle the high-dimensional action and observation spaces of Dota 2. To manage the game's complexity, curriculum learning was employed, beginning with a restricted environment featuring approximately 18 heroes, with item purchases handled via scripted logic, then progressively expanding the hero pool up to 25 heroes and incorporating other mechanics, though the final competitive system used a pool of 17 heroes and did not reach the full pool exceeding 100 options. formed the core of the evolution process, where agents played millions of games daily against evolving versions of themselves—equivalent to 180 years of gameplay per day—enabling rapid skill acquisition through competitive pressure. Diversity was maintained through self-play, in which 80% of games were played against the latest policy and 20% against past versions, preventing convergence on narrow strategies and contrasting with league-based training used in systems like AlphaStar. Key milestones unfolded in 2018, with the first full 5v5 self-play games achieved in June, demonstrating viable team-level performance after months of iterative training. In August 2018, OpenAI Five participated in live demonstration matches against professional players at The International, losing both games but showcasing emergent behaviors like coordinated denies and last-hitting, though it still struggled with long-term planning. The system's capabilities advanced significantly in 2019 through extended training with eight times more compute, culminating on April 13, 2019, when OpenAI Five defeated OG, the reigning world champion Dota 2 team, by winning two back-to-back games (2-0) in a single best-of-three series in a livestreamed exhibition. Following this achievement, OpenAI released the final version of OpenAI Five in an online arena, allowing humans to compete against or team up with the AI, marking the project's transition to public accessibility. The final version, trained over 10 months, represented the culmination of these efforts.

Hardware and Computational Resources

OpenAI Five's training demanded immense computational scale, relying on a distributed cluster comprising 256 NVIDIA Tesla P100 GPUs (on Google Cloud Platform) and 128,000 CPU cores to enable simulations equivalent to 180 years of gameplay per day. This infrastructure allowed the system to generate and process vast amounts of experience data through , with the training process spanning approximately 10 months. The setup leveraged for the scaled 5v5 training and for earlier iterations as platforms, with the primary configuration using 256 P100 GPUs and 128,000 preemptible CPU cores on GCP while earlier versions employed 256 K80 GPUs on Azure; it was integrated with for orchestration and custom internal tools to handle the complexity of coordinating thousands of processes across the cluster. Optimizations focused on efficient and simulation throughput, ensuring stable training despite the high dimensionality of the 5v5 environment. During live matches against professional teams, was adjusted with an average reaction time of 200 milliseconds, adjusted to approximate human response times for the live matches to match the real-time demands of Dota 2. Custom software pipelines minimized communication overhead, allowing parallel GPU computations to overlap with network delays. Resource needs evolved progressively: earlier iterations, such as the 1v1 bot, trained on smaller clusters before scaling to the full 5v5 configuration, which roughly doubled the number of CPU cores (from 60,000 to 128,000) and upgraded from 256 K80 GPUs to 256 P100 GPUs to handle team coordination and long-term strategy. The overall effort equated to tens of thousands of years of simulated gameplay, with costs estimated in the millions of dollars and funded through OpenAI's investments.

Technical Architecture

Reinforcement Learning Framework

OpenAI Five employs () as its primary reinforcement learning algorithm, an on-policy method designed to enhance sample efficiency and training stability in complex environments. builds on policy gradient techniques by constraining policy updates to prevent destabilizing shifts, making it particularly suitable for the high-dimensional, partially observable setting of 2. This approach allows the system to iteratively improve through , where agents learn from interactions with versions of themselves. At the core of PPO in OpenAI Five is an -critic architecture, where the component represents the network that outputs a over possible , and the critic estimates the value function to predict future rewards from given states. The advantage function, computed using generalized advantage estimation (GAE), quantifies how much better an is compared to the average, guiding updates. To promote exploration and avoid premature convergence to suboptimal policies, the training incorporates regularization, which adds a term to that penalizes overly deterministic distributions. The multi-agent nature of OpenAI Five, involving five cooperative per team, is addressed through centralized training with decentralized execution. During training, joint trajectories from games are collected and used to update all ' policies simultaneously, enabling coordinated learning without explicit communication. In execution, each operates independently based on its local observations, fostering emergent . Reward shaping plays a crucial role in encouraging cooperative behaviors, with the shared reward signal including penalties for individual deaths (-1 per death), alongside terms for other game events like kills (-0.6 per hero kill), last-hits (-0.16 per last hit), and denies (0.15 per deny) to align incentives with team success. PPO's optimization relies on a clipped to balance and while limiting update magnitudes: L^{\text{CLIP}}(\theta) = \hat{\mathbb{E}}_t \left[ \min\left(r_t(\theta) \hat{A}_t, \clip(r_t(\theta), 1-\epsilon, 1+\epsilon) \hat{A}_t \right) \right] Here, r_t(\theta) denotes the probability ratio between the new and old policies for action at timestep t, \hat{A}_t is the estimated , and \epsilon is the clipping parameter set to 0.2, ensuring updates remain proximate to the previous policy and maintaining monotonic improvement. The full combines this with a value and term, weighted by hyperparameters, to jointly optimize policy, value , and .

Neural Network Design

The neural network architecture powering each in OpenAI Five centers on a employing a single-layer (LSTM) unit with 4096 hidden units, designed to manage sequential decision-making amid the game's temporal dynamics. This LSTM functions as the primary mechanism for , recurrently updating its hidden state based on prior observations to capture recent game history. The architecture incorporates separate output heads attached to the LSTM: a head that generates action probabilities and a value head that estimates expected future rewards, enabling the to balance immediate actions with long-term assessment. Input processing begins with the game's raw state, which is transformed into a high-dimensional vector comprising approximately 16,000 values. This vector aggregates diverse elements, including positions and statuses of allied and enemy units, health and levels, inventory details, and a discretized encoding spatial awareness across the battlefield. To handle the complexity, observations are masked for unavailable information (e.g., ) and concatenated into a fixed-size input fed sequentially into the LSTM. The policy head outputs a distribution over approximately 8,000–80,000 discretized actions, derived from the continuous control space of Dota 2 through a hierarchical grammar-based tokenization system. This system abstracts actions into an 8-token sequence—such as selecting a unit type, target, or movement direction—reducing the effective output complexity while preserving expressiveness for intricate maneuvers like item usage or ability targeting. The value head, in contrast, produces a scalar estimate of the game's outcome probability, aiding in reward prediction during training. These components are optimized jointly using proximal policy optimization (PPO) to refine the agent's performance. All five agents share identical neural network parameters, promoting consistent learned behaviors across the team, while each maintains its own independent LSTM hidden state to facilitate decentralized coordination without explicit communication. This design allows emergent teamwork, such as synchronized attacks, while avoiding centralized bottlenecks in real-time play.

Action and Observation Spaces

OpenAI Five interfaces with the environment through a structured observation space that captures the game's partial observability, mirroring the fog of war mechanic where only nearby areas are visible to units. The agent receives approximately 16,000 numerical values per time step, comprising floats and categorical data for visible friendly units (up to 200, including positions, health, and attributes), enemy units (when in range), neutral creeps, buildings, projectiles, , and global state vectors like game time, gold, experience, and score differentials. Teammate positions and states are fully shared among agents, but enemy details are obscured beyond limits, requiring from incomplete to anticipate threats or strategies. To handle hidden enemy states, OpenAI Five reuses the last observed data for units entering , maintaining a static until they reappear, which encourages the agent to model potential movements based on prior visibility and game dynamics. This approach simulates real-world uncertainty without direct access to full map , emphasizing strategic reasoning over complete knowledge. The excludes raw pixels, instead using semantic features like relative positions and velocities to reduce dimensionality while preserving essential tactical details. The action space employs a , hierarchical structure to navigate Dota 2's , where raw possibilities exceed billions due to sequences of movements, attacks, abilities, and purchases. Actions are parsed via a system that first selects a —such as directional (discretized into 360 degrees), attacks on specific , ability casts (with or areas), stop commands, or item usage—then refines parameters like or unit selection, ensuring only valid combinations are considered. This reduces the effective space from over 1.8 million parameterized actions per to approximately 8,000–80,000 valid options per , filtering out impossibilities like targeting non-existent units or using on-cooldown abilities. Temporally, observations and actions operate at a 7.5 Hz rate (every fourth ), aligned with the game's 30 Hz tick rate by using a frameskip of 4 for computational efficiency, though in live matches, the system issues actions up to every fourth (7.5 Hz) to match real-time pacing. This frequency yields 150–170 actions per minute in practice, with built-in compensation to account for execution delays in the multiplayer , ensuring responsive play without frame-perfect .

Performance and Matches

Self-Play and Internal Validation

OpenAI Five's training relied on a mechanism where the five agents competed against evolving versions of themselves, enabling the system to generate diverse experiences and refine strategies without human intervention. This approach, starting from random weights, provided a natural curriculum for exploration, with the agents playing the equivalent of 180 years of games daily across distributed compute resources. To ensure stable learning and exposure to challenging opponents, approximately 80% of games pitted the current model against itself, while 20% involved matches against a prior version, gradually increasing the proportion of self-matches as training progressed. This self-improvement process, rooted in techniques like those pioneered in Neurogammon, allowed OpenAI Five to surpass initial baselines rapidly. Internal validation focused on benchmarks that measured progress against controlled opponents, including open-source bots and earlier model iterations. The system achieved win rates exceeding 95% against basic open-source bots early in development, demonstrating foundational competence in mechanics like last-hitting and laning. More advanced evaluations used internal tournaments simulating professional-level scenarios, where OpenAI Five's performance was tracked via the rating system; by mid-training, it consistently defeated legacy versions with win rates approaching 99.9%, indicating superhuman coordination in 5v5 matches. These closed-loop tests helped quantify improvements in team synergy and under uncertainty. Iterative enhancements addressed key challenges through targeted debugging and studies, particularly in areas like agent coordination and . Early iterations exhibited issues with shared resource management, such as inefficient item distribution (e.g., multiple agents purchasing overlapping team items like wards), which was mitigated by introducing hard-coded sub-systems for item buying to assign responsibilities dynamically. The process generated vast datasets for analysis, with over 10 months of training producing datasets including replay data. This replay corpus facilitated post-hoc examination of emergent behaviors, such as coordinated ambushes, and informed refinements to the reward function for sparse objectives like Roshan kills. Such scale underscored the computational intensity required for internal validation, ensuring robustness before broader evaluations.

Matches Against Human Professionals

OpenAI Five's first high-profile public matches against human professionals occurred during in August 2018, where it competed in exhibition games against top-tier teams. In the first match against Brazilian professional team paiN Gaming, OpenAI Five lost after 51 minutes, having regained ground mid-game but succumbing to strategic pushes. The second match pitted OpenAI Five against a team of Chinese professionals (Xiao8, , ROtK, Ferrari_430, SanSheng), resulting in a loss after 45 minutes of competitive play, highlighting the AI's strengths in early-game macro planning but vulnerabilities to human improvisation and mind games. These streamed events, featuring live commentary, showcased the AI's potential despite the defeats, as it demonstrated superior mechanical execution in controlled scenarios. Building on this experience, OpenAI Five achieved a breakthrough in April 2019 during the OpenAI Five , a live-streamed best-of-three series against , the reigning world champions from The International 2018. The matches used five identical accounts per side, with the AI operating without human intervention or communication, restricted to a pool of 17 heroes to manage complexity. OpenAI Five secured a decisive 2-0 victory, winning the first game in approximately 34 minutes through aggressive split-pushing and flawless team coordination that pressured OG's defenses, and the second in approximately 21 minutes by adapting to OG's counter-strategies with precise buyback timings and objective control. Notable moments included the AI's execution of macro strategies like perfect split-pushing to divide enemy forces, though it occasionally exhibited micro errors such as suboptimal last-hitting for gold efficiency. The AI's ability to adapt to human unpredictability was evident in its adjustments to feints and unconventional plays, maintaining an estimated 80-90% win probability in simulations leading up to the event.

Key Metrics and Outcomes

OpenAI Five demonstrated exceptional performance metrics in its engagements with players. In June 2018, it achieved a 100% win rate across multiple games against teams during benchmark matches. By April 2019, the system defeated the reigning world champions, OG, in a best-of-three exhibition series, securing a 2-0 victory and marking the first instance of an system beating an world champion in a full 5v5 match. In a subsequent public arena event, OpenAI Five recorded a 99.4% win rate, winning 7,215 out of 7,257 competitive games against a broad spectrum of opponents over three days. Efficiency metrics highlighted OpenAI Five's operational parity with professional players. The system executed approximately 150-170 actions per minute (APM), aligning closely with the 200-300 APM typical of pros, while reacting to in an of 217 milliseconds. These capabilities enabled coordination among agents, as evidenced by their ability to execute strategies that outperformed teams in and combat engagements during the 2019 finals. The project concluded in 2019 following the release of its final version, with publishing a comprehensive detailing the system's and . Supplemental materials, including implementations and replays, were made available to the to facilitate further study in and multi-agent systems.

Comparisons and Analysis

With Other Game AI Systems

represented a significant advancement in applying () to multi-agent environments with imperfect information, differing markedly from DeepMind's and , which excelled in single-agent, perfect-information games like Go. Whereas and relied on self-play combined with () to master under complete observability, operated in , handling partial observations due to the fog of war and coordinating five agents simultaneously without explicit human knowledge or search algorithms. This shift addressed the challenges of imperfect information and multi-agent , where agents must infer hidden states and align actions without direct communication, contrasting 's focus on evaluating board positions in . In comparison to DeepMind's AlphaStar, developed for around the same period, Five achieved professional-level performance earlier, defeating world champions in April 2019, while AlphaStar's initial show matches occurred in January 2019 and full grandmaster status was reached in October 2019. Both systems leveraged actor-critic architectures with LSTMs to manage partial in complex, games, but Five employed a simpler () algorithm scaled massively through , whereas AlphaStar utilized more sophisticated population-based training with a league of diverse agents to enhance robustness and exploration. This methodological difference highlighted Five's emphasis on efficient scaling of established techniques over hybrid approaches, enabling it to train on equivalent computational resources but with streamlined multi-agent coordination. Relative to earlier Dota 2 bots, such as rule-based scripted systems like DotaScript or OpenAI's initial 2017 RL agent, OpenAI Five pioneered end-to-end learning from raw game states, bypassing hand-crafted heuristics for policies and value functions derived purely from RL. Prior bots depended on predefined rules for pathfinding, combat, and resource management, limiting adaptability in dynamic scenarios, whereas OpenAI Five's neural networks processed high-dimensional observations (over 20,000 features) to learn emergent strategies like team positioning and item builds through 180 years of simulated gameplay per day. This end-to-end paradigm marked a departure from modular, scripted designs, enabling superhuman coordination without domain-specific engineering. These developments, including OpenAI Five, exemplified the broader surge in applications to complex games following AlphaGo's 2016 breakthrough, which demonstrated 's potential for superhuman performance and inspired scaled in multi-agent domains by 2019.

Strengths Relative to Dota 2's Complexity

OpenAI Five's design excels in team coordination by enabling five independent neural networks to cooperate without explicit communication protocols or pre-programmed strategies, allowing emergent behaviors to arise through . For instance, the agents learned to implicitly share vision by positioning themselves to reveal map areas for teammates and executing coordinated maneuvers, such as baiting enemies into ambushes by feigning vulnerability in one location while preparing a counterattack elsewhere. This capability addresses Dota 2's demand for real-time teamwork in a partially observable environment, where human players often struggle with miscommunication or delayed responses. The system's scalability to Dota 2's vast complexity is demonstrated by its ability to handle the 's 118 , each with unique abilities, talents, and synergies, as well as combinatorial item builds exceeding thousands of possible combinations, all managed through end-to-end learned policies rather than rule-based heuristics. Unlike traditional AIs limited to simpler domains, OpenAI Five processes an observation space of approximately 20,000 values per agent—encompassing unit positions, , , and state—while selecting from a discretized action space of 45,000 options every few frames, enabling it to outperform human teams in long-term over lasting up to 50 minutes or 20,000 steps. This learned approach allows the AI to optimize , drafting, and strategic depth without human-engineered shortcuts, surpassing human performance in anticipating multi-step consequences across the 's branching . Adaptability to Dota 2's evolving is a key strength, as OpenAI Five can rapidly fine-tune its policies following game that alter abilities, item effects, or balance changes, often achieving performance within days of retraining on the updated environment. For example, after a major in April 2019, the system was fine-tuned using a subset of its prior data, quickly regaining and exceeding its previous benchmarks by leveraging from the original training. This contrasts with human players who require weeks or months to adapt to meta shifts, highlighting the AI's efficiency in redistributing computational resources to explore new strategies. In quantitative terms, OpenAI Five exhibits a higher in late-game scenarios due to its capacity for processing and unbiased evaluation of complex states, where human fatigue and cognitive limits often lead to errors. During matches against professional teams like , the AI maintained over 80% win rates in extended games, executing precise kiting, positioning, and usage in team fights involving dozens of interacting elements, thereby capitalizing on Dota 2's high-variance endgame dynamics more reliably than humans.

Limitations and Challenges

Despite achieving superhuman performance in restricted scenarios, OpenAI Five exhibited notable gaps in -like , particularly struggling with creative strategies and psychological elements of gameplay. For instance, during matches against professional players, the failed to adapt to feints and unconventional tactics, such as baiting the bots into overcommitting resources, leading to exploitable errors. This over-reliance on patterns derived from vast training data limited its ability to handle novel, -driven maneuvers that s employ in high-stakes . Scalability posed significant challenges for OpenAI Five, primarily due to its enormous computational requirements, which rendered deployment prohibitively expensive. The was trained using up to 1,536 GPUs over approximately 10 months, equivalent to playing 180 years of games per day, highlighting the intensive resources needed for such tasks. Furthermore, the AI lacked to new game versions or patches without full retraining, as 's frequent updates introduced mechanics outside its learned distribution, necessitating costly recomputation for any adaptations. OpenAI Five also faced ethical and technical hurdles, including the potential to exploit unintended or during training and play. To mitigate this, developers imposed restrictions like disabling certain item interactions and vision mechanics, which prevented the AI from developing exploitative behaviors but underscored the risks of RL agents discovering and amplifying flaws in simulated environments. Additionally, the opaque nature of its deep decisions contributed to a lack of explainability, making it difficult to interpret why specific actions were chosen in complex scenarios, a common challenge in large-scale systems. The project was ultimately discontinued in April 2019 following its victory over world champions , as determined it had demonstrated the feasibility of scaled for complex tasks and shifted resources toward broader goals, such as developing more general-purpose models. This move reflected on further 2-specific advancements, with the team noting that continued progress would yield marginal gains relative to emerging opportunities in other domains.

Reception and Impact

Media and Public Response

The announcement and demonstrations of OpenAI Five in 2018 and 2019 garnered extensive media coverage from prominent technology and gaming outlets. The Verge reported on the AI's decisive 2-0 victory over the world champion team in April 2019, highlighting its ability to outmaneuver professional players in full five-versus-five matches. Wired covered the initial June 2018 reveal, where OpenAI Five defeated semiprofessional human teams, emphasizing the breakthrough in scaling to complex, gameplay. These reports, along with coverage in , framed the project as a milestone in AI's application to multiplayer . Public reaction was marked by widespread awe at OpenAI Five's coordinated performance, often described as demonstrating "superhuman teamwork" through seamless agent collaboration without predefined scripts. Live streams of key events, such as the August 2018 benchmark matches at The International, drew over 100,000 concurrent viewers on , underscoring the event's draw for enthusiasts. Commentary in gaming media like GosuGamers portrayed the AI's displays as "intimidating" and innovative, sparking excitement about AI's potential to enhance competitive gaming. AI researchers lauded OpenAI Five for advancing , with the project's paper receiving citations for its demonstration of scaling to levels in a partially . Experts noted the system's handling of long-term strategy and inter-agent communication as a significant step forward, though some critiques in outlets like The Verge questioned whether the hype overshadowed practical limitations, such as the AI's reliance on simplified item usage during early demos. The project also influenced discussions on AI's role in , bolstered by Valve's endorsement through via its Bot API and integration into events like The International. This collaboration highlighted AI's viability for innovation, prompting broader conversations on how such technologies could augment rather than replace human competition.

Influence on and Applications

OpenAI Five significantly advanced () methodologies, particularly by demonstrating the efficacy of () in large-scale, multi-agent environments. The system employed a scaled-up version of to train five coordinated agents in the complex, game , enabling them to learn cooperative behaviors through without human intervention. This approach highlighted 's robustness in handling partial observability, long-term planning, and inter-agent communication, setting a for applying methods beyond single-agent tasks. Furthermore, 's open-sourcing of the Baselines repository, which included implementations refined during OpenAI Five's development, directly influenced subsequent libraries such as Stable Baselines, facilitating easier adoption and experimentation in the research community. The techniques pioneered in OpenAI Five have extended to broader applications, notably in and simulation-based domains requiring multi-agent coordination. In , PPO's sample-efficient optimization has been adapted for multi-robot systems, where agents must collaborate on tasks like and in shared spaces, drawing from Five's emphasis on scalable for emergent cooperation. In real-world simulations, such as urban , these methods inform multi-agent models that optimize vehicle flows and signal timings, leveraging Five's insights into handling high-dimensional state spaces and non-stationary environments. As of November 2025, the foundational paper on , " with Large Scale ," had garnered over 2,600 academic citations, reflecting its enduring influence on multi-agent literature. This body of work has inspired subsequent projects, including DeepMind's efforts in multi-agent systems like AlphaStar, which also employed paradigms to tackle strategic games with imperfect information. On the industry front, OpenAI Five underscored the viability of at massive scales—training on billions of frames across thousands of GPUs—paving the way for increased investments in embodied , where agents interact with physical environments. This demonstration of computational feasibility encouraged funding in areas like autonomous systems and humanoid , though no direct commercial products emerged from Five itself. Notably, the refined algorithm from Five became a cornerstone for later innovations, serving as the optimization backbone in (RLHF) pipelines that aligned large language models like those powering .

References

  1. [1]
    OpenAI Five
    Jun 25, 2018 · OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy ...
  2. [2]
    [1912.06680] Dota 2 with Large Scale Deep Reinforcement Learning
    Dec 13, 2019 · OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
  3. [3]
    OpenAI Five defeats Dota 2 world champions
    Apr 15, 2019 · OpenAI Five is the first AI to beat the world champions in an esports game, having won two back-to-back games versus the world champion Dota 2 team.
  4. [4]
    Dota 2 on Steam
    Developer. Valve ; Publisher. Valve ; Released. Jul 9, 2013.
  5. [5]
    [PDF] Dota 2 with Large Scale Deep Reinforcement Learning - OpenAI
    Dec 13, 2019 · Dota 2 includes a scripting API designed for building bots. The provided API is exposed through. Lua and has methods for querying the visible ...
  6. [6]
    Dota 2 - Valve Developer Community
    Dota 2 is an "Action RTS" where players control heroes with unique abilities, and is the most-played game on Steam with millions of players.
  7. [7]
    OpenAI Bot Defeated The World's Best Dota 2 Player
    The world's best Dota 2 player Dendi was defeated by an artificial intelligence (AI) in a 1v1 game during Valve's yearly Dota 2 tournament.<|separator|>
  8. [8]
    Deep Blue - IBM
    Deep Blue derived its chess prowess through brute force computing power. It used 32 processors to perform a set of coordinated, high-speed computations in ...Missing: citation | Show results with:citation
  9. [9]
    Mastering the game of Go with deep neural networks and tree search
    Jan 27, 2016 · Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go ...
  10. [10]
    More on Dota 2 | OpenAI
    Aug 16, 2017 · Further training before Sumail's match on Thursday increased TrueSkill by two points. Sumail pointed out that the bot had learned to cast razes ...
  11. [11]
    The International 2018: Results | OpenAI
    Aug 23, 2018 · OpenAI Five lost two games against top Dota 2 players at The ... Dota 2 with large scale deep reinforcement learning. Publication Dec ...
  12. [12]
    [1707.06347] Proximal Policy Optimization Algorithms - arXiv
    Jul 20, 2017 · We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment.Missing: Five | Show results with:Five<|control11|><|separator|>
  13. [13]
    [AN #82]: How OpenAI Five distributed their training computation
    Jan 15, 2020 · - The Rollout Worker CPUs simulate the game, send observations to the Forward Pass GPUs and publish samples to the Experience Buffer. - The ...Technical Ai Alignment · Agent Foundations · Learning Human IntentMissing: hardware | Show results with:hardware
  14. [14]
    [PDF] Long-Term Planning and Situational Awareness in OpenAI Five - arXiv
    Dec 13, 2019 · AlphaStar and OpenAI Five have shown that agents can be trained without any explicit hierarchical macro-actions to reach superhuman skill in ...Missing: grammar | Show results with:grammar
  15. [15]
    Open AI bots fall to human Dota 2 players at The International 2018
    Aug 25, 2018 · The bots put up a good fight, and though some predicted they'd win after a series earlier this month, they were ultimately beaten 2-0 in a best-of-three series.Missing: outcomes | Show results with:outcomes
  16. [16]
    AI triumphs against the world's top pro team in strategy game Dota 2
    Apr 13, 2019 · OpenAI wants to tell us that AI is our ally, not our enemy. Competitive games are a great environment to show off what AI can achieve. But ...
  17. [17]
    Humans grab victory in first of three Dota 2 matches against OpenAI
    Aug 23, 2018 · The bot team, dubbed OpenAI Five, played well but made some key mistakes, including an unusual fixation with a neutral character in the game ...<|separator|>
  18. [18]
    Humans are still better than AI at Dota 2—for now
    Aug 23, 2018 · During last night's match, the OpenAI Five exhibited some weird behavior before suffering a narrow defeat. Overall, it acquitted itself well ...
  19. [19]
    AI bots trained for 180 years a day to beat humans at Dota 2
    Jun 25, 2018 · Although OpenAI's bots are now playing 5v5 matches, they're still not exposed to the full complexity of Dota 2. A number of limitations are in ...
  20. [20]
    OpenAI retires its Dota-2 playing bots after crushing e-sport pros one ...
    Apr 15, 2019 · OpenAI's video game playing bots OpenAI Five thrashed team OG, the reigning human champions of Dota 2, on Saturday in matches that were live-streamed from San ...
  21. [21]
    OpenAI's Dota 2 AI steamrolls world champion e-sports team with ...
    Apr 13, 2019 · ... prize last year when it took the No. 1 spot at The International, the premiere annual Dota 2 tournament with prizes now totaling $25 million.
  22. [22]
    Can Bots Outwit Humans in One of the Biggest Esports Games?
    Jun 25, 2018 · Each day, OpenAI Five played the equivalent of 180 years of Dota 2. No human has 180 years to learn a videogame. Indeed, some AI researchers say ...Missing: internal | Show results with:internal
  23. [23]
    A team of AI algorithms just crushed humans in a complex computer ...
    Jun 25, 2018 · Five different AI algorithms have teamed up to kick human butt in Dota 2, a popular strategy computer game.<|control11|><|separator|>
  24. [24]
    OpenAI Five Benchmark: Results
    Aug 6, 2018 · After the game 2 draft, OpenAI Five predicted a 76.2% win probability, and won the second in 24 minutes and 53 seconds. Game 3: audience draft.
  25. [25]
    Bots decimate Humans in the OpenAI exhibition match | GosuGamers
    The OpenAI Five team made a fantastic display of their intimidating strength in the show-match today against some of the best players in the world, ...Missing: awe superhuman<|separator|>
  26. [26]
    OpenAI's Dota 2 defeat is still a win for artificial intelligence
    Aug 28, 2018 · AI bots made by the Elon Musk-founded research lab OpenAI were defeated by human pro gamers at Dota 2 at The International.
  27. [27]
    [PDF] The Surprising Effectiveness of PPO in Cooperative, Multi-Agent ...
    Nov 4, 2022 · We conduct a comprehensive empirical study to examine the performance of PPO on four popular cooperative multi-agent benchmarks: the multi-agent ...
  28. [28]
    OpenAI Baselines: high-quality implementations of ... - GitHub
    OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms. These algorithms will make it easier for the research community ...Issues · Pull requests 89 · Security · ActivityMissing: Five | Show results with:Five
  29. [29]
    Proximal Policy Optimization - OpenAI
    Jul 20, 2017 · PPO lets us train AI policies in challenging environments, like the Roboschool one shown above where an agent tries to reach a target (the pink ...Missing: details | Show results with:details
  30. [30]
    Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications
    Real-world experiments with five ground robots show the efficacy of the proposed method. ... Coordination strategies for multi-robot exploration and mapping.
  31. [31]
    [PDF] arXiv:1911.06294v1 [eess.SY] 14 Nov 2019
    Nov 14, 2019 · To maintain effective traffic movement, signal re-timing is con- ducted every three to five years for each intersection. Signal re- timing ...
  32. [32]
    [PDF] Dota 2 with Large Scale Deep Reinforcement Learning
    Dota 2 with Large Scale Deep Reinforcement Learning. @article ... PPO and self-play with multiple competing agents, employing only a simple reward of ...Missing: details | Show results with:details
  33. [33]
    [PDF] A Comprehensive Review of Multi-Agent Reinforcement Learning in ...
    Sep 4, 2025 · AI research has traditionally focused on turn-based games like Backgammon and Go, where agents have unlimited time to compute optimal strategies ...
  34. [34]
    Embodied AI: How the US Can Beat China to the Next Tech Frontier
    Jul 15, 2025 · US policymakers face a variety of challenges along the embodied AI value chain, particularly amid great power competition against a motivated adversary.
  35. [35]
    Proximal Policy Optimization (PPO): The Key to LLM Alignment
    Oct 23, 2023 · As we will see, PPO works well and is incredibly easy to understand and use, making it a desirable algorithm from a practical perspective. For ...Missing: Five | Show results with:Five
  36. [36]
    Aligning language models to follow instructions - OpenAI
    Jan 27, 2022 · We've trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic.
  37. [37]
    OpenAI Five Benchmark Results
    Official OpenAI blog post detailing the August 5, 2018, benchmark event where OpenAI Five defeated a team of high-percentile players in streamed matches.
  38. [38]
    How Long Are Dota 2 Games – Match Length Guide
    Article discussing average Dota 2 match durations, confirming typical lengths of 30-40 minutes for standard matches without a fixed upper limit.
  39. [39]
    New record set for longest Dota 2 matchmaking game
    Report on the longest recorded Dota 2 match exceeding 21 hours, illustrating the absence of a time limit.
  40. [40]
    OpenAI Five Defeats Dota 2 World Champions
    Official OpenAI blog post from April 2019 stating that the current version of OpenAI Five has been training continuously since June 2018.
  41. [41]
    Dota 2 Paper: OpenAI Five
    Official technical paper from OpenAI detailing the architecture, training process, and specifics like scripted item buying for OpenAI Five.
  42. [42]
    OpenAI Five Defeats World Champions
    Official OpenAI blog post announcing the victory over OG, including details on the restricted hero pool of 17 heroes used in the matches and brief training with up to 25 heroes.
  43. [43]
    OpenAI Five Benchmark
    Official OpenAI page detailing the benchmark matches, including adjustments to reaction time for fairness in live competitions.