Fact-checked by Grok 2 weeks ago
References
-
[1]
[1811.12560] An Introduction to Deep Reinforcement Learning - arXivNov 30, 2018 · Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve ...Missing: definition | Show results with:definition
-
[2]
Human-level control through deep reinforcement learning - NatureFeb 25, 2015 · To achieve this, we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial ...
-
[3]
[1312.5602] Playing Atari with Deep Reinforcement Learning - arXivDec 19, 2013 · We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.
-
[4]
Mastering the game of Go with deep neural networks and tree searchJan 27, 2016 · Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go ...
-
[5]
Deep Reinforcement Learning - an overview | ScienceDirect TopicsDefinition of topic AI. Deep reinforcement learning (DRL) is defined as a combination of deep learning (DL) and reinforcement learning (RL) principles, aimed ...Core Concepts and Algorithms... · Computational Frameworks...
-
[6]
[1701.07274] Deep Reinforcement Learning: An Overview - arXivJan 25, 2017 · We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve ...
-
[7]
Deep Reinforcement Learning: A Chronological Overview ... - MDPIWe then trace the historical development of deep RL, highlighting key milestones such as the advent of deep Q-networks (DQN).
-
[8]
Learning representations by back-propagating errors - NatureOct 9, 1986 · We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in ...
-
[9]
Reinforcement Learning - MIT PressIn Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms. This second edition has ...
-
[10]
Q-learning | Machine LearningThis paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the ...
-
[11]
On-Line Q-Learning Using Connectionist Systems - ResearchGateUpdates for model-free learning were described using the SARSA TD algorithm (Rummery and Niranjan 1994) . The reward prediction error (δ) was computed as the ...
-
[12]
[1812.02648] Deep Reinforcement Learning and the Deadly TriadDec 6, 2018 · Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three ...
-
[13]
[PDF] Neuro-Dynamic Programming - MITcG 1996 Dimitri P. Bertsekas and John N. Tsitsiklis. All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical ...
- [14]
-
[15]
[PDF] TD-Gammon, A Self-teaching Backgammon Program, Achieves ...TD-Gammon is a neural network that self-teaches backgammon by playing against itself, learning from the results, and achieving master-level play.
-
[16]
11.1 TD-GammonOne of the most impressive applications of reinforcement learning to date is that by Gerry Tesauro to the game of backgammon (Tesauro, 1992, 1994, 1995).
-
[17]
[PDF] Tree-Based Batch Mode Reinforcement LearningThe fitted Q iteration algorithm is a batch mode reinforcement learning algorithm which yields an approximation of the Q-function corresponding to an infinite ...
-
[18]
Neural Fitted Q Iteration – First Experiences with a Data Efficient ...This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron.
-
[19]
Neural Fitted Q Iteration – First Experiences with a Data Efficient ...Aug 7, 2025 · Early neural value estimation methods Riedmiller (2005) incorporated action conditioning by incorporating both state and action as model inputs.
-
[20]
[PDF] Monte Carlo Tree Search in Go - Department of Computing ScienceAbstract Monte Carlo Tree Search (MCTS) was born in Computer Go, i.e. in the application of artificial intelligence to the game of Go.
-
[21]
The Bitter Lesson - Rich SuttonThe biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most ...Missing: implications | Show results with:implications
-
[22]
Deep Reinforcement Learning with Double Q-learning - arXivSep 22, 2015 · We first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games.
-
[23]
A general reinforcement learning algorithm that masters chess ...Dec 7, 2018 · In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games.
-
[24]
[1912.06680] Dota 2 with Large Scale Deep Reinforcement LearningDec 13, 2019 · OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.Missing: 2018 | Show results with:2018
-
[25]
Mastering Atari, Go, chess and shogi by planning with a learned modelDec 23, 2020 · Mastering Atari, Go, chess and shogi by planning with a learned model ... To better understand the nature of MuZero's learning algorithm ...
-
[26]
[2010.02193] Mastering Atari with Discrete World Models - arXivOct 5, 2020 · We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.Missing: deep | Show results with:deep
-
[27]
Decision Transformer: Reinforcement Learning via Sequence ... - arXivJun 2, 2021 · Abstract page for arXiv paper 2106.01345: Decision Transformer: Reinforcement Learning via Sequence Modeling.
-
[28]
Dueling Network Architectures for Deep Reinforcement LearningNov 20, 2015 · In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators.
-
[29]
Rainbow: Combining Improvements in Deep Reinforcement LearningOct 6, 2017 · This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of- ...
-
[30]
[1511.05952] Prioritized Experience Replay - arXivNov 18, 2015 · In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently.
-
[31]
[1707.06887] A Distributional Perspective on Reinforcement LearningJul 21, 2017 · In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning ...
-
[32]
Asynchronous Methods for Deep Reinforcement Learning - arXivFeb 4, 2016 · We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep ...Missing: DQN 2015
-
[33]
[1502.05477] Trust Region Policy Optimization - arXivFeb 19, 2015 · This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks.
-
[34]
[1707.06347] Proximal Policy Optimization Algorithms - arXivProximal Policy Optimization (PPO) is a policy gradient method for reinforcement learning that uses multiple epochs of minibatch updates, and is simpler than ...
-
[35]
[1812.05905] Soft Actor-Critic Algorithms and Applications - arXivDec 13, 2018 · In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework.
-
[36]
Addressing Function Approximation Error in Actor-Critic MethodsWe show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
-
[37]
[PDF] Deterministic Policy Gradient AlgorithmsAbstract. In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol-.
- [38]
- [39]
- [40]
-
[41]
Relative Importance Sampling for off-Policy Actor-Critic in Deep ...Oct 30, 2018 · However, importance sampling has high variance, which is exacerbated in sequential scenarios. We propose a smooth form of importance sampling ...
-
[42]
IMPALA: Scalable Distributed Deep-RL with Importance Weighted ...Feb 5, 2018 · We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V- ...Missing: critic | Show results with:critic
-
[43]
[1707.01495] Hindsight Experience Replay - arXivWe present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary.
-
[44]
Conservative Q-Learning for Offline Reinforcement Learning - arXivJun 8, 2020 · In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function.
-
[45]
Adaptive Behavior Cloning Regularization for Stable Offline-to ...Oct 25, 2022 · We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
-
[46]
[PDF] Playing Atari with Deep Reinforcement Learning - cs.TorontoWe present the first deep learning model to successfully learn control policies di- rectly from high-dimensional sensory input using reinforcement learning.
-
[47]
[PDF] Sample-Efficient Deep Reinforcement Learning via Episodic ...We theoretically prove the convergence of the EBU method and experimentally demonstrate its performance in both deterministic and stochastic environments.
-
[48]
Transfer Learning in Deep Reinforcement Learning: A Survey - PMCIn this survey, we systematically investigate the recent progress of transfer learning approaches in the context of deep reinforcement learning.
-
[49]
A Survey of Zero-shot Generalisation in Deep Reinforcement LearningNov 18, 2021 · The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to ...Missing: evaluation Procgen few-
-
[50]
Meta Reinforcement Learning - Lil'LogJun 23, 2019 · A meta-RL model is trained over a distribution of MDPs, and at test time, it is able to learn to solve a new task quickly.
-
[51]
[PDF] Transfer in Deep Reinforcement Learning Using Successor ...The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Re- cently, ...
-
[52]
Successor Features for Transfer in Reinforcement Learning - arXivJun 16, 2016 · We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same.
-
[53]
[PDF] Dynamics Generalisation in Reinforcement Learning via Adaptive ...This allows the agent to modify its behaviour for each context by having a shared feature-extractor network which is modulated by the context-aware adapter.
-
[54]
FeUdal Networks for Hierarchical Reinforcement Learning - arXivMar 3, 2017 · We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement ...Missing: options Sutton 1999 extensions
-
[55]
[PDF] FeUdal Networks for Hierarchical Reinforcement LearningThe options framework (Sut- ton et al., 1999; Precup, 2000) is a popular formulation for considering the problem with a two level hierarchy. The bottom level – ...Missing: extensions | Show results with:extensions
-
[56]
[PDF] Leveraging Procedural Generation to Benchmark Reinforcement ...We have created Procgen Benchmark to fulfill this need. This benchmark is ideal for evaluating generalization, as distinct training and test sets can be ...Missing: transfer | Show results with:transfer
-
[57]
Compositional Learning of Visually-Grounded Concepts Using ...Sep 8, 2023 · We investigate the compositional abilities of RL agents, using the task of navigating to specified color-shape targets in synthetic 3D environments.
-
[58]
[PDF] Robust Subtask Learning for Compositional GeneralizationCompositional reinforcement learning is a promising approach for training policies to per- form complex long-horizon tasks. Typically, a.
-
[59]
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive ... - arXivJun 7, 2017 · This paper explores deep reinforcement learning for multi-agent domains, adapting actor-critic methods to consider other agents' policies and ...
-
[60]
[PDF] Markov games as a framework for multi-agent reinforcement learningMarkov games (see e.g., [Van Der Wal, 1981]) is an extension of game theory to MDP-like environments. This paper considers the consequences of using the Markov.<|separator|>
-
[61]
Multi-agent Reinforcement Learning in Sequential Social DilemmasFeb 10, 2017 · We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games.
-
[62]
Multi-Agent Reinforcement Learning: A Review of Challenges and ...In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning ...
-
[63]
[PDF] Apprenticeship Learning via Inverse Reinforcement LearningThe problem of deriving a reward function from ob- served behavior is referred to as inverse reinforcement learning (Ng & Russell, 2000). In this paper, we.
-
[64]
[PDF] Maximum Entropy Inverse Reinforcement LearningThe maximum entropy ap- proach provides a principled method of dealing with this uncertainty. We discuss several additional advantages in modeling behavior that ...
-
[65]
A Reduction of Imitation Learning and Structured Prediction to No ...Nov 2, 2010 · In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online ...
-
[66]
[1606.03476] Generative Adversarial Imitation Learning - arXivJun 10, 2016 · Authors:Jonathan Ho, Stefano Ermon. View a PDF of the paper titled Generative Adversarial Imitation Learning, by Jonathan Ho and 1 other authors.
-
[67]
Universal Value Function ApproximatorsIn this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g.
-
[68]
[PDF] Goal-Induced Inverse Reinforcement Learning - UC Berkeley EECSMay 17, 2019 · This work explores using natural language to communicate goal-conditioned rewards, which can be learned via solving the inverse reinforcement ...
-
[69]
Advances and applications in inverse reinforcement learningMar 26, 2025 · This comprehensive review focuses on three key aspects: the diverse methodologies employed in IRL, its wide-ranging applications across fields such as robotics ...
-
[70]
A survey of inverse reinforcement learning: Challenges, methods ...Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an agent, given its policy or observed behavior.
-
[71]
Sim-to-Real: Learning Agile Locomotion For Quadruped RobotsApr 27, 2018 · In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques.
-
[72]
Safety Gym - OpenAINov 21, 2019 · We're releasing Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints ...Exploration Is Risky · Safety Gym · Benchmark
-
[73]
FIERY: Future Instance Prediction in Bird's-Eye View from Surround ...Apr 21, 2021 · We present FIERY: a probabilistic future prediction model in bird's-eye view from monocular cameras. Our model predicts future instance segmentation and motion.
- [74]
-
[75]
The Artificial Intelligence Clinician learns optimal treatment ... - NatureOct 22, 2018 · We developed the AI Clinician, a computational model using reinforcement learning, which is able to dynamically suggest optimal treatments for ...
- [76]
-
[77]
Deep Reinforcement Learning: Policy Gradients for US Equities ...Dec 4, 2023 · This paper presents a novel approach to applying Deep Reinforcement Learning (DRL) within the financial trading domain.
-
[78]
Deep reinforcement learning for portfolio selection - ScienceDirectThis study proposes an advanced model-free deep reinforcement learning (DRL) framework to construct optimal portfolio strategies in dynamic, complex, and large ...