Fact-checked by Grok 2 weeks ago
References
-
[1]
PhD Thesis: Learning from Delayed RewardsThe thesis introduces the notion of reinforcement learning as learning to control a Markov Decision Process by incremental dynamic programming.
-
[2]
Q-learning | Machine LearningThis paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to ...
-
[3]
Q-Learning Agent - MATLAB & Simulink - MathWorksThe Q-learning algorithm is an off-policy reinforcement learning method for environments with a discrete action space. A Q-learning agent trains a Q-value ...
-
[4]
[PDF] Reinforcement Learning: An Introduction - Stanford UniversityWe focus on the simplest aspects of reinforcement learning and on its main distinguishing features. ... on examples of correct behavior, reinforcement learning is ...
-
[5]
[PDF] Chapter 1 Introduction - Rich SuttonThese two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning. Reinforcement ...
-
[6]
[PDF] Technical Note Q,-LearningThis paper presents and proves in detail a convergence theorem for Q,-learning based on that outlined in Watkins. (1989). We show that Q-learning converges to ...
-
[7]
[PDF] Learning from Delayed Rewards - Computer ScienceMay 1, 1989 · Learning from Delayed Rewards. Christopher John Cornish Hellaby Watkins. King's College. Thesis Submitted for Ph.D. May, 1989. Page 2. A.
-
[8]
(PDF) Technical Note: Q-Learning - ResearchGateOct 24, 2025 · This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning ...
-
[9]
On-Line Q-Learning Using Connectionist Systems - ResearchGateUpdates for model-free learning were described using the SARSA TD algorithm (Rummery and Niranjan 1994) . The reward prediction error (δ) was computed as the ...
-
[10]
[PDF] Asynchronous Stochastic Approximation and Q-Learning - MITThe Q-learning algorithm is a method for computing V* based on a reformulation of the Bellman equation V* = T(V*). We provide a brief description of the ...
-
[11]
[PDF] An Investigation Into the Effect of the Learning Rate on ...In the beginning of training, a reasonably high learning rate is important to learn fast, but once a good approximation has been learned, using a low learning ...
-
[12]
Q-LearningThis paper presents and proves in detail a convergence theorem for ~-learning based on that outlined in Watkins. (1989). We show that 0~-learning converges to ...
-
[13]
Average reward reinforcement learning: Foundations, algorithms ...This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks ...
-
[14]
[PDF] Potential-Based Shaping and Q-Value Initialization are EquivalentWith Q-values initialized below their optimal value, an agent may require learning time exponential in the state and action space in order to find a goal state.
-
[15]
[PDF] Q-Learning - Henrique MaiaThis paper has presented the proof outlined by Watkins (1989) that Q-learning converges with probability one under reasonable conditions on the learning rates ...<|control11|><|separator|>
-
[16]
Solving Frozenlake with Tabular Q-LearningThis tutorial trains an agent for FrozenLake using tabular Q-learning. In this post we'll compare a bunch of different map sizes on the FrozenLake environment.
-
[17]
Function Approximation in Reinforcement Learning - GeeksforGeeksJul 23, 2025 · Function approximation is a critical concept in reinforcement learning (RL), enabling algorithms to generalize from limited experience to a broader set of ...
-
[18]
[PDF] Successful Examples Using Sparse Coarse Coding - Rich SuttonReinforcement learning is a broad class of optimal control methods based on estimating value functions from experience, simulation, or search (Barto, Bradtke & ...
-
[19]
[PDF] Q-FUNCTION APPROXIMATION WITH RADIAL BASIS NETWORK ...Following that, we found an RBF approximation of this off-policy method was best found with J = 20 basis functions.
-
[20]
[PDF] Convergence of Q-learning with linear function approximationIn this paper, we describe Q-learning with linear function approximation. This algorithm can be seen as an exten- sion to control problems of temporal- ...
-
[21]
Playing Atari with Deep Reinforcement Learning### Summary: Deep Neural Networks for Q-Function Approximation in Atari Games
-
[22]
[1812.02648] Deep Reinforcement Learning and the Deadly TriadDec 6, 2018 · Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three ...
-
[23]
Breaking the Deadly Triad with a Target NetworkThe deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, ...
-
[24]
[PDF] K-Means Clustering based Reinforcement Learning Algorithm for ...While partitioning the goal of reinforcement learning, we apply a modified K-means clustering algorithm to discrete continuous state and action spaces.
-
[25]
Convergence Analysis of Discretization Procedure in Q-LearningDiscretization of the state and decision spaces is required when Q-Learning is used to solve stochastic optimal control problems with the state and decision ...Missing: techniques | Show results with:techniques
-
[26]
Balancing a CartPole System with Reinforcement Learning - arXivJun 8, 2020 · In this paper, we provide the details of implementing various reinforcement learning (RL) algorithms for controlling a Cart-Pole system.Missing: discretization bins<|control11|><|separator|>
-
[27]
Learning to predict by the methods of temporal differencesFeb 4, 1988 · This article introduces a class of incremental learning procedures specialized for prediction-that is, for using past experience with an incompletely known ...
-
[28]
On-line Q-learning using connectionist systems - Semantic ScholarOn-line Q-learning using connectionist systems · Gavin Adrian Rummery, M. Niranjan · Published 1994 · Computer Science.Missing: key milestones
-
[29]
Q-Learning for Robot Control - ResearchGateQ-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received ...
-
[30]
[PDF] Nash Q-Learning for General-Sum Stochastic GamesIn extending Q-learning to multiagent environments, we adopt the framework of general-sum stochastic games. In a stochastic game, each agent's reward depends ...<|control11|><|separator|>
-
[31]
[PDF] An Analysis Of Temporal-difference Learning With Function ... - MITTSITSIKLIS AND VAN ROY: ANALYSIS OF TEMPORAL-DIFFERENCE LEARNING. 677 ... [4] J. N. Tsitsiklis, “Asynchronous stochastic approximation and Q- learning ...Missing: Q- | Show results with:Q-
-
[32]
Human-level control through deep reinforcement learning - NatureFeb 25, 2015 · Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn ...Main · Methods · Training Algorithm For Deep...
-
[33]
Conservative Q-Learning for Offline Reinforcement Learning - arXivJun 8, 2020 · In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function.
-
[34]
(PDF) QT-TDM: Planning With Transformer Dynamics Model and ...Dec 12, 2024 · Our proposed method, QT-TDM, integrates the robust predictive capabilities of Transformers as dynamics models with the efficacy of a model-free ...
-
[35]
DQN — Stable Baselines3 2.7.1a3 documentation - Read the DocsDeep Q Network (DQN) builds on Fitted Q-Iteration (FQI) and make use of different tricks to stabilize the learning with neural networks.
-
[36]
Rainbow: Combining Improvements in Deep Reinforcement LearningOct 6, 2017 · This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of- ...
-
[37]
[1511.05952] Prioritized Experience Replay - arXivNov 18, 2015 · We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across ...
-
[38]
Dueling Network Architectures for Deep Reinforcement LearningNov 20, 2015 · Access Paper: View a PDF of the paper titled Dueling Network Architectures for Deep Reinforcement Learning, by Ziyu Wang and 5 other authors.
-
[39]
[1706.10295] Noisy Networks for Exploration - arXivJun 30, 2017 · Access Paper: View a PDF of the paper titled Noisy Networks for Exploration, by Meire Fortunato and 11 other authors. View PDF · TeX Source.
-
[40]
Model-based Offline Reinforcement Learning with Lower Expectile ...Jun 30, 2024 · Abstract page for arXiv paper 2407.00699: Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning.
-
[41]
Multi-agent Reinforcement Learning: A Comprehensive Survey - arXivThis survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML)Missing: 2025 | Show results with:2025
-
[42]
Multi-agent reinforcement learning - ACM Digital LibraryMulti-agent reinforcement learning: independent versus cooperative agents. Author: Ming Tan. Ming Tan. View Profile. Authors Info & Claims. ICML'93: Proceedings ...Missing: Q- | Show results with:Q-
-
[43]
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent ...Mar 30, 2018 · Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
-
[44]
Opponent Modeling in Deep Reinforcement LearningOpponent modeling is needed in multi-agent settings. This paper uses neural models to learn opponent behavior, encoding observations into a deep Q-Network.
-
[45]
[PDF] The Effect of Hyperparameters on the Model Convergence Rate of ...This paper studies how hyperparameters like learning rate (alpha) and discount factor (gamma) affect the convergence speed of Q-Learning algorithm.
-
[46]
[PDF] Addressing Environment Non-Stationarity by Repeating Q-learning ...Abstract. Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to op- timal policies in Markov decision processes.
-
[47]
[PDF] Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis... sample complexity of. Q-learning to be on the order of |S||A|. (1−γ)4ε2 (up to log factor). Our theory unveils the strict sub-optimality of Q-learning when ...
-
[48]
[PDF] arXiv:2307.10649v1 [q-fin.CP] 20 Jul 2023Jul 20, 2023 · It is important to note that the curse of dimensionality in Q-learning makes it chal- lenging to handle high-dimensional data. While Hen ...
-
[49]
[PDF] Defining and Characterizing Reward Hacking - arXivMar 5, 2025 · Reward hacking occurs when optimizing a proxy reward function leads to poor performance according to the true reward function, in reinforcement ...