temporal credit assignment in reinforcement learning 【强化学习经典论文】-CFANZ编程社区

Sutton 出版论文的主页：

http://incompleteideas.net/publications.html

Phd 论文： temporal credit assignment in reinforcement learning

http://incompleteideas.net/publications.html#PhDthesis

temporal credit assignment in reinforcement learning 【强化学习经典论文】_ide

最近在做强化学习方面的课题，发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害， TD算法和策略梯度策略算法都是他所提出的，虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的，但是强化学习做的最早的人之一，对强化学习中经典思想的贡献最多的人估计就是Sutton了，Sutton本硕都是在MIT读的心理学，博士阶段才读的计算机，看来确实是很强的。作为强化学习最经典的论文，也是Sutton的博士毕业论文，很是值得读一读的，寻找该篇论文许久，发现可能是由于该篇论文发表的时间过久，所以所有的数据库都没有收录，唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学，但是由于该文章只向本校学生开发，所以找了几天都没有找到，今天灵机一动，为什么不到作者的个人主页上找一找呢，这一弄还果然发现了它的存在，特此mark一下。

temporal credit assignment in reinforcement learning 【强化学习经典论文】_html_02 -----------------------------------------------------------------------------------------------------------

附：（Sutton主页 Publication部分内容）

Rich Sutton's Publications

First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:

The 2nd edition of Reinforcement Learning: An Introduction
Emphatic TD(λ);Yu's convergence proof
Weighted importance sampling version of LSTD(λ),linear-complexity algorithms
True online TD(λ)
The predictive approach to knowledge representation;PEAK;Horde;nexting
Fast gradient-based TD algorithms,nonlinear case,GQ(lambda),control,Maei's thesis
RL book
Temporal-difference learning;TD(lambda) details
The TD model of Pavlovian conditioning; earlierSutton-Barto model; more biological1982 &1986; andinstrumental learning
Dyna; as anintegrated architecture; with FA1996,2008
Theoptions paper;UAV example;precursor not superseded;
Policy gradient methods;Incremental Natural Actor-Critic Algorithms
PhD thesis, introduced actor-critic architectures and "temporal credit assignment"
PSRs; the predictive representationshypothesis;TD networks; withoptions
RL forRoboCup soccer keepaway
RL withcontinuous state and action spaces
Step-size adaptation by meta-gradient descent;IDBD;improved;earliest pub; in classicalconditioning; in humancategory learning,in tracking
Random representations;representation search;feature discovery;more
Pole-balancing; trackingnonstationarity
Exponentiated-gradient RL; fullerTR
A study inalpha and lambda
Twoproblems with backprop