temporal credit assignment in reinforcement learning 【强化学习经典论文】-CFANZ编程社区

Sutton 出版论文的主页：

http://incompleteideas.net/publications.html

Phd 论文： temporal credit assignment in reinforcement learning

http://incompleteideas.net/publications.html#PhDthesis

temporal credit assignment in reinforcement learning 【强化学习经典论文】_ide

最近在做强化学习方面的课题，发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害， TD算法和策略梯度策略算法都是他所提出的，虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的，但是强化学习做的最早的人之一，对强化学习中经典思想的贡献最多的人估计就是Sutton了，Sutton本硕都是在MIT读的心理学，博士阶段才读的计算机，看来确实是很强的。作为强化学习最经典的论文，也是Sutton的博士毕业论文，很是值得读一读的，寻找该篇论文许久，发现可能是由于该篇论文发表的时间过久，所以所有的数据库都没有收录，唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学，但是由于该文章只向本校学生开发，所以找了几天都没有找到，今天灵机一动，为什么不到作者的个人主页上找一找呢，这一弄还果然发现了它的存在，特此mark一下。

temporal credit assignment in reinforcement learning 【强化学习经典论文】_html_02 -----------------------------------------------------------------------------------------------------------

附：（Sutton主页 Publication部分内容）

Rich Sutton's Publications

First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:

The 2nd edition of Reinforcement Learning: An Introduction
Emphatic TD(λ);Yu's convergence proof
Weighted importance sampling version of LSTD(λ),linear-complexity algorithms
True online TD(λ)
The predictive approach to knowledge representation;PEAK;Horde;nexting
Fast gradient-based TD algorithms,nonlinear case,GQ(lambda),control,Maei's thesis
RL book
Temporal-difference learning;TD(lambda) details
The TD model of Pavlovian conditioning; earlierSutton-Barto model; more biological1982 &1986; andinstrumental learning
Dyna; as anintegrated architecture; with FA1996,2008
Theoptions paper;UAV example;precursor not superseded;
Policy gradient methods;Incremental Natural Actor-Critic Algorithms
PhD thesis, introduced actor-critic architectures and "temporal credit assignment"
PSRs; the predictive representationshypothesis;TD networks; withoptions
RL forRoboCup soccer keepaway
RL withcontinuous state and action spaces
Step-size adaptation by meta-gradient descent;IDBD;improved;earliest pub; in classicalconditioning; in humancategory learning,in tracking
Random representations;representation search;feature discovery;more
Pole-balancing; trackingnonstationarity
Exponentiated-gradient RL; fullerTR
A study inalpha and lambda
Twoproblems with backprop

Also, some RL pubs that aren't mine, available for researchers:

Chris Watkins's thesis
Boyan's LSTD(lambda), 1999
Barto and Bradtke LSTD, 1996
Williams, 1992
Lin, 1992
Ross, 1983, chapter 2
Minsky, 1960, Steps to AI
Good, 1965, Speculations concerning the first ultraintelligent machine
Selfridge, 1958, Pandemonium
Samuel, 1959
Dayan, 1992
Tesauro, 1992, TD-Gammon
Watkins and Dayan, 1992
Hamid Maei's PhD thesis, 2011
Masoud Shahamiri's MSc thesis, 2008
Janey Yu's proof of convergence of Emphatic TD(λ)
Adam White's PhD thesis
David Silver's PhD thesis
Brian Tanner's MSc thesis
Kavosh Asadi's MSc thesis
Travis Dick's MSc thesis
Eddie Rafols MSc thesis
Anna Koop's MSc thesis
Leah Hackman's MSc thesis
Mike Delp's MSc thesis
MahdiehSadat Mirian HosseinAbadi's MSc thesis
Gurvitz, Lin, and Hanson, 1995
Rupam Mahmood's PhD thesis, 2017
An, Miller, and Parks (1991)
Intro to Andreae (2017) andAndreae (2017)