Sutton 出版论文的主页:
http://incompleteideas.net/publications.html
Phd 论文: temporal credit assignment in reinforcement learning
http://incompleteideas.net/publications.html#PhDthesis 

最近在做强化学习方面的课题, 发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害, TD算法和策略梯度策略算法都是他所提出的, 虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的,但是强化学习做的最早的人之一,对强化学习中经典思想的贡献最多的人估计就是Sutton了,Sutton本硕都是在MIT读的心理学,博士阶段才读的计算机,看来确实是很强的。作为强化学习最经典的论文,也是Sutton的博士毕业论文,很是值得读一读的,寻找该篇论文许久,发现可能是由于该篇论文发表的时间过久,所以所有的数据库都没有收录,唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学,但是由于该文章只向本校学生开发,所以找了几天都没有找到,今天灵机一动,为什么不到作者的个人主页上找一找呢,这一弄还果然发现了它的存在,特此mark一下。
 -----------------------------------------------------------------------------------------------------------
附:(Sutton主页 Publication部分内容)
Rich Sutton's Publications
First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:
- The 2nd edition of Reinforcement Learning: An Introduction
 - Emphatic TD(λ);Yu's convergence proof
 - Weighted importance sampling version of LSTD(λ),linear-complexity algorithms
 - True online TD(λ)
 - The predictive approach to knowledge representation;PEAK;Horde;nexting
 - Fast gradient-based TD algorithms,nonlinear case,GQ(lambda),control,Maei's thesis
 - RL book
 - Temporal-difference learning;TD(lambda) details
 - The TD model of Pavlovian conditioning; earlierSutton-Barto model; more biological1982 &1986; andinstrumental learning
 - Dyna; as anintegrated architecture; with FA1996,2008
 - Theoptions paper;UAV example;precursor not superseded;
 - Policy gradient methods;Incremental Natural Actor-Critic Algorithms
 - PhD thesis, introduced actor-critic architectures and "temporal credit assignment"
 - PSRs; the predictive representationshypothesis;TD networks; withoptions
 - RL forRoboCup soccer keepaway
 - RL withcontinuous state and action spaces
 - Step-size adaptation by meta-gradient descent;IDBD;improved;earliest pub; in classicalconditioning; in humancategory learning,in tracking
 - Random representations;representation search;feature discovery;more
 - Pole-balancing; trackingnonstationarity
 - Exponentiated-gradient RL; fullerTR
 - A study inalpha and lambda
 - Twoproblems with backprop
 
Also, some RL pubs that aren't mine, available for researchers:
- Chris Watkins's thesis
 - Boyan's LSTD(lambda), 1999
 - Barto and Bradtke LSTD, 1996
 - Williams, 1992
 - Lin, 1992
 - Ross, 1983, chapter 2
 - Minsky, 1960, Steps to AI
 - Good, 1965, Speculations concerning the first ultraintelligent machine
 - Selfridge, 1958, Pandemonium
 - Samuel, 1959
 - Dayan, 1992
 - Tesauro, 1992, TD-Gammon
 - Watkins and Dayan, 1992
 - Hamid Maei's PhD thesis, 2011
 - Masoud Shahamiri's MSc thesis, 2008
 - Janey Yu's proof of convergence of Emphatic TD(λ)
 - Adam White's PhD thesis
 - David Silver's PhD thesis
 - Brian Tanner's MSc thesis
 - Kavosh Asadi's MSc thesis
 - Travis Dick's MSc thesis
 - Eddie Rafols MSc thesis
 - Anna Koop's MSc thesis
 - Leah Hackman's MSc thesis
 - Mike Delp's MSc thesis
 - MahdiehSadat Mirian HosseinAbadi's MSc thesis
 - Gurvitz, Lin, and Hanson, 1995
 - Rupam Mahmood's PhD thesis, 2017
 - An, Miller, and Parks (1991)
 - Intro to Andreae (2017) andAndreae (2017)
 










