0
点赞
收藏
分享

微信扫一扫

temporal credit assignment in reinforcement learning 【强化学习 经典论文】

Sutton 出版论文的主页:

​​http://incompleteideas.net/publications.html​​

Phd  论文:   temporal credit assignment in reinforcement learning  

​​http://incompleteideas.net/publications.html#PhDthesis ​​

temporal credit assignment in reinforcement learning   【强化学习  经典论文】_ide


 最近在做强化学习方面的课题, 发现在强化学习方面被称作强化学习之父的  Sutton  确实很厉害, TD算法和策略梯度策略算法都是他所提出的, 虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的,但是强化学习做的最早的人之一,对强化学习中经典思想的贡献最多的人估计就是Sutton了,Sutton本硕都是在MIT读的心理学,博士阶段才读的计算机,看来确实是很强的。作为强化学习最经典的论文,也是Sutton的博士毕业论文,很是值得读一读的,寻找该篇论文许久,发现可能是由于该篇论文发表的时间过久,所以所有的数据库都没有收录,唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学,但是由于该文章只向本校学生开发,所以找了几天都没有找到,今天灵机一动,为什么不到作者的个人主页上找一找呢,这一弄还果然发现了它的存在,特此mark一下。


temporal credit assignment in reinforcement learning   【强化学习  经典论文】_html_02 -----------------------------------------------------------------------------------------------------------

附:(Sutton主页  Publication部分内容)


Rich Sutton's Publications

First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:

  • ​​The 2nd edition of Reinforcement Learning: An Introduction​​
  • ​​Emphatic TD​​​​(​​​​λ)​​​;​​Yu's convergence proof​​
  • ​​Weighted importance sampling version of LSTD​​​​(λ)​​​,​​linear-complexity algorithms​​
  • ​​True online TD(λ)​​
  • ​​The predictive approach to knowledge representation​​​;​​PEAK​​​;​​Horde​​​;​​nexting​​
  • ​​Fast gradient-based TD algorithms​​​,​​nonlinear case​​​,​​GQ(lambda)​​​,​​control​​​,​​Maei's thesis​​
  • ​​RL book​​
  • T​​emporal-difference learning​​​;​​TD(lambda) details​​
  • ​​The TD model of Pavlovian conditioning​​​; earlier​​Sutton-Barto model​​​; more biological​​1982​​​ &​​​1986​​​; and​​​instrumental​​ learning
  • ​​Dyna​​​; as an​​integrated architecture​​​; with FA​​​1996​​​,​​2008​​
  • The​​options​​​ paper;​​UAV​​​ example;​​precursor​​ not superseded;
  • ​​Policy gradient methods​​​;​​Incremental Natural Actor-Critic Algorithms​​
  • ​​PhD thesis​​, introduced actor-critic architectures and "temporal credit assignment"
  • ​​PSR​​​s; the predictive representations​​​hypothesis​​​;​​TD networks​​​; with​​​options​​
  • RL for​​RoboCup soccer keepaway​​
  • RL with​​continuous​​ state and action spaces
  • ​​Step-size adaptation​​​ by meta-gradient descent;​​IDBD​​​;​​improved​​​;​​earliest​​​ pub; in classical​​conditioning​​​; in human​​category learning​​​,​​in tracking​​
  • ​​Random representations​​​;​​representation search​​​;​​feature discovery​​​;​​more​​
  • ​​Pole-balancing​​​; tracking​​​nonstationarity​​
  • ​​Exponentiated-gradient​​​ RL; fuller​​TR​​
  • A study in​​alpha and lambda​​
  • Two​​problems with backprop​​

Also, some RL pubs that aren't mine, available for researchers:

  • ​​Chris Watkins's thesis​​
  • ​​Boyan's LSTD(lambda), 1999​​
  • ​​Barto and Bradtke LSTD, 1996​​
  • ​​Williams, 1992​​
  • ​​Lin, 1992​​
  • ​​Ross, 1983, chapter 2​​
  • ​​Minsky, 1960, Steps to AI​​
  • ​​Good, 1965, Speculations concerning the first ultraintelligent machine​​
  • ​​Selfridge, 1958, Pandemonium​​
  • ​​Samuel, 1959​​
  • ​​Dayan, 1992​​
  • ​​Tesauro, 1992, TD-Gammon​​
  • ​​Watkins and Dayan, 1992​​
  • ​​Hamid Maei's PhD thesis, 2011​​
  • ​​Masoud Shahamiri's MSc thesis, 2008​​
  • ​​Janey Yu's proof of convergence of Emphatic TD(λ)​​
  • ​​Adam White's PhD thesis​​
  • ​​David Silver's PhD thesis​​
  • ​​Brian Tanner's MSc thesis​​
  • ​​Kavosh Asadi's MSc thesis​​
  • ​​Travis Dick's MSc thesis​​
  • ​​Eddie Rafols MSc thesis​​
  • ​​Anna Koop's MSc thesis​​
  • ​​Leah Hackman's MSc thesis​​
  • ​​Mike Delp's MSc thesis​​
  • ​​MahdiehSadat Mirian HosseinAbadi's MSc thesis​​
  • ​​Gurvitz, Lin, and Hanson, 1995​​
  • ​​Rupam Mahmood's PhD thesis, 2017​​
  • ​​An, Miller, and Parks (1991)​​
  • ​​Intro to Andreae (2017)​​​ and​​​Andreae (2017)​​
举报

相关推荐

0 条评论