0
点赞
收藏
分享

微信扫一扫

pytorch的reinforce算法 官方文档

宁静的猫 2022-07-27 阅读 77


​​http://pytorch.org/docs/0.3.0/distributions.html​​

probs = policy_network(state)
m = Categorical(probs)
action = m.sample() # 抽样一个action
next_state, reward = env.step(action) # 得到一个reward
loss = -m.log_prob(action) * reward
loss.backward()


举报

相关推荐

0 条评论