0

点赞

收藏

分享

8_Self-Attention（自注意力机制）

产品喵dandan米娜 2022-01-31 阅读 57

标签: 深度学习自然语言处理 lstm

文章目录

一、Self-Attention
二、SimpleRNN + Self-Attention
三、Summary（总结）

一、Self-Attention

Self-Attention，把Attention用在一个RNN网络上
Attention可以用在所有的RNN上
Self-Attention [2]: attention [1] beyond Seq2Seq models.
The original self-attention paper uses LSTM .（self-attention的原始论文，把attention用在LSTM上）
To make teaching easy, I replace LSTM by SimpleRNN.（我把LSTM换成SimpleRNN）

Original paper:

Bahdanau, Cho, & Bengio. Neural machine translation by jointly learning to align and translate. in ICLR, 2015.
Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.

二、SimpleRNN + Self-Attention

初始时，C₀ 和状态向量h₀ 都是全零向量。
RNN读入第一个输入X₁ ，需要更新状态h，把X₁ 的信息压缩到新的状态h中，计算h₁
下一步，计算C₁ ，是已有状态的加权平均。

想要计算C_i ，需要计算权重α_i ，计算第二个Weights：α_i = align(h_i ，h₂).
对已有的状态h₁，和h₂做加权平均来计算C，由于h₀为全零向量，以后忽略h₀
之后不断重复这个过程。

三、Summary（总结）

With self-attention, RNN is less likely to forget.（self-attention不局限于Seq2Seq模型，self-attention可以用在所有的RNN上）
Pay attention to the context relevant to the new input.（除了避免遗忘，self-attention能帮助RNN关注相关的信息）

0 条评论

产品喵dandan米娜

关注