0、前言

在看代码之前有必要了解输入输出有哪些，以及他们的特性。
官方教程在：
https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN
在这里插入图片描述
参数：（实例化时候可以传入的参数）

input_size - 输入 x 中预期特征的数量。
hidden_size - 隐藏状态h的特征数。
num_layers - 循环层数。例如，设置 num_layers=2 意味着将两个 RNN 堆叠在一起形成堆叠式 RNN，第二个 RNN 接收第一个 RNN 的输出并计算最终结果。默认值：1
nonlinearity - 使用的非线性。可以是tanh或relu。默认值：tanh
bias - 如果为 False，则该层不使用偏置权重 b_ih 和 b_hh。默认值：True
batch_first - 如果为 True，则输入和输出张量将作为 (batch, seq, feature) 而不是 (seq, batch, feature) 提供。请注意，这不适用于隐藏状态或细胞状态。有关详细信息，请参阅下面的输入/输出部分。默认值：False
dropout - 如果非零，则在除最后一层之外的每个 RNN 层的输出上引入一个 Dropout 层，dropout 概率等于 dropout。默认值：0
bidirectional - 如果为True，则成为双向 RNN。默认值：False

输入：（需要初始化的值）

input：对于非批处理输入，形状为 $L, H_{in})$ 的张量， $L, N, H_{in})$ 当 batch_first=False或 $N, L, H_{in})$ 当 batch_first=True 时包含输入序列的特征。输入也可以是打包的可变长度序列。有关详细信息，请参阅 torch.nn.utils.rnn.pack_padded_sequence() 或 torch.nn.utils.rnn.pack_sequence() 。
h_0：形状张量 $num_layers , H o u t ) (D * \text{num\_layers}, H_{out})$ 对于非批处理输入或 $num_layers , N , H o u t ) (D * \text{num\_layers}, N, H_{out})$ 包含输入序列批次的初始隐藏状态。如果未提供，则默认为零。
其中：
$N=batch\ size\\ L=sequence\ length\\ D=2\ if\ bidirectional=True\ otherwise\ 1\\ H_{in}=input\_size\\ H_{out}=hidden\_size$
输出：
output：形状张量 $L, D * H_{out})$ 对于非批处理输入， $L, N, D * H_{out} )$ 当 batch_first=False 或 $N, L, D * H_{out})$ 当 batch_first=True 对于每个 t，包含来自 RNN 最后一层的输出特征 (h_t)。如果torch.nn.utils.rnn.PackedSequence 已作为输入给出，输出也将是一个打包序列。
h_n：形状张量 $D * num\_layers, H_{out})$ 对于非批处理输入或 $D∗num\_layers,N,Hout)$ 包含批次中每个元素的最终隐藏状态。(其实output的最后一个元素就是h_n)

变量:

weight_ih_l[k] - 第 k 层的可学习输入隐藏权重，形状为 (hidden_size, input_size) for k = 0。否则，形状为 (hidden_size, num_directions * hidden_size)
weight_hh_l[k] - 第 k 层的可学习隐藏-隐藏权重，形状为 (hidden_size, hidden_size)
bias_ih_l[k] - 第 k 层的可学习输入隐藏偏置，形状为 (hidden_size)
bias_hh_l[k] - 第 k 层的可学习隐藏-隐藏偏差，形状为 (hidden_size)

注意：

所有的权重和偏置都从 $\mathcal{U}(-\sqrt{k}, \sqrt{k})$ 初始化，其中 $\frac{1 }{hidden\_size}$
对于双向 RNN，前向和后向分别是方向 0 和 1。 batch_first=False 时分割输出层的示例：output.view(seq_len, batch, num_directions, hidden_size)。
batch_first 参数对于未批处理的输入会被忽略。

读代码实现以加深理解。

代码实现

import torch
import torch.nn as nn
# 单向单层rnn
single_rnn = nn.RNN(4,3,1,batch_first=True)#inputs hiddens numlayer
input = torch.randn(1, 2, 4)# bs sl inputs sl为一句话的符号长度sequence_length
output, h_n = single_rnn(input) # h_0默认为0
output # N L D*hiddens D为是否双向2和1

h_n # D*num_layer bs hiddens(是output的最后一行)

# 双向单层RNN
bidirection_rnn = nn.RNN(4, 3, 1, batch_first=True, bidirectional=True)
bi_output, bi_h_n = bidirection_rnn(input) # h_0默认为0
bi_output # 可见D为2时长度翻倍，前向和后向的一个拼接

bi_h_n

h_n.shape

bi_h_n.shape # 可以看成是几行几列个元素，然后元素里面有几维

output.shape

bi_output.shape

bs, T =2, 3 # 批大小 序列长度
input_size, hidden_size = 2, 3 # 输入特征大小，隐含层特征大小
input = torch.randn(bs, T, input_size) #随机初始化一个输入特征序列
h_prev = torch.zeros(bs, hidden_size) #初始隐含状态
rnn = nn.RNN(input_size, hidden_size, batch_first=True)
rnn_output, state_final = rnn(input, h_prev.unsqueeze(0))
print(rnn_output) # bs sqlen D*h_dim [2.3.3]
print(state_final) # D*numlayer bs h_dim [1,2,3]

# 单向RNN
def rnn_forward(input, weight_ih, bias_ih, weight_hh, bias_hh, h_prev):
  bs, T, input_size = input.shape
  h_dim = weight_ih.shape[0]# 第二个维度为input_size
  h_out = torch.zeros(bs, T, h_dim) #初始化一个输出（状态）矩阵

  for t in range(T):
    x = input[:,t,:].unsqueeze(2)# 获取当前时刻输入特征bs*input_size*1
    w_ih_batch = weight_ih.unsqueeze(0).tile(bs,1,1)# bs*h_dim*input_size
    w_hh_batch = weight_hh.unsqueeze(0).tile(bs,1,1)# bs+h_dim*h_dim

    w_times_x = torch.bmm(w_ih_batch, x).squeeze(-1) # bs* h_dim
    w_times_h = torch.bmm(w_hh_batch, h_prev.unsqueeze(2)).squeeze(-1) # bs*h_dim

    h_prev = torch.tanh(w_times_x + bias_ih + w_times_h + bias_hh)

    h_out[:,t,:] = h_prev
  return h_out, h_prev.unsqueeze(0)
for k,v in rnn.named_parameters():
  print(k, v)

custom_rnn_output, custom_state_final = rnn_forward(input, rnn.weight_ih_l0, rnn.bias_ih_l0, \
                                                    rnn.weight_hh_l0, rnn.bias_hh_l0, h_prev)
print(custom_rnn_output)
print(custom_state_final)

# 手写bidirection双向RNN计算原理
def bidirection_rnn_forward(input, weight_ih, bias_ih, weight_hh, bias_hh, h_prev, \
                            weight_ih_reverse, bias_ih_reverse, weight_hh_reverse, bias_hh_reverse, h_prev_reverse):
  bs, T, input_size = input.shape
  h_dim = weight_ih.shape[0]# 第二个维度为input_size
  h_out = torch.zeros(bs, T, h_dim*2) #初始化一个输出（状态）矩阵,注意双向是两倍的特征大小
  forward_output = rnn_forward(input, weight_ih, bias_ih, weight_hh, bias_hh, h_prev)[0]# forward layer
  backward_output = rnn_forward(torch.flip(input, [1]), weight_ih_reverse, bias_ih_reverse, weight_hh_reverse, bias_hh_reverse, h_prev_reverse)[0]# backward layer

  h_out[:,:,:h_dim] = forward_output
  h_out[:,:,h_dim:] = torch.flip(backward_output,[1])

  h_n=torch.zeros(bs, 2, h_dim)
  h_n[:,0,:] = forward_output[:,-1,:]
  h_n[:,1,:] = backward_output[:,-1,:]
  h_n=h_n.transpose(0, 1)

  return h_out, h_n
  # return h_out, h_out[:,-1,:].reshape((bs, 2, h_dim)).transpose(0, 1)# 最后一行所有元素

# 验证双向正确性
bi_rnn  = nn.RNN(input_size, hidden_size, batch_first=True, bidirectional=True)
h_prev = torch.zeros(2, bs, hidden_size)
bi_rnn_output, bi_state_final = bi_rnn(input, h_prev)
# for k,v in bi_rnn.named_parameters():
#   print(k, v) # 8个参数，正向和反向
print(bi_rnn_output)
print(bi_state_final)

custom_bi_rnn_output, custom_bi_state_final = bidirection_rnn_forward(input, bi_rnn.weight_ih_l0, \
                                                                      bi_rnn.bias_ih_l0, bi_rnn.weight_hh_l0, \
                                                                      bi_rnn.bias_hh_l0, h_prev[0], \
                                                                      bi_rnn.weight_ih_l0_reverse, \
                                                                      bi_rnn.bias_ih_l0_reverse, bi_rnn.weight_hh_l0_reverse, \
                                                                      bi_rnn.bias_hh_l0_reverse, h_prev[1])
print(custom_bi_rnn_output)
print(custom_bi_state_final)