#yyds干货盘点# 序列模型代码实现-CFANZ编程社区

这篇文章是接着这个文章写的:arrow_right:序列模型 #展望我的2022Flag#

碎碎念

早在11月更文挑战的时候就要写这一部分了，但是那个时候因为时间的关系正好赶上期末考试就没有具体的写代码，只是把理论写了一下，正好现在寒假比较闲，然后把代码写一写，补上。

祝大家开工大吉。

代码解析

%matplotlib inline
import torch
from torch import nn
from d2l import torch as d2l

之前是用pycharm写的，现在换jupyter notebook了。所以要加上%matplotlib inline调用matplotlib。

T = 1000  # 总共产生1000个点

# 生成数据集，一个正弦函数加入一些噪音
time = torch.arange(1, T + 1, dtype=d2l.float32)
x = torch.sin(0.01 *  time) + torch.normal(0, 0.2, (T,))

d2l.plot(time, [x], 'time', 'x', xlim=[1, 1000], figsize=(5, 3))

生成一个数据集，这里使用sin(x)并为其加上噪声。

先设置T为1000，即1000个时间步。
再使用torch.arange生成1-1000的浮点数。
- 注意区分torch.arange和torch.range
  - torch.arange(start=1.0,end=3.0)结果是1. 2. ，结果不包括end
  - torch.range(start=1.0, end=3.0)结果是1. 2. 3. ，包括end
  - 建议使用torch0.arange，因为支持范围更广
生成输入数据x。每个x的值是0.01-0.01*1000的正弦值再加上高斯噪声。
d2l.plot是《动手学深度学习》里自己的画图包，不用深究。

原始数据可视化之后是这样的：

也就是说数据集是$y = sin(x)+噪声$

回想一下，在自回归模型的近似法中，我们使用 $x{t-1}, \ldots, x{t-\tau}$ 而不是 $x_{t-1}, \ldots, x_1$ 来估计 $x_t$。只要这种近似是精确的，我们就说序列满足 马尔可夫条件（Markov condition）。特别是，如果 $\tau = 1$，得到一个 一阶马尔可夫模型（first-order Markov model），$P(x)$ 由下式给出：

$$
P(x_1, \ldots, xT) = \prod{t=1}^T P(xt \mid x{t-1}) \text{ where } P(x_1 \mid x_0) = P(x_1)
$$

tau = 4

features = torch.zeros((T - tau, tau))
for i in range(tau):
    features[:, i] = x[i: T - tau + i]
labels = x[tau:].reshape((-1, 1))

这里设置$\tau=4$，也就是默认i的数据只和它的前4项相关。
features就是我们之前的x，labels就是我们的y。

这里是将x放到features中。一组数据$\tau$个元素
labels是我们的输出y，因为我们是需要根据前4个值来预测第五个值。这里我们这个labels就是存储第五个值是。。

空口说可能不太理解，我写一下你们就懂了。

T = 10
x = torch.arange(T)
tau = 4
features = torch.zeros((T - tau, tau))
for i in range(tau):
    features[:, i] = x[i: T - tau + i]
labels = x[tau:].reshape((-1, 1))
print(x)
print(features)
print(x[tau:])

>>
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([[0., 1., 2., 3.],
        [1., 2., 3., 4.],
        [2., 3., 4., 5.],
        [3., 4., 5., 6.],
        [4., 5., 6., 7.],
        [5., 6., 7., 8.]])
tensor([4, 5, 6, 7, 8, 9])

在这生成一个x是0-9的列表，$\tau=4$，labels是从第4位开始的。也就是说它的值取决于它前面的四位数，也就是features中对应的行向量。

batch_size, n_train = 16, 600
# 只有前`n_train`个样本用于训练
train_iter = d2l.load_array((features[:n_train], labels[:n_train]),batch_size, is_train=True)

加载数据集使用的是《动手学深度学习》里自带的包，将数据集加载进来，这是假装前600个数是训练集。

# 初始化网络权重的函数
def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

# 一个简单的多层感知机
def get_net():
    net = nn.Sequential(nn.Linear(4, 10),
                        nn.ReLU(),
                        nn.Linear(10, 1))
    net.apply(init_weights)
    return net

# 平方损失
loss = nn.MSELoss()

初始化权重这里用的是Glorot初始化<sup><a rel="nofollow" href="#ref1">1-2</a></sup>。
- torch.nn.init.xavier_normal_(tensor, gain=1.0)
  - tensor – n维的torch.Tensor
  - gain - 可选的缩放因子
  这个初始化是$N ~ (0,std^2)$ where
  
  其中，$\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}$
训练过程可以使用简单的多层感知机。
损失使用的平方差损失

def train(net, train_iter, loss, epochs, lr):
    trainer = torch.optim.Adam(net.parameters(), lr)
    for epoch in range(epochs):
        for X, y in train_iter:
            trainer.zero_grad()
            l = loss(net(X), y)
            l.backward()
            trainer.step()
        print(f'epoch {epoch + 1}, '
              f'loss: {d2l.evaluate_loss(net, train_iter, loss):f}')

net = get_net()
train(net, train_iter, loss, 10, 0.01)
print(train)

>>
epoch 1, loss: 0.056344
epoch 2, loss: 0.055601
epoch 3, loss: 0.050352
epoch 4, loss: 0.047221
epoch 5, loss: 0.049006
epoch 6, loss: 0.046462
epoch 7, loss: 0.049919
epoch 8, loss: 0.046281
epoch 9, loss: 0.045042
epoch 10, loss: 0.046046