Dive Into Deep Learning

2. 预备知识

（刚刚开始学习深度学习，争取把节课的练习都记录下来，菜鸡一个，如果哪个地方有错误或是没有理解到位烦请各位大佬指教）

2. 预备知识

2.1 数据操作

第一题

运行本节中的代码。将本节中的条件语句X == Y更改为X < Y或X > Y，然后看看你可以得到什么样的张量

X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
X < Y, X > Y

运行结果

	(tensor([[ True, False,  True, False],
         [False, False, False, False],
         [False, False, False, False]]),
    tensor([[False, False, False, False],
         [ True,  True,  True,  True],
         [ True,  True,  True,  True]]))

第二题

用其他形状（例如三维张量）替换广播机制中按元素操作的两个张量。结果是否与预期相同？

若为 (2x1x3) + (1x3x2) 则报错

a = torch.tensor([[[1,2,3]],[[5,3,5]]])  # 2x1x3
b = torch.tensor([[[1,2],[3,5],[6,7]]])  # 1x3x2
a + b

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [103], in <module>
      1 a = torch.tensor([[[1,2,3]],[[5,3,5]]])
      2 b = torch.tensor([[[1,2],[3,5],[6,7]]])
----> 3 a + b

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 2

若为(1x2x3) + (2x1x3)

a = torch.tensor([[[1, 2, 3], [4, 5, 6]]])  # 1x2x3
b = torch.tensor([[[7, 8, 9]], [[4, 5, 6]]])  # 2x1x3
a + b

结果为(2x2x3)

tensor([[[ 8, 10, 12],
         [11, 13, 15]],

        [[ 5,  7,  9],
         [ 8, 10, 12]]])

关于广播机制，参考pytorch官方给出的解释：
1.每个张量至少有一个维度。
2.在迭代维度大小时，从尾随维度开始，维度大小必须相等，其中之一为 1，或者其中之一不存在。
站内找到了<狗狗狗大王>的一篇文章解释的非常详细：

2.2 数据预处理

第一题

创建包含更多行和列的原始数据集。(1) 删除缺失值最多的列。(2) 将预处理后的数据集转换为张量格式。

import os
import pandas as pd
import torch

os.makedirs(os.path.join('..', '2.1', 'data2'), exist_ok=True)
data_file = os.path.join('..', '2.1', 'data2', 'house_tiny.csv')
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Floor,Price\n')
    f.write('NA,Pave,2,127500\n')
    f.write('2,NA,1,106000\n')
    f.write('4,NA,NA,178100\n')
    f.write('NA,NA,2,140000\n')
    f.write('NA,NA,2,152000\n')
data = pd.read_csv(data_file)
inputs, outputs = data.iloc[:, 0:3], data.iloc[:, 3]

num = inputs.isnull().sum() # 获取缺失值最多的个数
Max_NaN = inputs.isnull().sum().idxmax()  # 获取缺失值最多个数的索引
inputs = inputs.drop(Max_NaN, axis=1) # 在inputs里删除缺失值最多的项
inputs = inputs.fillna(inputs.mean()) # 用同一列的均值替换该列的缺失项
inputs = pd.get_dummies(inputs, dummy_na=True)

x, y = torch.tensor(inputs.values), torch.tensor(outputs.values)  # 转化为张量形式

运行结果

(tensor([[3.0000, 2.0000],
         [2.0000, 1.0000],
         [4.0000, 1.7500],
         [3.0000, 2.0000],
         [3.0000, 2.0000]], dtype=torch.float64),
 tensor([127500, 106000, 178100, 140000, 152000]))

2.3 线性代数

第一题

证明一个矩阵 A 的转置的转置是 A ，即 (A^T)^T=A

A = torch.randn(4, 3)
A == A.T.T

运行结果

tensor([[True, True, True],
        [True, True, True],
        [True, True, True],
        [True, True, True]])

第二题

给出两个矩阵 A 和 B ，证明“它们转置的和”等于“它们和的转置”，即 A^T+B^T=(A+B)^T

A = torch.arange(12).reshape(3,4)
B = torch.randn(3,4)
A.T + B.T == (A + B).T

运行结果

tensor([[True, True, True],
        [True, True, True],
        [True, True, True],
        [True, True, True]])

第三题

给定任意方阵 A ， A+A^T总是对称的吗?为什么?

A = torch.randn(4, 4)
(A + A.T).T == (A + A.T)

运行结果

tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

第四题

我们在本节中定义了形状 (2,3,4) 的张量X。len(X)的输出结果是什么？

X = torch.arange(24).reshape(2, 3, 4)
len(X)

运行结果

第五题

对于任意形状的张量X,len(X)是否总是对应于X特定轴的长度?这个轴是什么?

X = torch.arange(24).reshape(2, 3, 4)  # 2x3x4张量
Y = torch.arange(24).reshape(4, 6)  # 4X6张量
Z = torch.ones(1)   # 1维张量
len(X), len(Y), len(Z)

运行结果

2  4  6

第六题

运行A / A.sum(axis=1)，看看会发生什么。你能分析原因吗？

若A为方阵

A = torch.arange(16).reshape(4, 4)  # 4X4
A / A.sum(axis=1)

运行结果

tensor([[0.0000, 0.0455, 0.0526, 0.0556],
        [0.6667, 0.2273, 0.1579, 0.1296],
        [1.3333, 0.4091, 0.2632, 0.2037],
        [2.0000, 0.5909, 0.3684, 0.2778]])

若A不为方阵

A = torch.arange(12).reshape(3, 4)  # 3X4
B = torch.arange(12).reshape(4, 3)  # 4x3
A / A.sum(axis=1)  # 或 B / B.sum(axis=1)

运行结果

RuntimeError                              Traceback (most recent call last)
Input In [78], in <module>
----> 1 A / A.sum(axis=1)

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

第七题

考虑一个具有形状 (2,3,4) 的张量，在轴0、1、2上的求和输出是什么形状?

A = torch.arange(24).reshape(2,3,4)  # 2x3x4
A.sum(axis=0).shape, A.sum(axis=1).shape, A.sum(axis=2).shape

运行结果

torch.Size([3, 4])
torch.Size([2, 4])
torch.Size([2, 3])

第八题

为linalg.norm函数提供3个或更多轴的张量，并观察其输出。对于任意形状的张量这个函数计算得到什么?

A, B = torch.randn(2,3,4), torch.randn(3, 4)
outputs1 = torch.linalg.norm(A)  # 2x3x4张量
outputs2 = torch.linalg.norm(B)  # 3x4张量
A, B, outputs1, outputs2

运行结果

(tensor([[[ 2.1417, -1.2939, -0.0506,  0.0582],
          [ 0.9437,  0.3785, -0.0736, -0.1000],
          [-0.2323,  1.3399,  0.6603,  0.8154]],
 
         [[-0.1303, -0.4355, -0.2770,  1.8112],
          [ 0.7443, -0.1177,  0.8033,  0.0264],
          [ 0.5158, -0.1448, -0.7694, -0.5072]]]),
 tensor([[-1.1264,  0.0546,  0.4413, -0.1869],
         [-1.7601, -0.4381, -0.2288, -1.7541],
         [-0.1453,  1.0307, -0.8918,  0.7459]]),
 tensor(4.0225),
 tensor(3.2180))

2.4 微积分

第一题

绘制函数 $y=f(x)=x^{3}-\frac{1}{x}$ 和其在 $x = 1$ 处切线的图像。

plot(x, [x ** 3 - 1 / x, 4 * x - 4], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])

运行结果

在这里插入图片描述

第二题

求函数 $f(x)=3x_{1}^{2}+5e^{x_{2}}$ 的梯度。

第三题

函数 $f(x)=\left \| x \right \|_{2}$ 的梯度是什么？

第四题

你可以写出函数 $u = f (x, y, z)$ ，其中 $x = x (a, b)$ ， $y = y (a, b)$ ， $z = z (a, b)$ 的链式法则吗?

2.5 自动微分

在这里插入图片描述

第一题

为什么计算二阶导数比一阶导数的开销要更大？

第二题

在运行反向传播函数之后，立即再次运行它，看看会发生什么。

import torch

x = torch.arange(4.0, requires_grad=True)
y = 2 * torch.dot(x, x)
y.backward()
y.backward()  # 立即再执行一次反向传播
x.grad

运行结果

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.  # 在试图第二次反向传播时，第一次反向传播的结果已经被释放了

在第一次 .backward时指定retain graph=True

import torch

x = torch.arange(4.0, requires_grad=True)
y = 2 * torch.dot(x, x)
y.backward(retain_graph=True)  # 保留计算图不被释放
y.backward()
x.grad

运行结果

tensor([ 0.,  8., 16., 24.])

第三题

在控制流的例子中，我们计算d关于a的导数，如果我们将变量a更改为随机向量或矩阵，会发生什么？

import torch

def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

a = torch.randn(size=(2,2), requires_grad=True)  # a为2x2
d = f(a)
d.backward()

运行结果

RuntimeError: grad can be implicitly created only for scalar outputs  # 不对向量或矩阵进行反向传播

第四题

重新设计一个求控制流梯度的例子，运行并分析结果。

import torch

def f(a):
    b = a / 2
    while b > 1:
        b = pow(a, 2)
    if b < 3:
        c = b * 2
    else:
        c = b * 3
    return c

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()
a.grad == d / a

运行结果

tensor(True)

第五题

使 $f (x) = s i n (x)$ ，绘制 $f (x)$ 和 $\frac{df(x)}{dx}$ 的图像，其中后者不使用 $f^{'}(x)=cos(x)$

错误代码

import torch
import matplotlib.pyplot as plt
import numpy as np
x = torch.arange(-3*np.pi, 3*np.pi, 0.1,requires_grad=True)
y = torch.sin(x)

y.sum().backward()

plt.plot(x, y, label='y=sin(x)') 
plt.plot(x, x.grad, label='dsin(x)=cos(x)') 
plt.legend(loc='upper center')
plt.show()

报错：

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

错误代码

y.backward()

报错：

grad can be implicitly created only for scalar outputs

import torch
import matplotlib.pyplot as plt
import numpy as np
x = torch.arange(-3*np.pi, 3*np.pi, 0.1,requires_grad=True)
y = torch.sin(x)

y.sum().backward()

plt.plot(x.detach(), y.detach(), label='y=sin(x)') 
plt.plot(x.detach(), x.grad, label='dsin(x)=cos(x)') 
plt.legend(loc='upper center')
plt.show()