Java面试八股之什么是Redis的缓存降级-CFANZ编程社区

1. 准备分类数据

这里我们使用的是Scikit-Learn里面的make_circles()方法，生成两个不同颜色的圆圈

# 制作数据
from sklearn.datasets import make_circles

# 创建1000个样本
n_samples = 1000

# 创建我们的圆圈样本
X, y = make_circles(n_samples,
                    noise=0.03, # 每个点的噪声
                    random_state=42) # 保证我们获得相同的值

在这里插入图片描述
报错了，没有sklearn模块

pip install scikit-learn

ok，咱们安装一下应该就没有问题啦！
现在让我们来看看X和y的前五个值

print(f"First 5 X features:\n{X[:5]}")
print(f"first 5 y features:\n{y[:5]}")

这里我们可以看出，它是两个X对应一个y值
接下来让我们可视化一下，这样有助于我们更好的理解数据

# 将我们的圆圈数据转换为 DataFrame 格式的
import pandas as pd
circles = pd.DataFrame({"X1":X[:,0],
                        "X2":X[:,1],
                        "label":y
                    })
circles.head(10)

在这里插入图片描述
报错：没有pandas模块
没事哒~没事哒，我们安装一下就可以啦！

pip install pandas

接着再运行代码就出现下面这个结果啦。
在这里插入图片描述
从上面展示的数据，咱们可以看出它是一个二分类问题，因为y值只有0或1的选项，所以让我们接着看一下每一个类别有多少数据呢

circles.label.value_counts()

绘制数据circle的图像

import matplotlib.pyplot as plt
plt.scatter(x=X[:,0],
            y=X[:,1],
            c=y,
            cmap=plt.cm.RdYlBu)

在这里插入图片描述
接下来看我们怎样创建一个模型，它能够很好的将点分为 0(red)， 1(blue)

1.1 输入和输出的形状 shape

# 首先，查看我们输入和输出数据的 shape
X.shape, y.shape

# 让我们看一下第一个数据
X_sample = X[0]
y_sample = y[0]
print(f"第一个数据的X值:{X_sample},第一个数据的y值:{y_sample}")
print(f"第一个数据的shape值: X_sample shape{X_sample.shape}, y_sample shape{y_sample.shape}")

这个结果告诉我们一个X特征是由两个值组成的向量，而y他就是一个值的标量，即我们有两个输入一个输出

1.2 将数据转换为张量，同时将我们的数据集转换为训练集和测试集

# 将数据转换为张量,并将数据转换为默认数据格式
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

# 查看一下前五个样本
X[:5],y[:5]

这里我们划分训练集和测试集不是使用原来那个切分了，使用的是Scikit-Learn中的train_test_split()方法

# 划分数据为训练集和测试集
from sklearn.model_selection import train_test_split

# test_size=0.2 是说测试数据占数据的20%，因为这个方法是随机划分的，因此我们这里设置了random_state=42，这样就有助于我们复现代码
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y, 
                                                    test_size=0.2,
                                                    random_state=42)
len(X_train), len(y_train), len(X_test), len(y_test)

这个数据就对啦，之前我们不是1000条数据嘛，训练集占80%，测试集占20%，这个数据划分非常之合理和正确。

2 创建模型

准备好数据，我们就可以创建我们的模型啦！

准备好了吗？Let’s start right now!

import torch.nn as nn

device = "cuda" if torch.cuda.is_available() else "cpu"
device

方法一：自定义+forward()

# 创建模型类，并且是 nn.Module 的子类
class CircleClassificationV0(nn.Module):
    def __init__(self):
        super().__init__()
        # 创建两个线性层
        self.liear1 = nn.Linear(in_features=2, out_features=5)
        self.liear2 = nn.Linear(in_features=5, out_features=1)
        
    def forward(self, x):
        return self.liear2(self.liear1(x))

这里有一个点就是要搞清楚这个输入和输出特征的值，即 in_features 和 out_features，之前我们讨论过输入X是2，输出y是1，所以不难理解linear1的in_features=2, 和linear2的out_features=1，注意linear1的out_features和linear2的in_features一定是一样的，因为linear1的out_features是linear2的输入，这个值是自己定的，这里选择5是因为方便我们观察数据，并了解里面的原理。

# 实例化模型，并把它送到目标设备上
model_0 = CircleClassificationV0().to(device)
model_0

这里还有一个方法，nn.Sequential()构建模型，看起来代码更加简洁

方法二：nn.Sequential()

# 使用 nn.Sequential() 构建模型
model_0 = nn.Sequential(
    nn.Linear(in_features=1, out_features=5),
    nn.Linear(in_features=5, out_features=1)
).to(device)
model_0

这段代码会重写上面的代码，同时注意这里nn.Sequential()里面的方法都是顺序执行的，并且我们不需要写forward()方法,下面还有一种两个方法结合的，可以看看。

方法三：自定义+forward()+nn.Sequential()

class CircleClassificationV1(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.linear = nn.Sequential(
            nn.Linear(in_features=2, out_features=5),
            nn.Linear(in_features=5, out_features=1)
        )
        
    def forward(self, x):
        return self.linear(x)
    
model_0 = CircleClassificationV1().to(device)
model_0

这种结合两种方法的都可以使用，按照需求自己选择吧，这里我们还是采用第一种哈，再运行一遍之前的代码就可以啦，这里就不复制粘贴啦。

ok，现在我们有一个模型了，让我们看看当我们传递数据给它会发生什么

# 使用模型进行预测
untrained_preds = model_0(X_test.to(device))
print(f"Length of predictions:{len(untrained_preds)},shape:{untrained_preds.shape}")
print(f"Length of test samples:{len(y_test)}, shape:{y_test.shape}")
print(f"\nFirst 10 predictions:\n{untrained_preds[:10]}")
print(f"\nFirst 10 test labels:\n{y[:10]}")