【李宏毅机器学习2021】卷积神经网络HW3-Image Classification（更新ing）-CFANZ编程社区

学习心得

（1）回顾【李宏毅机器学习CP21】(task6)卷积神经网络，CNN强大在于卷积层强大的特征提取能力，当然我们可以利用CNN将特征提取出来后，用全连接层或决策树、支持向量机等各种机器学习算法模型来进行分类。
（2）Pytorch的vision库：https://github.com/pytorch/vision
（3）数据加载的基本原理：使用Dataset封装数据集，然后使用Dataloader实现数据并行加载。

文章目录

学习心得
一、作业目标and要求

1.object和要求
2.数据说明

二、原始代码

1.导入所需要的库
2.定义Dataset, Data Loader, and Transforms

（1）Dataset
（2）Transforms

3.定义模型Model
4.训练Training
5.测试

三、修改代码

1.残差神经网络
2.残差神经网络+dropout

附：Pytorch基础教程
Reference

一、作业目标and要求

1.object和要求

目标：
（1）用CNN进行图片分类
（2）用数据增强（）提高performance
（3）学会利用unlabeled数据

要求：
（1）不能使用额外的数据（禁止使用其他图片数据集或预训练的模型）
（2）不能上网寻找label

作业难度分级：
初级: Build a simple convolutional neural network as the baseline. (2 pts)

中级: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)

高级: Utilize provided unlabeled data to obtain better results. (2 pts)

2.数据说明

通过卷积神经网络（Convolutional Neural Networks, CNN）对食物图片进行分类。

数据集中的食物图采集于网上，总共11类：Bread, Dairy product, Dessert, Egg, Fried food, Meat, Noodles/Pasta, Rice, Seafood, Soup, Vegetable/Fruit. 每一类用一个数字表示。比如：0表示Bread.

● Training set: 280 * 11 labeled images + 6786 unlabeled images
● Validation set: 30 * 11 labeled images
● Testing set: 3347 images

如打开训练集training是下面这样的，其中training数据集的unlabeled部分和testing数据集命名格式为[编号].jpg，而validation数据集和training数据集的labeled文件的图片文命名为[类别]_[编号].jpg：

【李宏毅机器学习2021】卷积神经网络HW3-Image Classification（更新ing）_2d_03

文件目录树为

│  hw3_CNN.ipynb
│  
└─food-11

二、原始代码

1.导入所需要的库

# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.auto import

torchvision实现了常用的图像数据加载功能，例如Imagenet、CIFAR10、MNIST等，以及常用的数据转换操作，极大方便了数据加载。
torch.utils.data.DataLoader实现的是对数据集的处理。

2.定义Dataset, Data Loader, and Transforms

（1）Dataset

在PyTorch中，可以利用torch.utils.data的Dataset和DataLoader来包装data，使后续的training和testing更为方便。

Data需要overload两个函数：__len__和__getitem__：
1）__len__必须要回传dataset的大小
2）__getitem__定义了当程序利用[]取值时，dataset应该要怎么回传资料

实际上我们不会直接使用到这两个函数，但是使用DataLoader在enumerate Dataset时会使用到，没有实做的话代码运行阶段会出现error。

（2）Transforms

Torchvision提供很多工具进行图像处理、数据wrapping和数据增强（augmentation）。因为我们的数据根据对应的标签类别分别存储在文件夹里，我们可以方便地直接使用，torchvision主要包含3方面：

（1）model提供dl中各种经典网络的网络结构及与训练好的模型，包括AlexNet、VGG系列、ResNet系列、Inception系列等。

（2）datasets提供常用的数据集加载，设计上都是继承torch.utils.data.Dataset，主要包括MNIST、CIFAR10/100、ImageNet、COCO等。Dataset对象是一个数据集，可按照下标访问，返回形如(data, label)的数据。举例一个常用的dataset——torchvision.datasets.DatasetFolder去wrapping数据。而ImageFolder假设所有的文件按文件夹保存，每个文件夹下存储同一个类别的图片，文件夹名为类名，其构造函数为：

ImageFolder(root, transform=None,target _transform=None,loader=defalut_loader)

（3）transforms提供常用的数据预处理操作，主要包括对Tensor及PIL Image对象的操作。
——transforms的转换范围两部，第一步：构建转换操作，如trans = transforms.Normalize(mean=x, std=y)；第二步：执行转换操作：例如output=transf(input)。另外还可以将多个处理操作用Compose拼接起来，构成一个处理转化你的流程。

PS：将Compose将这些操作连接起来（就像nn.Sequential），这些操作定义后是以对象的形式存在的，真正使用时需要调用它的__call__方法，类似于nn.Module。
更多的可以参考PyTorch的官方文档——Transforms部分。

Dataloader是一个可迭代的对象，它将dataset返回的每一条数据样本拼接成一个batch，并提供多线程加速优化和数据打乱等操作。当程序对dataset的所有数据遍历完一遍后，对Dataloader也就完成了一遍迭代。

# 在训练中进行数据增强是很重要的
# However, not every augmentation is useful.
# 思考哪种类型的数据增强有利于food recognition.
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    # You may add some transforms here.
    # ToTensor() should be the last one of the transforms.
    transforms.ToTensor(),
    # 将图片(Image)转成Tensor，归一化至[0, 1]
])

# 在testing and validation中我们不需要进行数据增强。
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
])

# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 128
# 采用分批次训练（加快参数更新速度），设置好我们的batch_size大小

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder("food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder("food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

Torchvison的transforms还可以通过Lambda封装自定义的转换策略，如相对PIL Image进行随机旋转，则可以写成trans=T.Lambda(lambda img: img.rotata(random()*360))。

3.定义模型Model

最基础的模型：先是一个卷积神经网络，再是一个全连接的前向传播神经网络。

卷积神经网络的一级卷积层由卷积层cov+批标准化batchnorm+激活函数ReLU+最大池化MaxPool构成。

调整参数时，最重要的是先调整重要参数：卷积层的卷积核个数、激活函数（可以对输出进行非线性运算）的种类以及输入图像的预处理；其他参数的影响不大（微调即可）。

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        # The arguments for commonly    used modules:
        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        # torch.nn.MaxPool2d(kernel_size, stride, padding)

        # input image size: [3, 128, 128]
        self.cnn_layers = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1), 
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0), 

            nn.Conv2d(64, 128, 3, 1, 1), 
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0), 

            nn.Conv2d(128, 256, 3, 1, 1), 
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(4, 4, 0),
        )
        self.fc_layers = nn.Sequential(
            nn.Linear(256 * 8 * 8, 256),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, 11)
        )

    def forward(self, x):
        # input (x): [batch_size, 3, 128, 128]
        # output: [batch_size, 11]

        # Extract features by convolutional layers.
        x = self.cnn_layers(x)

        # The extracted feature map must be flatten before going to fully-connected layers.
        x = x.flatten(1)

        # The features are transformed by fully-connected layers to obtain the final logits.
        x = self.fc_layers(x)
        return

笔记：继承的类什么时候使用nn.Module，什么时候使用nn.functional？
——如果模型有可学习的参数，最好使用前者，否则（即模型没有可学习的参数）则两种都可以使用，二者在性能上没有太大差异。由于激活函数（ReLU、sigmoid、tanh）、池化（MaxPool）等层没有可学习参数，可以使用对应的functional函数代替，而卷积、全连接等具有可学习参数的网络建议使用nn.Module。
ps：虽然dropout操作也没有可学习参数，但一般还是建议

4.训练Training

使用training set进行训练，用validation set选择最好的参数。

def get_pseudo_labels(dataset, model, threshold=0.65):
    # This functions generates pseudo-labels of a dataset using given model.
    # It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
    # You are NOT allowed to use any models trained on external data for pseudo-labeling.
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Make sure the model is in eval mode.
    model.eval()
    # Define softmax function.
    softmax = nn.Softmax(dim=-1)

    # Iterate over the dataset by batches.
    for batch in tqdm(dataloader):
        img, _ = batch

        # Forward the data
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(img.to(device))

        # Obtain the probability distributions by applying softmax on logits.
        probs = softmax(logits)

        # ---------- TODO ----------
        # Filter the data and construct a new dataset.

    # # Turn off the eval mode.
    model.train()
    return

# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize a model, and put it on the device specified.
model = Classifier().to(device)
model.device = device

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5)

# The number of training epochs.
n_epochs = 80

# Whether to do semi-supervised learning.
do_semi = False

# 训练
for epoch in range(n_epochs):
    # ---------- TODO ----------
    # In each epoch, relabel the unlabeled dataset for semi-supervised learning.
    # Then you can combine the labeled dataset and pseudo-labeled dataset for the training.
    if do_semi:
        # Obtain pseudo-labels for unlabeled data using trained model.
        pseudo_set = get_pseudo_labels(unlabeled_set, model)

        # Construct a new dataset and a data loader for training.
        # This is used in semi-supervised learning only.
        concat_dataset = ConcatDataset([train_set, pseudo_set])
        train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)

    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    model.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    # Iterate the training set by batches.
    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # Forward the data. (Make sure data and model are on the same device.)
        logits = model(imgs.to(device))

        # Calculate the cross-entropy loss.
        # We don't need to apply softmax before computing cross-entropy as it is done automatically.
        loss = criterion(logits, labels.to(device))

        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)

    # The average loss and accuracy of the training set is the average of the recorded values.
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- 验证集处理Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    model.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
          logits = model(imgs.to(device))

        # We can still compute the loss (but not the gradient).
        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

5.测试

（1）model.train() ：启用 BatchNormalization 和 Dropout
（2）model.eval() ：不启用 BatchNormalization 和 Dropout

# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
model.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(test_loader):
    # A batch consists of image data and corresponding labels.
    # But here the variable "labels" is useless since we do not have the ground-truth.
    # If printing out the labels, you will find that it is always 0.
    # This is because the wrapper (DatasetFolder) returns images and labels for each batch,
    # so we have to create fake labels to make it work normally.
    imgs, labels = batch

    # We don't need gradient in testing, and we don't even have labels to compute loss.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = model(imgs.to(device))

    # Take the class with greatest logit as prediction and record it.
    predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())


# Save predictions into the file.
with open("predict.csv", "w") as f:

    # The first row must be "Id, Category"
    f.write("Id,Category\n")

    # For the rest of the rows, each image id corresponds to a predicted class.
    for i, pred in  enumerate(predictions):
         f.write(f"{i},{pred}\n")