0
点赞
收藏
分享

微信扫一扫

【李宏毅机器学习2021】卷积神经网络HW3-Image Classification(更新ing)

耶也夜 2022-07-14 阅读 146


学习心得

(1)回顾​​【李宏毅机器学习CP21】(task6)卷积神经网络​​​,CNN强大在于卷积层强大的特征提取能力,当然我们可以利用CNN将特征提取出来后,用全连接层或决策树、支持向量机等各种机器学习算法模型来进行分类。
(2)Pytorch的​​​vision​​​库:https://github.com/pytorch/vision
(3)数据加载的基本原理:使用Dataset封装数据集,然后使用Dataloader实现数据并行加载。

文章目录

  • ​​学习心得​​
  • ​​一、作业目标and要求​​
  • ​​1.object和要求​​
  • ​​2.数据说明​​
  • ​​二、原始代码​​
  • ​​1.导入所需要的库​​
  • ​​2.定义Dataset, Data Loader, and Transforms​​
  • ​​(1)Dataset​​
  • ​​(2)Transforms​​
  • ​​3.定义模型Model​​
  • ​​4.训练Training​​
  • ​​5.测试​​
  • ​​三、修改代码​​
  • ​​1.残差神经网络​​
  • ​​2.残差神经网络+dropout​​
  • ​​附:Pytorch基础教程​​
  • ​​Reference​​

一、作业目标and要求

1.object和要求

目标:
(1)用CNN进行图片分类
(2)用数据增强( )提高performance
(3)学会利用unlabeled数据

要求:
(1)不能使用额外的数据(禁止使用其他图片数据集或预训练的模型)
(2)不能上网寻找label

作业难度分级:
初级: Build a simple convolutional neural network as the baseline. (2 pts)

中级: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)

高级: Utilize provided unlabeled data to obtain better results. (2 pts)

2.数据说明

通过卷积神经网络(Convolutional Neural Networks, CNN)对食物图片进行分类。

数据集中的食物图采集于网上,总共11类:Bread, Dairy product, Dessert, Egg, Fried food, Meat, Noodles/Pasta, Rice, Seafood, Soup, Vegetable/Fruit. 每一类用一个数字表示。比如:0表示Bread.

● Training set: 280 * 11 labeled images + 6786 unlabeled images
● Validation set: 30 * 11 labeled images
● Testing set: 3347 images

如打开训练集​​training​​​是下面这样的,其中​​training​​​数据集的​​unlabeled​​​部分和​​testing​​​数据集命名格式为[编号].jpg,而​​validation​​​数据集和​​training​​​数据集的​​labeled​​文件的图片文命名为[类别]_[编号].jpg:

【李宏毅机器学习2021】卷积神经网络HW3-Image Classification(更新ing)_2d_03

文件目录树为

  hw3_CNN.ipynb

└─food-11

二、原始代码

1.导入所需要的库

# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.auto import

​torchvision​​​实现了常用的图像数据加载功能,例如Imagenet、CIFAR10、MNIST等,以及常用的数据转换操作,极大方便了数据加载。
​​​torch.utils.data.DataLoader​​实现的是对数据集的处理。

2.定义Dataset, Data Loader, and Transforms

(1)Dataset

在PyTorch中,可以利用​​torch.utils.data​​​的​​Dataset​​​和​​DataLoader​​来包装data,使后续的training和testing更为方便。

Data需要overload两个函数:​​__len__​​​和​​__getitem__​​​:
1)​​​__len__​​​必须要回传dataset的大小
2)​​​__getitem__​​​定义了当程序利用​​[]​​取值时,dataset应该要怎么回传资料

实际上我们不会直接使用到这两个函数,但是使用​​DataLoader​​​在​​enumerate Dataset​​时会使用到,没有实做的话代码运行阶段会出现error。

(2)Transforms

​Torchvision​​​提供很多工具进行图像处理、数据wrapping和数据增强(augmentation)。因为我们的数据根据对应的标签类别分别存储在文件夹里,我们可以方便地直接使用,​​torchvision​​主要包含3方面:

(1)​​model​​提供dl中各种经典网络的网络结构及与训练好的模型,包括AlexNet、VGG系列、ResNet系列、Inception系列等。

(2)​​datasets​​​提供常用的数据集加载,设计上都是继承​​torch.utils.data.Dataset​​​,主要包括MNIST、CIFAR10/100、ImageNet、COCO等。​​Dataset​​​对象是一个数据集,可按照下标访问,返回形如​​(data, label)​​​的数据。举例一个常用的dataset——​​torchvision.datasets.DatasetFolder​​​去wrapping数据。而​​ImageFolder​​假设所有的文件按文件夹保存,每个文件夹下存储同一个类别的图片,文件夹名为类名,其构造函数为:

ImageFolder(root, transform=None,target _transform=None,loader=defalut_loader)

(3)​​transforms​​​提供常用的数据预处理操作,主要包括对Tensor及PIL Image对象的操作。
——​​​transforms​​​的转换范围两部,第一步:构建转换操作,如​​trans = transforms.Normalize(mean=x, std=y)​​​;第二步:执行转换操作:例如​​output=transf(input)​​。另外还可以将多个处理操作用Compose拼接起来,构成一个处理转化你的流程。

PS:将Compose将这些操作连接起来(就像​​nn.Sequential​​​),这些操作定义后是以对象的形式存在的,真正使用时需要调用它的​​__call__​​​方法,类似于​​nn.Module​​​。
更多的可以参考​​​PyTorch的官方文档——Transforms部分​​。

​Dataloader​​​是一个可迭代的对象,它将​​dataset​​​返回的每一条数据样本拼接成一个batch,并提供多线程加速优化和数据打乱等操作。当程序对​​dataset​​​的所有数据遍历完一遍后,对​​Dataloader​​也就完成了一遍迭代。

# 在训练中进行数据增强是很重要的
# However, not every augmentation is useful.
# 思考哪种类型的数据增强有利于food recognition.
train_tfm = transforms.Compose([
# Resize the image into a fixed shape (height = width = 128)
transforms.Resize((128, 128)),
# You may add some transforms here.
# ToTensor() should be the last one of the transforms.
transforms.ToTensor(),
# 将图片(Image)转成Tensor,归一化至[0, 1]
])

# 在testing and validation中我们不需要进行数据增强。
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
])

# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 128
# 采用分批次训练(加快参数更新速度),设置好我们的batch_size大小

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder("food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder("food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

​Torchvison​​​的​​transforms​​​还可以通过​​Lambda​​​封装自定义的转换策略,如相对PIL Image进行随机旋转,则可以写成​​trans=T.Lambda(lambda img: img.rotata(random()*360))​​。

3.定义模型Model

最基础的模型:先是一个卷积神经网络,再是一个全连接的前向传播神经网络。

卷积神经网络的一级卷积层由卷积层cov+批标准化batchnorm+激活函数ReLU+最大池化MaxPool构成。

调整参数时,最重要的是先调整重要参数:卷积层的卷积核个数、激活函数(可以对输出进行非线性运算)的种类以及输入图像的预处理;其他参数的影响不大(微调即可)。

class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
# The arguments for commonly used modules:
# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
# torch.nn.MaxPool2d(kernel_size, stride, padding)

# input image size: [3, 128, 128]
self.cnn_layers = nn.Sequential(
nn.Conv2d(3, 64, 3, 1, 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2, 0),

nn.Conv2d(64, 128, 3, 1, 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2, 0),

nn.Conv2d(128, 256, 3, 1, 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(4, 4, 0),
)
self.fc_layers = nn.Sequential(
nn.Linear(256 * 8 * 8, 256),
nn.ReLU(),
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 11)
)

def forward(self, x):
# input (x): [batch_size, 3, 128, 128]
# output: [batch_size, 11]

# Extract features by convolutional layers.
x = self.cnn_layers(x)

# The extracted feature map must be flatten before going to fully-connected layers.
x = x.flatten(1)

# The features are transformed by fully-connected layers to obtain the final logits.
x = self.fc_layers(x)
return

笔记:继承的类什么时候使用​​nn.Module​​​,什么时候使用​​nn.functional​​​?
——如果模型有可学习的参数,最好使用前者,否则(即模型没有可学习的参数)则两种都可以使用,二者在性能上没有太大差异。由于激活函数(ReLU、sigmoid、tanh)、池化(MaxPool)等层没有可学习参数,可以使用对应的​​​functional​​​函数代替,而卷积、全连接等具有可学习参数的网络建议使用​​nn.Module​​​。
ps:虽然dropout操作也没有可学习参数,但一般还是建议

4.训练Training

使用training set进行训练,用validation set选择最好的参数。

def get_pseudo_labels(dataset, model, threshold=0.65):
# This functions generates pseudo-labels of a dataset using given model.
# It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
# You are NOT allowed to use any models trained on external data for pseudo-labeling.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Make sure the model is in eval mode.
model.eval()
# Define softmax function.
softmax = nn.Softmax(dim=-1)

# Iterate over the dataset by batches.
for batch in tqdm(dataloader):
img, _ = batch

# Forward the data
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(img.to(device))

# Obtain the probability distributions by applying softmax on logits.
probs = softmax(logits)

# ---------- TODO ----------
# Filter the data and construct a new dataset.

# # Turn off the eval mode.
model.train()
return

# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize a model, and put it on the device specified.
model = Classifier().to(device)
model.device = device

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5)

# The number of training epochs.
n_epochs = 80

# Whether to do semi-supervised learning.
do_semi = False

# 训练
for epoch in range(n_epochs):
# ---------- TODO ----------
# In each epoch, relabel the unlabeled dataset for semi-supervised learning.
# Then you can combine the labeled dataset and pseudo-labeled dataset for the training.
if do_semi:
# Obtain pseudo-labels for unlabeled data using trained model.
pseudo_set = get_pseudo_labels(unlabeled_set, model)

# Construct a new dataset and a data loader for training.
# This is used in semi-supervised learning only.
concat_dataset = ConcatDataset([train_set, pseudo_set])
train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)

# ---------- Training ----------
# Make sure the model is in train mode before training.
model.train()

# These are used to record information in training.
train_loss = []
train_accs = []

# Iterate the training set by batches.
for batch in tqdm(train_loader):

# A batch consists of image data and corresponding labels.
imgs, labels = batch

# Forward the data. (Make sure data and model are on the same device.)
logits = model(imgs.to(device))

# Calculate the cross-entropy loss.
# We don't need to apply softmax before computing cross-entropy as it is done automatically.
loss = criterion(logits, labels.to(device))

# Gradients stored in the parameters in the previous step should be cleared out first.
optimizer.zero_grad()

# Compute the gradients for parameters.
loss.backward()

# Clip the gradient norms for stable training.
grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

# Update the parameters with computed gradients.
optimizer.step()

# Compute the accuracy for current batch.
acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

# Record the loss and accuracy.
train_loss.append(loss.item())
train_accs.append(acc)

# The average loss and accuracy of the training set is the average of the recorded values.
train_loss = sum(train_loss) / len(train_loss)
train_acc = sum(train_accs) / len(train_accs)

# Print the information.
print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

# ---------- 验证集处理Validation ----------
# Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
model.eval()

# These are used to record information in validation.
valid_loss = []
valid_accs = []

# Iterate the validation set by batches.
for batch in tqdm(valid_loader):

# A batch consists of image data and corresponding labels.
imgs, labels = batch

# We don't need gradient in validation.
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(imgs.to(device))

# We can still compute the loss (but not the gradient).
loss = criterion(logits, labels.to(device))

# Compute the accuracy for current batch.
acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

# Record the loss and accuracy.
valid_loss.append(loss.item())
valid_accs.append(acc)

# The average loss and accuracy for entire validation set is the average of the recorded values.
valid_loss = sum(valid_loss) / len(valid_loss)
valid_acc = sum(valid_accs) / len(valid_accs)

# Print the information.
print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

5.测试

(1)model.train() :启用 BatchNormalization 和 Dropout
(2)model.eval() :不启用 BatchNormalization 和 Dropout

# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
model.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(test_loader):
# A batch consists of image data and corresponding labels.
# But here the variable "labels" is useless since we do not have the ground-truth.
# If printing out the labels, you will find that it is always 0.
# This is because the wrapper (DatasetFolder) returns images and labels for each batch,
# so we have to create fake labels to make it work normally.
imgs, labels = batch

# We don't need gradient in testing, and we don't even have labels to compute loss.
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(imgs.to(device))

# Take the class with greatest logit as prediction and record it.
predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())


# Save predictions into the file.
with open("predict.csv", "w") as f:

# The first row must be "Id, Category"
f.write("Id,Category\n")

# For the rest of the rows, each image id corresponds to a predicted class.
for i, pred in enumerate(predictions):
f.write(f"{i},{pred}\n")

三、修改代码

1.残差神经网络

2.残差神经网络+dropout

附:Pytorch基础教程

(1)Autograd实现了反向传播
(2)​​​torch.nn​​​构建于Autograd之上,可以用来定义和运行网络,​​nn.Module​​​是nn中最重要的类,可以把他看做是一个网络的封装,包含网络各层定义以及​​forward​​​方法——调用​​forward(input)​​方法可返回前向传播的结果

Reference

(1)李宏毅2021年机器学习课程ppt
(2)《深度学习框架PyTorch入门与实践》陈云
(3)使用Colab方法:https://zhuanlan.zhihu.com/p/346358053


举报

相关推荐

0 条评论