引言：

创新技术

NIN块

一个卷积后跟两个全连接层
- sride=1，无padding，输出的形状跟卷积层输出一样，只是减少通道数
- 两个1*1的卷积起到了全连接层的作用

在每个像素的通道上分别使用多层感知机

模型结构

图3. VGG等和NiN的架构差异

图4. NIN网络结构

NIN由三层的多层感知卷积层（MLPConv Layer）构成，每一层多层感知卷积层内部由若干层的局部全连接层和非线性激活函数组成，代替了传统卷积层中采用的线性卷积核

无全连接层
交替使用NiN块和stride=2的最大池化层
- 目的：逐步减小高宽和增大通道数
最后使用全局平均池化层得到输出
- 其输入的通道数就是类别数

代码实现


import torch
from torch import nn


def nin_block(in_channels,out_channels,kernel_size,strides,padding):
    nin_block = nn.Sequential(
        nn.Conv2d(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,stride=strides,padding=padding),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=out_channels,out_channels=out_channels,kernel_size=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=1),
        nn.ReLU(inplace=True),
    )
    return nin_block

class NIN(nn.Module):
    def __init__(self,class_num,init_weight = False):
        super(NIN,self).__init__()
        self.nin = nn.Sequential(
            nin_block(in_channels=1,out_channels=96,kernel_size=11,strides=4,padding=0),
            nn.MaxPool2d(kernel_size=3,stride=2),
            #（3-5+2p）/2 +1 = 3 -> padding =2
            nin_block(in_channels=96,out_channels=256,kernel_size=5,strides=1,padding=2),   #推出来padding = 2
            nn.MaxPool2d(kernel_size=3,stride=2),
            nin_block(in_channels=256,out_channels=384,kernel_size=3,strides=1,padding=1),
            nn.MaxPool2d(kernel_size=3,stride=2),
            nin_block(in_channels=384,out_channels=class_num,kernel_size=3,strides=1,padding=1),
            nn.AdaptiveAvgPool2d((1, 1)),#将无论输入多大的H*W都变为 1*1
            nn.Flatten()  # 将四维的输出转成二维的输出，其形状为(批量大小,class_num)
        )

        if init_weight:
            self._initialize_weights()

    def forward(self,x):
        x = self.nin(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


net = NIN(class_num=10)
X = torch.rand(size=(1, 1, 224, 224))
for layer in net.nin:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

输出结果

Sequential output shape:	 torch.Size([1, 96, 54, 54])
MaxPool2d output shape:	 torch.Size([1, 96, 26, 26])
Sequential output shape:	 torch.Size([1, 256, 26, 26])
MaxPool2d output shape:	 torch.Size([1, 256, 12, 12])
Sequential output shape:	 torch.Size([1, 384, 12, 12])
MaxPool2d output shape:	 torch.Size([1, 384, 5, 5])
Sequential output shape:	 torch.Size([1, 10, 5, 5])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 10, 1, 1])
Flatten output shape:	 torch.Size([1, 10])

进程已结束,退出代码0

总结

NiN使用由一个卷积层和多个1×1卷积层组成的块。允许CNN获得的每像素非线性。
NiN容易造成过拟合的全连接层替换为全局平均池化层（即在所有位置上进行求和求平均）。该池化层通道数量可以设计成的输出数量（例如，Fashion-MNIST的输出为10）。
移除全连接层可减少参数，防止过拟合，同时显著减少NiN的参数。
NiN的设计影响了许多后续卷积神经网络的设计。（GoogLeNet的Inception结构就是来源于这个思想）
使用多层感知机结构来代替卷积的滤波操作，减少卷积核数过多而导致的参数量暴涨问题，注意：是卷积层增加，后面就会增加relu()层通过增加非线性的映射来提高模型对特征的抽象能力。

NIN：Network in Network

引言：

创新技术

NIN块

模型结构

代码实现

总结