引言

众所周知BN层在训练和测试时呈现出不同的计算法则，在训练时是对每个batch计算均值和方差，而在测试时则是用训练时batch的均值和方差对数据集整体进行无偏估计，具体可以参见我的另一篇博客：论文解析：Inception_V2(Batch Normalization)。因此我们在编程实现时需要根据所处阶段（训练或者推断）对BN层进行调整。

分析

对于如何根据所处阶段对BN层计算方式做出调整，网络上已经有了很多的介绍，即设置model.eval()。但有细心的朋友可能会发现，BN层初始化时存在一个track_running_stats参数，它的解释如下：

大意就是当这个参数为Ture时，BN模块会不断的跟进均值和方差，而在测试时则不会进行记录，并且会清空记录缓存。（这里涉及到pytorch里对BN层的处理方式，并不是真的存储了所有batch的均值和方差，在推断时统一处理，而是存在一种滑窗的机制，与此文无关不再细聊）。
但如果我们真的执行model.eval()，可以发现BN层的这一参数并未发生改变，测试如下：

model=nn.Sequential(
    nn.Linear(64,32),
    nn.BatchNorm1d(32),
    nn.ReLU(),
)
model.eval()
Out[7]: 
Sequential(
  (0): Linear(in_features=64, out_features=32, bias=True)
  (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
)
model[1]
Out[8]: BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

可以发现track_running_stats参数并没有发生变化，如果我们采用文献¹中的方法对模型中的每一层检测是否为BN，再单独设置eval()可以发现结果是一样的，那为何说调用该方法可以矫正BN层的运行模式呢？
答案是track_running_stats并不是唯一控制BN层运行模式的参数，BN层继承自_BatchNorm()类，而该类中存在如下判断语句

if self.training and self.track_running_stats:
    # TODO: if statement only here to tell the jit to skip emitting this when it is None
    if self.num_batches_tracked is not None:  # type: ignore[has-type]
        self.num_batches_tracked = self.num_batches_tracked + 1  # type: ignore[has-type]
        if self.momentum is None:  # use cumulative moving average
            exponential_average_factor = 1.0 / float(self.num_batches_tracked)
        else:  # use exponential moving average
            exponential_average_factor = self.momentum

r"""
Decide whether the mini-batch stats should be used for normalization rather than the buffers.
Mini-batch stats are used in training mode, and in eval mode when buffers are None.
"""
if self.training:
    bn_training = True
else:
    bn_training = (self.running_mean is None) and (self.running_var is None)

也就是说该类存在（继承）了另一个training属性来控制BN层的模式，而调用eval()方法则是改变了这一属性，验证如下：

model=nn.Sequential(
    nn.Linear(64,32),
    nn.BatchNorm1d(32),
    nn.ReLU(),
)
model.eval()
Out[10]: 
Sequential(
  (0): Linear(in_features=64, out_features=32, bias=True)
  (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
)
model[1].training
Out[11]: False