0
点赞
收藏
分享

微信扫一扫

优化:深度神经网络Tricks【笔记】


Slide:http://lamda.nju.edu.cn/weixs/slide/CNNTricks_slide.pdf

博文:http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html

  1)data augmentation;   

    2)pre-processing on images;    

      3)initializations of Networks;    

       4)some tips during training;     

        5)selections of activation functions;                        

       6)diverse regularizations

      7)some insights found from figures and finally    

   8)methods of ensemble multiple deep networks.

Sec. 1: Data Augmentation

训练的时候,训练集有限,可以用Data Augmentation来扩充数据集合;

  • (1)、简单的crops: horizontally flipping, random crops andcolor jittering
  • (2)、结合(1)中简单的处理
  • (3)、Krizhevsky et al. [1] 提出的 fancy PCA : alters the intensities of the RGB channels in training images.

Sec. 2: Pre-Processing

(1)、 zero-center + normalize:

python实现


>>> X -= np.mean(X, axis = 0) # zero-center
>>> X /= np.std(X, axis = 0) # normalize


(2)、 PCA Whitening:zero-center-->计算covariance matrix(数据之间的correlation结构)-->decorrelate数据-->whitening

python实现


>>> X -= np.mean(X, axis = 0) # zero-center
>>> cov = np.dot(X.T, X) / X.shape[0] # compute the covariance matrix


 decorrelate data :通过将原来的数据(除了zero-centres)映射带eigenbasis


>>> U,S,V = np.linalg.svd(cov) # compute the SVD factorization of the data covariance matrix
>>> Xrot = np.dot(X, U) # decorrelate the data


 whitening:用eigenvalue将eigenbasis的每个维度分开来normalize the scale


>>> Xwhite = Xrot / np.sqrt(S + 1e-5) # divide by the eigenvalues (which are square roots of the singular values)



Sec. 3: Initializations

(1)、All Zero Initialization

理想状态下认为一般权重为正数一半为负数再见过适当的data normalization

缺点:no source of asymmetry between neurons 

(2)、Initialization with Small Random Numbers:

优点:symmetry breaking

 思想:the neurons are all random and unique in the beginning,

eg1: 

, where

is a zero mean, unit standard deviation gaussian. 

eg2:small numbers drawn from a uniform distribution,

(3)、Calibrating the Variances

思想:normalize the variance of each neuron's output to 1 ,但是不会考虑ReLUs

python实现:


>>> w = np.random.randn(n) / sqrt(n) # calibrating the variances with 1/sqrt(n)


(4)、Current Recommendation

 He et al. [4] 关注 ReLUs:variance :

 

python实现:


>>> w = np.random.randn(n) * sqrt(2.0/n) # current recommendation.


Sec. 4: During Training

  • Filters and pooling size.  input images: power-of-2  ;  filter (e.g.,
  • )  ;strides (e.g., 1) with zeros-padding;  pooling :eg: 
  • .
  • Learning rate.利用validation set ,再次 Ilya Sutskever [2]:divide the gradients by mini batch size
  • Fine-tune on pre-trained models. 考虑:新的数据集的大小&和预训练模型训练数据集的相似性
  • (1)、如果自己的数据和预训练的相似 ,直接在从预训练模型的高层提取的特征尚训练一个 linear classifier
  • (2)、如果有许多数据,可以用small learning rate微调预训练模型的高层
  • (3)、如果自己的数据集和预训练模型的数据集差异很大,但是有很多训练图像,大部分的layers需要用小的learning rate在自己的数据集上进行 fine-tuned
  • (4)、如果自己的数据集小而且与预训练模型数据集差异很大,那就只训练一个 linear classifier.

Sec. 5: Activation Functions :non-linearity

                                    

优化:深度神经网络Tricks【笔记】_数据集

                    


Sigmoid

优化:深度神经网络Tricks【笔记】_数据集_02

 

优化:深度神经网络Tricks【笔记】_ide_03

 large negative numbers become 0 and large positive numbers become 1. 


  1. sigmoids saturate and kill gradients. .
  2. Sigmoid outputs are not zero-centered


tanh(x)

优化:深度神经网络Tricks【笔记】_ide_04

 

 range [-1, 1].

1、 its activations saturate

2、zero-centered                                                                                                 

Rectified Linear Unit

优化:深度神经网络Tricks【笔记】_数据集_05

 

        


  1. (Pros) do  expensive operations (exponentials, etc.),
  2. (Pros) ReLUs does not suffer from saturating.
  3. (Pros) accelerate (e.g., a factor of 6 in [1]) the convergence of stochastic 
  4. gradient descent (linear, non-saturating form.)
  5. (Cons)  fragile during training and can “die”.                                            



Leaky ReLU

优化:深度神经网络Tricks【笔记】_python实现_06

 

 fix the “dying ReLU” problem. 

  if

 (

 : a small constant)

  if


(cons)the results are not always consistent.                                            

Parametric ReLU : 

优化:深度神经网络Tricks【笔记】_ide_07

 


 PReLU,

is learned from data not pre-defiined[[4]] Leaky ReLU 

is fixed.  RReLU,

is a random variable  in a given range in the training, 

and then fixed in the testing[[5]] (cons) reduce overfitting


Randomized ReLU

  RReLU,

 在训练时是给定范围的随机变量 ,但在测试时是固定的。[[5]] 

优化:深度神经网络Tricks【笔记】_ide_08

 

 


Sec. 6: Regularizations

  • L2 regularization : add  
  • to the objective,
  •  :regularization strength. ( heavily penalizing peaky weight vectors and preferring diffuse weight vectors)
  • L1 regularization: add
  • to the objective. 结合:
  •  (Elastic net regularization). 
  • Max norm constraints. enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint.
  • .
  •  (always 3 or 4).update are bounded so the nwtwork wont explores..
  • Dropout : [6] only updating the parameters of the sampled network based on the input data . 

优化:深度神经网络Tricks【笔记】_ide_09

 

 [6].  training:  keeping a neuron active with some 

probability 

(a hyper-parameter), or setting it to zero .

testing: no dropout 

dropout ratio

 is a reasonable default

Sec. 7: Insights from Figures


  • learning rate 
  • loss curve.: the “width” of the curve is related to the batch size. 
  • accuracy curve. 
  • 优化:深度神经网络Tricks【笔记】_python实现_10

Sec. 8: Ensemble[8]


  • Same model, different initialization. 用交叉验证集来决定最好的超参数 hyperparameters, 然后用这些超参数来训练多个 models ,但是随机初始化.
  • Top models discovered during cross-validation. 用交叉验证集来决定最好的超参数 hyperparameters,然后选出前n个最好的models来ensemble.(风险是可能包含未达标准的model).
  • Different checkpoints of a single model. training非常expensive的情况下, 选取一个single network中不同时刻的不同的 checkpoints 来ensemble. (缺乏多样性,但是cheap).
  • Some practical examples. 如果你的任务是high-level image semantic: 可以在不同的数据集上使用多个深度模型来提取不同的互补的深度representations. 

Miscellaneous

Problems:

data:class-imbalanced: some classes have a large number of images/training instances, while some have very limited number of images. 

method1:balance the training data by directly up-sampling and down-sampling the imbalanced data[10].

method2: crops processing[7].

method3 :adjust the fine-tuning strategy

举报

相关推荐

深度循环神经网络

0 条评论