Day11 when gradient is small……


怎么知道是局部小 还是鞍点?
using Math



example – using Hessan
Dont afraid of saddle point(鞍点)


local minima VS saddle Point



Day12 Tips for training :Batch and Momentum
why we use batch?
前面有讲到这里, 前倾回归
shuffle :有可能batch结束后,就会重新分一次batch
small vs big

未考虑平行运算(并行 --gpu)






Aspect | Small Batch Size(100个样本) | Large Batch Size(10000个样本) |
---|---|---|
Speed for one update (no parallel) | Faster | Slower |
Speed for one update (with parallel) | Same | Same (not too large) |
Time for one epoch | Slower | Faster |
Gradient | Noisy | Stable |
Optimization | Better | Worse |
Generalization | Better | Worse |
batch is a hyperparameter……
Momentum
惯性



