SoftMax函数反向传递公式推导及代码实现
SoftMax函数介绍
简介
softmax函数是常用的输出层函数,常用来解决互斥标签的多分类问题。当然由于他是非线性函数,也可以作为隐藏层函数使用
公式
假设我们有若干输入[x1, x2, x3…xn],对应的输出为[y1, y2, y3…yn],对于SoftMax函数我们有
y
i
=
e
x
i
∑
k
=
0
e
x
k
y_i= \frac{e^{x_i}}{\sum_{k=0} e^{^{x_k}}}
yi=∑k=0exkexi
图像
反向传递公式推导
SoftMax函数比较特殊,他有多个输入和输出,并且每个输出与所有的输入都有关,所以这个函数输出对于多个输入都有一个偏导数,也就是SoftMax可以得到多个偏导数。对于SoftMax我们有两种情况
当输入坐标与输出坐标相对应时
∂
y
i
∂
x
j
=
∂
y
i
∂
x
i
\frac{\partial y_i}{\partial {x_j}}=\frac{\partial y_i}{\partial {x_i}}
∂xj∂yi=∂xi∂yi
=
e
x
i
⋅
(
∑
k
,
i
=
j
e
x
i
)
−
e
x
i
⋅
e
x
i
(
∑
k
,
i
=
j
e
x
k
)
2
= \frac{e^{x_i} \cdot (\sum_{k,i=j} e^{x_i})-e^{x_i} \cdot e^{x_i}}{(\sum_{k, i=j}e^{x_k})^2}
=(∑k,i=jexk)2exi⋅(∑k,i=jexi)−exi⋅exi
=
e
x
i
∑
k
,
i
=
j
e
x
k
−
(
e
x
i
∑
k
,
i
=
j
e
x
k
)
2
=\frac{e^{x_i}}{\sum_{k, i=j}e^{x_k}}-(\frac{e^{x_i}}{\sum_{k, i=j}e^{x_k}})^2
=∑k,i=jexkexi−(∑k,i=jexkexi)2
=
y
i
(
1
−
y
i
)
=y_i(1-y_i)
=yi(1−yi)
当输入坐标与输出坐标不对应时
∂
y
i
∂
x
j
=
−
e
x
i
⋅
e
x
j
(
∑
k
e
x
k
)
2
\frac{\partial y_i}{\partial {x_j}}= -\frac{e^{x_i} \cdot e^{x_j}}{(\sum_ke^{x_k})^2}
∂xj∂yi=−(∑kexk)2exi⋅exj
=
−
e
x
i
∑
k
,
i
!
=
j
e
x
k
⋅
e
x
j
∑
k
,
i
!
=
j
e
x
k
=
−
y
i
⋅
y
j
=-\frac{e^{x_i}}{\sum_{k, i!=j}e^{x_k}} \cdot \frac{e^{x_j}}{\sum_{k, i!=j}e^{x_k}}=-y_i \cdot y_j
=−∑k,i!=jexkexi⋅∑k,i!=jexkexj=−yi⋅yj
两种情况合并
∂
y
i
∂
x
j
=
e
x
i
∑
k
,
i
=
j
e
x
k
−
(
e
x
i
∑
k
,
i
=
j
e
x
k
)
2
−
e
x
i
∑
k
,
i
!
=
j
e
x
k
⋅
e
x
j
∑
k
,
i
!
=
j
e
x
i
=
e
x
i
∑
k
,
i
=
j
e
x
k
−
e
x
i
⋅
e
x
j
(
∑
k
e
x
k
)
2
=
y
i
−
y
i
⋅
y
j
\frac{\partial y_i}{\partial x_j}=\frac{e^{x_i}}{\sum_{k, i=j}e^{x_k}}-(\frac{e^{x_i}}{\sum_{k, i=j}e^{x_k}})^2-\frac{e^{x_i}}{\sum_{k, i!=j}e^{x_k}} \cdot \frac{e^{x_j}}{\sum_{k, i!=j}e^{x_i}} \\ = \frac{e^{x_i}}{\sum_{k, i=j}e^{x_k}}-\frac{e^{x_i} \cdot e^{x_j}}{(\sum_{k}e^{x_k})^2}=y_i -y_i \cdot y_j
∂xj∂yi=∑k,i=jexkexi−(∑k,i=jexkexi)2−∑k,i!=jexkexi⋅∑k,i!=jexiexj=∑k,i=jexkexi−(∑kexk)2exi⋅exj=yi−yi⋅yj
故
∂
y
∂
x
=
y
⋅
(
1
−
y
)
\frac{\partial y}{\partial x}=y \cdot (1-y)
∂x∂y=y⋅(1−y)
代码实现
class SoftMax():
def __init__(self):
pass
def _softmax(self,x):
x = x.T
x = x - np.max(x, axis=0)
y = np.exp(x) / np.sum(np.exp(x), axis=0)
return y.T
def forward(self,input):
return self._softmax(input)
def backward(self, input, grad_output):
out = self.forward(input)
return grad_output * out * (1 - out)