4.Semi-Supervised EM
(a)
参考EM算法的讲义原始的推导过程,很容易得出:
ℓ
semi-sup
(
θ
(
t
+
1
)
)
=
ℓ
unsup
(
θ
(
t
+
1
)
)
+
α
ℓ
sup
(
θ
(
t
+
1
)
)
≥
∑
i
=
1
m
(
∑
z
(
i
)
Q
i
(
t
)
(
z
(
i
)
)
log
p
(
x
(
i
)
,
z
(
i
)
;
θ
(
t
+
1
)
)
Q
i
(
t
)
(
z
(
i
)
)
)
+
α
(
∑
i
=
1
m
~
log
p
(
x
~
(
i
)
,
z
~
(
i
)
;
θ
(
t
+
1
)
)
)
≥
∑
i
=
1
m
(
∑
z
(
i
)
Q
i
(
t
)
(
z
(
i
)
)
log
p
(
x
(
i
)
,
z
(
i
)
;
θ
(
t
)
)
Q
i
(
t
)
(
z
(
i
)
)
)
+
α
(
∑
i
=
1
m
log
p
(
x
~
(
i
)
,
z
~
(
i
)
;
θ
(
t
)
)
)
=
ℓ
unsup
(
θ
(
t
)
)
+
α
ℓ
sup
(
θ
(
t
)
)
=
ℓ
semi-sup
(
θ
(
t
)
)
\begin{aligned} \ell_{\text {semi-sup }}\left(\theta^{(t+1)}\right) &=\ell_{\text {unsup }}\left(\theta^{(t+1)}\right)+\alpha \ell_{\text {sup }}\left(\theta^{(t+1)}\right) \\ & \geq \sum_{i=1}^{m}\left(\sum_{z^{(i)}} Q_{i}^{(t)}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta^{(t+1)}\right)}{Q_{i}^{(t)}\left(z^{(i)}\right)}\right)+\alpha\left(\sum_{i=1}^{\tilde{m}} \log p\left(\tilde{x}^{(i)}, \tilde{z}^{(i)} ; \theta^{(t+1)}\right)\right) \\ & \geq \sum_{i=1}^{m}\left(\sum_{z^{(i)}} Q_{i}^{(t)}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta^{(t)}\right)}{Q_{i}^{(t)}\left(z^{(i)}\right)}\right)+\alpha\left(\sum_{i=1}^{m} \log p\left(\tilde{x}^{(i)}, \tilde{z}^{(i)} ; \theta^{(t)}\right)\right) \\ &=\ell_{\text {unsup }}\left(\theta^{(t)}\right)+\alpha \ell_{\text {sup }}\left(\theta^{(t)}\right) \\ &=\ell_{\text {semi-sup }}\left(\theta^{(t)}\right) \end{aligned}
ℓsemi-sup (θ(t+1))=ℓunsup (θ(t+1))+αℓsup (θ(t+1))≥i=1∑m(z(i)∑Qi(t)(z(i))logQi(t)(z(i))p(x(i),z(i);θ(t+1)))+α(i=1∑m~logp(x~(i),z~(i);θ(t+1)))≥i=1∑m(z(i)∑Qi(t)(z(i))logQi(t)(z(i))p(x(i),z(i);θ(t)))+α(i=1∑mlogp(x~(i),z~(i);θ(t)))=ℓunsup (θ(t))+αℓsup (θ(t))=ℓsemi-sup (θ(t))
其中第一个不等号使用了Jensens不等式,第二个不等号使用了M-step中
θ
t
+
1
\theta^{t+1}
θt+1是函数的最大值。
Semi-supervised GMM
(b)
参考讲义关于GMM的EM-step推导,也不难得出:
w
j
(
i
)
=
p
(
z
(
i
)
=
j
∣
x
(
i
)
;
ϕ
,
μ
,
Σ
)
=
p
(
x
(
i
)
∣
z
(
i
)
=
j
;
μ
,
Σ
)
p
(
z
(
i
)
=
j
;
ϕ
)
∑
l
=
1
k
p
(
x
(
i
)
∣
z
(
i
)
=
l
;
μ
,
Σ
)
p
(
z
(
i
)
=
l
;
ϕ
)
=
1
(
2
π
)
d
/
2
∣
Σ
j
∣
1
/
2
exp
(
−
1
2
(
x
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
)
ϕ
j
∑
l
=
1
k
1
(
2
π
)
d
/
2
∣
Σ
l
∣
1
/
2
exp
(
−
1
2
(
x
(
i
)
−
μ
l
)
T
Σ
l
−
1
(
x
(
i
)
−
μ
l
)
)
ϕ
l
\begin{aligned} w_{j}^{(i)} &=p\left(z^{(i)}=j \mid x^{(i)} ; \phi, \mu, \Sigma\right) \\ &=\frac{p\left(x^{(i)} \mid z^{(i)}=j ; \mu, \Sigma\right) p\left(z^{(i)}=j ; \phi\right)}{\sum_{l=1}^{k} p\left(x^{(i)} \mid z^{(i)}=l ; \mu, \Sigma\right) p\left(z^{(i)}=l ; \phi\right)} \\ &=\frac{\frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \phi_{j}}{\sum_{l=1}^{k} \frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{l}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{l}\right)^{T} \Sigma_{l}^{-1}\left(x^{(i)}-\mu_{l}\right)\right) \phi_{l}} \end{aligned}
wj(i)=p(z(i)=j∣x(i);ϕ,μ,Σ)=∑l=1kp(x(i)∣z(i)=l;μ,Σ)p(z(i)=l;ϕ)p(x(i)∣z(i)=j;μ,Σ)p(z(i)=j;ϕ)=∑l=1k(2π)d/2∣Σl∣1/21exp(−21(x(i)−μl)TΣl−1(x(i)−μl))ϕl(2π)d/2∣Σj∣1/21exp(−21(x(i)−μj)TΣj−1(x(i)−μj))ϕj
E-step中,是初始化或者更新了参数以后,重新计算隐变量
z
(
i
)
z^{(i)}
z(i)的分布,因此更新的是隐变量。
©
首先写出M-step的目标函数:
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
log
1
(
2
π
)
d
/
2
∣
Σ
j
∣
1
/
2
exp
(
−
1
2
(
x
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
)
ϕ
j
w
j
(
i
)
+
∑
i
=
1
m
~
∑
j
=
1
k
1
{
z
~
(
i
)
=
j
}
log
1
(
2
π
)
d
/
2
∣
Σ
j
∣
1
/
2
exp
(
−
1
2
(
x
~
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
~
(
i
)
−
μ
j
)
)
ϕ
j
\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \phi_{j}}{w_{j}^{(i)}} \\ +\sum_{i=1}^{\tilde{m}} \sum_{j=1}^{k} 1\left\{\tilde{z}^{(i)}=j\right\} \log \frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(\tilde{x}^{(i)}-\mu_{j}\right)\right) \phi_{j}
i=1∑mj=1∑kwj(i)logwj(i)(2π)d/2∣Σj∣1/21exp(−21(x(i)−μj)TΣj−1(x(i)−μj))ϕj+i=1∑m~j=1∑k1{z~(i)=j}log(2π)d/2∣Σj∣1/21exp(−21(x~(i)−μj)TΣj−1(x~(i)−μj))ϕj
对于
θ
\theta
θ,由于存在概率约束,需要用拉格朗日乘子法求解(只考虑与其有关的项),其拉格朗日函数为:
L
(
ϕ
)
=
∑
i
=
1
m
∑
l
=
1
k
w
l
(
i
)
log
ϕ
l
+
∑
i
=
1
m
~
∑
l
=
1
k
1
{
z
~
(
i
)
=
l
}
log
ϕ
l
+
β
(
∑
l
=
1
k
ϕ
l
−
1
)
\mathcal{L}(\phi)=\sum_{i=1}^{m} \sum_{l=1}^{k} w_{l}^{(i)} \log \phi_{l}+\sum_{i=1}^{\tilde{m}} \sum_{l=1}^{k} 1\left\{\tilde{z}^{(i)}=l\right\} \log \phi_{l}+\beta\left(\sum_{l=1}^{k} \phi_{l}-1\right)
L(ϕ)=i=1∑ml=1∑kwl(i)logϕl+i=1∑m~l=1∑k1{z~(i)=l}logϕl+β(l=1∑kϕl−1)
对参数求导置0得:
∇
ϕ
j
L
(
ϕ
)
=
∑
i
=
1
m
w
j
(
i
)
ϕ
j
+
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
ϕ
j
+
β
=
0
ϕ
j
=
∑
i
=
1
m
w
j
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
−
β
\begin{array}{c}\nabla_{\phi_{j}} \mathcal{L}(\phi)=\sum_{i=1}^{m} \frac{w_{j}^{(i)}}{\phi_{j}}+\sum_{i=1}^{\tilde{m}} \frac{1\left\{\tilde{z}^{(i)}=j\right\}}{\phi_{j}}+\beta=0 \\ \phi_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}{-\beta}\end{array}
∇ϕjL(ϕ)=∑i=1mϕjwj(i)+∑i=1m~ϕj1{z~(i)=j}+β=0ϕj=−β∑i=1mwj(i)+α∑i=1m~1{z~(i)=j}
利用约束条件得:
∑
l
=
1
k
ϕ
l
=
∑
i
=
1
m
∑
l
=
1
k
w
l
(
i
)
+
α
∑
i
=
1
m
~
∑
l
=
1
k
1
{
z
~
(
i
)
=
l
}
−
β
=
m
+
α
m
~
−
β
=
1
−
β
=
m
+
α
m
~
\begin{aligned} \sum_{l=1}^{k} \phi_{l} &=\frac{\sum_{i=1}^{m} \sum_{l=1}^{k} w_{l}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} \sum_{l=1}^{k} 1\left\{\tilde{z}^{(i)}=l\right\}}{-\beta} \\ &=\frac{m+\alpha \tilde{m}}{-\beta} \\ &=1 \\-\beta=m+\alpha \tilde{m} \end{aligned}
l=1∑kϕl−β=m+αm~=−β∑i=1m∑l=1kwl(i)+α∑i=1m~∑l=1k1{z~(i)=l}=−βm+αm~=1
从而:
ϕ
j
=
∑
i
=
1
m
w
j
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
m
+
α
m
~
\phi_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}{m+\alpha \tilde{m}}
ϕj=m+αm~∑i=1mwj(i)+α∑i=1m~1{z~(i)=j}
对于参数
μ
\mu
μ,对齐求梯度:
∇
μ
j
ℓ
u
n
s
u
p
=
∑
i
=
1
m
w
j
(
i
)
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
=
Σ
j
−
1
(
∑
i
=
1
m
w
j
(
i
)
x
(
i
)
−
μ
j
∑
i
=
1
m
w
j
(
i
)
)
\begin{aligned} \nabla_{\mu_{j}} \ell_{\mathrm{unsup}} &=\sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right) \\ &=\Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}-\mu_{j} \sum_{i=1}^{m} w_{j}^{(i)}\right) \end{aligned}
∇μjℓunsup=i=1∑mwj(i)Σj−1(x(i)−μj)=Σj−1(i=1∑mwj(i)x(i)−μji=1∑mwj(i))
∇
μ
j
ℓ
sup
=
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
Σ
j
−
1
(
x
~
(
i
)
−
μ
j
)
=
Σ
j
−
1
(
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
x
~
(
i
)
−
μ
j
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
)
\begin{aligned} \nabla_{\mu_{j}} \ell_{\sup } &=\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}\left(\tilde{x}^{(i)}-\mu_{j}\right) \\ &=\Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}-\mu_{j} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right) \end{aligned}
∇μjℓsup=i=1∑m~1{z~(i)=j}Σj−1(x~(i)−μj)=Σj−1(i=1∑m~1{z~(i)=j}x~(i)−μji=1∑m~1{z~(i)=j})
∇
μ
j
ℓ
semi-sup
=
∇
μ
j
ℓ
unsup
+
α
∇
μ
j
ℓ
sup
=
Σ
j
−
1
[
(
∑
i
=
1
m
w
j
(
i
)
x
(
i
)
−
μ
j
∑
i
=
1
m
w
j
(
i
)
)
+
α
(
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
x
~
(
i
)
−
μ
j
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
)
]
=
Σ
j
−
1
[
(
∑
i
=
1
m
w
j
(
i
)
x
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
x
~
(
i
)
)
−
μ
j
(
∑
i
=
1
m
w
j
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
)
]
=
0
\begin{aligned} \nabla_{\mu_{j}} \ell_{\text {semi-sup }} &=\nabla_{\mu_{j}} \ell_{\text {unsup }}+\alpha \nabla_{\mu_{j}} \ell_{\text {sup }} \\ &=\Sigma_{j}^{-1}\left[\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}-\mu_{j} \sum_{i=1}^{m} w_{j}^{(i)}\right)+\alpha\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}-\mu_{j} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right)\right] \\ &=\Sigma_{j}^{-1}\left[\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}\right)-\mu_{j}\left(\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right)\right] \\ &=0 \end{aligned}
∇μjℓsemi-sup =∇μjℓunsup +α∇μjℓsup =Σj−1[(i=1∑mwj(i)x(i)−μji=1∑mwj(i))+α(i=1∑m~1{z~(i)=j}x~(i)−μji=1∑m~1{z~(i)=j})]=Σj−1[(i=1∑mwj(i)x(i)+αi=1∑m~1{z~(i)=j}x~(i))−μj(i=1∑mwj(i)+αi=1∑m~1{z~(i)=j})]=0
μ
j
=
∑
i
=
1
m
w
j
(
i
)
x
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
x
~
(
i
)
∑
i
=
1
m
w
j
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
\mu_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}}{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}
μj=∑i=1mwj(i)+α∑i=1m~1{z~(i)=j}∑i=1mwj(i)x(i)+α∑i=1m~1{z~(i)=j}x~(i)
对于
Σ
\Sigma
Σ求导数:
∇
Σ
j
ℓ
u
n
s
u
p
=
−
1
2
∑
i
=
1
m
w
j
(
i
)
Σ
j
−
1
+
1
2
Σ
j
−
1
(
∑
i
=
1
m
w
j
(
i
)
(
x
(
i
)
−
μ
j
)
(
x
(
i
)
−
μ
j
)
T
)
Σ
j
−
1
\nabla_{\Sigma_{j}} \ell_{\mathrm{unsup}}=-\frac{1}{2} \sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1}
∇Σjℓunsup=−21i=1∑mwj(i)Σj−1+21Σj−1(i=1∑mwj(i)(x(i)−μj)(x(i)−μj)T)Σj−1
∇
Σ
j
ℓ
sup
=
−
1
2
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
Σ
j
−
1
+
1
2
Σ
j
−
1
(
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
(
x
~
(
i
)
−
μ
j
)
(
x
~
(
i
)
−
μ
j
)
T
)
Σ
j
−
1
\nabla_{\Sigma_{j}} \ell_{\text {sup }}=-\frac{1}{2} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1}
∇Σjℓsup =−21i=1∑m~1{z~(i)=j}Σj−1+21Σj−1(i=1∑m~1{z~(i)=j}(x~(i)−μj)(x~(i)−μj)T)Σj−1
∇
Σ
j
ℓ
semi-sup
=
∇
Σ
j
ℓ
unsup
+
α
∇
Σ
j
ℓ
sup
=
−
1
2
∑
i
=
1
m
w
j
(
i
)
Σ
j
−
1
+
1
2
Σ
j
−
1
(
∑
i
=
1
m
w
j
(
i
)
(
x
(
i
)
−
μ
j
)
(
x
(
i
)
−
μ
j
)
T
)
Σ
j
−
1
−
1
2
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
Σ
j
−
1
+
1
2
α
Σ
j
−
1
(
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
(
x
~
(
i
)
−
μ
j
)
(
x
~
(
i
)
−
μ
j
)
T
)
Σ
j
−
1
=
−
1
2
Σ
j
−
1
(
∑
i
=
1
m
w
j
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
)
+
1
2
Σ
j
−
1
(
∑
i
=
1
m
w
j
(
i
)
(
x
(
i
)
−
μ
j
)
(
x
(
i
)
−
μ
j
)
T
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
(
x
~
(
i
)
−
μ
j
)
(
x
~
(
i
)
−
μ
j
)
T
)
Σ
j
−
1
=
0
\begin{aligned} \nabla_{\Sigma_{j}} \ell_{\text {semi-sup }}=& \nabla_{\Sigma_{j}} \ell_{\text {unsup }}+\alpha \nabla_{\Sigma_{j}} \ell_{\text {sup }} \\=&-\frac{1}{2} \sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\ &-\frac{1}{2} \alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}+\frac{1}{2} \alpha \Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\=&-\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right) \\ &+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\=& 0 \end{aligned}
∇Σjℓsemi-sup ====∇Σjℓunsup +α∇Σjℓsup −21i=1∑mwj(i)Σj−1+21Σj−1(i=1∑mwj(i)(x(i)−μj)(x(i)−μj)T)Σj−1−21αi=1∑m~1{z~(i)=j}Σj−1+21αΣj−1(i=1∑m~1{z~(i)=j}(x~(i)−μj)(x~(i)−μj)T)Σj−1−21Σj−1(i=1∑mwj(i)+αi=1∑m~1{z~(i)=j})+21Σj−1(i=1∑mwj(i)(x(i)−μj)(x(i)−μj)T+αi=1∑m~1{z~(i)=j}(x~(i)−μj)(x~(i)−μj)T)Σj−10
Σ
j
=
∑
i
=
1
m
w
j
(
i
)
(
x
(
i
)
−
μ
j
)
(
x
(
i
)
−
μ
j
)
T
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
(
x
~
(
i
)
−
μ
j
)
(
x
~
(
i
)
−
μ
j
)
T
∑
i
=
1
m
w
j
(
i
)
+
α
∑
i
=
1
m
~
1
{
z
~
(
i
)
=
j
}
\Sigma_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}
Σj=∑i=1mwj(i)+α∑i=1m~1{z~(i)=j}∑i=1mwj(i)(x(i)−μj)(x(i)−μj)T+α∑i=1m~1{z~(i)=j}(x~(i)−μj)(x~(i)−μj)T
(d)
import matplotlib.pyplot as plt
import numpy as np
import os
PLOT_COLORS = ['red', 'green', 'blue', 'orange'] # Colors for your plots
K = 4 # Number of Gaussians in the mixture model
NUM_TRIALS = 3 # Number of trials to run (can be adjusted for debugging)
UNLABELED = -1 # Cluster label for unlabeled data points (do not change)
def main(is_semi_supervised, trial_num):
"""Problem 3: EM for Gaussian Mixture Models (unsupervised and semi-supervised)"""
print('Running {} EM algorithm...'
.format('semi-supervised' if is_semi_supervised else 'unsupervised'))
# Load dataset
train_path = os.path.join('.', 'data', 'ds4_train.csv')
x, z = load_gmm_dataset(train_path)
x_tilde = None
if is_semi_supervised:
# Split into labeled and unlabeled examples
labeled_idxs = (z != UNLABELED).squeeze()
x_tilde = x[labeled_idxs, :] # Labeled examples
z = z[labeled_idxs, :] # Corresponding labels
x = x[~labeled_idxs, :] # Unlabeled examples
# *** START CODE HERE ***
# (1) Initialize mu and sigma by splitting the m data points uniformly at random
# into K groups, then calculating the sample mean and covariance for each group
# (2) Initialize phi to place equal probability on each Gaussian
# phi should be a numpy array of shape (K,)
# (3) Initialize the w values to place equal probability on each Gaussian
# w should be a numpy array of shape (m, K)
m, n = x.shape
group_data_num = int(m / K)
mu, sigma = [], []
idx = np.random.permutation(m)
# initialize mu and sigma
for i in range(K):
if i != (K-1):
x_group = x[idx[i*group_data_num:(i+1)*group_data_num], :]
else:
x_group = x[idx[i*group_data_num:], :]
mu_group = x_group.mean(axis=0)
mu.append(mu_group)
sigma.append((x_group - mu_group).T @ (x_group - mu_group) / x_group.shape[0])
# initialize phi
phi = np.ones(K) / K
# initialize w
w = np.ones((m, K)) / K
# *** END CODE HERE ***
if is_semi_supervised:
w = run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma)
else:
w = run_em(x, w, phi, mu, sigma)
# Plot your predictions
z_pred = np.zeros(m)
if w is not None: # Just a placeholder for the starter code
for i in range(m):
z_pred[i] = np.argmax(w[i])
plot_gmm_preds(x, z_pred, is_semi_supervised, plot_id=trial_num)
def run_em(x, w, phi, mu, sigma):
"""Problem 3(d): EM Algorithm (unsupervised).
See inline comments for instructions.
Args:
x: Design matrix of shape (m, n).
w: Initial weight matrix of shape (m, k).
phi: Initial mixture prior, of shape (k,).
mu: Initial cluster means, list of k arrays of shape (n,).
sigma: Initial cluster covariances, list of k arrays of shape (n, n).
Returns:
Updated weight matrix of shape (m, k) resulting from EM algorithm.
More specifically, w[i, j] should contain the probability of
example x^(i) belonging to the j-th Gaussian in the mixture.
"""
# No need to change any of these parameters
eps = 1e-3 # Convergence threshold
max_iter = 3000
# Stop when the absolute change in log-likelihood is < eps
# See below for explanation of the convergence criterion
it = 0
ll = prev_ll = None
while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
# Just a placeholder for the starter code
# *** START CODE HERE
# (1) E-step: Update your estimates in w
# (2) M-step: Update the model parameters phi, mu, and sigma
# (3) Compute the log-likelihood of the data to check for convergence.
# By log-likelihood, we mean `ll = sum_x[log(sum_z[p(x|z) * p(z)])]`.
# We define convergence by the first iteration where abs(ll - prev_ll) < eps.
# Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
# E-step
for j in range(K):
# w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / (np.linalg.det(sigma[j])**0.5) * phi[j]
w[:, j] = np.exp(-0.5 * ((x-mu[j]).dot(np.linalg.inv(sigma[j])) * (x-mu[j])).sum(axis=1)) / (np.linalg.det(sigma[j])**0.5) * phi[j]
w /= w.sum(axis=1)[:, None] # 维持维度
# M-step
# phi = w.mean(axis=0)
phi = np.mean(w, axis=0)
for j in range(K):
# mu[j] = x.T @ w[:, j] / w[:, j].sum()
mu[j] = x.T.dot(w[:, j]) / sum(w[:, j])
sigma[j] = (w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) / w[:, j].sum()
# sigma[j] = (w[:, j][:, None] * (x-mu[j])).T.dot(x-mu[j]) / sum(w[:, j])
it += 1
prev_ll = ll
# 利用Jenses不等式,可以得到取等号时的条件,从而可知`ll = sum_x[log(sum_z[p(x|z) * p(z)])]`
p_xz = np.zeros(w.shape)
for i in range(K):
p_xz[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / (np.linalg.det(sigma[i])**0.5) / (2 * np.pi)**(x.shape[1]/2) * phi[i]
ll = np.sum(np.log(p_xz))
if (it) % 100 ==0:
print(f'iteration: {it}; log-likelihood:{ll}')
# *** END CODE HERE ***
print(f'Number of iterations:{it}')
return w
def run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma):
"""Problem 3(e): Semi-Supervised EM Algorithm.
See inline comments for instructions.
Args:
x: Design matrix of unlabeled examples of shape (m, n).
x_tilde: Design matrix of labeled examples of shape (m_tilde, n).
z: Array of labels of shape (m_tilde, 1).
w: Initial weight matrix of shape (m, k).
phi: Initial mixture prior, of shape (k,).
mu: Initial cluster means, list of k arrays of shape (n,).
sigma: Initial cluster covariances, list of k arrays of shape (n, n).
Returns:
Updated weight matrix of shape (m, k) resulting from semi-supervised EM algorithm.
More specifically, w[i, j] should contain the probability of
example x^(i) belonging to the j-th Gaussian in the mixture.
"""
# No need to change any of these parameters
alpha = 20. # Weight for the labeled examples
eps = 1e-3 # Convergence threshold
max_iter = 1000
# Stop when the absolute change in log-likelihood is < eps
# See below for explanation of the convergence criterion
it = 0
ll = prev_ll = None
while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
pass # Just a placeholder for the starter code
# *** START CODE HERE ***
# (1) E-step: Update your estimates in w
# (2) M-step: Update the model parameters phi, mu, and sigma
# (3) Compute the log-likelihood of the data to check for convergence.
# Hint: Make sure to include alpha in your calculation of ll.
# Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
# E-step
for j in range(K):
w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
w /= w.sum(axis=1)[:, None] # 维持维度
# M-step
for j in range(K):
phi[j] = (w.sum(axis=0)[j] + alpha * np.sum(z==j)) / (x.shape[0] + alpha * x_tilde.shape[0])
mu[j] = ((w[:, j] * x).sum(axis=0) + (alpha * (z==j) * x_tilde).sum(axis=0)) / (w[:, j].sum() + alpha * (z==j).sum())
# *** END CODE HERE ***
print(f'Number of iterations:{it}')
return w
# *** START CODE HERE ***
# Helper functions
# *** END CODE HERE ***
def plot_gmm_preds(x, z, with_supervision, plot_id):
"""Plot GMM predictions on a 2D dataset `x` with labels `z`.
Write to the output directory, including `plot_id`
in the name, and appending 'ss' if the GMM had supervision.
NOTE: You do not need to edit this function.
"""
plt.figure(figsize=(12, 8))
plt.title('{} GMM Predictions'.format('Semi-supervised' if with_supervision else 'Unsupervised'))
plt.xlabel('x_1')
plt.ylabel('x_2')
for x_1, x_2, z_ in zip(x[:, 0], x[:, 1], z):
color = 'gray' if z_ < 0 else PLOT_COLORS[int(z_)]
alpha = 0.25 if z_ < 0 else 0.75
plt.scatter(x_1, x_2, marker='.', c=color, alpha=alpha)
file_name = 'p04_pred{}_{}.png'.format('_ss' if with_supervision else '', plot_id)
save_path = os.path.join('output', file_name)
plt.savefig(save_path)
def load_gmm_dataset(csv_path):
"""Load dataset for Gaussian Mixture Model (problem 3).
Args:
csv_path: Path to CSV file containing dataset.
Returns:
x: NumPy array shape (m, n)
z: NumPy array shape (m, 1)
NOTE: You do not need to edit this function.
"""
# Load headers
with open(csv_path, 'r') as csv_fh:
headers = csv_fh.readline().strip().split(',')
# Load features and labels
x_cols = [i for i in range(len(headers)) if headers[i].startswith('x')]
z_cols = [i for i in range(len(headers)) if headers[i] == 'z']
x = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=x_cols, dtype=float)
z = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=z_cols, dtype=float)
if z.ndim == 1:
z = np.expand_dims(z, axis=-1)
return x, z
np.random.seed(229)
# Run NUM_TRIALS trials to see how different initializations
# affect the final predictions with and without supervision
for t in range(NUM_TRIALS):
main(is_semi_supervised=False, trial_num=t)
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-73456.47962832157
iteration: 200; log-likelihood:-76344.42663557787
iteration: 300; log-likelihood:-76358.50656318516
iteration: 400; log-likelihood:-76370.50169188957
iteration: 500; log-likelihood:-76380.35698643283
iteration: 600; log-likelihood:-76388.28158239859
iteration: 700; log-likelihood:-76394.56679933086
iteration: 800; log-likelihood:-76399.5061901347
iteration: 900; log-likelihood:-76403.36330837198
iteration: 1000; log-likelihood:-76406.36170531949
iteration: 1100; log-likelihood:-76408.68494412962
iteration: 1200; log-likelihood:-76410.4807314067
iteration: 1300; log-likelihood:-76411.8663428124
iteration: 1400; log-likelihood:-76412.93404146458
iteration: 1500; log-likelihood:-76413.75594335285
iteration: 1600; log-likelihood:-76414.3881538019
iteration: 1700; log-likelihood:-76414.87417287314
iteration: 1800; log-likelihood:-76415.24764203127
iteration: 1900; log-likelihood:-76415.53452924197
iteration: 2000; log-likelihood:-76415.75485077602
iteration: 2100; log-likelihood:-76415.92401873259
iteration: 2200; log-likelihood:-76416.05389043568
Number of iterations:2249
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-66590.43501429242
iteration: 200; log-likelihood:-76451.9031340933
iteration: 300; log-likelihood:-76443.37779052822
iteration: 400; log-likelihood:-76436.93811778397
iteration: 500; log-likelihood:-76432.07008934084
iteration: 600; log-likelihood:-76428.37868549602
iteration: 700; log-likelihood:-76425.57243300215
iteration: 800; log-likelihood:-76423.43475451792
iteration: 900; log-likelihood:-76421.80374000929
iteration: 1000; log-likelihood:-76420.55771978038
iteration: 1100; log-likelihood:-76419.60486913892
iteration: 1200; log-likelihood:-76418.87564360708
iteration: 1300; log-likelihood:-76418.31722357296
iteration: 1400; log-likelihood:-76417.88940150806
iteration: 1500; log-likelihood:-76417.56151590426
iteration: 1600; log-likelihood:-76417.31015214347
iteration: 1700; log-likelihood:-76417.11741014768
iteration: 1800; log-likelihood:-76416.96959398432
Number of iterations:1897
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-75497.54605389759
iteration: 200; log-likelihood:-76563.96638195189
iteration: 300; log-likelihood:-76526.07244007861
iteration: 400; log-likelihood:-76498.3789583912
iteration: 500; log-likelihood:-76477.98078109502
iteration: 600; log-likelihood:-76462.85420946805
iteration: 700; log-likelihood:-76451.56994910692
iteration: 800; log-likelihood:-76443.10818633785
iteration: 900; log-likelihood:-76436.73464038363
iteration: 1000; log-likelihood:-76431.9159971813
iteration: 1100; log-likelihood:-76428.26166787685
iteration: 1200; log-likelihood:-76425.48337038445
iteration: 1300; log-likelihood:-76423.36684720177
iteration: 1400; log-likelihood:-76421.75188958709
iteration: 1500; log-likelihood:-76420.51808550546
iteration: 1600; log-likelihood:-76419.57454650669
iteration: 1700; log-likelihood:-76418.85242925619
iteration: 1800; log-likelihood:-76418.29944184411
iteration: 1900; log-likelihood:-76417.87577553015
iteration: 2000; log-likelihood:-76417.55107116755
iteration: 2100; log-likelihood:-76417.30214399204
iteration: 2200; log-likelihood:-76417.11126902315
iteration: 2300; log-likelihood:-76416.9648839318
Number of iterations:2394
(e)
import matplotlib.pyplot as plt
import numpy as np
import os
PLOT_COLORS = ['red', 'green', 'blue', 'orange'] # Colors for your plots
K = 4 # Number of Gaussians in the mixture model
NUM_TRIALS = 3 # Number of trials to run (can be adjusted for debugging)
UNLABELED = -1 # Cluster label for unlabeled data points (do not change)
def main(is_semi_supervised, trial_num):
"""Problem 3: EM for Gaussian Mixture Models (unsupervised and semi-supervised)"""
print('Running {} EM algorithm...'
.format('semi-supervised' if is_semi_supervised else 'unsupervised'))
# Load dataset
train_path = os.path.join('.', 'data', 'ds4_train.csv')
x, z = load_gmm_dataset(train_path)
x_tilde = None
if is_semi_supervised:
# Split into labeled and unlabeled examples
labeled_idxs = (z != UNLABELED).squeeze()
x_tilde = x[labeled_idxs, :] # Labeled examples
z = z[labeled_idxs, :] # Corresponding labels
x = x[~labeled_idxs, :] # Unlabeled examples
# *** START CODE HERE ***
# (1) Initialize mu and sigma by splitting the m data points uniformly at random
# into K groups, then calculating the sample mean and covariance for each group
# (2) Initialize phi to place equal probability on each Gaussian
# phi should be a numpy array of shape (K,)
# (3) Initialize the w values to place equal probability on each Gaussian
# w should be a numpy array of shape (m, K)
m, n = x.shape
group_data_num = int(m / K)
mu, sigma = [], []
idx = np.random.permutation(m)
# initialize mu and sigma
for i in range(K):
if i != (K-1):
x_group = x[idx[i*group_data_num:(i+1)*group_data_num], :]
else:
x_group = x[idx[i*group_data_num:], :]
mu_group = x_group.mean(axis=0)
mu.append(mu_group)
sigma.append((x_group - mu_group).T @ (x_group - mu_group) / x_group.shape[0])
# initialize phi
phi = np.ones(K) / K
# initialize w
w = np.ones((m, K)) / K
# *** END CODE HERE ***
if is_semi_supervised:
w = run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma)
else:
w = run_em(x, w, phi, mu, sigma)
# Plot your predictions
z_pred = np.zeros(m)
if w is not None: # Just a placeholder for the starter code
for i in range(m):
z_pred[i] = np.argmax(w[i])
plot_gmm_preds(x, z_pred, is_semi_supervised, plot_id=trial_num)
def run_em(x, w, phi, mu, sigma):
"""Problem 3(d): EM Algorithm (unsupervised).
See inline comments for instructions.
Args:
x: Design matrix of shape (m, n).
w: Initial weight matrix of shape (m, k).
phi: Initial mixture prior, of shape (k,).
mu: Initial cluster means, list of k arrays of shape (n,).
sigma: Initial cluster covariances, list of k arrays of shape (n, n).
Returns:
Updated weight matrix of shape (m, k) resulting from EM algorithm.
More specifically, w[i, j] should contain the probability of
example x^(i) belonging to the j-th Gaussian in the mixture.
"""
# No need to change any of these parameters
eps = 1e-3 # Convergence threshold
max_iter = 3000
# Stop when the absolute change in log-likelihood is < eps
# See below for explanation of the convergence criterion
it = 0
ll = prev_ll = None
while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
# Just a placeholder for the starter code
# *** START CODE HERE
# (1) E-step: Update your estimates in w
# (2) M-step: Update the model parameters phi, mu, and sigma
# (3) Compute the log-likelihood of the data to check for convergence.
# By log-likelihood, we mean `ll = sum_x[log(sum_z[p(x|z) * p(z)])]`.
# We define convergence by the first iteration where abs(ll - prev_ll) < eps.
# Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
# E-step
for j in range(K):
w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
w /= w.sum(axis=1)[:, None] # 维持维度
# M-step
phi = w.mean(axis=0)
for j in range(K):
mu[j] = x.T @ w[:, j] / w[:, j].sum()
sigma[j] = (w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) / w[:, j].sum()
it += 1
prev_ll = ll
# 利用Jenses不等式,可以得到取等号时的条件,从而可知`ll = sum_x[log(sum_z[p(x|z) * p(z)])]`
p_xy = np.zeros(w.shape)
for i in range(K):
p_xy[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / np.linalg.det(sigma[i])**0.5 / (np.pi)**(x.shape[1]/2) * phi[i]
ll = np.sum(np.log(p_xy))
if (it) % 100 ==0:
print(f'iteration: {it}; log-likelihood:{ll}')
# *** END CODE HERE ***
print(f'Number of iterations:{it}')
return w
def run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma):
"""Problem 3(e): Semi-Supervised EM Algorithm.
See inline comments for instructions.
Args:
x: Design matrix of unlabeled examples of shape (m, n).
x_tilde: Design matrix of labeled examples of shape (m_tilde, n).
z: Array of labels of shape (m_tilde, 1).
w: Initial weight matrix of shape (m, k).
phi: Initial mixture prior, of shape (k,).
mu: Initial cluster means, list of k arrays of shape (n,).
sigma: Initial cluster covariances, list of k arrays of shape (n, n).
Returns:
Updated weight matrix of shape (m, k) resulting from semi-supervised EM algorithm.
More specifically, w[i, j] should contain the probability of
example x^(i) belonging to the j-th Gaussian in the mixture.
"""
# No need to change any of these parameters
alpha = 20. # Weight for the labeled examples
eps = 1e-3 # Convergence threshold
max_iter = 1000
# Stop when the absolute change in log-likelihood is < eps
# See below for explanation of the convergence criterion
it = 0
ll = prev_ll = None
while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
pass # Just a placeholder for the starter code
# *** START CODE HERE ***
# (1) E-step: Update your estimates in w
# (2) M-step: Update the model parameters phi, mu, and sigma
# (3) Compute the log-likelihood of the data to check for convergence.
# Hint: Make sure to include alpha in your calculation of ll.
# Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
# E-step
for j in range(K):
w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
w /= w.sum(axis=1)[:, None] # 维持维度
# M-step
for j in range(K):
phi[j] = (w[:, j].sum() + alpha * (z==j).sum()) / (x.shape[0] + alpha * x_tilde.shape[0])
mu[j] = ((w[:, j][:, None] * x).sum(axis=0) + alpha * x_tilde[(z==j).flatten()].sum(axis=0)) / (w[:, j].sum() + alpha * (z==j).sum())
sigma[j] = ((w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) + alpha * (x_tilde[(z==j).flatten()] - mu[j]).T @ (x_tilde[(z==j).flatten()] - mu[j])) / (w[:, j].sum() + alpha * (z==j).sum())
# log-likelihood
prev_ll = ll
p_xy_semi = np.zeros(w.shape)
for i in range(K):
p_xy_semi[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / np.linalg.det(sigma[i])**0.5 / (np.pi)**(x.shape[1]/2) * phi[i]
ll = np.sum(np.log(p_xy_semi))
it += 1
if (it) % 10 == 0:
print(f'iteration: {it}; log-likelihood:{ll}')
# *** END CODE HERE ***
print(f'Number of iterations:{it}')
return w
# *** START CODE HERE ***
# Helper functions
# *** END CODE HERE ***
def plot_gmm_preds(x, z, with_supervision, plot_id):
"""Plot GMM predictions on a 2D dataset `x` with labels `z`.
Write to the output directory, including `plot_id`
in the name, and appending 'ss' if the GMM had supervision.
NOTE: You do not need to edit this function.
"""
plt.figure(figsize=(12, 8))
plt.title('{} GMM Predictions'.format('Semi-supervised' if with_supervision else 'Unsupervised'))
plt.xlabel('x_1')
plt.ylabel('x_2')
for x_1, x_2, z_ in zip(x[:, 0], x[:, 1], z):
color = 'gray' if z_ < 0 else PLOT_COLORS[int(z_)]
alpha = 0.25 if z_ < 0 else 0.75
plt.scatter(x_1, x_2, marker='.', c=color, alpha=alpha)
file_name = 'p04_pred{}_{}.png'.format('_ss' if with_supervision else '', plot_id)
save_path = os.path.join('output', file_name)
plt.savefig(save_path)
def load_gmm_dataset(csv_path):
"""Load dataset for Gaussian Mixture Model (problem 3).
Args:
csv_path: Path to CSV file containing dataset.
Returns:
x: NumPy array shape (m, n)
z: NumPy array shape (m, 1)
NOTE: You do not need to edit this function.
"""
# Load headers
with open(csv_path, 'r') as csv_fh:
headers = csv_fh.readline().strip().split(',')
# Load features and labels
x_cols = [i for i in range(len(headers)) if headers[i].startswith('x')]
z_cols = [i for i in range(len(headers)) if headers[i] == 'z']
x = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=x_cols, dtype=float)
z = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=z_cols, dtype=float)
if z.ndim == 1:
z = np.expand_dims(z, axis=-1)
return x, z
np.random.seed(229)
# Run NUM_TRIALS trials to see how different initializations
# affect the final predictions with and without supervision
for t in range(NUM_TRIALS):
# main(is_semi_supervised=False, trial_num=t)
# *** START CODE HERE ***
# Once you've implemented the semi-supervised version,
# uncomment the following line.
# You do not need to add any other lines in this code block.
main(is_semi_supervised=True, trial_num=t)
# *** END CODE HERE ***
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-121211.71022914571
iteration: 20; log-likelihood:-126688.67202095361
iteration: 30; log-likelihood:-126730.02265599032
iteration: 40; log-likelihood:-126731.9314977497
iteration: 50; log-likelihood:-126732.038350953
Number of iterations:53
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-114403.866748803
iteration: 20; log-likelihood:-126462.74083684657
iteration: 30; log-likelihood:-126717.02209948156
iteration: 40; log-likelihood:-126731.19419752332
iteration: 50; log-likelihood:-126731.99648558485
iteration: 60; log-likelihood:-126732.04203570582
Number of iterations:60
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-103681.46117895067
iteration: 20; log-likelihood:-126802.45193921725
iteration: 30; log-likelihood:-126737.54249053207
iteration: 40; log-likelihood:-126732.36145376327
iteration: 50; log-likelihood:-126732.06277741205
Number of iterations:57
(f)
其实在上面两个coding题目中,无论是否半监督还是无监督我发现目标的log-likelihood函数不一定单调增加,但是最后可以收敛,不知这是为何?本身的波动吗?
回答这一问的问题:
i.
半监督学习显然收敛更快,需要迭代次数更少。
ii.
半监督显然更稳定,改变初始化,无监督学习的结果会变化很多,半监督的基本没有变化。数据和某个所属高斯分布“匹配”比较稳定。
iii.
整体质量显然是半监督学习更好,在其结果中,可以看到稳定的存在三个方差差不多的高斯分布数据,和一个方差更大的高斯分布数据源,而在无监督学习情况下,给出了四个方差明显不同的高斯数据源。