0
点赞
收藏
分享

微信扫一扫

【中】CS229 吴恩达机器学习 习题作业答案 problem sets 03 PS03(全部问题解答,欢迎各位前辈指教)4题有一问有个问题不知有没有人解答一下 EM算法 K均值聚类

4.Semi-Supervised EM

(a)

参考EM算法的讲义原始的推导过程,很容易得出:
ℓ semi-sup  ( θ ( t + 1 ) ) = ℓ unsup  ( θ ( t + 1 ) ) + α ℓ sup  ( θ ( t + 1 ) ) ≥ ∑ i = 1 m ( ∑ z ( i ) Q i ( t ) ( z ( i ) ) log ⁡ p ( x ( i ) , z ( i ) ; θ ( t + 1 ) ) Q i ( t ) ( z ( i ) ) ) + α ( ∑ i = 1 m ~ log ⁡ p ( x ~ ( i ) , z ~ ( i ) ; θ ( t + 1 ) ) ) ≥ ∑ i = 1 m ( ∑ z ( i ) Q i ( t ) ( z ( i ) ) log ⁡ p ( x ( i ) , z ( i ) ; θ ( t ) ) Q i ( t ) ( z ( i ) ) ) + α ( ∑ i = 1 m log ⁡ p ( x ~ ( i ) , z ~ ( i ) ; θ ( t ) ) ) = ℓ unsup  ( θ ( t ) ) + α ℓ sup  ( θ ( t ) ) = ℓ semi-sup  ( θ ( t ) ) \begin{aligned} \ell_{\text {semi-sup }}\left(\theta^{(t+1)}\right) &=\ell_{\text {unsup }}\left(\theta^{(t+1)}\right)+\alpha \ell_{\text {sup }}\left(\theta^{(t+1)}\right) \\ & \geq \sum_{i=1}^{m}\left(\sum_{z^{(i)}} Q_{i}^{(t)}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta^{(t+1)}\right)}{Q_{i}^{(t)}\left(z^{(i)}\right)}\right)+\alpha\left(\sum_{i=1}^{\tilde{m}} \log p\left(\tilde{x}^{(i)}, \tilde{z}^{(i)} ; \theta^{(t+1)}\right)\right) \\ & \geq \sum_{i=1}^{m}\left(\sum_{z^{(i)}} Q_{i}^{(t)}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta^{(t)}\right)}{Q_{i}^{(t)}\left(z^{(i)}\right)}\right)+\alpha\left(\sum_{i=1}^{m} \log p\left(\tilde{x}^{(i)}, \tilde{z}^{(i)} ; \theta^{(t)}\right)\right) \\ &=\ell_{\text {unsup }}\left(\theta^{(t)}\right)+\alpha \ell_{\text {sup }}\left(\theta^{(t)}\right) \\ &=\ell_{\text {semi-sup }}\left(\theta^{(t)}\right) \end{aligned} semi-sup (θ(t+1))=unsup (θ(t+1))+αsup (θ(t+1))i=1m(z(i)Qi(t)(z(i))logQi(t)(z(i))p(x(i),z(i);θ(t+1)))+α(i=1m~logp(x~(i),z~(i);θ(t+1)))i=1m(z(i)Qi(t)(z(i))logQi(t)(z(i))p(x(i),z(i);θ(t)))+α(i=1mlogp(x~(i),z~(i);θ(t)))=unsup (θ(t))+αsup (θ(t))=semi-sup (θ(t))
其中第一个不等号使用了Jensens不等式,第二个不等号使用了M-step中 θ t + 1 \theta^{t+1} θt+1是函数的最大值。

Semi-supervised GMM

(b)

参考讲义关于GMM的EM-step推导,也不难得出:
w j ( i ) = p ( z ( i ) = j ∣ x ( i ) ; ϕ , μ , Σ ) = p ( x ( i ) ∣ z ( i ) = j ; μ , Σ ) p ( z ( i ) = j ; ϕ ) ∑ l = 1 k p ( x ( i ) ∣ z ( i ) = l ; μ , Σ ) p ( z ( i ) = l ; ϕ ) = 1 ( 2 π ) d / 2 ∣ Σ j ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) ϕ j ∑ l = 1 k 1 ( 2 π ) d / 2 ∣ Σ l ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ l ) T Σ l − 1 ( x ( i ) − μ l ) ) ϕ l \begin{aligned} w_{j}^{(i)} &=p\left(z^{(i)}=j \mid x^{(i)} ; \phi, \mu, \Sigma\right) \\ &=\frac{p\left(x^{(i)} \mid z^{(i)}=j ; \mu, \Sigma\right) p\left(z^{(i)}=j ; \phi\right)}{\sum_{l=1}^{k} p\left(x^{(i)} \mid z^{(i)}=l ; \mu, \Sigma\right) p\left(z^{(i)}=l ; \phi\right)} \\ &=\frac{\frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \phi_{j}}{\sum_{l=1}^{k} \frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{l}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{l}\right)^{T} \Sigma_{l}^{-1}\left(x^{(i)}-\mu_{l}\right)\right) \phi_{l}} \end{aligned} wj(i)=p(z(i)=jx(i);ϕ,μ,Σ)=l=1kp(x(i)z(i)=l;μ,Σ)p(z(i)=l;ϕ)p(x(i)z(i)=j;μ,Σ)p(z(i)=j;ϕ)=l=1k(2π)d/2Σl1/21exp(21(x(i)μl)TΣl1(x(i)μl))ϕl(2π)d/2Σj1/21exp(21(x(i)μj)TΣj1(x(i)μj))ϕj
E-step中,是初始化或者更新了参数以后,重新计算隐变量 z ( i ) z^{(i)} z(i)的分布,因此更新的是隐变量。

©

首先写出M-step的目标函数:
∑ i = 1 m ∑ j = 1 k w j ( i ) log ⁡ 1 ( 2 π ) d / 2 ∣ Σ j ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) ϕ j w j ( i ) + ∑ i = 1 m ~ ∑ j = 1 k 1 { z ~ ( i ) = j } log ⁡ 1 ( 2 π ) d / 2 ∣ Σ j ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ~ ( i ) − μ j ) T Σ j − 1 ( x ~ ( i ) − μ j ) ) ϕ j \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \phi_{j}}{w_{j}^{(i)}} \\ +\sum_{i=1}^{\tilde{m}} \sum_{j=1}^{k} 1\left\{\tilde{z}^{(i)}=j\right\} \log \frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(\tilde{x}^{(i)}-\mu_{j}\right)\right) \phi_{j} i=1mj=1kwj(i)logwj(i)(2π)d/2Σj1/21exp(21(x(i)μj)TΣj1(x(i)μj))ϕj+i=1m~j=1k1{z~(i)=j}log(2π)d/2Σj1/21exp(21(x~(i)μj)TΣj1(x~(i)μj))ϕj
对于 θ \theta θ,由于存在概率约束,需要用拉格朗日乘子法求解(只考虑与其有关的项),其拉格朗日函数为:
L ( ϕ ) = ∑ i = 1 m ∑ l = 1 k w l ( i ) log ⁡ ϕ l + ∑ i = 1 m ~ ∑ l = 1 k 1 { z ~ ( i ) = l } log ⁡ ϕ l + β ( ∑ l = 1 k ϕ l − 1 ) \mathcal{L}(\phi)=\sum_{i=1}^{m} \sum_{l=1}^{k} w_{l}^{(i)} \log \phi_{l}+\sum_{i=1}^{\tilde{m}} \sum_{l=1}^{k} 1\left\{\tilde{z}^{(i)}=l\right\} \log \phi_{l}+\beta\left(\sum_{l=1}^{k} \phi_{l}-1\right) L(ϕ)=i=1ml=1kwl(i)logϕl+i=1m~l=1k1{z~(i)=l}logϕl+β(l=1kϕl1)
对参数求导置0得:
∇ ϕ j L ( ϕ ) = ∑ i = 1 m w j ( i ) ϕ j + ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ϕ j + β = 0 ϕ j = ∑ i = 1 m w j ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } − β \begin{array}{c}\nabla_{\phi_{j}} \mathcal{L}(\phi)=\sum_{i=1}^{m} \frac{w_{j}^{(i)}}{\phi_{j}}+\sum_{i=1}^{\tilde{m}} \frac{1\left\{\tilde{z}^{(i)}=j\right\}}{\phi_{j}}+\beta=0 \\ \phi_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}{-\beta}\end{array} ϕjL(ϕ)=i=1mϕjwj(i)+i=1m~ϕj1{z~(i)=j}+β=0ϕj=βi=1mwj(i)+αi=1m~1{z~(i)=j}
利用约束条件得:
∑ l = 1 k ϕ l = ∑ i = 1 m ∑ l = 1 k w l ( i ) + α ∑ i = 1 m ~ ∑ l = 1 k 1 { z ~ ( i ) = l } − β = m + α m ~ − β = 1 − β = m + α m ~ \begin{aligned} \sum_{l=1}^{k} \phi_{l} &=\frac{\sum_{i=1}^{m} \sum_{l=1}^{k} w_{l}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} \sum_{l=1}^{k} 1\left\{\tilde{z}^{(i)}=l\right\}}{-\beta} \\ &=\frac{m+\alpha \tilde{m}}{-\beta} \\ &=1 \\-\beta=m+\alpha \tilde{m} \end{aligned} l=1kϕlβ=m+αm~=βi=1ml=1kwl(i)+αi=1m~l=1k1{z~(i)=l}=βm+αm~=1
从而:
ϕ j = ∑ i = 1 m w j ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } m + α m ~ \phi_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}{m+\alpha \tilde{m}} ϕj=m+αm~i=1mwj(i)+αi=1m~1{z~(i)=j}
对于参数 μ \mu μ,对齐求梯度:
∇ μ j ℓ u n s u p = ∑ i = 1 m w j ( i ) Σ j − 1 ( x ( i ) − μ j ) = Σ j − 1 ( ∑ i = 1 m w j ( i ) x ( i ) − μ j ∑ i = 1 m w j ( i ) ) \begin{aligned} \nabla_{\mu_{j}} \ell_{\mathrm{unsup}} &=\sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right) \\ &=\Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}-\mu_{j} \sum_{i=1}^{m} w_{j}^{(i)}\right) \end{aligned} μjunsup=i=1mwj(i)Σj1(x(i)μj)=Σj1(i=1mwj(i)x(i)μji=1mwj(i))
∇ μ j ℓ sup ⁡ = ∑ i = 1 m ~ 1 { z ~ ( i ) = j } Σ j − 1 ( x ~ ( i ) − μ j ) = Σ j − 1 ( ∑ i = 1 m ~ 1 { z ~ ( i ) = j } x ~ ( i ) − μ j ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ) \begin{aligned} \nabla_{\mu_{j}} \ell_{\sup } &=\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}\left(\tilde{x}^{(i)}-\mu_{j}\right) \\ &=\Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}-\mu_{j} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right) \end{aligned} μjsup=i=1m~1{z~(i)=j}Σj1(x~(i)μj)=Σj1(i=1m~1{z~(i)=j}x~(i)μji=1m~1{z~(i)=j})
∇ μ j ℓ semi-sup  = ∇ μ j ℓ unsup  + α ∇ μ j ℓ sup  = Σ j − 1 [ ( ∑ i = 1 m w j ( i ) x ( i ) − μ j ∑ i = 1 m w j ( i ) ) + α ( ∑ i = 1 m ~ 1 { z ~ ( i ) = j } x ~ ( i ) − μ j ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ) ] = Σ j − 1 [ ( ∑ i = 1 m w j ( i ) x ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } x ~ ( i ) ) − μ j ( ∑ i = 1 m w j ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ) ] = 0 \begin{aligned} \nabla_{\mu_{j}} \ell_{\text {semi-sup }} &=\nabla_{\mu_{j}} \ell_{\text {unsup }}+\alpha \nabla_{\mu_{j}} \ell_{\text {sup }} \\ &=\Sigma_{j}^{-1}\left[\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}-\mu_{j} \sum_{i=1}^{m} w_{j}^{(i)}\right)+\alpha\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}-\mu_{j} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right)\right] \\ &=\Sigma_{j}^{-1}\left[\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}\right)-\mu_{j}\left(\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right)\right] \\ &=0 \end{aligned} μjsemi-sup =μjunsup +αμjsup =Σj1[(i=1mwj(i)x(i)μji=1mwj(i))+α(i=1m~1{z~(i)=j}x~(i)μji=1m~1{z~(i)=j})]=Σj1[(i=1mwj(i)x(i)+αi=1m~1{z~(i)=j}x~(i))μj(i=1mwj(i)+αi=1m~1{z~(i)=j})]=0
μ j = ∑ i = 1 m w j ( i ) x ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } x ~ ( i ) ∑ i = 1 m w j ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } \mu_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}}{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}} μj=i=1mwj(i)+αi=1m~1{z~(i)=j}i=1mwj(i)x(i)+αi=1m~1{z~(i)=j}x~(i)
对于 Σ \Sigma Σ求导数:
∇ Σ j ℓ u n s u p = − 1 2 ∑ i = 1 m w j ( i ) Σ j − 1 + 1 2 Σ j − 1 ( ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ) Σ j − 1 \nabla_{\Sigma_{j}} \ell_{\mathrm{unsup}}=-\frac{1}{2} \sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} Σjunsup=21i=1mwj(i)Σj1+21Σj1(i=1mwj(i)(x(i)μj)(x(i)μj)T)Σj1
∇ Σ j ℓ sup  = − 1 2 ∑ i = 1 m ~ 1 { z ~ ( i ) = j } Σ j − 1 + 1 2 Σ j − 1 ( ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ( x ~ ( i ) − μ j ) ( x ~ ( i ) − μ j ) T ) Σ j − 1 \nabla_{\Sigma_{j}} \ell_{\text {sup }}=-\frac{1}{2} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} Σjsup =21i=1m~1{z~(i)=j}Σj1+21Σj1(i=1m~1{z~(i)=j}(x~(i)μj)(x~(i)μj)T)Σj1
∇ Σ j ℓ semi-sup  = ∇ Σ j ℓ unsup  + α ∇ Σ j ℓ sup  = − 1 2 ∑ i = 1 m w j ( i ) Σ j − 1 + 1 2 Σ j − 1 ( ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ) Σ j − 1 − 1 2 α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } Σ j − 1 + 1 2 α Σ j − 1 ( ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ( x ~ ( i ) − μ j ) ( x ~ ( i ) − μ j ) T ) Σ j − 1 = − 1 2 Σ j − 1 ( ∑ i = 1 m w j ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ) + 1 2 Σ j − 1 ( ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ( x ~ ( i ) − μ j ) ( x ~ ( i ) − μ j ) T ) Σ j − 1 = 0 \begin{aligned} \nabla_{\Sigma_{j}} \ell_{\text {semi-sup }}=& \nabla_{\Sigma_{j}} \ell_{\text {unsup }}+\alpha \nabla_{\Sigma_{j}} \ell_{\text {sup }} \\=&-\frac{1}{2} \sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\ &-\frac{1}{2} \alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}+\frac{1}{2} \alpha \Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\=&-\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right) \\ &+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\=& 0 \end{aligned} Σjsemi-sup ====Σjunsup +αΣjsup 21i=1mwj(i)Σj1+21Σj1(i=1mwj(i)(x(i)μj)(x(i)μj)T)Σj121αi=1m~1{z~(i)=j}Σj1+21αΣj1(i=1m~1{z~(i)=j}(x~(i)μj)(x~(i)μj)T)Σj121Σj1(i=1mwj(i)+αi=1m~1{z~(i)=j})+21Σj1(i=1mwj(i)(x(i)μj)(x(i)μj)T+αi=1m~1{z~(i)=j}(x~(i)μj)(x~(i)μj)T)Σj10
Σ j = ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } ( x ~ ( i ) − μ j ) ( x ~ ( i ) − μ j ) T ∑ i = 1 m w j ( i ) + α ∑ i = 1 m ~ 1 { z ~ ( i ) = j } \Sigma_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}} Σj=i=1mwj(i)+αi=1m~1{z~(i)=j}i=1mwj(i)(x(i)μj)(x(i)μj)T+αi=1m~1{z~(i)=j}(x~(i)μj)(x~(i)μj)T

(d)

import matplotlib.pyplot as plt
import numpy as np
import os

PLOT_COLORS = ['red', 'green', 'blue', 'orange']  # Colors for your plots
K = 4           # Number of Gaussians in the mixture model
NUM_TRIALS = 3  # Number of trials to run (can be adjusted for debugging)
UNLABELED = -1  # Cluster label for unlabeled data points (do not change)


def main(is_semi_supervised, trial_num):
    """Problem 3: EM for Gaussian Mixture Models (unsupervised and semi-supervised)"""
    print('Running {} EM algorithm...'
          .format('semi-supervised' if is_semi_supervised else 'unsupervised'))

    # Load dataset
    train_path = os.path.join('.', 'data', 'ds4_train.csv')
    x, z = load_gmm_dataset(train_path)
    x_tilde = None

    if is_semi_supervised:
        # Split into labeled and unlabeled examples
        labeled_idxs = (z != UNLABELED).squeeze()
        x_tilde = x[labeled_idxs, :]   # Labeled examples
        z = z[labeled_idxs, :]         # Corresponding labels
        x = x[~labeled_idxs, :]        # Unlabeled examples

    # *** START CODE HERE ***
    # (1) Initialize mu and sigma by splitting the m data points uniformly at random
    # into K groups, then calculating the sample mean and covariance for each group
    # (2) Initialize phi to place equal probability on each Gaussian
    # phi should be a numpy array of shape (K,)
    # (3) Initialize the w values to place equal probability on each Gaussian
    # w should be a numpy array of shape (m, K)
    m, n = x.shape
    group_data_num = int(m / K)
    mu, sigma = [], []
    idx = np.random.permutation(m)
    # initialize mu and sigma
    for i in range(K):
        if i != (K-1):
            x_group = x[idx[i*group_data_num:(i+1)*group_data_num], :]
        else:
            x_group = x[idx[i*group_data_num:], :]
        mu_group = x_group.mean(axis=0)
        mu.append(mu_group)
        sigma.append((x_group - mu_group).T @ (x_group - mu_group) / x_group.shape[0])
    # initialize phi
    phi = np.ones(K) / K
    # initialize w
    w = np.ones((m, K)) / K
    # *** END CODE HERE ***

    if is_semi_supervised:
        w = run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma)
    else:
        w = run_em(x, w, phi, mu, sigma)

    # Plot your predictions
    z_pred = np.zeros(m)
    if w is not None:  # Just a placeholder for the starter code
        for i in range(m):
            z_pred[i] = np.argmax(w[i])

    plot_gmm_preds(x, z_pred, is_semi_supervised, plot_id=trial_num)


def run_em(x, w, phi, mu, sigma):
    """Problem 3(d): EM Algorithm (unsupervised).

    See inline comments for instructions.

    Args:
        x: Design matrix of shape (m, n).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    eps = 1e-3  # Convergence threshold
    max_iter = 3000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        # Just a placeholder for the starter code
        # *** START CODE HERE
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # By log-likelihood, we mean `ll = sum_x[log(sum_z[p(x|z) * p(z)])]`.
        # We define convergence by the first iteration where abs(ll - prev_ll) < eps.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        # E-step
        for j in range(K):
#             w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / (np.linalg.det(sigma[j])**0.5) * phi[j]
            w[:, j] = np.exp(-0.5 * ((x-mu[j]).dot(np.linalg.inv(sigma[j])) * (x-mu[j])).sum(axis=1)) / (np.linalg.det(sigma[j])**0.5) * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
#         phi = w.mean(axis=0)
        phi = np.mean(w, axis=0)
        for j in range(K):
#             mu[j] = x.T @ w[:, j] / w[:, j].sum()
            mu[j] = x.T.dot(w[:, j]) / sum(w[:, j])
            sigma[j] = (w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) / w[:, j].sum()
#             sigma[j] = (w[:, j][:, None] * (x-mu[j])).T.dot(x-mu[j]) / sum(w[:, j])
        it += 1
        prev_ll = ll
        # 利用Jenses不等式,可以得到取等号时的条件,从而可知`ll = sum_x[log(sum_z[p(x|z) * p(z)])]`
        p_xz = np.zeros(w.shape)
        for i in range(K):
            p_xz[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / (np.linalg.det(sigma[i])**0.5) / (2 * np.pi)**(x.shape[1]/2) * phi[i]
        ll = np.sum(np.log(p_xz))
        if (it) % 100 ==0:
            print(f'iteration: {it}; log-likelihood:{ll}')
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w



def run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma):
    """Problem 3(e): Semi-Supervised EM Algorithm.

    See inline comments for instructions.

    Args:
        x: Design matrix of unlabeled examples of shape (m, n).
        x_tilde: Design matrix of labeled examples of shape (m_tilde, n).
        z: Array of labels of shape (m_tilde, 1).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from semi-supervised EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    alpha = 20.  # Weight for the labeled examples
    eps = 1e-3   # Convergence threshold
    max_iter = 1000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        pass # Just a placeholder for the starter code
        # *** START CODE HERE ***
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # Hint: Make sure to include alpha in your calculation of ll.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        
        # E-step
        for j in range(K):
            w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
        for j in range(K):
            phi[j] = (w.sum(axis=0)[j] + alpha * np.sum(z==j)) / (x.shape[0] + alpha * x_tilde.shape[0])
            mu[j] = ((w[:, j] * x).sum(axis=0) + (alpha * (z==j) * x_tilde).sum(axis=0)) / (w[:, j].sum() + alpha * (z==j).sum())
            
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w


# *** START CODE HERE ***
# Helper functions
# *** END CODE HERE ***


def plot_gmm_preds(x, z, with_supervision, plot_id):
    """Plot GMM predictions on a 2D dataset `x` with labels `z`.

    Write to the output directory, including `plot_id`
    in the name, and appending 'ss' if the GMM had supervision.

    NOTE: You do not need to edit this function.
    """
    plt.figure(figsize=(12, 8))
    plt.title('{} GMM Predictions'.format('Semi-supervised' if with_supervision else 'Unsupervised'))
    plt.xlabel('x_1')
    plt.ylabel('x_2')

    for x_1, x_2, z_ in zip(x[:, 0], x[:, 1], z):
        color = 'gray' if z_ < 0 else PLOT_COLORS[int(z_)]
        alpha = 0.25 if z_ < 0 else 0.75
        plt.scatter(x_1, x_2, marker='.', c=color, alpha=alpha)

    file_name = 'p04_pred{}_{}.png'.format('_ss' if with_supervision else '', plot_id)
    save_path = os.path.join('output', file_name)
    plt.savefig(save_path)


def load_gmm_dataset(csv_path):
    """Load dataset for Gaussian Mixture Model (problem 3).

    Args:
         csv_path: Path to CSV file containing dataset.

    Returns:
        x: NumPy array shape (m, n)
        z: NumPy array shape (m, 1)

    NOTE: You do not need to edit this function.
    """

    # Load headers
    with open(csv_path, 'r') as csv_fh:
        headers = csv_fh.readline().strip().split(',')

    # Load features and labels
    x_cols = [i for i in range(len(headers)) if headers[i].startswith('x')]
    z_cols = [i for i in range(len(headers)) if headers[i] == 'z']

    x = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=x_cols, dtype=float)
    z = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=z_cols, dtype=float)

    if z.ndim == 1:
        z = np.expand_dims(z, axis=-1)

    return x, z

np.random.seed(229)
# Run NUM_TRIALS trials to see how different initializations
# affect the final predictions with and without supervision
for t in range(NUM_TRIALS):
    main(is_semi_supervised=False, trial_num=t)
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-73456.47962832157
iteration: 200; log-likelihood:-76344.42663557787
iteration: 300; log-likelihood:-76358.50656318516
iteration: 400; log-likelihood:-76370.50169188957
iteration: 500; log-likelihood:-76380.35698643283
iteration: 600; log-likelihood:-76388.28158239859
iteration: 700; log-likelihood:-76394.56679933086
iteration: 800; log-likelihood:-76399.5061901347
iteration: 900; log-likelihood:-76403.36330837198
iteration: 1000; log-likelihood:-76406.36170531949
iteration: 1100; log-likelihood:-76408.68494412962
iteration: 1200; log-likelihood:-76410.4807314067
iteration: 1300; log-likelihood:-76411.8663428124
iteration: 1400; log-likelihood:-76412.93404146458
iteration: 1500; log-likelihood:-76413.75594335285
iteration: 1600; log-likelihood:-76414.3881538019
iteration: 1700; log-likelihood:-76414.87417287314
iteration: 1800; log-likelihood:-76415.24764203127
iteration: 1900; log-likelihood:-76415.53452924197
iteration: 2000; log-likelihood:-76415.75485077602
iteration: 2100; log-likelihood:-76415.92401873259
iteration: 2200; log-likelihood:-76416.05389043568
Number of iterations:2249
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-66590.43501429242
iteration: 200; log-likelihood:-76451.9031340933
iteration: 300; log-likelihood:-76443.37779052822
iteration: 400; log-likelihood:-76436.93811778397
iteration: 500; log-likelihood:-76432.07008934084
iteration: 600; log-likelihood:-76428.37868549602
iteration: 700; log-likelihood:-76425.57243300215
iteration: 800; log-likelihood:-76423.43475451792
iteration: 900; log-likelihood:-76421.80374000929
iteration: 1000; log-likelihood:-76420.55771978038
iteration: 1100; log-likelihood:-76419.60486913892
iteration: 1200; log-likelihood:-76418.87564360708
iteration: 1300; log-likelihood:-76418.31722357296
iteration: 1400; log-likelihood:-76417.88940150806
iteration: 1500; log-likelihood:-76417.56151590426
iteration: 1600; log-likelihood:-76417.31015214347
iteration: 1700; log-likelihood:-76417.11741014768
iteration: 1800; log-likelihood:-76416.96959398432
Number of iterations:1897
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-75497.54605389759
iteration: 200; log-likelihood:-76563.96638195189
iteration: 300; log-likelihood:-76526.07244007861
iteration: 400; log-likelihood:-76498.3789583912
iteration: 500; log-likelihood:-76477.98078109502
iteration: 600; log-likelihood:-76462.85420946805
iteration: 700; log-likelihood:-76451.56994910692
iteration: 800; log-likelihood:-76443.10818633785
iteration: 900; log-likelihood:-76436.73464038363
iteration: 1000; log-likelihood:-76431.9159971813
iteration: 1100; log-likelihood:-76428.26166787685
iteration: 1200; log-likelihood:-76425.48337038445
iteration: 1300; log-likelihood:-76423.36684720177
iteration: 1400; log-likelihood:-76421.75188958709
iteration: 1500; log-likelihood:-76420.51808550546
iteration: 1600; log-likelihood:-76419.57454650669
iteration: 1700; log-likelihood:-76418.85242925619
iteration: 1800; log-likelihood:-76418.29944184411
iteration: 1900; log-likelihood:-76417.87577553015
iteration: 2000; log-likelihood:-76417.55107116755
iteration: 2100; log-likelihood:-76417.30214399204
iteration: 2200; log-likelihood:-76417.11126902315
iteration: 2300; log-likelihood:-76416.9648839318
Number of iterations:2394

请添加图片描述

请添加图片描述

请添加图片描述

(e)

import matplotlib.pyplot as plt
import numpy as np
import os

PLOT_COLORS = ['red', 'green', 'blue', 'orange']  # Colors for your plots
K = 4           # Number of Gaussians in the mixture model
NUM_TRIALS = 3  # Number of trials to run (can be adjusted for debugging)
UNLABELED = -1  # Cluster label for unlabeled data points (do not change)


def main(is_semi_supervised, trial_num):
    """Problem 3: EM for Gaussian Mixture Models (unsupervised and semi-supervised)"""
    print('Running {} EM algorithm...'
          .format('semi-supervised' if is_semi_supervised else 'unsupervised'))

    # Load dataset
    train_path = os.path.join('.', 'data', 'ds4_train.csv')
    x, z = load_gmm_dataset(train_path)
    x_tilde = None

    if is_semi_supervised:
        # Split into labeled and unlabeled examples
        labeled_idxs = (z != UNLABELED).squeeze()
        x_tilde = x[labeled_idxs, :]   # Labeled examples
        z = z[labeled_idxs, :]         # Corresponding labels
        x = x[~labeled_idxs, :]        # Unlabeled examples

    # *** START CODE HERE ***
    # (1) Initialize mu and sigma by splitting the m data points uniformly at random
    # into K groups, then calculating the sample mean and covariance for each group
    # (2) Initialize phi to place equal probability on each Gaussian
    # phi should be a numpy array of shape (K,)
    # (3) Initialize the w values to place equal probability on each Gaussian
    # w should be a numpy array of shape (m, K)
    m, n = x.shape
    group_data_num = int(m / K)
    mu, sigma = [], []
    idx = np.random.permutation(m)
    # initialize mu and sigma
    for i in range(K):
        if i != (K-1):
            x_group = x[idx[i*group_data_num:(i+1)*group_data_num], :]
        else:
            x_group = x[idx[i*group_data_num:], :]
        mu_group = x_group.mean(axis=0)
        mu.append(mu_group)
        sigma.append((x_group - mu_group).T @ (x_group - mu_group) / x_group.shape[0])
    # initialize phi
    phi = np.ones(K) / K
    # initialize w
    w = np.ones((m, K)) / K
    # *** END CODE HERE ***

    if is_semi_supervised:
        w = run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma)
    else:
        w = run_em(x, w, phi, mu, sigma)

    # Plot your predictions
    z_pred = np.zeros(m)
    if w is not None:  # Just a placeholder for the starter code
        for i in range(m):
            z_pred[i] = np.argmax(w[i])

    plot_gmm_preds(x, z_pred, is_semi_supervised, plot_id=trial_num)


def run_em(x, w, phi, mu, sigma):
    """Problem 3(d): EM Algorithm (unsupervised).

    See inline comments for instructions.

    Args:
        x: Design matrix of shape (m, n).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    eps = 1e-3  # Convergence threshold
    max_iter = 3000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        # Just a placeholder for the starter code
        # *** START CODE HERE
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # By log-likelihood, we mean `ll = sum_x[log(sum_z[p(x|z) * p(z)])]`.
        # We define convergence by the first iteration where abs(ll - prev_ll) < eps.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        # E-step
        for j in range(K):
            w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
        phi = w.mean(axis=0)
        for j in range(K):
            mu[j] = x.T @ w[:, j] / w[:, j].sum()
            sigma[j] = (w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) / w[:, j].sum()
        it += 1
        prev_ll = ll
        # 利用Jenses不等式,可以得到取等号时的条件,从而可知`ll = sum_x[log(sum_z[p(x|z) * p(z)])]`
        p_xy = np.zeros(w.shape)
        for i in range(K):
            p_xy[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / np.linalg.det(sigma[i])**0.5 / (np.pi)**(x.shape[1]/2) * phi[i]
        ll = np.sum(np.log(p_xy))
        if (it) % 100 ==0:
            print(f'iteration: {it}; log-likelihood:{ll}')
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w


def run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma):
    """Problem 3(e): Semi-Supervised EM Algorithm.

    See inline comments for instructions.

    Args:
        x: Design matrix of unlabeled examples of shape (m, n).
        x_tilde: Design matrix of labeled examples of shape (m_tilde, n).
        z: Array of labels of shape (m_tilde, 1).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from semi-supervised EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    alpha = 20.  # Weight for the labeled examples
    eps = 1e-3   # Convergence threshold
    max_iter = 1000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        pass # Just a placeholder for the starter code
        # *** START CODE HERE ***
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # Hint: Make sure to include alpha in your calculation of ll.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        
        # E-step
        for j in range(K):
            w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
        for j in range(K):
            phi[j] = (w[:, j].sum() + alpha * (z==j).sum()) / (x.shape[0] + alpha * x_tilde.shape[0])
            mu[j] = ((w[:, j][:, None] * x).sum(axis=0) + alpha * x_tilde[(z==j).flatten()].sum(axis=0)) / (w[:, j].sum() + alpha * (z==j).sum())
            sigma[j] = ((w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) + alpha * (x_tilde[(z==j).flatten()] - mu[j]).T @ (x_tilde[(z==j).flatten()] - mu[j])) / (w[:, j].sum() + alpha * (z==j).sum())
        # log-likelihood
        prev_ll = ll
        p_xy_semi = np.zeros(w.shape)
        for i in range(K):
            p_xy_semi[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / np.linalg.det(sigma[i])**0.5 / (np.pi)**(x.shape[1]/2) * phi[i]
        ll = np.sum(np.log(p_xy_semi))
        
        it += 1
        if (it) % 10 == 0:
            print(f'iteration: {it}; log-likelihood:{ll}')
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w


# *** START CODE HERE ***
# Helper functions
# *** END CODE HERE ***


def plot_gmm_preds(x, z, with_supervision, plot_id):
    """Plot GMM predictions on a 2D dataset `x` with labels `z`.

    Write to the output directory, including `plot_id`
    in the name, and appending 'ss' if the GMM had supervision.

    NOTE: You do not need to edit this function.
    """
    plt.figure(figsize=(12, 8))
    plt.title('{} GMM Predictions'.format('Semi-supervised' if with_supervision else 'Unsupervised'))
    plt.xlabel('x_1')
    plt.ylabel('x_2')

    for x_1, x_2, z_ in zip(x[:, 0], x[:, 1], z):
        color = 'gray' if z_ < 0 else PLOT_COLORS[int(z_)]
        alpha = 0.25 if z_ < 0 else 0.75
        plt.scatter(x_1, x_2, marker='.', c=color, alpha=alpha)

    file_name = 'p04_pred{}_{}.png'.format('_ss' if with_supervision else '', plot_id)
    save_path = os.path.join('output', file_name)
    plt.savefig(save_path)


def load_gmm_dataset(csv_path):
    """Load dataset for Gaussian Mixture Model (problem 3).

    Args:
         csv_path: Path to CSV file containing dataset.

    Returns:
        x: NumPy array shape (m, n)
        z: NumPy array shape (m, 1)

    NOTE: You do not need to edit this function.
    """

    # Load headers
    with open(csv_path, 'r') as csv_fh:
        headers = csv_fh.readline().strip().split(',')

    # Load features and labels
    x_cols = [i for i in range(len(headers)) if headers[i].startswith('x')]
    z_cols = [i for i in range(len(headers)) if headers[i] == 'z']

    x = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=x_cols, dtype=float)
    z = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=z_cols, dtype=float)

    if z.ndim == 1:
        z = np.expand_dims(z, axis=-1)

    return x, z

np.random.seed(229)
# Run NUM_TRIALS trials to see how different initializations
# affect the final predictions with and without supervision
for t in range(NUM_TRIALS):
#     main(is_semi_supervised=False, trial_num=t)

    # *** START CODE HERE ***
    # Once you've implemented the semi-supervised version,
    # uncomment the following line.
    # You do not need to add any other lines in this code block.
    main(is_semi_supervised=True, trial_num=t)
    # *** END CODE HERE ***
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-121211.71022914571
iteration: 20; log-likelihood:-126688.67202095361
iteration: 30; log-likelihood:-126730.02265599032
iteration: 40; log-likelihood:-126731.9314977497
iteration: 50; log-likelihood:-126732.038350953
Number of iterations:53
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-114403.866748803
iteration: 20; log-likelihood:-126462.74083684657
iteration: 30; log-likelihood:-126717.02209948156
iteration: 40; log-likelihood:-126731.19419752332
iteration: 50; log-likelihood:-126731.99648558485
iteration: 60; log-likelihood:-126732.04203570582
Number of iterations:60
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-103681.46117895067
iteration: 20; log-likelihood:-126802.45193921725
iteration: 30; log-likelihood:-126737.54249053207
iteration: 40; log-likelihood:-126732.36145376327
iteration: 50; log-likelihood:-126732.06277741205
Number of iterations:57

请添加图片描述

请添加图片描述

请添加图片描述

(f)

其实在上面两个coding题目中,无论是否半监督还是无监督我发现目标的log-likelihood函数不一定单调增加,但是最后可以收敛,不知这是为何?本身的波动吗?
回答这一问的问题:

i.

半监督学习显然收敛更快,需要迭代次数更少。

ii.

半监督显然更稳定,改变初始化,无监督学习的结果会变化很多,半监督的基本没有变化。数据和某个所属高斯分布“匹配”比较稳定。

iii.

整体质量显然是半监督学习更好,在其结果中,可以看到稳定的存在三个方差差不多的高斯分布数据,和一个方差更大的高斯分布数据源,而在无监督学习情况下,给出了四个方差明显不同的高斯数据源。

举报

相关推荐

0 条评论