4.Semi-Supervised EM

(a)

参考EM算法的讲义原始的推导过程，很容易得出：
$\begin{aligned} \ell_{\text {semi-sup }}\left(\theta^{(t+1)}\right) &=\ell_{\text {unsup }}\left(\theta^{(t+1)}\right)+\alpha \ell_{\text {sup }}\left(\theta^{(t+1)}\right) \\ & \geq \sum_{i=1}^{m}\left(\sum_{z^{(i)}} Q_{i}^{(t)}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta^{(t+1)}\right)}{Q_{i}^{(t)}\left(z^{(i)}\right)}\right)+\alpha\left(\sum_{i=1}^{\tilde{m}} \log p\left(\tilde{x}^{(i)}, \tilde{z}^{(i)} ; \theta^{(t+1)}\right)\right) \\ & \geq \sum_{i=1}^{m}\left(\sum_{z^{(i)}} Q_{i}^{(t)}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta^{(t)}\right)}{Q_{i}^{(t)}\left(z^{(i)}\right)}\right)+\alpha\left(\sum_{i=1}^{m} \log p\left(\tilde{x}^{(i)}, \tilde{z}^{(i)} ; \theta^{(t)}\right)\right) \\ &=\ell_{\text {unsup }}\left(\theta^{(t)}\right)+\alpha \ell_{\text {sup }}\left(\theta^{(t)}\right) \\ &=\ell_{\text {semi-sup }}\left(\theta^{(t)}\right) \end{aligned}$
其中第一个不等号使用了Jensens不等式，第二个不等号使用了M-step中 $\theta^{t+1}$ 是函数的最大值。

Semi-supervised GMM

(b)

参考讲义关于GMM的EM-step推导，也不难得出：
$\begin{aligned} w_{j}^{(i)} &=p\left(z^{(i)}=j \mid x^{(i)} ; \phi, \mu, \Sigma\right) \\ &=\frac{p\left(x^{(i)} \mid z^{(i)}=j ; \mu, \Sigma\right) p\left(z^{(i)}=j ; \phi\right)}{\sum_{l=1}^{k} p\left(x^{(i)} \mid z^{(i)}=l ; \mu, \Sigma\right) p\left(z^{(i)}=l ; \phi\right)} \\ &=\frac{\frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \phi_{j}}{\sum_{l=1}^{k} \frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{l}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{l}\right)^{T} \Sigma_{l}^{-1}\left(x^{(i)}-\mu_{l}\right)\right) \phi_{l}} \end{aligned}$
E-step中，是初始化或者更新了参数以后，重新计算隐变量 $z^{(i)}$ 的分布，因此更新的是隐变量。

©

首先写出M-step的目标函数：
$\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \phi_{j}}{w_{j}^{(i)}} \\ +\sum_{i=1}^{\tilde{m}} \sum_{j=1}^{k} 1\left\{\tilde{z}^{(i)}=j\right\} \log \frac{1}{(2 \pi)^{d / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(\tilde{x}^{(i)}-\mu_{j}\right)\right) \phi_{j}$
对于 $\theta$ ，由于存在概率约束，需要用拉格朗日乘子法求解（只考虑与其有关的项），其拉格朗日函数为：
$\mathcal{L}(\phi)=\sum_{i=1}^{m} \sum_{l=1}^{k} w_{l}^{(i)} \log \phi_{l}+\sum_{i=1}^{\tilde{m}} \sum_{l=1}^{k} 1\left\{\tilde{z}^{(i)}=l\right\} \log \phi_{l}+\beta\left(\sum_{l=1}^{k} \phi_{l}-1\right)$
对参数求导置0得：
$\begin{array}{c}\nabla_{\phi_{j}} \mathcal{L}(\phi)=\sum_{i=1}^{m} \frac{w_{j}^{(i)}}{\phi_{j}}+\sum_{i=1}^{\tilde{m}} \frac{1\left\{\tilde{z}^{(i)}=j\right\}}{\phi_{j}}+\beta=0 \\ \phi_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}{-\beta}\end{array}$
利用约束条件得：
$\begin{aligned} \sum_{l=1}^{k} \phi_{l} &=\frac{\sum_{i=1}^{m} \sum_{l=1}^{k} w_{l}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} \sum_{l=1}^{k} 1\left\{\tilde{z}^{(i)}=l\right\}}{-\beta} \\ &=\frac{m+\alpha \tilde{m}}{-\beta} \\ &=1 \\-\beta=m+\alpha \tilde{m} \end{aligned}$
从而：
$\phi_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}{m+\alpha \tilde{m}}$
对于参数 $\mu$ ，对齐求梯度：
$\begin{aligned} \nabla_{\mu_{j}} \ell_{\mathrm{unsup}} &=\sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right) \\ &=\Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}-\mu_{j} \sum_{i=1}^{m} w_{j}^{(i)}\right) \end{aligned}$
$\begin{aligned} \nabla_{\mu_{j}} \ell_{\sup } &=\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}\left(\tilde{x}^{(i)}-\mu_{j}\right) \\ &=\Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}-\mu_{j} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right) \end{aligned}$
$\begin{aligned} \nabla_{\mu_{j}} \ell_{\text {semi-sup }} &=\nabla_{\mu_{j}} \ell_{\text {unsup }}+\alpha \nabla_{\mu_{j}} \ell_{\text {sup }} \\ &=\Sigma_{j}^{-1}\left[\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}-\mu_{j} \sum_{i=1}^{m} w_{j}^{(i)}\right)+\alpha\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}-\mu_{j} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right)\right] \\ &=\Sigma_{j}^{-1}\left[\left(\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}\right)-\mu_{j}\left(\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right)\right] \\ &=0 \end{aligned}$
$\mu_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)} x^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \tilde{x}^{(i)}}{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}$
对于 $\Sigma$ 求导数：
$\nabla_{\Sigma_{j}} \ell_{\mathrm{unsup}}=-\frac{1}{2} \sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1}$
$\nabla_{\Sigma_{j}} \ell_{\text {sup }}=-\frac{1}{2} \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1}$
$\begin{aligned} \nabla_{\Sigma_{j}} \ell_{\text {semi-sup }}=& \nabla_{\Sigma_{j}} \ell_{\text {unsup }}+\alpha \nabla_{\Sigma_{j}} \ell_{\text {sup }} \\=&-\frac{1}{2} \sum_{i=1}^{m} w_{j}^{(i)} \Sigma_{j}^{-1}+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\ &-\frac{1}{2} \alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\} \Sigma_{j}^{-1}+\frac{1}{2} \alpha \Sigma_{j}^{-1}\left(\sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\=&-\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\right) \\ &+\frac{1}{2} \Sigma_{j}^{-1}\left(\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}\right) \Sigma_{j}^{-1} \\=& 0 \end{aligned}$
$\Sigma_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}\left(\tilde{x}^{(i)}-\mu_{j}\right)\left(\tilde{x}^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}+\alpha \sum_{i=1}^{\tilde{m}} 1\left\{\tilde{z}^{(i)}=j\right\}}$

(d)

import matplotlib.pyplot as plt
import numpy as np
import os

PLOT_COLORS = ['red', 'green', 'blue', 'orange']  # Colors for your plots
K = 4           # Number of Gaussians in the mixture model
NUM_TRIALS = 3  # Number of trials to run (can be adjusted for debugging)
UNLABELED = -1  # Cluster label for unlabeled data points (do not change)


def main(is_semi_supervised, trial_num):
    """Problem 3: EM for Gaussian Mixture Models (unsupervised and semi-supervised)"""
    print('Running {} EM algorithm...'
          .format('semi-supervised' if is_semi_supervised else 'unsupervised'))

    # Load dataset
    train_path = os.path.join('.', 'data', 'ds4_train.csv')
    x, z = load_gmm_dataset(train_path)
    x_tilde = None

    if is_semi_supervised:
        # Split into labeled and unlabeled examples
        labeled_idxs = (z != UNLABELED).squeeze()
        x_tilde = x[labeled_idxs, :]   # Labeled examples
        z = z[labeled_idxs, :]         # Corresponding labels
        x = x[~labeled_idxs, :]        # Unlabeled examples

    # *** START CODE HERE ***
    # (1) Initialize mu and sigma by splitting the m data points uniformly at random
    # into K groups, then calculating the sample mean and covariance for each group
    # (2) Initialize phi to place equal probability on each Gaussian
    # phi should be a numpy array of shape (K,)
    # (3) Initialize the w values to place equal probability on each Gaussian
    # w should be a numpy array of shape (m, K)
    m, n = x.shape
    group_data_num = int(m / K)
    mu, sigma = [], []
    idx = np.random.permutation(m)
    # initialize mu and sigma
    for i in range(K):
        if i != (K-1):
            x_group = x[idx[i*group_data_num:(i+1)*group_data_num], :]
        else:
            x_group = x[idx[i*group_data_num:], :]
        mu_group = x_group.mean(axis=0)
        mu.append(mu_group)
        sigma.append((x_group - mu_group).T @ (x_group - mu_group) / x_group.shape[0])
    # initialize phi
    phi = np.ones(K) / K
    # initialize w
    w = np.ones((m, K)) / K
    # *** END CODE HERE ***

    if is_semi_supervised:
        w = run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma)
    else:
        w = run_em(x, w, phi, mu, sigma)

    # Plot your predictions
    z_pred = np.zeros(m)
    if w is not None:  # Just a placeholder for the starter code
        for i in range(m):
            z_pred[i] = np.argmax(w[i])

    plot_gmm_preds(x, z_pred, is_semi_supervised, plot_id=trial_num)


def run_em(x, w, phi, mu, sigma):
    """Problem 3(d): EM Algorithm (unsupervised).

    See inline comments for instructions.

    Args:
        x: Design matrix of shape (m, n).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    eps = 1e-3  # Convergence threshold
    max_iter = 3000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        # Just a placeholder for the starter code
        # *** START CODE HERE
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # By log-likelihood, we mean `ll = sum_x[log(sum_z[p(x|z) * p(z)])]`.
        # We define convergence by the first iteration where abs(ll - prev_ll) < eps.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        # E-step
        for j in range(K):
#             w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / (np.linalg.det(sigma[j])**0.5) * phi[j]
            w[:, j] = np.exp(-0.5 * ((x-mu[j]).dot(np.linalg.inv(sigma[j])) * (x-mu[j])).sum(axis=1)) / (np.linalg.det(sigma[j])**0.5) * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
#         phi = w.mean(axis=0)
        phi = np.mean(w, axis=0)
        for j in range(K):
#             mu[j] = x.T @ w[:, j] / w[:, j].sum()
            mu[j] = x.T.dot(w[:, j]) / sum(w[:, j])
            sigma[j] = (w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) / w[:, j].sum()
#             sigma[j] = (w[:, j][:, None] * (x-mu[j])).T.dot(x-mu[j]) / sum(w[:, j])
        it += 1
        prev_ll = ll
        # 利用Jenses不等式，可以得到取等号时的条件，从而可知`ll = sum_x[log(sum_z[p(x|z) * p(z)])]`
        p_xz = np.zeros(w.shape)
        for i in range(K):
            p_xz[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / (np.linalg.det(sigma[i])**0.5) / (2 * np.pi)**(x.shape[1]/2) * phi[i]
        ll = np.sum(np.log(p_xz))
        if (it) % 100 ==0:
            print(f'iteration: {it}; log-likelihood:{ll}')
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w



def run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma):
    """Problem 3(e): Semi-Supervised EM Algorithm.

    See inline comments for instructions.

    Args:
        x: Design matrix of unlabeled examples of shape (m, n).
        x_tilde: Design matrix of labeled examples of shape (m_tilde, n).
        z: Array of labels of shape (m_tilde, 1).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from semi-supervised EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    alpha = 20.  # Weight for the labeled examples
    eps = 1e-3   # Convergence threshold
    max_iter = 1000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        pass # Just a placeholder for the starter code
        # *** START CODE HERE ***
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # Hint: Make sure to include alpha in your calculation of ll.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        
        # E-step
        for j in range(K):
            w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
        for j in range(K):
            phi[j] = (w.sum(axis=0)[j] + alpha * np.sum(z==j)) / (x.shape[0] + alpha * x_tilde.shape[0])
            mu[j] = ((w[:, j] * x).sum(axis=0) + (alpha * (z==j) * x_tilde).sum(axis=0)) / (w[:, j].sum() + alpha * (z==j).sum())
            
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w


# *** START CODE HERE ***
# Helper functions
# *** END CODE HERE ***


def plot_gmm_preds(x, z, with_supervision, plot_id):
    """Plot GMM predictions on a 2D dataset `x` with labels `z`.

    Write to the output directory, including `plot_id`
    in the name, and appending 'ss' if the GMM had supervision.

    NOTE: You do not need to edit this function.
    """
    plt.figure(figsize=(12, 8))
    plt.title('{} GMM Predictions'.format('Semi-supervised' if with_supervision else 'Unsupervised'))
    plt.xlabel('x_1')
    plt.ylabel('x_2')

    for x_1, x_2, z_ in zip(x[:, 0], x[:, 1], z):
        color = 'gray' if z_ < 0 else PLOT_COLORS[int(z_)]
        alpha = 0.25 if z_ < 0 else 0.75
        plt.scatter(x_1, x_2, marker='.', c=color, alpha=alpha)

    file_name = 'p04_pred{}_{}.png'.format('_ss' if with_supervision else '', plot_id)
    save_path = os.path.join('output', file_name)
    plt.savefig(save_path)


def load_gmm_dataset(csv_path):
    """Load dataset for Gaussian Mixture Model (problem 3).

    Args:
         csv_path: Path to CSV file containing dataset.

    Returns:
        x: NumPy array shape (m, n)
        z: NumPy array shape (m, 1)

    NOTE: You do not need to edit this function.
    """

    # Load headers
    with open(csv_path, 'r') as csv_fh:
        headers = csv_fh.readline().strip().split(',')

    # Load features and labels
    x_cols = [i for i in range(len(headers)) if headers[i].startswith('x')]
    z_cols = [i for i in range(len(headers)) if headers[i] == 'z']

    x = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=x_cols, dtype=float)
    z = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=z_cols, dtype=float)

    if z.ndim == 1:
        z = np.expand_dims(z, axis=-1)

    return x, z

np.random.seed(229)
# Run NUM_TRIALS trials to see how different initializations
# affect the final predictions with and without supervision
for t in range(NUM_TRIALS):
    main(is_semi_supervised=False, trial_num=t)

Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-73456.47962832157
iteration: 200; log-likelihood:-76344.42663557787
iteration: 300; log-likelihood:-76358.50656318516
iteration: 400; log-likelihood:-76370.50169188957
iteration: 500; log-likelihood:-76380.35698643283
iteration: 600; log-likelihood:-76388.28158239859
iteration: 700; log-likelihood:-76394.56679933086
iteration: 800; log-likelihood:-76399.5061901347
iteration: 900; log-likelihood:-76403.36330837198
iteration: 1000; log-likelihood:-76406.36170531949
iteration: 1100; log-likelihood:-76408.68494412962
iteration: 1200; log-likelihood:-76410.4807314067
iteration: 1300; log-likelihood:-76411.8663428124
iteration: 1400; log-likelihood:-76412.93404146458
iteration: 1500; log-likelihood:-76413.75594335285
iteration: 1600; log-likelihood:-76414.3881538019
iteration: 1700; log-likelihood:-76414.87417287314
iteration: 1800; log-likelihood:-76415.24764203127
iteration: 1900; log-likelihood:-76415.53452924197
iteration: 2000; log-likelihood:-76415.75485077602
iteration: 2100; log-likelihood:-76415.92401873259
iteration: 2200; log-likelihood:-76416.05389043568
Number of iterations:2249
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-66590.43501429242
iteration: 200; log-likelihood:-76451.9031340933
iteration: 300; log-likelihood:-76443.37779052822
iteration: 400; log-likelihood:-76436.93811778397
iteration: 500; log-likelihood:-76432.07008934084
iteration: 600; log-likelihood:-76428.37868549602
iteration: 700; log-likelihood:-76425.57243300215
iteration: 800; log-likelihood:-76423.43475451792
iteration: 900; log-likelihood:-76421.80374000929
iteration: 1000; log-likelihood:-76420.55771978038
iteration: 1100; log-likelihood:-76419.60486913892
iteration: 1200; log-likelihood:-76418.87564360708
iteration: 1300; log-likelihood:-76418.31722357296
iteration: 1400; log-likelihood:-76417.88940150806
iteration: 1500; log-likelihood:-76417.56151590426
iteration: 1600; log-likelihood:-76417.31015214347
iteration: 1700; log-likelihood:-76417.11741014768
iteration: 1800; log-likelihood:-76416.96959398432
Number of iterations:1897
Running unsupervised EM algorithm...
iteration: 100; log-likelihood:-75497.54605389759
iteration: 200; log-likelihood:-76563.96638195189
iteration: 300; log-likelihood:-76526.07244007861
iteration: 400; log-likelihood:-76498.3789583912
iteration: 500; log-likelihood:-76477.98078109502
iteration: 600; log-likelihood:-76462.85420946805
iteration: 700; log-likelihood:-76451.56994910692
iteration: 800; log-likelihood:-76443.10818633785
iteration: 900; log-likelihood:-76436.73464038363
iteration: 1000; log-likelihood:-76431.9159971813
iteration: 1100; log-likelihood:-76428.26166787685
iteration: 1200; log-likelihood:-76425.48337038445
iteration: 1300; log-likelihood:-76423.36684720177
iteration: 1400; log-likelihood:-76421.75188958709
iteration: 1500; log-likelihood:-76420.51808550546
iteration: 1600; log-likelihood:-76419.57454650669
iteration: 1700; log-likelihood:-76418.85242925619
iteration: 1800; log-likelihood:-76418.29944184411
iteration: 1900; log-likelihood:-76417.87577553015
iteration: 2000; log-likelihood:-76417.55107116755
iteration: 2100; log-likelihood:-76417.30214399204
iteration: 2200; log-likelihood:-76417.11126902315
iteration: 2300; log-likelihood:-76416.9648839318
Number of iterations:2394

请添加图片描述

(e)

import matplotlib.pyplot as plt
import numpy as np
import os

PLOT_COLORS = ['red', 'green', 'blue', 'orange']  # Colors for your plots
K = 4           # Number of Gaussians in the mixture model
NUM_TRIALS = 3  # Number of trials to run (can be adjusted for debugging)
UNLABELED = -1  # Cluster label for unlabeled data points (do not change)


def main(is_semi_supervised, trial_num):
    """Problem 3: EM for Gaussian Mixture Models (unsupervised and semi-supervised)"""
    print('Running {} EM algorithm...'
          .format('semi-supervised' if is_semi_supervised else 'unsupervised'))

    # Load dataset
    train_path = os.path.join('.', 'data', 'ds4_train.csv')
    x, z = load_gmm_dataset(train_path)
    x_tilde = None

    if is_semi_supervised:
        # Split into labeled and unlabeled examples
        labeled_idxs = (z != UNLABELED).squeeze()
        x_tilde = x[labeled_idxs, :]   # Labeled examples
        z = z[labeled_idxs, :]         # Corresponding labels
        x = x[~labeled_idxs, :]        # Unlabeled examples

    # *** START CODE HERE ***
    # (1) Initialize mu and sigma by splitting the m data points uniformly at random
    # into K groups, then calculating the sample mean and covariance for each group
    # (2) Initialize phi to place equal probability on each Gaussian
    # phi should be a numpy array of shape (K,)
    # (3) Initialize the w values to place equal probability on each Gaussian
    # w should be a numpy array of shape (m, K)
    m, n = x.shape
    group_data_num = int(m / K)
    mu, sigma = [], []
    idx = np.random.permutation(m)
    # initialize mu and sigma
    for i in range(K):
        if i != (K-1):
            x_group = x[idx[i*group_data_num:(i+1)*group_data_num], :]
        else:
            x_group = x[idx[i*group_data_num:], :]
        mu_group = x_group.mean(axis=0)
        mu.append(mu_group)
        sigma.append((x_group - mu_group).T @ (x_group - mu_group) / x_group.shape[0])
    # initialize phi
    phi = np.ones(K) / K
    # initialize w
    w = np.ones((m, K)) / K
    # *** END CODE HERE ***

    if is_semi_supervised:
        w = run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma)
    else:
        w = run_em(x, w, phi, mu, sigma)

    # Plot your predictions
    z_pred = np.zeros(m)
    if w is not None:  # Just a placeholder for the starter code
        for i in range(m):
            z_pred[i] = np.argmax(w[i])

    plot_gmm_preds(x, z_pred, is_semi_supervised, plot_id=trial_num)


def run_em(x, w, phi, mu, sigma):
    """Problem 3(d): EM Algorithm (unsupervised).

    See inline comments for instructions.

    Args:
        x: Design matrix of shape (m, n).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    eps = 1e-3  # Convergence threshold
    max_iter = 3000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        # Just a placeholder for the starter code
        # *** START CODE HERE
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # By log-likelihood, we mean `ll = sum_x[log(sum_z[p(x|z) * p(z)])]`.
        # We define convergence by the first iteration where abs(ll - prev_ll) < eps.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        # E-step
        for j in range(K):
            w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
        phi = w.mean(axis=0)
        for j in range(K):
            mu[j] = x.T @ w[:, j] / w[:, j].sum()
            sigma[j] = (w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) / w[:, j].sum()
        it += 1
        prev_ll = ll
        # 利用Jenses不等式，可以得到取等号时的条件，从而可知`ll = sum_x[log(sum_z[p(x|z) * p(z)])]`
        p_xy = np.zeros(w.shape)
        for i in range(K):
            p_xy[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / np.linalg.det(sigma[i])**0.5 / (np.pi)**(x.shape[1]/2) * phi[i]
        ll = np.sum(np.log(p_xy))
        if (it) % 100 ==0:
            print(f'iteration: {it}; log-likelihood:{ll}')
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w


def run_semi_supervised_em(x, x_tilde, z, w, phi, mu, sigma):
    """Problem 3(e): Semi-Supervised EM Algorithm.

    See inline comments for instructions.

    Args:
        x: Design matrix of unlabeled examples of shape (m, n).
        x_tilde: Design matrix of labeled examples of shape (m_tilde, n).
        z: Array of labels of shape (m_tilde, 1).
        w: Initial weight matrix of shape (m, k).
        phi: Initial mixture prior, of shape (k,).
        mu: Initial cluster means, list of k arrays of shape (n,).
        sigma: Initial cluster covariances, list of k arrays of shape (n, n).

    Returns:
        Updated weight matrix of shape (m, k) resulting from semi-supervised EM algorithm.
        More specifically, w[i, j] should contain the probability of
        example x^(i) belonging to the j-th Gaussian in the mixture.
    """
    # No need to change any of these parameters
    alpha = 20.  # Weight for the labeled examples
    eps = 1e-3   # Convergence threshold
    max_iter = 1000

    # Stop when the absolute change in log-likelihood is < eps
    # See below for explanation of the convergence criterion
    it = 0
    ll = prev_ll = None
    while it < max_iter and (prev_ll is None or np.abs(ll - prev_ll) >= eps):
        pass # Just a placeholder for the starter code
        # *** START CODE HERE ***
        # (1) E-step: Update your estimates in w
        # (2) M-step: Update the model parameters phi, mu, and sigma
        # (3) Compute the log-likelihood of the data to check for convergence.
        # Hint: Make sure to include alpha in your calculation of ll.
        # Hint: For debugging, recall part (a). We showed that ll should be monotonically increasing.
        
        # E-step
        for j in range(K):
            w[:, j] = np.exp(-0.5 * ((x - mu[j]) @ np.linalg.inv(sigma[j]) * (x - mu[j])).sum(axis=1)) / np.linalg.det(sigma[j])**0.5 * phi[j]
        w /= w.sum(axis=1)[:, None] # 维持维度
        # M-step
        for j in range(K):
            phi[j] = (w[:, j].sum() + alpha * (z==j).sum()) / (x.shape[0] + alpha * x_tilde.shape[0])
            mu[j] = ((w[:, j][:, None] * x).sum(axis=0) + alpha * x_tilde[(z==j).flatten()].sum(axis=0)) / (w[:, j].sum() + alpha * (z==j).sum())
            sigma[j] = ((w[:, j][:, None] * (x - mu[j])).T @ (x - mu[j]) + alpha * (x_tilde[(z==j).flatten()] - mu[j]).T @ (x_tilde[(z==j).flatten()] - mu[j])) / (w[:, j].sum() + alpha * (z==j).sum())
        # log-likelihood
        prev_ll = ll
        p_xy_semi = np.zeros(w.shape)
        for i in range(K):
            p_xy_semi[:, i] = np.exp(-0.5 * ((x - mu[i]) @ np.linalg.inv(sigma[i]) * (x - mu[i])).sum(axis=1)) / np.linalg.det(sigma[i])**0.5 / (np.pi)**(x.shape[1]/2) * phi[i]
        ll = np.sum(np.log(p_xy_semi))
        
        it += 1
        if (it) % 10 == 0:
            print(f'iteration: {it}; log-likelihood:{ll}')
        # *** END CODE HERE ***
    print(f'Number of iterations:{it}')

    return w


# *** START CODE HERE ***
# Helper functions
# *** END CODE HERE ***


def plot_gmm_preds(x, z, with_supervision, plot_id):
    """Plot GMM predictions on a 2D dataset `x` with labels `z`.

    Write to the output directory, including `plot_id`
    in the name, and appending 'ss' if the GMM had supervision.

    NOTE: You do not need to edit this function.
    """
    plt.figure(figsize=(12, 8))
    plt.title('{} GMM Predictions'.format('Semi-supervised' if with_supervision else 'Unsupervised'))
    plt.xlabel('x_1')
    plt.ylabel('x_2')

    for x_1, x_2, z_ in zip(x[:, 0], x[:, 1], z):
        color = 'gray' if z_ < 0 else PLOT_COLORS[int(z_)]
        alpha = 0.25 if z_ < 0 else 0.75
        plt.scatter(x_1, x_2, marker='.', c=color, alpha=alpha)

    file_name = 'p04_pred{}_{}.png'.format('_ss' if with_supervision else '', plot_id)
    save_path = os.path.join('output', file_name)
    plt.savefig(save_path)


def load_gmm_dataset(csv_path):
    """Load dataset for Gaussian Mixture Model (problem 3).

    Args:
         csv_path: Path to CSV file containing dataset.

    Returns:
        x: NumPy array shape (m, n)
        z: NumPy array shape (m, 1)

    NOTE: You do not need to edit this function.
    """

    # Load headers
    with open(csv_path, 'r') as csv_fh:
        headers = csv_fh.readline().strip().split(',')

    # Load features and labels
    x_cols = [i for i in range(len(headers)) if headers[i].startswith('x')]
    z_cols = [i for i in range(len(headers)) if headers[i] == 'z']

    x = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=x_cols, dtype=float)
    z = np.loadtxt(csv_path, delimiter=',', skiprows=1, usecols=z_cols, dtype=float)

    if z.ndim == 1:
        z = np.expand_dims(z, axis=-1)

    return x, z

np.random.seed(229)
# Run NUM_TRIALS trials to see how different initializations
# affect the final predictions with and without supervision
for t in range(NUM_TRIALS):
#     main(is_semi_supervised=False, trial_num=t)

    # *** START CODE HERE ***
    # Once you've implemented the semi-supervised version,
    # uncomment the following line.
    # You do not need to add any other lines in this code block.
    main(is_semi_supervised=True, trial_num=t)
    # *** END CODE HERE ***

Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-121211.71022914571
iteration: 20; log-likelihood:-126688.67202095361
iteration: 30; log-likelihood:-126730.02265599032
iteration: 40; log-likelihood:-126731.9314977497
iteration: 50; log-likelihood:-126732.038350953
Number of iterations:53
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-114403.866748803
iteration: 20; log-likelihood:-126462.74083684657
iteration: 30; log-likelihood:-126717.02209948156
iteration: 40; log-likelihood:-126731.19419752332
iteration: 50; log-likelihood:-126731.99648558485
iteration: 60; log-likelihood:-126732.04203570582
Number of iterations:60
Running semi-supervised EM algorithm...
iteration: 10; log-likelihood:-103681.46117895067
iteration: 20; log-likelihood:-126802.45193921725
iteration: 30; log-likelihood:-126737.54249053207
iteration: 40; log-likelihood:-126732.36145376327
iteration: 50; log-likelihood:-126732.06277741205
Number of iterations:57

请添加图片描述

(f)

其实在上面两个coding题目中，无论是否半监督还是无监督我发现目标的log-likelihood函数不一定单调增加，但是最后可以收敛，不知这是为何？本身的波动吗？
回答这一问的问题：

i.

半监督学习显然收敛更快，需要迭代次数更少。

ii.

半监督显然更稳定，改变初始化，无监督学习的结果会变化很多，半监督的基本没有变化。数据和某个所属高斯分布“匹配”比较稳定。

iii.

整体质量显然是半监督学习更好，在其结果中，可以看到稳定的存在三个方差差不多的高斯分布数据，和一个方差更大的高斯分布数据源，而在无监督学习情况下，给出了四个方差明显不同的高斯数据源。

【中】CS229 吴恩达机器学习 习题作业答案 problem sets 03 PS03（全部问题解答，欢迎各位前辈指教）4题有一问有个问题不知有没有人解答一下 EM算法 K均值聚类

4.Semi-Supervised EM

(a)

Semi-supervised GMM

(b)

©

(d)

(e)

(f)

i.

ii.

iii.

【上】CS229 吴恩达机器学习 习题作业答案 problem sets 03 PS03（全部问题解答，欢迎各位前辈指教）

【中】CS229 吴恩达机器学习 习题大作业答案 problem sets 04 PS04（第一问，欢迎指教）Neural Networks: MNIST image classification

【中】CS229 吴恩达机器学习习题作业答案 problem sets 03 PS03（全部问题解答，欢迎各位前辈指教）4题有一问有个问题不知有没有人解答一下 EM算法 K均值聚类

【上】CS229 吴恩达机器学习习题作业答案 problem sets 03 PS03（全部问题解答，欢迎各位前辈指教）

【中】CS229 吴恩达机器学习习题大作业答案 problem sets 04 PS04（第一问，欢迎指教）Neural Networks: MNIST image classification