1. 线性可分知识点复习

分类包括二分类（Binary classification）还有多分类（Multi-class classification）。

分类分为线性可分和线性不可分。

1.1 线性可分假设函数定义

比如：
在这里插入图片描述
我们想要做的就是给这两种数据进行分类，Admitted就是1，no Admitted就是0

在线性任务中我们会定义线性函数： $h_{\theta}(x)=g(\pmb{X}\pmb{\theta})$ $其中的~\pmb{X}\pmb{\theta}~是：\theta_0+\theta_1x_1+\theta_2x_2$ $而g(z)就是之前文章中提到的激活函数（sigmoid函数）：\frac{1}{1+e^{-z}}$
在激活函数函数的作用下，我们将数据压缩在： $[0, 1]$ 之间。
在这里插入图片描述
假设我们压缩过的值： $\geq 2.5$ ，那么对应的在激活函数的作用下，对应的分类值： $g (z) = 1$ ，相应的如果： $\leq -2.5$ ，那么对应的分类值就是： $g (z) = 0$

对应到二分类中，函数： $h_{\theta}=g(\pmb{X}\pmb{\theta})$ ，如果 $h_{\theta} \geq 0.5$ ，也就是说我们认为预测基本为真，那么我们就说: $y = 1$ ，相应的，如果： $h_{\theta} \leq 0.5$ ，那么我们认为预测值不合理为假，那么我们就说： $y = 0$
$y=\left\{\begin{array}{rcl}1&&h_{\theta} \geq 0.5\\0&&h_{\theta} \leq 0.5 \end{array}\right.$

1.2 损失函数

我们将损失函数定义为： $Cost(h_{\theta}(x),y)=\left\{\begin{array}{rcl}-\log(h_{\theta}(x))&if&y=1\\-\log(1-h_{\theta}(x))&if&y=0 \end{array}\right.$
下面我们用图像来理解公式：
在这里插入图片描述
看到： $if~~y=1:\begin{array}{rcl}h_{\theta}(x) \rightarrow 1&,&Cost\rightarrow0 \\ h_{\theta}(x)\rightarrow0&,&Cost\rightarrow\infty\end{array}$
这和我们预期的一样，如果这一类数据的实际值为真，我们预测的越接近0，代价就越大，越接近1，代价就越小。另外，当 $y = 0$ 时也是一样。

下面对损失函数进行向量化表示
我们用： $\hat{y}$ 来代表预测值： $h_{\theta}(x)$ ，而 $y$ 仍然是真实值，这样就有： $J(\theta)=-\frac{1}{m}[\pmb{y}\cdot log(\pmb{\hat{y}})+(1-\pmb{y})\cdot log(1-\pmb{\hat{y}})]$
注意这里进行矩阵相乘的时候是点乘（对应位置相乘），不是矩阵乘法

1.3 梯度下降

$\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum\limits_{i=1}^{m}(h_{\theta}(x)-y^{(i)})x^{(i)}$

$\pmb{\theta}:=\pmb{\theta} - \frac{\alpha}{m}*\pmb{X^{\top}}(g(\pmb{X \theta})-\pmb{y})$

1.4 决策界面

我们的目标就是对数据进行划分，得到下面的结果：
在这里插入图片描述
也就是得到决策边界。

$\pmb{X\theta}=0$ $\theta_0+\theta_1x_1+\theta_2x_2=0$ $那么就有：x_2=-\frac{\theta_0}{\theta_2}-\frac{\theta_1}{\theta_2}x_1$

这样我们就得到了边界函数。

2. 逻辑回归-线性可分课后作业

案例：根据学生的两门成绩，预测该学生是否会被大学录取
数据集：课后习题数据集，提取码为：5ijq。

2.1 数据导入与处理

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('ex2data1.txt', names = ['Exam 1', 'Exam 2', 'Accepted'])
data.head()

在这里插入图片描述

fig, ax = plt.subplots()

ax.scatter(data[data['Accepted'] == 0]['Exam 1'], data[data['Accepted'] == 0]['Exam 2'], 
           c = 'r', 
           marker = 'x', 
           label = 'y = 0')
ax.scatter(data[data['Accepted'] == 1]['Exam 1'], data[data['Accepted'] == 1]['Exam 2'], 
           c = 'b', 
           marker = 'o',
           label = 'y = 1')
ax.legend()
ax.set(xlabel = 'exam 1',
       ylabel = 'exam 2')
plt.show()

在这里插入图片描述

def get_Xy(data):
    data.insert(0, 'ones', 1)
    X_ = data.iloc[:, 0 : -1]
    X = X_.values
    
    y_ = data.iloc[:, -1]
    y = y_.values.reshape(len(y_), 1)
    
    return X,y
    
X, y = get_Xy(data)

2.2 构造函数

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def costFunction(X, y, theta):
    A = sigmoid(X @ theta)
    
    first = y * np.log(A)
    second = (1 - y) * np.log(1 - A)
    
    return np.sum(first + second) / len(X)

def gradientDescent(X, y, theta, iters, alpha):
    m = len(X)
    costs = []
    for i in range(iters):
        A = sigmoid(X @ theta)
        theta = theta - (alpha / m) * X.T @ (A - y)
        cost = costFunction(X, y, theta)
        costs.append(cost)
        if i % 1000 == 0:
            print(cost)
        
    return costs, theta

$\pmb{\theta}=\begin{bmatrix}0\\0\\0\end{bmatrix}$
$学习率\alpha=0.004$
$迭代次数： i t e r s = 20000$

costs, theta_final = gradientDescent(X, y, theta, iters, alpha)

由于迭代次数过多，我这里只展示最后一部分的损失值的变化情况：
在这里插入图片描述

2.3 数据可视化

由于costs中的数组维度过大，所以我们只取出其中的一部分来画图。

cost=[]
for i in range(iters):
    if i % 1000 == 0:
        cost.append(costs[i])
iter_=200

fig, ax = plt.subplots()
ax.plot(np.arange(iter_),cost)
ax.set(xlabel='iter',
       ylabel='costs',
       title='costs vs iters')
plt.show()

在这里插入图片描述
发现随着迭代的进行，损失值在不断地反复下降。

coef1 = -theta_final[0, 0] / theta_final[2, 0]
coef2 = -theta_final[1, 0] / theta_final[2, 0]

x = np.linspace(20, 100, 100)
f = coef1 + coef2 * x

fig, ax = plt.subplots()

ax.scatter(data[data['Accepted'] == 0]['Exam 1'], data[data['Accepted'] == 0]['Exam 2'], 
           c = 'r', 
           marker = 'x', 
           label = 'y = 0')
ax.scatter(data[data['Accepted'] == 1]['Exam 1'], data[data['Accepted'] == 1]['Exam 2'], 
           c = 'b', 
           marker = 'o',
           label = 'y = 1')
ax.legend()
ax.set(xlabel = 'exam 1',
       ylabel = 'exam 2')
ax.plot(x, f, c = 'g')
plt.show()

在这里插入图片描述

2.4 完整代码

#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#导入文件
data = pd.read_csv('ex2data1.txt', names = ['Exam 1', 'Exam 2', 'Accepted'])
data.head()

#绘制散点图
fig, ax = plt.subplots()

ax.scatter(data[data['Accepted'] == 0]['Exam 1'], data[data['Accepted'] == 0]['Exam 2'], 
           c = 'r', 
           marker = 'x', 
           label = 'y = 0')
ax.scatter(data[data['Accepted'] == 1]['Exam 1'], data[data['Accepted'] == 1]['Exam 2'], 
           c = 'b', 
           marker = 'o',
           label = 'y = 1')
ax.legend()
ax.set(xlabel = 'exam 1',
       ylabel = 'exam 2')
plt.show()

#获取特征值数组和真实值
def get_Xy(data):
    data.insert(0, 'ones', 1)
    X_ = data.iloc[:, 0 : -1]
    X = X_.values
    
    y_ = data.iloc[:, -1]
    y = y_.values.reshape(len(y_), 1)
    
    return X,y
X, y = get_Xy(data)

#定义损失函数和梯度下降函数并设定初始值
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def costFunction(X, y, theta):
    A = sigmoid(X @ theta)
    
    first = y * np.log(A)
    second = (1 - y) * np.log(1 - A)
    
    return np.sum(first + second) / len(X)

def gradientDescent(X, y, theta, iters, alpha):
    m = len(X)
    costs = []
    for i in range(iters):
        A = sigmoid(X @ theta)
        theta = theta - (alpha / m) * X.T @ (A - y)
        cost = -costFunction(X, y, theta)
        costs.append(cost)
        
    return costs, theta

theta = np.zeros((3,1))
alpha = 0.004
iters = 200000

#获取代价值，还有最终想要的 θ 向量
costs, theta_final = gradientDescent(X, y, theta, iters, alpha)

#绘制图像：costs vs iters
cost=[]
for i in range(iters):
    if i % 1000 == 0:
        cost.append(costs[i])
iter_=200

fig, ax = plt.subplots()
ax.plot(np.arange(iter_),cost)
ax.set(xlabel='iter',
       ylabel='costs',
       title='costs vs iters')
plt.show()

#绘制决策边界：
coef1 = -theta_final[0, 0] / theta_final[2, 0]
coef2 = -theta_final[1, 0] / theta_final[2, 0]
x = np.linspace(20, 100, 100)
f = coef1 + coef2 * x

fig, ax = plt.subplots()

ax.scatter(data[data['Accepted'] == 0]['Exam 1'], data[data['Accepted'] == 0]['Exam 2'], 
           c = 'r', 
           marker = 'x', 
           label = 'y = 0')
ax.scatter(data[data['Accepted'] == 1]['Exam 1'], data[data['Accepted'] == 1]['Exam 2'], 
           c = 'b', 
           marker = 'o',
           label = 'y = 1')
ax.legend()
ax.set(xlabel = 'exam 1',
       ylabel = 'exam 2')
ax.plot(x, f, c = 'g')
plt.show()