实现手写体mnist
数据集的识别任务,共分为三个模块文件,分别是描述网络结
构的前向传播过程文件(mnist _forward.py
)、 描述网络参数优化方法的 反向传
播 过 程 文件 ( mnist_backward.py
)、 验证 模 型 准确 率 的 测试 过 程 文件
(mnist_test.py
)。
前向传播过程文件( mnist _forward.py
)
在前向传播过程中,需要定义网络模型输入层个数、隐藏层节点数、输出层个数,
定义网络参数 w、偏置 b,定义由输入到输出的神经网络架构。
实现手写体 mnist 数据集的识别任务前向传播过程如下:
import tensorflow as tf
INPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500
def get_weight(shape, regularizer):
w = tf.Variable(tf.truncated_normal(shape,stddev=0.1))
if regularizer != None: tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
return w
def get_bias(shape):
b = tf.Variable(tf.zeros(shape))
return b
def forward(x, regularizer):
w1 = get_weight([INPUT_NODE, LAYER1_NODE], regularizer)
b1 = get_bias([LAYER1_NODE])
y1 = tf.nn.relu(tf.matmul(x, w1) + b1)
w2 = get_weight([LAYER1_NODE, OUTPUT_NODE], regularizer)
b2 = get_bias([OUTPUT_NODE])
y = tf.matmul(y1, w2) + b2
return y
由上述代码可知,在前向传播过程中,规定网络输入结点为 784 个(代表每张输
入图片的像素个数),隐藏层节点 500 个,输出节点 10 个(表示输出为数字 0-9
的十分类)。由输入层到隐藏层的参数 w1 形状为[784,500],由隐藏层到输出层
的参数 w2 形状为[500,10],参数满足截断正态分布,并使用正则化,将每个参数的正则化损失加到总损失中。由输入层到隐藏层的偏置 b1 形状为长度为 500
的一维数组,由隐藏层到输出层的偏置 b2 形状为长度为 10 的一维数组,初始化
值为全 0。前向传播结构第一层为输入 x 与参数 w1 矩阵相乘加上偏置 b1,再经
过 relu 函数,得到隐藏层输出 y1。前向传播结构第二层为隐藏层输出 y1 与参
数 w2 矩阵相乘加上偏置 b2,得到输出 y。由于输出 y 要经过 softmax 函数,使
其符合概率分布,故输出 y 不经过 relu 函数。
反向传播过程文件(mnist_backward.py
)
反向传播过程实现利用训练数据集对神经网络模型训练,通过降低损失函数值,
实现网络模型参数的优化,从而得到准确率高且泛化能力强的神经网络模型。
实现手写体 mnist 数据集的识别任务反向传播过程如下:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_forward
import os
BATCH_SIZE = 200
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
REGULARIZER = 0.0001
STEPS = 50000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH="./model/"
MODEL_NAME="mnist_model"
def backward(mnist):
x = tf.placeholder(tf.float32, [None, mnist_forward.INPUT_NODE])
y_ = tf.placeholder(tf.float32, [None, mnist_forward.OUTPUT_NODE])
y = mnist_forward.forward(x, REGULARIZER)
global_step = tf.Variable(0, trainable=False)
ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cem = tf.reduce_mean(ce)
loss = cem + tf.add_n(tf.get_collection('losses'))
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE,
LEARNING_RATE_DECAY,
staircase=True)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
ema_op = ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step, ema_op]):
train_op = tf.no_op(name='train')
saver = tf.train.Saver()
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
for i in range(STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)
_, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: xs, y_: ys})
if i % 1000 == 0:
print("After %d training step(s), loss on training batch is %g." % (step, loss_value))
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)
def main():
mnist = input_data.read_data_sets("./data/", one_hot=True)
backward(mnist)
if __name__ == '__main__':
main()
由上述代码可知,在反向传播过程中,首先引入 tensorflow、input_data、前向
传播 mnist_forward 和 os 模块,定义每轮喂入神经网络的图片数、初始学习率、
学习率衰减率、正则化系数、训练轮数、模型保存路径以及模型保存名称等相关
信息。在反向传播函数 backword 中,首先读入 mnist,用 placeholder 给训练
数据 x 和标签 y_占位,调用 mnist_forward 文件中的前向传播过程 forword()函
数,并设置正则化,计算训练数据集上的预测结果 y,并给当前计算轮数计数器
赋值,设定为不可训练类型。接着,调用包含所有参数正则化损失的损失函数
loss,并设定指数衰减学习率 learning_rate。然后,使用梯度衰减算法对模型
优化,降低损失函数,并定义参数的滑动平均。最后,在 with 结构中,实现所
有参数初始化,每次喂入 batch_size 组(即 200 组)训练数据和对应标签,循
环迭代 steps 轮,并每隔 1000 轮打印出一次损失函数值信息,并将当前会话加
载到指定路径。最后,通过主函数 main(),加载指定路径下的训练数据集,并调
用规定的 backward()函数训练模型。
output
Extracting ./data/train-images-idx3-ubyte.gz
Extracting ./data/train-labels-idx1-ubyte.gz
Extracting ./data/t10k-images-idx3-ubyte.gz
Extracting ./data/t10k-labels-idx1-ubyte.gz
After 1 training step(s), loss on training batch is 2.96485.
After 1001 training step(s), loss on training batch is 0.34589.
After 2001 training step(s), loss on training batch is 0.301066.
After 3001 training step(s), loss on training batch is 0.247503.
After 4001 training step(s), loss on training batch is 0.240075.
After 5001 training step(s), loss on training batch is 0.19213.
After 6001 training step(s), loss on training batch is 0.182161.
After 7001 training step(s), loss on training batch is 0.187682.
After 8001 training step(s), loss on training batch is 0.182451.
After 9001 training step(s), loss on training batch is 0.197914.
After 10001 training step(s), loss on training batch is 0.192844.
After 11001 training step(s), loss on training batch is 0.173851.
After 12001 training step(s), loss on training batch is 0.191573.
After 13001 training step(s), loss on training batch is 0.175819.
After 14001 training step(s), loss on training batch is 0.160991.
After 15001 training step(s), loss on training batch is 0.147571.
After 16001 training step(s), loss on training batch is 0.160469.
After 17001 training step(s), loss on training batch is 0.158161.
After 18001 training step(s), loss on training batch is 0.150214.
After 19001 training step(s), loss on training batch is 0.149087.
After 20001 training step(s), loss on training batch is 0.144424.
After 21001 training step(s), loss on training batch is 0.155767.
After 22001 training step(s), loss on training batch is 0.139728.
After 23001 training step(s), loss on training batch is 0.139936.
After 24001 training step(s), loss on training batch is 0.14674.
After 25001 training step(s), loss on training batch is 0.139407.
After 26001 training step(s), loss on training batch is 0.137411.
After 27001 training step(s), loss on training batch is 0.137503.
After 28001 training step(s), loss on training batch is 0.136464.
After 29001 training step(s), loss on training batch is 0.137261.
After 30001 training step(s), loss on training batch is 0.137994.
After 31001 training step(s), loss on training batch is 0.139818.
After 32001 training step(s), loss on training batch is 0.134705.
After 33001 training step(s), loss on training batch is 0.141517.
After 34001 training step(s), loss on training batch is 0.130094.
After 35001 training step(s), loss on training batch is 0.129363.
After 36001 training step(s), loss on training batch is 0.134802.
After 37001 training step(s), loss on training batch is 0.138029.
After 38001 training step(s), loss on training batch is 0.136385.
After 39001 training step(s), loss on training batch is 0.131043.
After 40001 training step(s), loss on training batch is 0.131529.
After 41001 training step(s), loss on training batch is 0.131605.
After 42001 training step(s), loss on training batch is 0.128889.
After 43001 training step(s), loss on training batch is 0.126269.
After 44001 training step(s), loss on training batch is 0.125147.
After 45001 training step(s), loss on training batch is 0.132544.
After 46001 training step(s), loss on training batch is 0.12836.
After 47001 training step(s), loss on training batch is 0.131488.
After 48001 training step(s), loss on training batch is 0.125363.
After 49001 training step(s), loss on training batch is 0.124936.
√ 测试 过程文件(mnist_ test .py
)
当训练完模型后,给神经网络模型输入测试集验证网络的准确性和泛化性。注意,
所用的测试集和训练集是相互独立的。
实现手写体 mnist 数据集的识别任务测试传播过程如下:
#coding:utf-8
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_forward
import mnist_backward
TEST_INTERVAL_SECS = 5
def test(mnist):
with tf.Graph().as_default() as g:
x = tf.placeholder(tf.float32, [None, mnist_forward.INPUT_NODE])
y_ = tf.placeholder(tf.float32, [None, mnist_forward.OUTPUT_NODE])
y = mnist_forward.forward(x, None)
ema = tf.train.ExponentialMovingAverage(mnist_backward.MOVING_AVERAGE_DECAY)
ema_restore = ema.variables_to_restore()
saver = tf.train.Saver(ema_restore)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
while True:
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(mnist_backward.MODEL_SAVE_PATH)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
accuracy_score = sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print("After %s training step(s), test accuracy = %g" % (global_step, accuracy_score))
else:
print('No checkpoint file found')
return
time.sleep(TEST_INTERVAL_SECS)
def main():
mnist = input_data.read_data_sets("./data/", one_hot=True)
test(mnist)
if __name__ == '__main__':
main()
从终端显示的运行结果可以看出,随着训练轮数的增加,网络模型的损失函数
值在不断降低,并且在测试集上的准确率在不断提升,有较好的泛化能力。