卷机神经网络的可视化(可视化类激活的热力图)
python深度学习
可视化类激活的热力图
我还要介绍另一种可视化方法,它有助于了解一张图像的哪一部分让卷积神经网络做出了最终的分类决策。这有助于对卷积神经网络的决策过程进行调试,特别是出现分类错误的情况下。
这种方法还可以定位图像中的特定目标。
这种通用的技术叫作类激活图(CAM, class activation map)可视化,它是指对输入图像生成类激活的热力图。类激活热力图是与特定输出类别相关的二维分数网格,对任何输入图像的每个位置都要进行计算,它表示每个位置对该类别的重要程度。举例来说,对于输入到猫狗分
类卷积神经网络的一张图像, CAM 可视化可以生成类别“猫”的热力图,表示图像的各个部分与“猫”的相似程度, CAM 可视化也会生成类别“狗”的热力图,表示图像的各个部分与“狗”的相似程度。可视化类激活的热力图
代码清单 5-40 加载带有预训练权重的 VGG16 网络
代码清单 5-41 为 VGG16 模型预处理一张输入图像
代码清单 5-42 应用 Grad-CAM 算法
代码清单 5-43 热力图后处理
代码清单 5-44 将热力图与原始图像叠加
# 我们将使用的具体实现方式是“Grad-CAM: visual explanations from deep networks via gradientbased localization” a 这篇论文中描述的方法。这种方法非常简单:给定一张输入图像,对于一个
# 卷积层的输出特征图,用类别相对于通道的梯度对这个特征图中的每个通道进行加权。直观上
# 来看,理解这个技巧的一种方法是,你是用“每个通道对类别的重要程度”对“输入图像对不
# 同通道的激活强度”的空间图进行加权,从而得到了“输入图像对类别的激活强度”的空间图。
# 我们再次使用预训练的 VGG16 网络来演示此方法。
# 代码清单 5-40 加载带有预训练权重的 VGG16 网络
from keras.applications.vgg16 import VGG16
K.clear_session()
# Note that we are including the densely-connected classifier on top;
# all previous times, we were discarding it.
# 注意,网络中包括了密集连接分类器。在前面所有的例子中,我们都舍弃了这个分类器
model = VGG16(weights='imagenet')
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
548380672/553467096 [============================>.] - ETA: 0s
Let's consider the following image of two African elephants, possible a mother and its cub, strolling in the savanna (under a Creative Commons license):
elephants
Let's convert this image into something the VGG16 model can read: the model was trained on images of size 224x244, preprocessed according to a few rules that are packaged in the utility function keras.applications.vgg16.preprocess_input. So we need to load the image, resize it to 224x224, convert it to a Numpy float32 tensor, and apply these pre-processing rules.
# 图 5-34 显示了两只非洲象的图像(遵守知识共享许可协议),可能是一只母象和它的小
# 象,它们在大草原上漫步。我们将这张图像转换为 VGG16 模型能够读取的格式:模型在大小为
# 224×224 的图像上进行训练,这些训练图像都根据 keras.applications.vgg16.preprocess_
# input 函数中内置的规则进行预处理。因此,我们需要加载图像,将其大小调整为 224×224,
# 然后将其转换为 float32 格式的 Numpy 张量,并应用这些预处理规则。
# 代码清单 5-41 为 VGG16 模型预处理一张输入图像
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
# 目标图像的本地路径
# The local path to our target image
img_path = '/Users/fchollet/Downloads/creative_commons_elephant.jpg'
# `img` is a PIL image of size 224x224
img = image.load_img(img_path, target_size=(224, 224))
# `x` is a float32 Numpy array of shape (224, 224, 3)
x = image.img_to_array(img)
# We add a dimension to transform our array into a "batch"
# of size (1, 224, 224, 3)
# 添加一个维度,将数组转换为(1, 224, 224, 3) 形状的批量
x = np.expand_dims(x, axis=0)
# Finally we preprocess the batch
# (this does channel-wise color normalization)
# 对批量进行预处理(按通道进行颜色标准化)
x = preprocess_input(x)
现在你可以在图像上运行预训练的 VGG16 网络,并将其预测向量解码为人类可读的格式
# 现在你可以在图像上运行预训练的 VGG16 网络,并将其预测向量解码为人类可读的格式
preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=3)[0])
Predicted: [('n02504458', 'African_elephant', 0.90942144), ('n01871265', 'tusker', 0.08618243), ('n02504013', 'Indian_elephant', 0.0043545929)]
The top-3 classes predicted for this image are:
African elephant (with 92.5% probability)
Tusker (with 7% probability)
Indian elephant (with 0.4% probability)
Thus our network has recognized our image as containing an undetermined quantity of African elephants. The entry in the prediction vector that was maximally activated is the one corresponding to the "African elephant" class, at index 386:
对这张图像预测的前三个类别分别为:
# ‰ 非洲象(African elephant, 92.5% 的概率)
# ‰ 长牙动物(tusker, 7% 的概率)
# ‰ 印度象(Indian elephant, 0.4% 的概率)
# 网络识别出图像中包含数量不确定的非洲象。预测向量中被最大激活的元素是对应“非洲象”
# 类别的元素,索引编号为 386。
# 对这张图像预测的前三个类别分别为:
# • 非洲象(African elephant, 92.5% 的概率)
# • 长牙动物(tusker, 7% 的概率)
# • 印度象(Indian elephant, 0.4% 的概率)
# 网络识别出图像中包含数量不确定的非洲象。预测向量中被最大激活的元素是对应“非洲象”
# 类别的元素,索引编号为 386。
np.argmax(preds[0])
386
To visualize which parts of our image were the most "African elephant"-like, let's set up the Grad-CAM process:
# This is the "african elephant" entry in the prediction vector
# 为了展示图像中哪些部分最像非洲象,我们来使用 Grad-CAM 算法。
# 代码清单 5-42 应用 Grad-CAM 算法
african_elephant_output = model.output[:, 386]
# The is the output feature map of the `block5_conv3` layer,
# the last convolutional layer in VGG16
# block5_conv3 层的输出特征图,它是 VGG16 的最后一个卷积层
last_conv_layer = model.get_layer('block5_conv3')
# This is the gradient of the "african elephant" class with regard to
# the output feature map of `block5_conv3`
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]
# This is a vector of shape (512,), where each entry
# is the mean intensity of the gradient over a specific feature map channel
pooled_grads = K.mean(grads, axis=(0, 1, 2))
# This function allows us to access the values of the quantities we just defined:
# `pooled_grads` and the output feature map of `block5_conv3`,
# given a sample image
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
# 访问刚刚定义的量:对于给定的样本图像,pooled_grads 和 block5_conv3 层的输出特征图
# These are the values of these two quantities, as Numpy arrays,
# given our sample image of two elephants
pooled_grads_value, conv_layer_output_value = iterate([x])
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the elephant class
for i in range(512):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
# The channel-wise mean of the resulting feature map
# is our heatmap of class activation
heatmap = np.mean(conv_layer_output_value, axis=-1)
For visualization purpose, we will also normalize the heatmap between 0 and 1:
# 为了便于可视化,我们还需要将热力图标准化到 0~1 范围内。得到的结果如图 5-35 所示。
# 代码清单 5-43 热力图后处理
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)
plt.show()
这种可视化方法回答了两个重要问题:
# ‰ 网络为什么会认为这张图像中包含一头非洲象?
# ‰ 非洲象在图像中的什么位置?
# 尤其值得注意的是,小象耳朵的激活强度很大,这可能是网络找到的非洲象和印度象的不
# 同之处。
# 最后,我们可以用 OpenCV 来生成一张图像,将原始图像叠加在刚刚得到的热力图上(见
# 图 5-36)。
# 代码清单 5-44 将热力图与原始图像叠加
import cv2
# We use cv2 to load the original image
img = cv2.imread(img_path)
# We resize the heatmap to have the same size as the original image
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
# We convert the heatmap to RGB
heatmap = np.uint8(255 * heatmap)
# We apply the heatmap to the original image
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
# 0.4 here is a heatmap intensity factor
# 这里的 0.4 是热力图强度因子
superimposed_img = heatmap * 0.4 + img
# Save the image to disk
cv2.imwrite('/Users/fchollet/Downloads/elephant_cam.jpg', superimposed_img)
# 这种可视化方法回答了两个重要问题:
# • 网络为什么会认为这张图像中包含一头非洲象?
# • 非洲象在图像中的什么位置?
# 尤其值得注意的是,小象耳朵的激活强度很大,这可能是网络找到的非洲象和印度象的不
# 同之处。