Speed/accuracy trade-offs for modern convolutional object detectors-CFANZ编程社区

论文：https://arxiv.org/abs/1611.10012

1、Motivation

这篇文章通过大量的实验，主要权衡了三种被称为“元结构”（meta-architectures）的主流，教我们如何选择速度和精度满足要求的检测器。充分的对比了Faster RCNN、RFCN和SSD优缺点，并且实验的设计非常系统。

2、作者做了哪些实验

<1> 首先作者在TensorFlow里复现了Faster RCNN、RFCN和SSD三种方法，在统一框架下进行比较。

<2> Feature extractor的影响。实验的不同的主干网络：VGG16、Resnet-101、Inception v2、Inception v3、Inception Resnet(v2)和MobileNet。

<3>-<9>

3、元结构

Single Shot Detector (SSD)：文中将SSD定义为使用单个前馈神经网络来直接预测类别和 anchor offsets，并不要求stage per-proposal分类操作。Multibox和RPN都使用了这一方法来预测类不可知（class-agnostic）的proposals。

注：SSD中直接得出分类和anchor offsets

Speed/accuracy trade-offs for modern convolutional object detectors_处理速度

Faster R-CNN：detection分两步走。第一步产生region proposals，中间层的一些特征（例如VGG16中的conv5）被用于预测box proposals。第二步利用这些box proposals在同一层来裁剪特征并将它们送入特征提取器的剩下几层（例如fc6和fc7）来获得所属类别并修正proposals。

注：Faster R-CNN中需要利用proposal generator先得出分类和proposals然后将box proposals再送回预测出它们的中间层进行对特征的crop，最后送入fc6和fc7（图中用蓝色矩形表示），得出最终结果（两个蓝色小矩形）

Speed/accuracy trade-offs for modern convolutional object detectors_特征提取_02

R-FCN：与Faster R-CNN不同的是，R-FCN是在产生预测region proposals的那一层的前面一层对特征进行裁剪而非与预测同一层。这样做的好处在于每个区域所需的计算总量实现最小化。

注：Box Classifier这一步中box proposals被送回到特征层的最后一层（也就是图中三个蓝色矩形后面的那一层），紧接着就是预测层了（蓝色小矩形）

Speed/accuracy trade-offs for modern convolutional object detectors_目标检测_03

文中用TensorFlow重新设计了三种结构的流程。这一块包括 Architectural configuration，Loss function configuration，Input size configuration，Training，hyperparameter tuning， Benchmarking procedure，Model Details等。

4、实验及结论

<1>Accurancy VS time

Speed/accuracy trade-offs for modern convolutional object detectors_处理速度_04

从上图可以看出SSD，R-FCN在速度上要远远超过Faster R-CNN，但是在精度上Faster R-CNN领先，R-FCN紧随其后。但是Faster R-CNN可以通过设置region proposal来降低处理速度，如Faster R-CNN w/ResNet 50 poposal。

速度选择：SSD，R-FCN，Faster R-CNN w/Resnet 101

精度选择：Faster R-CNN

<2>Feature extractor

从下图可以看到，SSD对Feature extractor并不是很敏感，Faster RCNN和RFCN对特征的好坏很敏感。我们还是可以看出ResNet-101的效果比其他的略好。

Speed/accuracy trade-offs for modern convolutional object detectors_目标检测_05