Topic: YOLO9000: better, faster, stronger
Dosudo deep learning newsletter #4
Editor: George.Wu
Resources: Paper link Github video
label: Video object detection
2017 CVPR honor paper
在大規模的影像目標檢測上, 訴求是能快速準確的識別物體, 而且也希望能識別的物體種類越多越好. 一些之前經典的算法包括Deformable parts models(DPM)[1]以及基於CNN的video object detection 包括RCNN [2], Fast-RCNN [3], Faster-RCNN[4], 以及端對端的YOLO[5]. YOLO的最大優勢就是速度而核心思想就是直接用整張圖當作輸入. 首先它將一幅圖像分成SxS個網格(cell), 如果物體中心落在某網格內則用這網格負責檢測這物體. 然後將object detection問題視為回歸問題, 利用CNN預測bounding boxes座標和概率.
YOLO2是基於在YOLO的基礎上加入一系列方法優化效能, 包括使用Batch normalization避免overfitting以及提高模型收斂速度, high resolution classifier首先用448×448的ImageNet fine-tune 使結果效能提高, 使用multi-scale training 在每隔幾輪的訓練中隨機改變不同模型輸入尺寸, 讓模式度不同尺度圖像更具有穩健性等等.
YOLO2在很多數據都獲得很好的精確度的及檢測速度. 除此之外, 作者更提出一套目標檢測及分類的共訓練方法, 透過對COCO以及ImageNet的dataset的共訓練將YOLO2可檢測的物體數達到9000種. 稱為YOLO9000.
Reference:
[1] DPM: Felzenszwalb, Pedro F., et al. “Object detection with discriminatively trained part-based models.” IEEE transactions on pattern analysis and machine intelligence 32.9 (2010): 1627-1645. Github
[2] RCNN: Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. Github
[3] Fast-RCNN: Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE international conference on computer vision. 2015. Github
[4] Faster-RCNN: Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015. Github
[5] YOLO: Redmon, Joseph, et al. “You only look once: Unified, real-time object detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Github