Machine Learning Papers Notes (CNN)

Compiled by Patrick Liu

This note covers advancement in computer vision/image processing powered by convolutional neural network (CNN) in increasingly more challenging topics from Image Classification to Object Detection to Segmentation.

Image Classification

Goal: Predict a label with confidence to an entire image.

Evolution from AlexNet, VGGNet, GoogLeNet (Inception) to ResNet.

AlexNet (NIPS 2012)

VGG16 (ICLR 2015, 09/2014)

Object Detection

Goal: Predict a label with confidence, as well as the coordinates of a box bounding each object in an image.

The evolution from R-CNN (regions with CNN-features), Fast R-CNN, Faster R-CNN, YOLO (YOLOv2 and YOLO9000) and SSD.

Review blogs

A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN



Fast R-CNN

Faster R-CNN


YOLOv2 and YOLO9000


Extended reading


Goal: Semantic segmentation aims at grouping pixels in a semantically meaningful way and are, therefore, pixel-wise segmentation. It predicts a label with confidence for each pixel in the image.

Instance classification is more challenging in that it include object detection. See illustration below for an example.

Review blogs

FCN (Fully connected networks)



FPN (Feature pyramid network)

Instance/Object segmentation

Instance segmentation involves challenges from object detection with bounding boxes and semantic segmentation. Facebook AI Research (FAIR) has a series of progressive research on on DeepMask, SharpMask and MultiPath Network. Here is a blog post review by Piotr Dollar, and here is another one



MultiPath Network

Mask R-CNN

Polygon RNN (2017 CVPR)

Medical applications


CNN feature extractor for TB