Today, we are going to use deep learning to detect faces in images and videos using Tensorflow. You guys might have seen a lot of demos of face detection online. Most of these demos or projects are built using Haar cascade in OpenCV. Haar cascades have a lot of issues when applied to real-world videos. They especially don’t work well with lighting variations. Today we shall use deep learning to detect faces in images and videos. We shall be using MTCNN based frontal face detectors which use a cascade of neural networks.
Multi-task Cascaded Convolutional Networks(MTCNN):
MTCNN breaks down the task into three stages and builds a pipeline.
Stage-1: P-Net: In this stage, it produces candidate windows by a shallow convolutional network.
Stage-2: R-Net: The objective in this stage is to reject as many non-face windows as possible. The network used here is deeper.
Stage-3: O-Net: This uses a even complex network to further refine the output of R-net. See here:
The original paper is trained on WIDER FACE data-set which contains 32,203 public images and 393,703 labelled faces. The trained model achieves a real-time performance for 640×480 VGA images with 20×20 minimum face size. The authors used NVIDIA Titan Black on which they were able to achieve 99 fps.
MTCNN is very useful as it can run real-time even on small devices. There have been many algorithms after MTCNN still it remains one of my favorites for frontal face detection. We shall add another blog post in the future which would show how to implement this network in Tensorflow.