In India, we celebrated the festival of color “Holi” last week. We celebrate the end of the winter with a splash of color because that’s what the spring will bring us in a few days. When I was young, the celebrations were sparse. It was the decade of frugal parenting. We waited for festivals so eagerly because it meant parent approved outing and fun. We barely slept in the night before and buzzed with excitement early morning. Tubs and buckets were sacrificed for the good cause of color mixing. Then started the actual riot. We went to people’s places, people came to our place… And nobody was left with the color that nature gave us. All with the reds, yellows, and greens!
The festival celebrates colors and how they add beauty to life. I began to think, how dull and boring would be life without colors. Looking back at the photos from that time, what strikes me is that they don’t quite carry the colors they were meant to. As much as we all love looking at those old black & white photos and feel the nostalgia, there can be more to those pictures. Typically, coloring black and white photos is a manual and time-consuming task and is done using adobe photoshop by experts. Can we train neural networks to add colors to black and white photos? Yes, we can! That’s what I would demonstrate in today’s post.
I recently came across this facebook group of my college, IIT Kanpur. In this group, one of our seniors shared the black and white photos of his time(more than 40+ years old). These photos take you on a journey to the past as if you were there. I wondered if I could color these photos.
Pretty exciting!! Isn’t it? In today’s post, we shall learn to add colors to black and white photos using Deep Learning in OpenCV’s DNN module. Let’s first look at how the color information is encoded in digital images. Most of the popular image formats use RGB color space.
RGB Color space:
In RGB color space, each pixel has three color values (Red, Green, and Blue). So, in an 8-bit image, each channel (R,G,B) can have a value between 0 and 255. The brightness of the image depends on all three channels.
While in a grayscale(black & white) image, each pixel just has just the intensity value.
I know what you are thinking, we want to teach a neural network to convert this grayscale image to the colored one. i.e It needs to learn to map this single value to a three channel image. But experts do something different. To understand that we need to first know about the Lab color space.
Lab color space
The Lab is another color space like RGB. In this space:
L channel: Lightness
a channel: encodes green-red
b channel: encodes blue-yellow.
Here the grayscale images are encoded in only L channel. Hence, this color space seems more convenient for our problem. As we only need to learn how to map L channel to a and b channels.
Problem Formulation:
Given an input L(grayscale image), we need to learn to predict a and b channels.
There have been many efforts to colorize an image automatically. However, Colorful Image Colorization by zHang et. al. is one of the most successful automatic colorization approaches. This paper uses convolutional neural networks for this learning task.
Colorful Image Colorization:
It uses a simple convolutional neural network architecture. As explained above, we take the L channel image and learn to predict a and b channels. Combining the prediction and input would give us the colorized image which can be converted back to the RGB color space. In order to train the network, the authors created the grayscale version of Imagenet dataset.
The results of the training which was done in Caffe are very impressive. These results have been integrated into the DNN module of OpenCV. In a few moments, let’s write a code to use this model to colorize our images.
Colorization in OpenCV:
We would write a single script that would take images, videos or webcam feed as input and generate a colored output. Let’s download the pre-trained models’ weights and other dependencies. You can also run get_models.sh file to download the pretrained model.
1 2 |
sh get_models.sh |
This would download following files:
colorization_release_v2.caffemodel: The path to the model weights trained in Caffe.
colorization_deploy_v2.prototxt: Caffe specific file which defines the network.
kernel: Path to cluster center points stored in numpy format.
Now, let’s write the code. The first step is to handle the imports and define a way to take inputs to the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import numpy as np import argparse import cv2 as cv def parse_args(): parser = argparse.ArgumentParser(description='iColor: deep interactive colorization') parser.add_argument('--input', help='Path to image or video. Skip to capture frames from camera') parser.add_argument('--prototxt', help='Path to colorization_deploy_v2.prototxt', required=True) parser.add_argument('--caffemodel', help='Path to colorization_release_v2.caffemodel', required=True) parser.add_argument('--kernel', help='Path to pts_in_hull.npy', required=True) args = parser.parse_args() return args |
The script needs the following inputs:
input: Path to the input grayscale image or video.
caffemodel: Path to the model weights trained in Caffe.
prototxt: Caffe specific file which defines the network.
kernel: Path to cluster center points stored in numpy format.
Let’s create the network graph.
1 2 3 4 5 6 7 8 |
# Create network graph and load weights net = cv.dnn.readNetFromCaffe(args.prototxt, args.caffemodel) # load cluster centers pts_in_hull = np.load(args.kernel) # populate cluster centers as 1x1 convolution kernel pts_in_hull = pts_in_hull.transpose().reshape(2, 313, 1, 1) |
Take the input image and apply some pre-processing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# Read the input image in BGR format frame=cv.imread(args.input) #convert it to rgb format frame= frame[:,:,[2, 1, 0]] # Scale the image to handle the variations in intensity img_rgb = ( frame * 1.0 / 255).astype(np.float32) #convert to Lab color space img_lab = cv.cvtColor(img_rgb, cv.COLOR_RGB2Lab) # pull out L channel img_l = img_lab[:,:,0] (H_orig,W_orig) = img_rgb.shape[:2] # original image size # resize image to network input size img_rs = cv.resize(img_rgb, (W_in, H_in)) # resize image to network input size img_lab_rs = cv.cvtColor(img_rs, cv.COLOR_RGB2Lab) img_l_rs = img_lab_rs[:,:,0] # subtract 50 for mean-centering img_l_rs -= 50 # Set the input for forwarding through the openCV DNN module net.setInput(cv.dnn.blobFromImage(img_l_rs)) |
Now, we are ready to do the inference. Let’s run the model by calling the forward method of OpenCV’s DNN module.
1 2 3 |
#Inference on network ab_dec = net.forward('class8_ab')[0,:,:,:].transpose((1,2,0)) # this is our result |
Now, take the output out which is a and b channels predicted by the network. If we add the L channel(the input), we would have the complete image in Lab color space. Then we convert it back to the BGR color space.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Get the a and b channels (H_out,W_out) = ab_dec.shape[:2] #Resize to original size ab_dec_us = cv.resize(ab_dec, (W_orig, H_orig)) # concatenate with original image i.e. L channel img_lab_out = np.concatenate((img_l[:,:,np.newaxis],ab_dec_us),axis=2) # convert to BGR space from Lab space img_lab_out = cv.cvtColor(img_lab_out, cv.COLOR_Lab2BGR) # Clip and then rescale to 0-255 img_bgr_out = 255 * np.clip(, 0, 1) #Convert to uint-8 img_bgr_out = np.uint8(img_bgr_out) |
Now, let’s look at the output of the network:
We can also use this to convert black and white videos to the colored. The code is very similar, we just read the video or webcam depending on the input. Here is how the results look like.
As usual, the complete code can be found here.
References:
- Photos: Unsplash.com and this bit of that iitk.