Keras tutorial: Practical guide from getting started to developing complex deep neural network

Keras is a high-level python API which can be used to quickly build and train neural networks using either Tensorflow or Theano as back-end. This tutorial assumes that you are slightly familiar convolutional neural networks. You can follow the first part of convolutional neural network tutorial to learn more about them.

In this quick tutorial, we shall learn following things:

1. Why Keras? Why is it considered to be the future of deep learning?

2. Installing Keras on Ubuntu: Step by step installation on Ubuntu

3. Keras Tensorflow tutorial: Fundamentals of Keras

4. Understanding Keras Sequential Model

4.1) Solve a linear regression problem with example

5. Saving and restoring pre-trained models using Keras

6. Keras functional API

6.1) Develop VGG convolutional neural network using functional API

6.2) Build and run SqueezeNet convolutional neural network using functional API

1. Why Keras?

François Chollet, who works at Google developed Keras as a wrapper on top of Theano for quick prototyping. Later this was expanded for Tensorflow as back-end. Recently, Tensorflow has decided to adopt it and provide it as part of contrib folder in the Tensorflow code.

Keras is being hailed as the future of building neural networks. Here are some of the reasons for its popularity:

Light-weight and quick: Keras is designed to remove boilerplate code. Few lines of keras code will achieve so much more than native Tensorflow code. You can easily design both CNN and RNNs and can run them on either GPU or CPU.
Emerging possible winner: Keras is an API which runs on top of a back-end. This back-end could be either Tensorflow or Theano. Microsoft is also working to provide CNTK as a back-end to Keras. Currently, the world of Neural Network is very fragmented and evolving very fast. Look at this tweet by Karpathy:

Imagine the pain all of us have been enduring, of learning a new framework every year. As of now, it appears that Tensorflow is here to stay and as more and more frameworks provide support to Keras, it will become the standard.

Currently, Keras is one of the fastest growing libraries for deep learning.

The power of being able to run the same code with different back-end is a great reason for choosing Keras. Imagine, you read a paper which seems to be doing something so interesting that you want to try with your own dataset. Let’s say you work with Tensorflow and don’t know much about Theano, then you will have to implement the paper in Tensorflow, which obviously will take longer. Now, If the code is written in Keras all you have to do is change the back-end to Tensorflow. This will turbo charge collaborations for the whole community.

2. How to Install Keras with Tensorflow:

a) Dependency:

Install h5py for saving and restoring models:

1 2	pip install h5py

Other python dependencies need to be installed.

pip install numpy scipy

pip install pillow

If you don’t have Tensorflow installed, please follow this Tensorflow tutorial to install Tensorflow. Once, you have Tensorflow installed, you can simply install Keras using PyPI:

1 2	sudo pip install keras

Checking the Keras version.

>>python -c "import keras; print(keras.__version__)"

Using TensorFlow backend.

2.0.1

Once, Keras is installed, you need to specify which backend it should run on i.e. Tensorflow or Theano. This is done in a config file which is located at ~/.keras/keras.json. This is how it looks like:

{

"epsilon": 1e-07,

"floatx": "float32",

"image_data_format": "channels_last",

"backend": "tensorflow"

}

Note that, the value of image_data_format is “channels_last”, which is the correct value for Tensorflow. In Tensorflow, images are stored as Tensors/arrays of shape [height, width, channels] while in Theano the order is different [channels, height, width]. So, if you don’t have this parameter set correctly, your intermediate results will be very strange. For Theano, this value will be “channels_first”.

So, now you are ready to use Keras with Tensorflow.

3. Keras Tensorflow Tutorial: Fundamentals of Keras

The main data structure in keras is the model which provides a way to define the complete graph. You can add layers to the existing model/graph to build the network you want.

1 2	import keras

Keras has two distinct ways of building models:

Sequential models: This is used to implement simple models. You simply keep adding layers to the existing model.
Functional API: Keras functional API is very powerful and you can build more complex models using it, models with multiple output, directed acyclic graph etc. In the next sections of this blog, you would understand the theory and examples of Keras Sequential Model and functional API.

4. Keras Sequential Model

In this section, I shall cover the theory of Keras sequential model. I shall quickly explain how it works by also showing you the code. Later, we shall solve a linear regression problem where you can run the code while reading.

This is how we start by importing and building a Sequential model.

from keras.models import Sequential

model = Sequential()

We can add layers like Dense(fully connected layer), Activation, Conv2D, MaxPooling2D etc by calling add function.

from keras.layers import Dense, Activation,Conv2D,MaxPooling2D,Flatten,Dropout

model.add(Conv2D(64, (3, 3), activation='relu'))

// This adds a Convolutional layer with 64 filters of size 3 * 3 to the graph

Here is how you can add some of the most popular layers to the network. I have already written about most of these layers in this convolutional network tutorial:

1. Convolutional layer: Here, we shall add a layer with 64 filters of size 3*3 and use relu activations after that.

1 2	model.add(Conv2D(64, (3, 3), activation='relu'))

2. MaxPooling layer: Specify the type of layer and specify the pool size and you are done. How cool is that!

1 2	model.add(MaxPooling2D(pool_size=(2, 2)))

3. Fully connected layer: It’s called Dense in Keras. Just specify the number of outputs and you are done.

1 2	model.add(Dense(256, activation='relu'))

4. Drop out:

1 2	model.add(Dropout(0.5))

5. Flattening layer:

1 2	model.add(Flatten())

Taking input:

The first layer of a network reads the training data. So, we need to specify the size of images/training data that we are feeding the network. So, the input_shape parameter is used to specify the shape of input data:

1 2	model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))

In this case, the input layer is a convolutional layer which takes input images of 224 * 224 * 3.

This will help you build neural networks using Sequential Model. Let’s come to the most important part. Once you have specified the architecture of the network, you need to specify the method for back-propagation by choosing an optimizer( like rmsprop or adagrad) and specify the loss(like categorical_crossentropy ). We use compile function to do that in Keras. For example, in this line below we are asking the network to use the ‘rmsprop’ optimizer to change weights in such a way that the loss ‘binary_crossentropy’ is minimized at each iteration.

model.compile(loss='binary_crossentropy',

optimizer='rmsprop')

If you want to specify stochastic gradient descent and you want to choose proper initialization and other hyperparameters:

from keras.optimizers import SGD

.......

......

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy', optimizer=sgd)

Now that we have created the model; let’s feed the data to the model via the fit function. You can also specify the batch_size and the maximum number of epochs you want training to go on.

1 2	model.fit(x_train, y_train, batch_size=32, epochs=10,validation_data=(x_val, y_val))

Finally, let’s use the evaluate function to test the model

1 2	score = model.evaluate(x_test, y_test, batch_size=32)

These are the basic building blocks to use the Sequential model in Keras. Now, let’s build a simple example to implement linear regression using Keras Sequential model.

4.1 Solve a linear regression problem with an example

Problem statement:

In linear regression, you get a lot of data points and try to fit them on a straight line. For this example, we will create 100 data points and try to fit them into a line.

a) Creating training data:

TrainX has values between –1 and 1, and TrainY has 3 times the TrainX and some randomness.

import keras

from keras.models import Sequential

from keras.layers import Dense

import numpy as np

trX = np.linspace(-1, 1, 101)

trY = 3 * trX + np.random.randn(*trX.shape) * 0.33

b) Create model:

We shall create a sequential model. All we need is a single connection so we use a Dense layer with linear activation.

model = Sequential()

model.add(Dense(input_dim=1, output_dim=1, init='uniform', activation='linear'))

This will take input x and apply weight, w, and bias, b followed by a linear activation to produce output.

Let’s look at values that the weights are initialized with:

weights = model.layers[0].get_weights()

w_init = weights[0][0][0]

b_init = weights[1][0]

print('Linear regression model is initialized with weights w: %.2f, b: %.2f' % (w_init, b_init))

## Linear regression model is initialized with weight w: -0.03, b: 0.00

Now, we shall train this linear model with our training data of trX and trY, where trY is 3 times of trX, so this value of weights should become 3.

We shall define mean squared error(mse) as the loss with simple gradient descent(sgd) as the optimizer

1 2	model.compile(optimizer='sgd', loss='mse')

Finally, we feed the data using fit function

1 2	model.fit(trX, trY, nb_epoch=200, verbose=1)

Now, we print the weight after training:

weights = model.layers[0].get_weights()

w_final = weights[0][0][0]

b_final = weights[1][0]

print('Linear regression model is trained to have weight w: %.2f, b: %.2f' % (w_final, b_final))

##Linear regression model is trained to have weight w: 2.94, b: 0.08

As you can see, the weights are very close to 3 now. This ran for 200 epochs. Feel free to change the number of epochs to 100 or 300 to see how this affects the output. We are now able to build a linear classifier using Keras with very few lines of code while we had to deal with sessions and placeholders to do the same using native Tensorflow in this tutorial.

5. Saving and restoring pre-trained weights using Keras:

HDF5 Binary format:

Once you are done with training using Keras, you can save your network weights in HDF5 binary data format. In order to use this, you must have the h5py package installed, which we did during installation. The HDF5 format is great to store huge amount of numerical data and manipulate this data from numpy. For example, it’s easily possible to slice multi-terabyte datasets stored on disk as if they were real numpy arrays. You can also store multiple datasets in a single file, iterate over them or check out the .shape and .dtype attributes.

If it gives you a kick, even NASA stores data using HDF format. h5py rests on an object-oriented Python wrapping of the HDF5 C API. Almost anything you can do from C in HDF5, you can do from h5py.

Saving weights:

To save the weights of the network that you have just trained, just use the save_weights function:

1 2	model.save_weights("my_model.h5")

Restoring pre-trained weights:

If you want to load saved weights and start training on top of previous work, use the load_weights function:

1 2	model.load_weights('my_model_weights.h5')

6. Functional API:

Sequential models are good for simpler networks and problems, but to build real-world complex networks you need to understand functional API. In most of the popular neural networks, we have a mini-network(a few layers arranged in a certain way like Inception module in GoogLeNet, fire module in squeezeNet) and this mini-network is repeated multiple times. Functional API allows you to call models like a function as we do for a layer. So, you can build a small model which can be repeated multiple times to build the complete network with even fewer lines of code. By calling a model on a tensor, you can reuse the model as well as weights.

Let’s see how it works. First, you need to import differently.

1 2	from keras.models import Model

Now, you start by specifying the input, as opposed to mentioning the input at the end of the fit function, as done in Sequential models. This is one of the notable difference between Sequential models and functional API. Let’s declare a tensor of shape 28 * 28 * 1 by using Input().

1 2	from keras.layers import Input

1 2	digit_input = Input(shape=(28, 28,1))

Now, let’s say we apply a convolutional layer using the Functional API, we shall have to specify the variable on which we want to apply the layer. This looks like this(see, how we specify the input to the layer):

x = Conv2D(64, (3, 3))(digit_input)

x = Conv2D(64, (3, 3))(x)

x = MaxPooling2D((2, 2))(x)

out = Flatten()(x)

Finally, we create a model by specifying the input and output.

1 2	vision_model = Model(digit_input, out)

Of course, one will also need to specify the loss, optimizer etc. using fit and compile methods, same as we did for the Sequential models.

Let’s just use what we have just learned and build a vgg-16 neural network. It’s a rather old and large network but is great for learning things due to its simplicity.

6.1 Develop VGG convolutional neural network using functional API:

VGG:

VGG convolutional neural network was proposed by a research group at Oxford in 2014. This network was once very popular due to its simplicity and some nice properties like it worked well on both image classification as well as detection tasks. VGG achieved 92.3% top-5 accuracy in ILSVRC 2014 but was not the winner. It has a few variants, one of the most popular ones is vgg-16 which has 16 layers. As you can see that it takes an input of 224 * 224 * 3.

Vgg 16 architecture

Let’s write an independent function that will build the complete network.

img_input = Input(shape=input_shape)
# Block 1
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

# Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(classes, activation='softmax', name='predictions')(x)

img_input = Input(shape=input_shape)

# Block 1

x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)

x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

# Block 2

x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)

x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

# Block 3

x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)

x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)

x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

# Block 4

x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)

x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)

x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

# Block 5

x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)

x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)

x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

x = Flatten(name='flatten')(x)

x = Dense(4096, activation='relu', name='fc1')(x)

x = Dense(4096, activation='relu', name='fc2')(x)

x = Dense(classes, activation='softmax', name='predictions')(x)

The complete code is provided in vgg16.py.

In this example, let’s run imageNet predictions on some images. Let’s write the code for the same:

model = applications.VGG16(weights='imagenet')

img = image.load_img('cat.jpeg', target_size=(224, 224))

x = image.img_to_array(img)

x = np.expand_dims(x, axis=0)

x = preprocess_input(x)

preds = model.predict(x)

for results in decode_predictions(preds):

for result in results:

print('Probability %0.2f%% => [%s]' % (100*result[2], result[1]))

prediction in Keras Tensorflow tutorial

This will take the input image and generate predictions. As you can see that it’s able to accurately predict the correct class.

VGG is a simple network which demonstrates the use of functional API. However, it doesn’t really harness the power of functional API. In the next section, we shall build SqueezeNet which shall truly demonstrate the power of functional API.

6.2 Build and run SqueezeNet convolutional neural network using functional API:

SqueezeNet:

SqueezeNet is a network architecture which is remarkable not for its accuracy but for how less computation it needs. SqueezeNet has accuracy levels close to that of AlexNet however, the pre-trained model on ImageNet has a size less than 5 MB which is great for using CNNs in a real world application. SqueezeNet introduced a Fire module which is made of alternate Squeeze and Expand modules.

SqueezeNet fire module

Now, this Fire Module is repeated used to build the complete network which looks like this:

In order to build this network, we shall harness the power of functional API to first build an individual Fire module:

# Squeeze part of fire module with 1 * 1 convolutions, followed by Relu

x = Convolution2D(squeeze, (1, 1), padding='valid', name='fire2/squeeze1x1')(x)

x = Activation('relu', name='fire2/relu_squeeze1x1')(x)

#Expand part has two portions, left uses 1 * 1 convolutions and is called expand1x1 

left = Convolution2D(expand, (1, 1), padding='valid', name='fire2/expand1x1')(x)

left = Activation('relu', name='fire2/relu_expand1x1')(left)

#Right part uses 3 * 3 convolutions and is called expand3x3, both of these are follow#ed by Relu layer, Note that both receive x as input as designed. 

right = Convolution2D(expand, (3, 3), padding='same', name='fire2/expand3x3')(x)

right = Activation('relu', name='fire2/relu_expand3x3')(right)

# Final output of Fire Module is concatenation of left and right. 

x = concatenate([left, right], axis=3, name='fire2/concat')

We can easily convert this code into a function for reuse:

sq1x1 = "squeeze1x1"

exp1x1 = "expand1x1"

exp3x3 = "expand3x3"

relu = "relu_"

WEIGHTS_PATH = "https://github.com/rcmalli/keras-squeezenet/releases/download/v1.0/squeezenet_weights_tf_dim_ordering_tf_kernels.h5"

Modular function for Fire Node

sq1x1 = "squeeze1x1"

exp1x1 = "expand1x1"

exp3x3 = "expand3x3"

relu = "relu_"

def fire_module(x, fire_id, squeeze=16, expand=64):

s_id = 'fire' + str(fire_id) + '/'

x = Convolution2D(squeeze, (1, 1), padding='valid', name=s_id + sq1x1)(x)

x = Activation('relu', name=s_id + relu + sq1x1)(x)

left = Convolution2D(expand, (1, 1), padding='valid', name=s_id + exp1x1)(x)

left = Activation('relu', name=s_id + relu + exp1x1)(left)

right = Convolution2D(expand, (3, 3), padding='same', name=s_id + exp3x3)(x)

right = Activation('relu', name=s_id + relu + exp3x3)(right)

x = concatenate([left, right], axis=3, name=s_id + 'concat')

return x

Now, we can build the complete network by extensively reusing the fire_module function which we just defined.

x = Convolution2D(64, (3, 3), strides=(2, 2), padding='valid', name='conv1')(img_input)
x = Activation('relu', name='relu_conv1')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)

x = fire_module(x, fire_id=2, squeeze=16, expand=64)
x = fire_module(x, fire_id=3, squeeze=16, expand=64)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool3')(x)

x = fire_module(x, fire_id=4, squeeze=32, expand=128)
x = fire_module(x, fire_id=5, squeeze=32, expand=128)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool5')(x)

x = fire_module(x, fire_id=6, squeeze=48, expand=192)
x = fire_module(x, fire_id=7, squeeze=48, expand=192)
x = fire_module(x, fire_id=8, squeeze=64, expand=256)
x = fire_module(x, fire_id=9, squeeze=64, expand=256)
x = Dropout(0.5, name='drop9')(x)

x = Convolution2D(classes, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
out = Activation('softmax', name='loss')(x)

model = Model(inputs, out, name='squeezenet')

x = Convolution2D(64, (3, 3), strides=(2, 2), padding='valid', name='conv1')(img_input)

x = Activation('relu', name='relu_conv1')(x)

x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)

x = fire_module(x, fire_id=2, squeeze=16, expand=64)

x = fire_module(x, fire_id=3, squeeze=16, expand=64)

x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool3')(x)

x = fire_module(x, fire_id=4, squeeze=32, expand=128)

x = fire_module(x, fire_id=5, squeeze=32, expand=128)

x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool5')(x)

x = fire_module(x, fire_id=6, squeeze=48, expand=192)

x = fire_module(x, fire_id=7, squeeze=48, expand=192)

x = fire_module(x, fire_id=8, squeeze=64, expand=256)

x = fire_module(x, fire_id=9, squeeze=64, expand=256)

x = Dropout(0.5, name='drop9')(x)

x = Convolution2D(classes, (1, 1), padding='valid', name='conv10')(x)

x = Activation('relu', name='relu_conv10')(x)

x = GlobalAveragePooling2D()(x)

out = Activation('softmax', name='loss')(x)

model = Model(inputs, out, name='squeezenet')

The complete network architecture is defined in squeezenet.py file. We shall download imageNet pre-trained model and run prediction using this model on our own image. Let’s quickly write some code to run this network:

import numpy as np

from keras_squeezenet import SqueezeNet

from keras.applications.imagenet_utils import preprocess_input, decode_predictions

from keras.preprocessing import image

model = SqueezeNet()

img = image.load_img('pexels-photo-280207.jpeg', target_size=(227, 227))

x = image.img_to_array(img)

x = np.expand_dims(x, axis=0)

x = preprocess_input(x)

preds = model.predict(x)

all_results = decode_predictions(preds)

for results in all_results:

for result in results:

print('Probability %0.2f%% => [%s]' % (100*result[2], result[1]))

This will result in the t in p-5 predictions and their probabilities for the image ‘pixels-photo-280207.jpeg’ which is an image of French Loaf.

Hopefully, this Keras Tensorflow tutorial gave you a good introduction to Keras. We have learnt:

1. Setting up and installing Keras with Tensorflow Backend.

2. Keras Sequential Models

3. Keras Functional API

4. Saving and loading saved weights in Keras

5. How to solve linear regression using Keras with example code.

As usual, the complete code can be downloaded from our github Repo. You can run all 3 examples by running these 3 files:

#Linear Regression

python 1_keras_linear_regression.py

# VGG prediction, This downloads 500 MB sized weights

# So, it will take a while to run and predict.

python 2_run_vgg.py

# Squeezenet prediction. The size of pretrained model is 5 MB. Wow!

python 3_run_squeezenet.py

Hopefully, this tutorial helps you in learning Keras with Tensorflow. Do me a favour, if you find this useful, please share with your friends and colleagues who are looking to learn deep learning and computer vision. Happy Learning!

Deep Learning, install keras, keras functional api, sqeezenet, Tensorflow tutorial, vgg