In 2013, Google acquired a company called DeepMind for 400 Million dollars. DeepMind at the time had no product and no revenue, but what they had was a team of the best brains in the world who worked on Machine learning and Deep Learning. In fact, between 2013 and 2015, all the technology giants(Facebook, Google, Microsoft, Apple) acquired many companies for huge sums with no revenue or products but were working on Deep Learning and Machine Learning. So, let’s learn and understand why is Deep Learning and Machine Learning is such a crucial technology. 

Artificial Intelligence(AI)?

AI is a broad term and has been around for a while. AI refers to the ability of a machine or a robot to perform certain tasks which require human or any form of intelligence. The Dream of creating a world where robots are as smart as humans and do all the boring work for humans is not new. However, this dream has been elusive so far. 

Programs like chess, video games, ticket booking systems, etc all have some level of artificial intelligence. However, all of these systems have limited intelligence to solve a specific problem and handle a particular type of hard-coded(predefined) situation. Another important thing to note is that since these systems have limited understanding of their surroundings and environment, they can only deal with structured data. For example, you need to fill a form to interact with any online ticket booking system. You can’t simply type that “I need the flight tickets to go to New York on Friday evening”. So, current intelligence software have limited specific intelligence and can’t handle unstructured data. 

So, a more practical way to describe AI is how Jeff Bezos, CEO of Amazon described in one of his letters:

“Over the past decades computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder.”

History of AI:

Idea that machine can have intelligence is quite old. John McCarthy coined the term “Artificial Intelligence” in 1956 during famous Dartmouth workshop. Since, then we have made huge progress in terms of solving the puzzle of “Creating Intelligence”. AI community has had a bumpy ride since its inception. Typically, some invention would set off huge expectations resulting in a boom period for AI Research with huge funding and investments. But soon it would be followed by a period of dismay and lack of funds, these periods are known as AI Winters. Look at the timeline. 

History of AI with timeline

AI researchers have tried various techniques and methodology to build systems that can mimic human intelligence. Often these systems worked in demos with specific settings and constrained environments but were useless for a general real-world use-case. A lot of it involved creating hand-coded representations called feature engineering to solve specific problems like identifying objects. For example, in order to identify a cat, we needed to create a feature that learned four legs, a tail, a whisker, a circular face, etc. but most of these failed due to an insane number of variations possible in real-world use-cases. 

Also, due to feature engineering, it took huge expertise and time to build even simple systems. Intelligence systems were simple and often involved the programmer to identify the explicit rules and code them into many rules and logic.

Machine Learning?

Machine Learning in a broader sense refers to the techniques where machines can learn from the data shown to them. Learning from the environment requires some sort of intelligence. Hence, the field of Artificial intelligence is the superset of Machine Learning. 

Most of the time objective of machine learning algorithms is to find the underlying pattern in some data or observations and then create a system that can be used to predict a future incident. For example, a Machine-learning system trained with the historical data of people who took loans from a bank can learn patterns from the data. Once, this system is trained, we can use it to predict the probability of default for a new loan application. Before Machine learning, humans had to find patterns like (Females have a higher probability of paying back the loans, people with jobs in large companies have a higher probability of default, so on and so forth)  manually and hard code them as logic in computer programs. These rules need an expert human who has to work with the system for years. So, building these programs was slow and expensive. With Machine learning systems it is sometimes possible to identify and exploit highly complex patterns that were missed by human dependent systems. 

Algorithms based on Learning style:

Machine Learning algorithms can be divided into 3 types based on the way they learn from the data: 

  1. Unsupervised learning: 

Unsupervised learning is used when the training data is not labeled. In other words, we don’t know the outcome. For example: Using the unsupervised learning algorithm, you can organize the 20 images of 5 people into 5 piles, each containing photo of only one person, without knowing who is who. Similarly, in a list of bank customers, we can cluster customers with similar behaviors. In fact, most of the unsupervised learning algorithms are used to cluster data.

  1. Semi-supervised learning:

In Semi-supervised learning, some examples are labeled and some are not. 

  1. Supervised Learning: 

Supervised learning is used when we have labeled training data, i.e. we have the observations and the outcomes both. The system that predicts the probability of loan default from the past observations of defaulting accounts is an example of a  supervised learning system. 


Other Machine Learning algorithms:

  1. Regression Algorithms
  2. Instance-based Algorithms: KNN
  3. Bayesian algorithms: Naive Bayes, Bayesian Networks
  4. Clustering algorithms: K-means clustering
  5. Artificial Neural Networks : Deep Learning, CNN, Auto-encoders,
  6. Dimensional Reduction Algorithms: PCA, PCR
  7. Reinforcement Learning
  8. Recommendation systems


Regression Algorithms are a set of statistical processes that learn and predict the relationship between a dependent and an independent variable. For example, regression can be used to learn how the habit of drinking can have an effect on the number of accidents caused by such drivers. There are many kinds of regression algorithms like Linear Regression and Logistic Regression.
In this course, we shall focus on supervised learning and a branch of Artificial Neural network-based algorithms called Deep Learning. 

Building a supervised learning system:

Let’s say we want to build a supervised learning system that identifies Cats from Dogs.  This is a classification problem. In order to build a supervised learning system, we need to collect a lot of labeled data that will be used to train this system. We also need to choose the kind of algorithm that we will use. Let’s say we choose artificial neural networks. Building such a system has two phases: 

1. Training the model:

During the training phase, we feed training images to our model and the model predicts the probability of image containing one of the many labels. For example, in the image below, the model predicts the probability of image containing the hot-dog as 0.80 i.e. 80% confidence that image contains a hot dog. Overall, the sum of these probabilities is 1. 

An image classification system

When the training process starts, the model will probably produce random outputs for input images. All models have trainable variables(weights and biases) that can be learned for various problems. During the training, we need to adjust these variables in a way that for all the training examples of hot dog, our model outputs maximum probability(1) for the label Hot Dog and vice versa for images of not a hot dog, our model should be extremely confident of the image not containing a hot dog. So, we can say that at a given time, if the output of the network for the input image is yi and the expected output is y, we want to minimize the difference yiy for all the images. In order to do this we actually minimize the sum of squares and add a factor of 0.5 for calculation simplicity i.e. we want to minimize 0.5 *i=1n(yiy)2.  This term is known as cost and if we are able to reduce the cost by choosing appropriate network weights and parameters, we have succeeded in building our classifier. 

These variables in the case of Artificial neural networks are called weights and most neural networks contain millions of weights. This is why we need a huge number of training examples for training Neural networks. During the training process, we show an image to the neural network, adjust the weights using a process called back-propagation, and check the output again. We change the weights slowly so that a few wrong examples in training don’t take us too far from our equilibrium. This rate of change of weights is controlled by a parameter called the learning rate

After the training is complete, we save the state of the model and use this for prediction on an unseen image. 

2. Deploying the model to production:

After the training is complete, we have optimized our model in such a way that it can look at any image and identify if the image is a HotDog or not-hot-dog. 

An image classification system

Examples of Machine Learning Systems in real life:

There are two fundamental ways that machine learning and AI systems are changing the world. 

      1. Machine Intelligence unearths hidden patterns:

AI and machine learning is particularly great at analyzing loads of data without getting tired or bored and identifying previously unknown and hidden patterns. 

When a massive object (a galaxy or a black hole) comes between a distant light source and an observer on Earth, it bends the space and light around it, that creates a lens that gives scientists a closer look at very old, remote parts of our Universe that are normally blocked from our view. This is called a gravitational lens, and they are very important to understand the Universe. The first gravitational lens was discovered in 1979 and since then around 100 of them have been discovered. Discovering them has been a slow and serendipitous process as it involves going through huge image data and identifying them. Now, Astronomers have created an AI which can look for them untiringly and pass on the findings to a human to verify the findings. This will hugely speed up the process of discovering them. 

      2. Changing the Internet to more human-friendly interfaces:

 Humans interact with each other using speech and vision but most of the popular internet systems to interact with machines need us to select a few drop-down and click a few buttons. This results in the need for training to use them. For example, my parents still go to banks for depositing and withdrawing money. They still call me when need to book a flight. With AI, it’s now possible for computers to understand and process information in natural language, speech, and visual data. In the coming years, most of the existing systems will either evolve themselves into more human-friendly versions or will be replaced by more AI-first companies. This will create trillions of dollars of opportunities.

Here are a few very popular examples of AI systems currently used:

  1. Spam Detection, automatic email categorization, smart reply feature in Gmail 
  2. Face Detection and verification
  3. Product Recommendations at Amazon, songs recommendations at Spotify
  4. Speech Understanding in Siri, Cortana and Amazon Alexa
  5. Credit card fraud detection, underwriting of loans 
  6. Medical Diagnosis for eye diseases, AI-assisted radiology for reading x-rays
  7. Self-driving cars 
  8. Amazon Go stores for auto check-out
  9. Auto reply in Google Allo
  10. Mobile Check deposits by
  11. Turnitin, a plagiarism checking tool
  12. Pinterest Image search and visually similar feature


The Two Towers(Computation and Data):

There are two important factors that have given rise to today’s new era of AI and Deep Learning.

1. Rise of Internet and smartphones:

With the rise of the internet, social networks, and smartphones, it has become extremely cheap to create data (text, images, audios, and videos). As a result, on the Internet, every day billions of images/videos/audio are uploaded and shared which can be used for training machine learning models. Before the Internet and smartphones, it was impossible to imagine the training datasets of this size.

2. Advances in computing(Graphical Processing Units): 

In the last few decades, the computing capacity of our computers has grown by leaps and bounds. The falling cost of computing and growing capability is ideal for the rise of machine learning which typically is very compute heavy requiring supercomputers earlier for training. Modern machine learning models are almost exclusively trained on GPUs which are 5-10 times more powerful than a similar CPU


ImageNet Project:

Imagenet is a project, started by Stanford professor Fei Fei Li where she created a large dataset of labeled images belonging to commonly seen real-world objects like dogs, cars, airplanes, etc. (Her TED talk is a recommended watch). Imagenet project is an ongoing effort and currently has 14,197,122 images from 21841 different categories. Since 2010, Imagenet runs an annual competition in visual recognition where participants are provided with 1.2 million images belonging to 1000 different classes from the Imagenet dataset. Each participating team then builds computer vision algorithms to solve the classification problem for these 1000 classes. This competition is called ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and is considered an annual Olympics of computer vision with participants from across the globe including the finest of academia and industry. 


Alexnet: A LandMark Moment:

Alex Krizhevsky changed the world when he first won Imagenet challenged in 2012 using a convolutional neural network for the image classification task. Alexnet achieved top-5 accuracy of 84.6% in the classification task while the team that stood second had top-5 accuracy of 73.8% which was a record-breaking and unprecedented difference. Before this, CNNs (and the people who were working on it) were not so popular among the computer vision community. However, the tables were turned after this. Soon, most of the computer vision researchers started working on CNN and the accuracy has improved significantly over the last 4-5 years.


Deep Learning Gold-Rush:

After alexnet, deep learning had everyone’s attention. In the next few years, most of the machine learning research (both at academia and industry) was focussed on deep learning. Currently, for most of the machine learning problems, the state-of-the-art results use deep learning. For example, let’s look at the state-of-the-art result for image classification on the ImageNet challenge.


Year Model Top-5 Accuracy*
2012 Alexnet   (Deep Learning based ) 84.7
2013 ZFNet     (Deep Learning based ) 88.9
2014 VGGNet  (Deep Learning based ) 93.6
2015 Resnet    (Deep Learning based ) 96.4

* Top-5 Accuracy: In order to calculate top-5 accuracy, we calculate all the predictions as correct where any one of top-5 predictions of the model is correct. 

For most tasks, human level accuracy is close to 95%. Since then deep learning systems have reached near human-level accuracy for most of the tasks, be it speech recognition, machine translation, or visual recognition. For the same reason, jobs, investments, and acquisitions for companies and people who work on AI and deep learning have been growing rapidly. And we are just at the beginning of a revolution. So, it’s the best time to invest in AI and deep learning.