Hello World in Neural Networks

Hello World in Neural Networks


  Deep Learning

Contents

iris Classification using Tensorflow

Many times, the MNIST database (a database of handwritten digits) is typically the Hello World application when introducing Neural Networks for the first time. However, we are going to make it even simpler by taking the iris dataset and create a keras based tensorflow Neural Network to classify species. Please refer to Iris Data to understand more about the dataset we will be working on. You can also refer to Classification in Python to understand more about a non-neural network based classification approach to classifying the species in the iris dataset.

Once you understand how to solve the iris classification problem in Neural Networks, we will move to image recognition. As you will see, structurally there is not a lot of difference in the way we build the neural net for both of these problems.

This is just a “Hello World” tutorial. It is not intended to teach you the internals of Neural Networks. With that background, we are now ready to say hello to Neural Networks using Tensorflow.

import tensorflow as tf
from   tensorflow import keras

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Tensorflow is probably the most popular, open source library from Google that is used to implement Deep Learning. You can build neural networks of any complexity using Tensorflow. However, building a neural net from scratch typically involves defining

  • Layers
  • Linking the layers
  • loss function
  • weight adjustments etc

Defining these manually is very time consuming and daunting for newbies. What is needed is an abstract layer above Tensorflow, that makes building neural nets much quicker and easier.

Keras is the answer. Keras is a high level Python based API that can be used to build neural nets by leveraging Tensorflow. By the way, Tensorflow is not the only deep learning package out there. Here is a quick visual that shows you where Keras and Tensorflow stand in the hierarchy.

# load iris dataset
from sklearn import datasets
iris = datasets.load_iris()

# preview the iris data
print ( iris.data[0:5,:]  ) # data
print ( iris.target[0:5]  ) # target species

# train/test split @ 20% test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.data , iris.target, test_size=0.2)  
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
[0 0 0 0 0]

The following 8 lines of code is all you need to solve the problem. Quickly execute it to see the output for yourself. However, there is quite a lot of explantion to be done here. Let’s take it step by step.

model = keras.Sequential()
model.add(keras.layers.Dense(4,input_shape=(4,)))
model.add(keras.layers.Dense(8,activation="relu"))
model.add(keras.layers.Dense(8,activation="relu"))
model.add(keras.layers.Dense(3,activation="softmax"))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100)

y_pred = model.predict(X_test)
y_pred
Epoch 1/100
120/120 [==============================] - 0s 521us/sample - loss: 1.6050 - acc: 0.2917
Epoch 2/100
120/120 [==============================] - 0s 37us/sample - loss: 1.5040 - acc: 0.2917
Epoch 3/100
120/120 [==============================] - 0s 33us/sample - loss: 1.4096 - acc: 0.2917
Epoch 4/100
120/120 [==============================] - 0s 29us/sample - loss: 1.3262 - acc: 0.4333
Epoch 5/100
120/120 [==============================] - 0s 29us/sample - loss: 1.2607 - acc: 0.5667
Epoch 6/100
120/120 [==============================] - 0s 29us/sample - loss: 1.2080 - acc: 0.4667
Epoch 7/100
120/120 [==============================] - 0s 33us/sample - loss: 1.1707 - acc: 0.4917
Epoch 8/100
120/120 [==============================] - 0s 29us/sample - loss: 1.1451 - acc: 0.5000
Epoch 9/100
120/120 [==============================] - 0s 33us/sample - loss: 1.1258 - acc: 0.5167
Epoch 10/100
120/120 [==============================] - 0s 62us/sample - loss: 1.1068 - acc: 0.5417
Epoch 11/100
120/120 [==============================] - 0s 54us/sample - loss: 1.0904 - acc: 0.5833
Epoch 12/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0770 - acc: 0.5833
Epoch 13/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0674 - acc: 0.5250
Epoch 14/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0610 - acc: 0.3917
Epoch 15/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0548 - acc: 0.3583
Epoch 16/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0498 - acc: 0.3417
Epoch 17/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0453 - acc: 0.3417
Epoch 18/100
120/120 [==============================] - 0s 54us/sample - loss: 1.0397 - acc: 0.3417
Epoch 19/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0344 - acc: 0.3417
Epoch 20/100
120/120 [==============================] - 0s 41us/sample - loss: 1.0294 - acc: 0.3417
Epoch 21/100
120/120 [==============================] - 0s 29us/sample - loss: 1.0241 - acc: 0.3417
Epoch 22/100
120/120 [==============================] - 0s 29us/sample - loss: 1.0185 - acc: 0.3417
Epoch 23/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0128 - acc: 0.3417
Epoch 24/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0065 - acc: 0.3417
Epoch 25/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9998 - acc: 0.3417
Epoch 26/100
120/120 [==============================] - 0s 54us/sample - loss: 0.9933 - acc: 0.3417
Epoch 27/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9869 - acc: 0.3583
Epoch 28/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9786 - acc: 0.3667
Epoch 29/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9715 - acc: 0.4250
Epoch 30/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9627 - acc: 0.5500
Epoch 31/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9554 - acc: 0.6083
Epoch 32/100
120/120 [==============================] - 0s 50us/sample - loss: 0.9456 - acc: 0.6250
Epoch 33/100
120/120 [==============================] - 0s 42us/sample - loss: 0.9367 - acc: 0.6250
Epoch 34/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9272 - acc: 0.6333
Epoch 35/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9173 - acc: 0.6333
Epoch 36/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9076 - acc: 0.6500
Epoch 37/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8972 - acc: 0.6500
Epoch 38/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8869 - acc: 0.6500
Epoch 39/100
120/120 [==============================] - 0s 50us/sample - loss: 0.8768 - acc: 0.6583
Epoch 40/100
120/120 [==============================] - 0s 37us/sample - loss: 0.8661 - acc: 0.6583
Epoch 41/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8548 - acc: 0.6583
Epoch 42/100
120/120 [==============================] - 0s 37us/sample - loss: 0.8442 - acc: 0.6583
Epoch 43/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8339 - acc: 0.6667
Epoch 44/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8220 - acc: 0.6667
Epoch 45/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8111 - acc: 0.6667
Epoch 46/100
120/120 [==============================] - 0s 58us/sample - loss: 0.7997 - acc: 0.6750
Epoch 47/100
120/120 [==============================] - 0s 41us/sample - loss: 0.7883 - acc: 0.6833
Epoch 48/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7770 - acc: 0.6833
Epoch 49/100
120/120 [==============================] - 0s 50us/sample - loss: 0.7658 - acc: 0.6750
Epoch 50/100
120/120 [==============================] - 0s 37us/sample - loss: 0.7541 - acc: 0.6750
Epoch 51/100
120/120 [==============================] - 0s 29us/sample - loss: 0.7431 - acc: 0.6917
Epoch 52/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7317 - acc: 0.6917
Epoch 53/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7211 - acc: 0.7167
Epoch 54/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7099 - acc: 0.7250
Epoch 55/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6991 - acc: 0.7167
Epoch 56/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6885 - acc: 0.7167
Epoch 57/100
120/120 [==============================] - 0s 45us/sample - loss: 0.6782 - acc: 0.7083
Epoch 58/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6684 - acc: 0.7083
Epoch 59/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6599 - acc: 0.7167
Epoch 60/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6481 - acc: 0.7667
Epoch 61/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6382 - acc: 0.7583
Epoch 62/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6286 - acc: 0.7750
Epoch 63/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6196 - acc: 0.7667
Epoch 64/100
120/120 [==============================] - 0s 58us/sample - loss: 0.6111 - acc: 0.7667
Epoch 65/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6018 - acc: 0.7833
Epoch 66/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5936 - acc: 0.7917
Epoch 67/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5860 - acc: 0.8000
Epoch 68/100
120/120 [==============================] - 0s 25us/sample - loss: 0.5769 - acc: 0.8250
Epoch 69/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5688 - acc: 0.8167
Epoch 70/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5610 - acc: 0.8250
Epoch 71/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5537 - acc: 0.8417
Epoch 72/100
120/120 [==============================] - 0s 54us/sample - loss: 0.5461 - acc: 0.8500
Epoch 73/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5397 - acc: 0.8417
Epoch 74/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5323 - acc: 0.8417
Epoch 75/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5266 - acc: 0.8500
Epoch 76/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5184 - acc: 0.8583
Epoch 77/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5126 - acc: 0.8583
Epoch 78/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5079 - acc: 0.8500
Epoch 79/100
120/120 [==============================] - 0s 50us/sample - loss: 0.5031 - acc: 0.8583
Epoch 80/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4950 - acc: 0.8500
Epoch 81/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4883 - acc: 0.8500
Epoch 82/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4828 - acc: 0.8583
Epoch 83/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4782 - acc: 0.8583
Epoch 84/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4727 - acc: 0.8583
Epoch 85/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4672 - acc: 0.8750
Epoch 86/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4621 - acc: 0.8750
Epoch 87/100
120/120 [==============================] - 0s 45us/sample - loss: 0.4567 - acc: 0.8750
Epoch 88/100
120/120 [==============================] - 0s 45us/sample - loss: 0.4513 - acc: 0.8750
Epoch 89/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4470 - acc: 0.8833
Epoch 90/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4426 - acc: 0.8917
Epoch 91/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4374 - acc: 0.8833
Epoch 92/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4326 - acc: 0.8750
Epoch 93/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4277 - acc: 0.8833
Epoch 94/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4234 - acc: 0.8833
Epoch 95/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4188 - acc: 0.8917
Epoch 96/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4139 - acc: 0.8917
Epoch 97/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4095 - acc: 0.8917
Epoch 98/100
120/120 [==============================] - 0s 45us/sample - loss: 0.4054 - acc: 0.8917
Epoch 99/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4006 - acc: 0.8917
Epoch 100/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3962 - acc: 0.8917
array([[1.20811760e-02, 3.91645938e-01, 5.96272826e-01],
       [1.69933531e-02, 3.97484392e-01, 5.85522234e-01],
       [9.59960818e-01, 3.92647162e-02, 7.74465443e-04],
       [1.90705255e-01, 6.64598465e-01, 1.44696265e-01],
       [3.11258342e-03, 2.26298794e-01, 7.70588636e-01],
       [8.99805903e-01, 9.60557535e-02, 4.13831137e-03],
       [6.15397003e-03, 2.78556108e-01, 7.15289891e-01],
       [9.58240926e-01, 4.09574024e-02, 8.01660179e-04],
       [8.35558847e-02, 6.37535155e-01, 2.78908908e-01],
       [9.11576152e-01, 8.53549615e-02, 3.06887995e-03],
       [2.80173123e-02, 5.20794570e-01, 4.51188117e-01],
       [9.81949151e-01, 1.78453047e-02, 2.05568867e-04],
       [9.13475394e-01, 8.33630040e-02, 3.16164969e-03],
       [4.98204343e-02, 5.69957256e-01, 3.80222321e-01],
       [2.83193532e-02, 5.36988616e-01, 4.34692025e-01],
       [6.19469536e-03, 2.78104872e-01, 7.15700507e-01],
       [5.04648834e-02, 5.63345432e-01, 3.86189699e-01],
       [9.01798606e-01, 9.46312845e-02, 3.57011799e-03],
       [3.41202389e-03, 2.44403824e-01, 7.52184212e-01],
       [9.06935573e-01, 9.03311223e-02, 2.73334724e-03],
       [9.46662784e-01, 5.19549623e-02, 1.38220214e-03],
       [9.40084696e-01, 5.81936389e-02, 1.72167330e-03],
       [9.40235198e-01, 5.79454526e-02, 1.81935111e-03],
       [9.38879550e-01, 5.96898273e-02, 1.43059052e-03],
       [9.09764946e-01, 8.71445313e-02, 3.09041399e-03],
       [9.40479219e-01, 5.79398200e-02, 1.58103404e-03],
       [2.62060165e-01, 6.20771348e-01, 1.17168434e-01],
       [4.93753655e-03, 2.82564163e-01, 7.12498307e-01],
       [9.47779417e-01, 5.09882867e-02, 1.23227667e-03],
       [1.00235706e-02, 3.16601396e-01, 6.73375070e-01]], dtype=float32

Step 1 – What type of neural network are we building ?

model = keras.Sequential()

There are two types of Neural networks that can be build in Keras

  • Sequential
  • Functional

This classification is related to the structure of the Neural Network. However, most of the time you will be using Sequential model. It can solve most of the problems. In a sequential neural net, neurons are arranged in layers and in sequence . The firing and wiring happen in sequence, hence the name. Later in the course when we see an example of a functional neural net, the difference will be clear. Here is a quick visual of what we are building.

Finally, when the network is trained, the outputs corresponding to the species (for the corresponding data points) will light-up. When we look at the last step, we will understand what I meant by light-up.

Step 2 – How are the neurons connected ?

We are building a Dense neural network.

model.add(keras.layers.Dense(4,input_shape=(4,)))

A Dense neural network is one in which each neuron is connected to all other neurons in the previous and next layers.

You can see from the visual below that the arrows coming in to each neuron are connected to all the neurons in the previous layer. This is the most typical type of neural network.

Also, with this statement, we are just building the input layer. An input layer with 4 nodes, one node for each of the inputs. Naturally, the assumption at this point would be that there would be as many nodes in the input layer as the number of inputs. So, why specify the input_shape parameter again ? In later examples we will see that the input data shape need not always match with the input nodes. We specify the input_shape parameter as a tuple. In this case the input is a 1-d vector. Once again, later in the course we will see examples of 2-d data.

The parameter input_shape is only used when creating the first layer. The next set of steps (hidden layer and output layer) do not need this parameter.

Step 3 – Hidden Layers

This is where the magic happens. Let’s try it with just one hidden layer.

model.add(keras.layers.Dense(8,activation="relu"))

Irrespective of the layer (input, hidden or output), the way to add layers is using the add function. That should make things easy for us. The new parameter that you see in the hidden layer is the activation parameter.

In this case, we are using a relu activation function. ReLU stands for Rectified Linear Unit. The mathematical definition of relu is

The output of the activation function looks like this.

Step 4 – Output Layer

After the hidden layer is added, we add the output layer. Since we are doing a multi-class classification, the preferred activation function is called as a softmax – more on this later. A softmax activation function gives out multiple probability values and the one with the highest probability is the predicted output.

model.add(keras.layers.Dense(3,activation="softmax"))

Step 5 – Compile the model

So far, we have created the structure of the neural net – layer by layer. At each step, we have defined the number of nodes and the activation function to be used. Once we have completed it, we now have to compile the model.

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

So far, we have just defined how the neural net should look like. With the compile ( ) method, Keras translates the parameters you have specified into an optimized series of steps that can then be executed on the computer. Without the compile step, you cannot fit (train) the model. We will see how we can use metrics in a bit, but optimizer and loss parameters requrie quite a bit of explanation.

Typically, Machine Learning algorithm requires some kind of a loss function to be minimized. Gradient Descent is a commonly used loss function. For classification problems, a common loss function is Cross Entropy. Cross Entropy is also called as Log Loss. Mathematically, this is a how a cross entropy function can be defined for 2 classes.

Let’s look at an example below. Say, we are just looking at 2 species of iris flowers.

0 – setosa 1 – virginica

If the model has predicted the species to be setosa with a probability of 0.2, the loss function can be calculated as follows.

import numpy as np

p = np.array([0.0001,0.001,0.01,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.99])

# when y = 1, what is the loss function ?
y = 1
l = - (y * np.log10(p) + (1-y) * np.log10(1-p)  )

# now plot it to see how the loss function decreases as the predicted value approaches the actual value (of y = 1)
import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(p,l)
plt.xlabel("Different predicted values when the actual value is 1")
plt.ylabel("Loss function")
Text(0, 0.5, 'Loss function')

What this plot means is that the more the predicted value deviates from the actual value, the more the loss function is. For example, when the predicted value reaches close to the actual value (of 1 in this case), the loss function gets closer and closer to 0.

At this point, you can see a quick summary of the model you have created so far

model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_8 (Dense)              (None, 4)                 20        
_________________________________________________________________
dense_9 (Dense)              (None, 8)                 40        
_________________________________________________________________
dense_10 (Dense)             (None, 3)                 27        
=================================================================
Total params: 87
Trainable params: 87
Non-trainable params: 0
_________________________________________________________________

Step 6 – Fit the model with training data

This is where we train the dataset. The word epoch represents one complete iteration over the training dataset. With each epoc (one pass over the entire dataset) the weights are adjusted and the accuracy slowly increases. Since you have accuracy as a metric in step 5, it is shown at each of the training epoch. That way you see how the accuracy increases with each epoch.

model.fit(X_train, y_train, epochs=100)

Epoch 1/100
120/120 [==============================] - 0s 446us/sample - loss: 1.6451 - acc: 0.2167
Epoch 2/100
120/120 [==============================] - 0s 33us/sample - loss: 1.5703 - acc: 0.2750
Epoch 3/100
120/120 [==============================] - 0s 29us/sample - loss: 1.5049 - acc: 0.3083
Epoch 4/100
120/120 [==============================] - 0s 29us/sample - loss: 1.4404 - acc: 0.3083
Epoch 5/100
120/120 [==============================] - 0s 29us/sample - loss: 1.3812 - acc: 0.3167
Epoch 6/100
120/120 [==============================] - 0s 41us/sample - loss: 1.3288 - acc: 0.3333
Epoch 7/100
120/120 [==============================] - 0s 33us/sample - loss: 1.2814 - acc: 0.3333
Epoch 8/100
120/120 [==============================] - 0s 37us/sample - loss: 1.2384 - acc: 0.3417
Epoch 9/100
120/120 [==============================] - 0s 33us/sample - loss: 1.2007 - acc: 0.3417
Epoch 10/100
120/120 [==============================] - 0s 29us/sample - loss: 1.1680 - acc: 0.3500
Epoch 11/100
120/120 [==============================] - 0s 33us/sample - loss: 1.1370 - acc: 0.3917
Epoch 12/100
120/120 [==============================] - 0s 37us/sample - loss: 1.1077 - acc: 0.4583
Epoch 13/100
120/120 [==============================] - 0s 45us/sample - loss: 1.0831 - acc: 0.5167
Epoch 14/100
120/120 [==============================] - 0s 45us/sample - loss: 1.0567 - acc: 0.5417
Epoch 15/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0347 - acc: 0.5833
Epoch 16/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0128 - acc: 0.6000
Epoch 17/100
120/120 [==============================] - 0s 29us/sample - loss: 0.9897 - acc: 0.6000
Epoch 18/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9706 - acc: 0.6167
Epoch 19/100
120/120 [==============================] - 0s 54us/sample - loss: 0.9528 - acc: 0.6167
Epoch 20/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9355 - acc: 0.6167
Epoch 21/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9172 - acc: 0.6083
Epoch 22/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9005 - acc: 0.5917
Epoch 23/100
120/120 [==============================] - 0s 25us/sample - loss: 0.8838 - acc: 0.5833
Epoch 24/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8681 - acc: 0.5917
Epoch 25/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8518 - acc: 0.6417
Epoch 26/100
120/120 [==============================] - 0s 41us/sample - loss: 0.8370 - acc: 0.8000
Epoch 27/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8212 - acc: 0.8500
Epoch 28/100
120/120 [==============================] - 0s 37us/sample - loss: 0.8063 - acc: 0.8750
Epoch 29/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7920 - acc: 0.9000
Epoch 30/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7775 - acc: 0.9000
Epoch 31/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7639 - acc: 0.9083
Epoch 32/100
120/120 [==============================] - 0s 50us/sample - loss: 0.7499 - acc: 0.9000
Epoch 33/100
120/120 [==============================] - 0s 29us/sample - loss: 0.7373 - acc: 0.9000
Epoch 34/100
120/120 [==============================] - 0s 37us/sample - loss: 0.7235 - acc: 0.9000
Epoch 35/100
120/120 [==============================] - 0s 29us/sample - loss: 0.7112 - acc: 0.9000
Epoch 36/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6985 - acc: 0.9000
Epoch 37/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6864 - acc: 0.9083
Epoch 38/100
120/120 [==============================] - 0s 50us/sample - loss: 0.6744 - acc: 0.9250
Epoch 39/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6631 - acc: 0.9333
Epoch 40/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6517 - acc: 0.9333
Epoch 41/100
120/120 [==============================] - 0s 25us/sample - loss: 0.6411 - acc: 0.9417
Epoch 42/100
120/120 [==============================] - 0s 29us/sample - loss: 0.6303 - acc: 0.9333
Epoch 43/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6197 - acc: 0.9333
Epoch 44/100
120/120 [==============================] - 0s 46us/sample - loss: 0.6098 - acc: 0.9333
Epoch 45/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6000 - acc: 0.9333
Epoch 46/100
120/120 [==============================] - 0s 41us/sample - loss: 0.5902 - acc: 0.9333
Epoch 47/100
120/120 [==============================] - 0s 50us/sample - loss: 0.5814 - acc: 0.9333
Epoch 48/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5721 - acc: 0.9333
Epoch 49/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5637 - acc: 0.9417
Epoch 50/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5550 - acc: 0.9417
Epoch 51/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5469 - acc: 0.9417
Epoch 52/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5393 - acc: 0.9667
Epoch 53/100
120/120 [==============================] - 0s 41us/sample - loss: 0.5317 - acc: 0.9667
Epoch 54/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5239 - acc: 0.9667
Epoch 55/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5167 - acc: 0.9667
Epoch 56/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5096 - acc: 0.9667
Epoch 57/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5027 - acc: 0.9667
Epoch 58/100
120/120 [==============================] - 0s 50us/sample - loss: 0.4964 - acc: 0.9667
Epoch 59/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4897 - acc: 0.9667
Epoch 60/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4837 - acc: 0.9667
Epoch 61/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4774 - acc: 0.9667
Epoch 62/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4716 - acc: 0.9667
Epoch 63/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4660 - acc: 0.9667
Epoch 64/100
120/120 [==============================] - 0s 54us/sample - loss: 0.4602 - acc: 0.9667
Epoch 65/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4547 - acc: 0.9667
Epoch 66/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4495 - acc: 0.9667
Epoch 67/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4440 - acc: 0.9667
Epoch 68/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4390 - acc: 0.9667
Epoch 69/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4339 - acc: 0.9667
Epoch 70/100
120/120 [==============================] - 0s 45us/sample - loss: 0.4292 - acc: 0.9667
Epoch 71/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4243 - acc: 0.9667
Epoch 72/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4196 - acc: 0.9667
Epoch 73/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4149 - acc: 0.9750
Epoch 74/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4105 - acc: 0.9750
Epoch 75/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4062 - acc: 0.9750
Epoch 76/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4018 - acc: 0.9750
Epoch 77/100
120/120 [==============================] - 0s 50us/sample - loss: 0.3976 - acc: 0.9750
Epoch 78/100
120/120 [==============================] - 0s 41us/sample - loss: 0.3936 - acc: 0.9750
Epoch 79/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3903 - acc: 0.9750
Epoch 80/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3852 - acc: 0.9750
Epoch 81/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3817 - acc: 0.9750
Epoch 82/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3781 - acc: 0.9750
Epoch 83/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3740 - acc: 0.9750
Epoch 84/100
120/120 [==============================] - 0s 41us/sample - loss: 0.3703 - acc: 0.9750
Epoch 85/100
120/120 [==============================] - 0s 37us/sample - loss: 0.3664 - acc: 0.9750
Epoch 86/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3630 - acc: 0.9750
Epoch 87/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3591 - acc: 0.9750
Epoch 88/100
120/120 [==============================] - 0s 37us/sample - loss: 0.3557 - acc: 0.9750
Epoch 89/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3523 - acc: 0.9750
Epoch 90/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3488 - acc: 0.9750
Epoch 91/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3454 - acc: 0.9750
Epoch 92/100
120/120 [==============================] - 0s 45us/sample - loss: 0.3422 - acc: 0.9750
Epoch 93/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3389 - acc: 0.9750
Epoch 94/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3357 - acc: 0.9750
Epoch 95/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3329 - acc: 0.9750
Epoch 96/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3296 - acc: 0.9750
Epoch 97/100
120/120 [==============================] - 0s 25us/sample - loss: 0.3267 - acc: 0.9750
Epoch 98/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3232 - acc: 0.9750
Epoch 99/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3203 - acc: 0.9750
Epoch 100/100
120/120 [==============================] - 0s 58us/sample - loss: 0.3171 - acc: 0.9750

After finishing 100 epochs, the accuracy is around 69% – not bad for our first attempt. We will enhance it shortly to 90% by just adding one more hidden layer.

Step 7 – Predict data

Now that the model is trained, you can start predicting your test data. This step is pretty straight forward if you have already used sklearn to predict test data based on any machine learning model.

y_pred = model.predict(X_test)
y_pred
array([[4.80523752e-03, 1.61355630e-01, 8.33839059e-01],
       [5.65165561e-03, 1.38419360e-01, 8.55928957e-01],
       [9.37631130e-01, 5.78489937e-02, 4.51981043e-03],
       [9.99230742e-02, 7.17553735e-01, 1.82523206e-01],
       [9.50973015e-04, 1.86273277e-01, 8.12775731e-01],
       [8.69464993e-01, 1.15551665e-01, 1.49833458e-02],
       [6.78034406e-03, 2.44014740e-01, 7.49204934e-01],
       [9.12608325e-01, 8.10388103e-02, 6.35283068e-03],
       [6.19196370e-02, 7.35043049e-01, 2.03037351e-01],
       [8.91312957e-01, 9.75011438e-02, 1.11858230e-02],
       [1.72961298e-02, 3.84219527e-01, 5.98484337e-01],
       [9.59164917e-01, 3.88094820e-02, 2.02561123e-03],
       [9.01787043e-01, 8.78127143e-02, 1.04002040e-02],
       [4.25860547e-02, 4.14889425e-01, 5.42524457e-01],
       [2.64684074e-02, 6.92561448e-01, 2.80970186e-01],
       [3.44479713e-03, 2.71285832e-01, 7.25269318e-01],
       [4.74239029e-02, 6.01970136e-01, 3.50605994e-01],
       [8.60095024e-01, 1.25155374e-01, 1.47496713e-02],
       [1.49409531e-03, 2.42252737e-01, 7.56253123e-01],
       [8.37987006e-01, 1.46805704e-01, 1.52072664e-02],
       [8.97249877e-01, 9.39026028e-02, 8.84753559e-03],
       [9.13183808e-01, 7.87739381e-02, 8.04229267e-03],
       [8.98115098e-01, 9.22952741e-02, 9.58956406e-03],
       [8.81850123e-01, 1.08397752e-01, 9.75214690e-03],
       [8.75269353e-01, 1.11916631e-01, 1.28139891e-02],
       [9.16219890e-01, 7.63572678e-02, 7.42286025e-03],
       [1.50582254e-01, 6.88981295e-01, 1.60436377e-01],
       [8.01152084e-03, 4.72976834e-01, 5.19011676e-01],
       [9.26478028e-01, 6.74546957e-02, 6.06729649e-03],
       [8.82636756e-03, 2.28457645e-01, 7.62715995e-01]], dtype=float32)

Since this is a multi-class output, what the neural net outputs are probabilities. The highest probability among the three elements is the predicted value. However, we need to convert these probabilities back to indices.

y_pred_class = np.argmax(y_pred,axis=1)
y_pred_class
array([2, 2, 0, 1, 2, 0, 2, 0, 1, 0, 2, 0, 0, 2, 1, 2, 1, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 1, 2, 0, 2], dtype=int64)

Step 8 – Evaluate Model

Since the output is categorical data, a quick confusion matrix will show use how far we are from the model. Use scikit learn’s confusion matrix should do.

from sklearn.metrics import confusion_matrix
 
cm = confusion_matrix(y_test, y_pred_class)
print ( cm )

[[15  0  0]
 [ 0  5  1]
 [ 0  0  9]]

And of course the final number – accuracy.

from sklearn.metrics import accuracy_score
 
accuracy_score(y_test,y_pred_class)

0.9666666666666667

That is pretty low by any Machine Learning standards for this dataset. Let’s optimize it.

Step 9 – Optimize Model

There are a couple of ways to optimize for higher accuracy. One way is to increase the nodes in the hidden layer. Let’s try to increase the number of nodes from 8 to 20 and see how the network performs.

model = keras.Sequential()
model.add(keras.layers.Dense(4,input_shape=(4,)))
# BEGIN change - increase the number of nodes from 8 to 20
model.add(keras.layers.Dense(20,activation="relu"))
# END change
model.add(keras.layers.Dense(3,activation="softmax"))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100)

y_pred = model.predict(X_test)
y_pred_class = np.argmax(y_pred,axis=1)

from sklearn.metrics import accuracy_score
 
accuracy_score(y_test,y_pred_class)

Epoch 1/100
120/120 [==============================] - 0s 484us/sample - loss: 1.9111 - acc: 0.2917
Epoch 2/100
120/120 [==============================] - 0s 29us/sample - loss: 1.7657 - acc: 0.2917
Epoch 3/100
120/120 [==============================] - 0s 33us/sample - loss: 1.6348 - acc: 0.2917
Epoch 4/100
120/120 [==============================] - 0s 33us/sample - loss: 1.5117 - acc: 0.2917
Epoch 5/100
120/120 [==============================] - 0s 33us/sample - loss: 1.4100 - acc: 0.2917
Epoch 6/100
120/120 [==============================] - 0s 33us/sample - loss: 1.3202 - acc: 0.2917
Epoch 7/100
120/120 [==============================] - 0s 29us/sample - loss: 1.2427 - acc: 0.2917
Epoch 8/100
120/120 [==============================] - 0s 33us/sample - loss: 1.1696 - acc: 0.2917
Epoch 9/100
120/120 [==============================] - 0s 45us/sample - loss: 1.1183 - acc: 0.2917
Epoch 10/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0729 - acc: 0.4083
Epoch 11/100
120/120 [==============================] - 0s 29us/sample - loss: 1.0347 - acc: 0.4667
Epoch 12/100
120/120 [==============================] - 0s 29us/sample - loss: 1.0049 - acc: 0.5000
Epoch 13/100
120/120 [==============================] - 0s 29us/sample - loss: 0.9793 - acc: 0.5500
Epoch 14/100
120/120 [==============================] - 0s 29us/sample - loss: 0.9566 - acc: 0.5917
Epoch 15/100
120/120 [==============================] - 0s 29us/sample - loss: 0.9369 - acc: 0.5833
Epoch 16/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9184 - acc: 0.6000
Epoch 17/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9012 - acc: 0.6000
Epoch 18/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8860 - acc: 0.6083
Epoch 19/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8717 - acc: 0.6083
Epoch 20/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8571 - acc: 0.6083
Epoch 21/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8440 - acc: 0.6083
Epoch 22/100
120/120 [==============================] - 0s 50us/sample - loss: 0.8301 - acc: 0.6250
Epoch 23/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8142 - acc: 0.6333
Epoch 24/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7965 - acc: 0.6417
Epoch 25/100
120/120 [==============================] - 0s 37us/sample - loss: 0.7782 - acc: 0.6417
Epoch 26/100
120/120 [==============================] - 0s 29us/sample - loss: 0.7605 - acc: 0.6417
Epoch 27/100
120/120 [==============================] - 0s 37us/sample - loss: 0.7451 - acc: 0.6583
Epoch 28/100
120/120 [==============================] - 0s 45us/sample - loss: 0.7296 - acc: 0.6667
Epoch 29/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7159 - acc: 0.6917
Epoch 30/100
120/120 [==============================] - 0s 29us/sample - loss: 0.7041 - acc: 0.7417
Epoch 31/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6930 - acc: 0.7833
Epoch 32/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6817 - acc: 0.7833
Epoch 33/100
120/120 [==============================] - 0s 50us/sample - loss: 0.6706 - acc: 0.7750
Epoch 34/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6605 - acc: 0.7833
Epoch 35/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6506 - acc: 0.8250
Epoch 36/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6410 - acc: 0.8083
Epoch 37/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6306 - acc: 0.8000
Epoch 38/100
120/120 [==============================] - 0s 41us/sample - loss: 0.6215 - acc: 0.8500
Epoch 39/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6125 - acc: 0.8583
Epoch 40/100
120/120 [==============================] - 0s 37us/sample - loss: 0.6036 - acc: 0.8833
Epoch 41/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5949 - acc: 0.8833
Epoch 42/100
120/120 [==============================] - 0s 46us/sample - loss: 0.5869 - acc: 0.8750
Epoch 43/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5784 - acc: 0.8833
Epoch 44/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5705 - acc: 0.8917
Epoch 45/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5629 - acc: 0.8917
Epoch 46/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5553 - acc: 0.8917
Epoch 47/100
120/120 [==============================] - 0s 50us/sample - loss: 0.5480 - acc: 0.8917
Epoch 48/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5409 - acc: 0.8833
Epoch 49/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5341 - acc: 0.8917
Epoch 50/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5273 - acc: 0.8833
Epoch 51/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5213 - acc: 0.9167
Epoch 52/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5150 - acc: 0.8917
Epoch 53/100
120/120 [==============================] - 0s 41us/sample - loss: 0.5080 - acc: 0.8917
Epoch 54/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5011 - acc: 0.9167
Epoch 55/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4956 - acc: 0.9250
Epoch 56/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4905 - acc: 0.9250
Epoch 57/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4836 - acc: 0.9250
Epoch 58/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4779 - acc: 0.9250
Epoch 59/100
120/120 [==============================] - 0s 50us/sample - loss: 0.4727 - acc: 0.9250
Epoch 60/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4669 - acc: 0.9250
Epoch 61/100
120/120 [==============================] - 0s 25us/sample - loss: 0.4612 - acc: 0.9250
Epoch 62/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4557 - acc: 0.9250
Epoch 63/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4508 - acc: 0.9250
Epoch 64/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4456 - acc: 0.9250
Epoch 65/100
120/120 [==============================] - 0s 46us/sample - loss: 0.4406 - acc: 0.9250
Epoch 66/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4357 - acc: 0.9250
Epoch 67/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4306 - acc: 0.9333
Epoch 68/100
120/120 [==============================] - 0s 46us/sample - loss: 0.4253 - acc: 0.9417
Epoch 69/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4216 - acc: 0.9333
Epoch 70/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4159 - acc: 0.9250
Epoch 71/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4108 - acc: 0.9417
Epoch 72/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4066 - acc: 0.9417
Epoch 73/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4021 - acc: 0.9417
Epoch 74/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3976 - acc: 0.9417
Epoch 75/100
120/120 [==============================] - 0s 37us/sample - loss: 0.3932 - acc: 0.9333
Epoch 76/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3888 - acc: 0.9417
Epoch 77/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3840 - acc: 0.9500
Epoch 78/100
120/120 [==============================] - 0s 49us/sample - loss: 0.3796 - acc: 0.9417
Epoch 79/100
120/120 [==============================] - 0s 37us/sample - loss: 0.3755 - acc: 0.9417
Epoch 80/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3714 - acc: 0.9417
Epoch 81/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3671 - acc: 0.9417
Epoch 82/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3628 - acc: 0.9500
Epoch 83/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3595 - acc: 0.9417
Epoch 84/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3549 - acc: 0.9500
Epoch 85/100
120/120 [==============================] - 0s 41us/sample - loss: 0.3507 - acc: 0.9417
Epoch 86/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3470 - acc: 0.9417
Epoch 87/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3444 - acc: 0.9500
Epoch 88/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3406 - acc: 0.9417
Epoch 89/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3351 - acc: 0.9583
Epoch 90/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3318 - acc: 0.9583
Epoch 91/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3281 - acc: 0.9583
Epoch 92/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3252 - acc: 0.9583
Epoch 93/100
120/120 [==============================] - 0s 41us/sample - loss: 0.3211 - acc: 0.9417
Epoch 94/100
120/120 [==============================] - 0s 45us/sample - loss: 0.3174 - acc: 0.9500
Epoch 95/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3142 - acc: 0.9500
Epoch 96/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3102 - acc: 0.9583
Epoch 97/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3081 - acc: 0.9500
Epoch 98/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3040 - acc: 0.9583
Epoch 99/100
120/120 [==============================] - 0s 33us/sample - loss: 0.2998 - acc: 0.9667
Epoch 100/100
120/120 [==============================] - 0s 29us/sample - loss: 0.2974 - acc: 0.9500
0.9666666666666667

We have now hit 90% accuracy. That’s pretty much close to what most ML models would achieve. Let’s try to keep the number of nodes the same, but add one more hidden layer.

model = keras.Sequential()
model.add(keras.layers.Dense(4,input_shape=(4,)))
model.add(keras.layers.Dense(8,activation="relu"))
# BEGIN Change - add one more hidden layer
model.add(keras.layers.Dense(8,activation="relu"))
# END Change

model.add(keras.layers.Dense(3,activation="softmax"))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100)

y_pred = model.predict(X_test)
y_pred_class = np.argmax(y_pred,axis=1)

from sklearn.metrics import accuracy_score
 
accuracy_score(y_test,y_pred_class)
Epoch 1/100
120/120 [==============================] - 0s 574us/sample - loss: 1.9741 - acc: 0.2333
Epoch 2/100
120/120 [==============================] - 0s 33us/sample - loss: 1.8711 - acc: 0.3417
Epoch 3/100
120/120 [==============================] - 0s 33us/sample - loss: 1.7824 - acc: 0.3417
Epoch 4/100
120/120 [==============================] - 0s 33us/sample - loss: 1.7114 - acc: 0.3417
Epoch 5/100
120/120 [==============================] - 0s 46us/sample - loss: 1.6428 - acc: 0.3417
Epoch 6/100
120/120 [==============================] - 0s 37us/sample - loss: 1.5780 - acc: 0.3417
Epoch 7/100
120/120 [==============================] - 0s 33us/sample - loss: 1.5234 - acc: 0.3417
Epoch 8/100
120/120 [==============================] - 0s 37us/sample - loss: 1.4688 - acc: 0.3417
Epoch 9/100
120/120 [==============================] - 0s 54us/sample - loss: 1.4162 - acc: 0.3417
Epoch 10/100
120/120 [==============================] - 0s 29us/sample - loss: 1.3703 - acc: 0.3417
Epoch 11/100
120/120 [==============================] - 0s 37us/sample - loss: 1.3316 - acc: 0.3417
Epoch 12/100
120/120 [==============================] - 0s 45us/sample - loss: 1.2942 - acc: 0.3417
Epoch 13/100
120/120 [==============================] - 0s 33us/sample - loss: 1.2585 - acc: 0.3417
Epoch 14/100
120/120 [==============================] - 0s 37us/sample - loss: 1.2258 - acc: 0.3417
Epoch 15/100
120/120 [==============================] - 0s 33us/sample - loss: 1.2001 - acc: 0.3417
Epoch 16/100
120/120 [==============================] - 0s 54us/sample - loss: 1.1726 - acc: 0.3417
Epoch 17/100
120/120 [==============================] - 0s 37us/sample - loss: 1.1488 - acc: 0.3417
Epoch 18/100
120/120 [==============================] - 0s 33us/sample - loss: 1.1179 - acc: 0.3583
Epoch 19/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0965 - acc: 0.3833
Epoch 20/100
120/120 [==============================] - 0s 45us/sample - loss: 1.0837 - acc: 0.4167
Epoch 21/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0740 - acc: 0.4583
Epoch 22/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0643 - acc: 0.4583
Epoch 23/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0557 - acc: 0.4667
Epoch 24/100
120/120 [==============================] - 0s 37us/sample - loss: 1.0479 - acc: 0.4583
Epoch 25/100
120/120 [==============================] - 0s 38us/sample - loss: 1.0408 - acc: 0.4500
Epoch 26/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0339 - acc: 0.4333
Epoch 27/100
120/120 [==============================] - 0s 50us/sample - loss: 1.0268 - acc: 0.4583
Epoch 28/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0209 - acc: 0.4417
Epoch 29/100
120/120 [==============================] - 0s 33us/sample - loss: 1.0147 - acc: 0.4583
Epoch 30/100
120/120 [==============================] - 0s 41us/sample - loss: 1.0085 - acc: 0.4667
Epoch 31/100
120/120 [==============================] - 0s 41us/sample - loss: 1.0032 - acc: 0.4917
Epoch 32/100
120/120 [==============================] - 0s 45us/sample - loss: 0.9976 - acc: 0.5167
Epoch 33/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9920 - acc: 0.5250
Epoch 34/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9871 - acc: 0.5167
Epoch 35/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9814 - acc: 0.5333
Epoch 36/100
120/120 [==============================] - 0s 45us/sample - loss: 0.9770 - acc: 0.5250
Epoch 37/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9714 - acc: 0.5333
Epoch 38/100
120/120 [==============================] - 0s 29us/sample - loss: 0.9655 - acc: 0.5333
Epoch 39/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9607 - acc: 0.5250
Epoch 40/100
120/120 [==============================] - 0s 46us/sample - loss: 0.9554 - acc: 0.5333
Epoch 41/100
120/120 [==============================] - 0s 41us/sample - loss: 0.9513 - acc: 0.5333
Epoch 42/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9445 - acc: 0.5417
Epoch 43/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9397 - acc: 0.5333
Epoch 44/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9336 - acc: 0.5417
Epoch 45/100
120/120 [==============================] - 0s 54us/sample - loss: 0.9280 - acc: 0.5500
Epoch 46/100
120/120 [==============================] - 0s 29us/sample - loss: 0.9226 - acc: 0.5417
Epoch 47/100
120/120 [==============================] - 0s 37us/sample - loss: 0.9163 - acc: 0.5500
Epoch 48/100
120/120 [==============================] - 0s 45us/sample - loss: 0.9104 - acc: 0.5500
Epoch 49/100
120/120 [==============================] - 0s 33us/sample - loss: 0.9049 - acc: 0.5500
Epoch 50/100
120/120 [==============================] - 0s 50us/sample - loss: 0.8987 - acc: 0.5500
Epoch 51/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8940 - acc: 0.5500
Epoch 52/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8874 - acc: 0.5500
Epoch 53/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8819 - acc: 0.5500
Epoch 54/100
120/120 [==============================] - 0s 37us/sample - loss: 0.8735 - acc: 0.5500
Epoch 55/100
120/120 [==============================] - 0s 33us/sample - loss: 0.8543 - acc: 0.5500
Epoch 56/100
120/120 [==============================] - 0s 54us/sample - loss: 0.8298 - acc: 0.6083
Epoch 57/100
120/120 [==============================] - 0s 29us/sample - loss: 0.8125 - acc: 0.8333
Epoch 58/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7931 - acc: 0.8667
Epoch 59/100
120/120 [==============================] - 0s 41us/sample - loss: 0.7758 - acc: 0.9000
Epoch 60/100
120/120 [==============================] - 0s 29us/sample - loss: 0.7576 - acc: 0.8917
Epoch 61/100
120/120 [==============================] - 0s 50us/sample - loss: 0.7374 - acc: 0.9083
Epoch 62/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7191 - acc: 0.9000
Epoch 63/100
120/120 [==============================] - 0s 33us/sample - loss: 0.7007 - acc: 0.9000
Epoch 64/100
120/120 [==============================] - 0s 45us/sample - loss: 0.6813 - acc: 0.9000
Epoch 65/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6627 - acc: 0.9167
Epoch 66/100
120/120 [==============================] - 0s 41us/sample - loss: 0.6429 - acc: 0.9250
Epoch 67/100
120/120 [==============================] - 0s 45us/sample - loss: 0.6240 - acc: 0.9250
Epoch 68/100
120/120 [==============================] - 0s 33us/sample - loss: 0.6069 - acc: 0.9333
Epoch 69/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5888 - acc: 0.9333
Epoch 70/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5711 - acc: 0.9333
Epoch 71/100
120/120 [==============================] - 0s 29us/sample - loss: 0.5546 - acc: 0.9333
Epoch 72/100
120/120 [==============================] - 0s 33us/sample - loss: 0.5388 - acc: 0.9333
Epoch 73/100
120/120 [==============================] - 0s 37us/sample - loss: 0.5228 - acc: 0.9417
Epoch 74/100
120/120 [==============================] - 0s 50us/sample - loss: 0.5071 - acc: 0.9417
Epoch 75/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4926 - acc: 0.9417
Epoch 76/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4793 - acc: 0.9333
Epoch 77/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4666 - acc: 0.9417
Epoch 78/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4537 - acc: 0.9417
Epoch 79/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4419 - acc: 0.9417
Epoch 80/100
120/120 [==============================] - 0s 33us/sample - loss: 0.4310 - acc: 0.9417
Epoch 81/100
120/120 [==============================] - 0s 41us/sample - loss: 0.4205 - acc: 0.9417
Epoch 82/100
120/120 [==============================] - 0s 37us/sample - loss: 0.4103 - acc: 0.9500
Epoch 83/100
120/120 [==============================] - 0s 29us/sample - loss: 0.4006 - acc: 0.9500
Epoch 84/100
120/120 [==============================] - 0s 37us/sample - loss: 0.3916 - acc: 0.9500
Epoch 85/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3833 - acc: 0.9500
Epoch 86/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3753 - acc: 0.9500
Epoch 87/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3673 - acc: 0.9500
Epoch 88/100
120/120 [==============================] - 0s 54us/sample - loss: 0.3601 - acc: 0.9667
Epoch 89/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3529 - acc: 0.9667
Epoch 90/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3459 - acc: 0.9500
Epoch 91/100
120/120 [==============================] - 0s 29us/sample - loss: 0.3391 - acc: 0.9500
Epoch 92/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3339 - acc: 0.9583
Epoch 93/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3268 - acc: 0.9583
Epoch 94/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3209 - acc: 0.9500
Epoch 95/100
120/120 [==============================] - 0s 37us/sample - loss: 0.3152 - acc: 0.9500
Epoch 96/100
120/120 [==============================] - 0s 41us/sample - loss: 0.3095 - acc: 0.9500
Epoch 97/100
120/120 [==============================] - 0s 33us/sample - loss: 0.3042 - acc: 0.9583
Epoch 98/100
120/120 [==============================] - 0s 37us/sample - loss: 0.2990 - acc: 0.9583
Epoch 99/100
120/120 [==============================] - 0s 29us/sample - loss: 0.2942 - acc: 0.9667
Epoch 100/100
120/120 [==============================] - 0s 29us/sample - loss: 0.2887 - acc: 0.9583
0.9333333333333333

That’s 80% accuracy. The immediate question you might have is – How should you choose the number of nodes or the number of hidden layers ? Unfortunately, the meaning of weights and outputs is essentially a blackbox to humans. Meaning, we cannot make any sense out of it.

Choosing the size and complexity of a neural network (like the numbner of nodes and the number of hidden layers) is more art than science

MNIST handwritten digits classification

If all we had to do in Neural Networks was classify iris data, we wouldn’t be needing Neural networks to start with. We need a more involved dataset to quality as a “Hello World” program in Neural Networks. Welcome the MNIST digits dataset. It is a dataset of handwritten images that are scanned, standardized and optimized for machine learning. Tensorflow comes built-in with this dataset. Let’s quickly load it to see how these images look like.

import tensorflow as tf
from tensorflow import keras

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# load the data
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Show the first picture in the training dataset.
plt.figure()
plt.imshow(train_images[0])

It is a handwritten digit – 5. Although the image looks in color, these are actually gray-scale images. All of the data is standardized into 28×28 pixels. And each pixel has an intensity value between 0 and 255 ( 0 – 28 ). Since this is a small image (just 28×28 pixels), we can write it to an excel file and see the numbers visually.

# write the first image to a csv file.
np.savetxt("image_1.csv",train_images[0],delimiter=",")

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-68-3e86563d8a79> in <module>
      1 # write the first image to a csv file.
----> 2 np.savetxt("image_1.csv",train_images[0],delimiter=",")

<__array_function__ internals> in savetxt(*args, **kwargs)

~\AppData\Roaming\Python\Python37\site-packages\numpy\lib\npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
   1362     if _is_string_like(fname):
   1363         # datasource doesn't support creating a new file ...
-> 1364         open(fname, 'wt').close()
   1365         fh = np.lib._datasource.open(fname, 'wt', encoding=encoding)
   1366         own_fh = True

PermissionError: [Errno 13] Permission denied: 'image_1.csv'

If you open the csv file in excel, and adjust the column size and change the number format to zero decimals, you should see a picture like this. Can you identify the digit 5 in there ?

Confirm that the image is in fact a 5.

# Print the first label
train_labels[0]

Prepare the model. The first layer in this case is slightly different from the first layer in the iris example above. As you can see from the input image data, it is a 28×28 dimension numpy array. However, we are going to be working with flat (a flat set of neurons in each layer). So, the first layer will be essentially a 784 (28 x 28 = 784) node layer that will be created automatically by flattening the input array.

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=6)
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)
predictions = model.predict(test_images)

The prediction is a vector of 10 x 1 dimension of probabilities. The value with the highest probability in the output array is the predicted value. For example, let’s see what is the first image in the test_image.

test_labels[0]

It is a 7. Let’s see how the probabilities are predicted.

np.savetxt("predictions.csv",predictions[0:10],delimiter=",")
predicted_final[0:10]

You can see from the array above that the highest probability value is at location 7. Let’s apply the argmax function of numpy to just filter out the values with the highest probability into a new array. For example, the first output array has an argmax output of 7 ( argmax ( ) outputs the index of the highest element in a numpy array).

Let’s apply the argmax on the output array to predicted array.

predicted_final = np.argmax(predictions,axis=1)
from sklearn.metrics import accuracy_score
 
accuracy_score(test_labels,predicted_final)

That’s a 95% accuracy on the test datasets.

Great !! Our “Hello World” of Neural Networks is complete. The next couple of days we will focus on the moving parts of a neural network and how Gradient Descent is used in Neural networks to optimize the weights. This is how the neural network essentially learns.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: