0 comments on “Subplots”

Subplots

Subplots


  Data visualization

Contents

Elements of a plot

Figure and Axes

Before getting into the subplots topic, let us first discuss about the two main elements in a plot – Figure and Axes.

Figure

A Figure object can be thought of as a window on which plots are rendered and it contains all the plotting elements.

Axes

The Axes is the actual plotting area contained within the figure object. There can be multiple axes objects in a figure object. The Axes contains the x-axis,y-axis,data points,lines,ticks,text etc.,

Interfaces

The Matplotlib library provides two important interfaces for rendering plots:

1. State-machine interface
2. Object Oriented interface


What is a state-machine?

A finite state machine is an abstract concept wherein a system can exist in a finite number of states. But at any given point of time, it can be in exactly one of these states. Depending on the operations performed on the state machine, it can change from one state to another.

State-machine interface

The figure on which plots are rendered behaves like a state-machine system. The state-machine interface keeps track of the current figure, axes and the current state of the figure and hence is called stateful interface. The state-machine interface makes calls to various methods using the pyplot module and it is also known as Pyplot interface. When a method is called using the pyplot module, a figure and axes objects are created implicitly and changes are made to the figure, hence changing the state of the figure. We do not define any objects or variables when using this interface. We simply call methods defined in the pyplot module and the changes appear in the figure.

The pyplot interface is useful to render simple plots with minimum code or for quickly testing our code, because we can issue a command and immediately see the result. But if we are working on multiple plots it becomes difficult to keep track of the active figure and the plots.

Object-oriented interface

Matplotlib also supports the object oriented interface or the stateless interface. This interface is called stateless because instead of creating a global figure instance, a figure object is created and referenced with a variable. We can then directly call methods on this variable to change the different elements in a plot. Because we are storing a reference to the plotting objects in variables, it is easy to track the objects that we are working on. The object-oriented interface is a more pythonic and recommended way to use Matplotlib.

The pyplot interface is infact built on top of the object-oriented interface. The top layer is the front-end interface that is used by users without much programming experience to test simple plots. Calling methods such as line,scatter,bar,hist etc., from pyplot module, creates a figure instance implicitly and generates the respective plots. The bottom layer can be used to develop complicated visualizations as it provides more control and allows for customization of the plots.

A figure object can be created using pyplot. In the object oriented interface as well, pyplot is used to create the figure and the axes objects. In order to render the plots, we then call the various methods directly on these objects.

Subplots

Subplot method

Up until now, we have seen various plots such as line plots, bar charts, pie charts, scatter plots etc., In many examples we have generated one plot on one figure, we have also seen examples where we generated multiple plots on one figure. What if you are required to have plots side by side or one plot on top of the other in a figure? This can be achieved with subplots. A subplot is a plot that takes up only a part of the figure object. In Matplotlib,a subplot is also referred to as an axes. We will now see how to generate subplots using the object oriented approach.

The subplot method is used to divide a figure into multiple subplots: subplot (m, n, index, kwargs)**

The subplot method returns an axes object and is used to divide a figure into a matrix of m rows and n columns, creating m*n subplots in one figure. The index number provides the location of the current axes for plotting. The index number starts from 1 and the subplots are numbered from left to right beginning from the first row upto the mth row.

from matplotlib import pyplot as plt
import numpy as np
fig1 = plt.figure(num=1,figsize=(10,6))
ax1  = plt.subplot(111)
x = np.arange(1,10)
y = np.arange(1,10)
ax1.plot(x,y,color='c',marker='p')
plt.title('Subplot_1')
plt.show()


Subplots method

In order to plot using the subplot function, the given subplot has to be first set as the current/active axes for plotting by specifying the index number and this becomes tedious if there are too many subplots. Matplotlib also provides the subplots method which returns a figure object and a numpy array of axes objects.

subplots(m=1, n=1, sharex=False, sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None, fig_kw)**

fig2, ax = plt.subplots(2,1,sharex='col')
x = np.arange(1,11)
y = np.arange(1,11)
z = np.random.randn(10)

# Subplot1
ax[0].plot(x,y,color='g',marker='p',label='axes1')
ax[0].set_title('Axes1')

# Subplot2
ax[1].plot(x,z,marker='o',label='axes2')
ax[1].set_title('Axes2')

plt.tight_layout()
plt.show()
fig2, ax = plt.subplots(2,2,sharex='col',sharey='row')
x = np.arange(1,11)
y = np.arange(1,11)
z = np.random.randn(10)

#Subplot1
ax[0][0].plot(x,y,color='g',marker='p',label='axes1')
ax[0][0].set_title('Axes1')
ax[0][0].legend()

#Subplot2
ax[0][1].scatter(x,y,color='k',marker='^',label='axes2')
ax[0][1].set_title('Axes2')
ax[0][1].legend()

#Subplot3
ax[1][0].plot(x,z,marker='v',label='axes3')
ax[1][0].set_title('Axes3')
ax[1][0].legend()

#Subplot4
ax[1][1].scatter(x,z,marker='o',label='axes4')
ax[1][1].set_title('Axes4')
ax[1][1].legend()

plt.tight_layout()
plt.show()

Add axes method

You can also specify the position and the dimensions of an axes using the add_axes method.

add_axes([left, bottom, width, height])

A list is passed to this method containing the elements:

  • left, bottom (coordinates of lower left corner of the axes).
  • width, height of the axes object.

The values for all the 4 parameters range in between 0 and 1, representing the fraction of the figure that the axes occupies.

fig3 = plt.figure()
ax3 = fig3.add_axes([0.1,0.1,0.8,0.8])
inset_ax = fig3.add_axes([0.15,.6,0.2,0.2])
ax3.plot(x,y)
inset_ax.plot(x,y)
plt.show()

In the above example, we have a subplot enclosed in another subplot.

0 comments on “Scatter Plots”

Scatter Plots

Scatter Plots


  Data visualization

Contents

Scatter Plots

Scatter plots are used to represent the relation between two variables, one variable plotted along the x-axis and the other plotted along the y-axis. Scatter plots are used to spot trends and the correlation between two variables i.e., how much one variable is affected by another. In a scatter plot, a dot or small circle represents a single data point. If the data points are clustered together, it means the relation between the two variables is strong. If the data points are widely spread in the plot then the variables have a weak relation.

2D Scatter Plot

Correlation

In the example below, a simple scatter plot is plotted with two variables x,y using the ‘scatter’ function defined in the pyplot module. The data points are represented with dots. It can be seen that the two variables have a positive correlation, which means if the value of one variable increases, the value of the other variable increases as well and vice versa.

%matplotlib inline
from matplotlib import pyplot as plt 
import numpy as np
x = np.random.randint(0,50,200)
y = x**2+x+2
plt.scatter(x,y,c='magenta',linewidth=1,edgecolor='green')
plt.title('2D Scatter Plot')
plt.xlabel('x-->')
plt.ylabel('y-->')
plt.show()

Analysing the Iris dataset using Scatter Plots

Visualizing Patterns

In the next example, we will analyse the iris dataset using a scatter plot. The iris data set contains measurements in centimeters for the characteristics – sepal length and width, and petal length and width, for 150 flowers from 3 species of iris flowers – Iris setosa, versicolor and virginica. The dataset contains 50 instances for each species. The data was collected over several years by Edgar Anderson, a biologist, who used the data to show that the measurements could be used to differentiate between different species of irises. The iris flower dataset is now widely used for testing purposes in computer science. In order to plot a scatter plot, we will use the characteristics – petal length and width as variables.

# read the data file
import csv
with open (r'C:\Users\Ajay Tech\Documents\Matplotlib\iris.csv') as csv_file:
    input_file = csv.reader(csv_file,delimiter = ',')
    Header = next(input_file)
    sepal_length = []
    sepal_width = []
    petal_length = []
    petal_width = []
    for row in input_file:
        sepal_length.append(float(row[0]))
        sepal_width.append(float(row[1]))
        petal_length.append(float(row[2]))
        petal_width.append(float(row[3]))

# plot the data        
plt.scatter(petal_length[ :51],petal_width[ :51],c='red',label='Iris-setosa')
plt.scatter(petal_length[51:101],petal_width[51:101],c='green',label='Iris-versicolor')
plt.scatter(petal_length[101:],petal_width[101:],c='blue',label='Iris-virginica')
plt.title('Iris Data')
plt.xlabel('petal_length')
plt.ylabel('petal_width')
plt.legend()
plt.show()

We have used different colours to differentiate between the three species in the plot. We can clearly see from the plot that the data points are separated into three different groups despite the slight overlap between Versicolor and Virginica. From the distribution of data points on the plot we can infer that the three species have different petal sizes. Flowers belonging to the Setosa species can be easily distinguished from the other two species, as the petal size of the Setosa species do not overlap with the other two. By examining the above scatter plot we see an overall positive correlation between petal length and petal width for the three species.

Multidimensional Scatter Plots

Scatter plot is a two dimensional visualization tool, but we can easily add another dimension to the 2D plot using the visual variables such as the color, size and shape. Say for example, you want to see the correlation between three variables then you can map the third variable to the marker size of each data point in the plot. So the marker size represents an additional third dimension.

We can plot all the data points with the same color by specifying a color name or we can plot data points in varying colors. For example, we can change the color intensity of the data points from bright to dark, in this case color for each data point is retrieved from a color map. Color map also called a color look up table, is a three-column matrix whose length is equal to the number of colors it defines. Each row of the matrix defines a particular color by specifying three values in the range 0 to 1. These three values define the intensities of the red, green and blue components respectively. Color maps are usually used to highlight the data or apply effects to your plots. The default colormap in Matplotlib is “viridis”.

Sample color map:

RedGreenBlueColor
000black
111white
100red
010green
001blue
110yellow
101magenta
011cyan
10.250.25coral red
110.19daffodil
0.50.50.5grey

To plot the data points in varying colors we need to map values in our data to colors in a plot. In order to see the range of colors in the colormap and the assignment of numerical values in the third dimension we can use a colorbar. A colorbar displays the current colormap along with numerical rulings so that the color scale can be interpreted.

4D Scatter Plot

In the example below, we have a scatter plot with the first two dimensions as the variables x and y. The third variable (z) is represented in the third dimension by mapping it to the marker size. So the marker size corresponds to the different values in the variable ‘z’. We have also added a fourth dimension to the plot which is the color of the data points, the colors correspond to the values in the numpy array ‘color’.

import numpy as np
x = np.random.randint(1,100,100)
y = np.random.randint(1,100,100)
z = 5 * np.random.randint(1,100,100)
color = np.random.randint(1,100,100)

plt.scatter(x,y,s=z,c=color,alpha=0.5,cmap='ocean')
plt.title('4D Scatter Plot')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.colorbar()
plt.show()

The above scatter plot does not show any evident relationship between the variables as the dots are scattered around the entire plot area. So the variables have zero correlation.

0 comments on “Histogram”

Histogram

Histogram


  Data visualization

Table of Contents

Histogram

Histogram

A frequency distribution is a table that shows the number of times distinct values in a dataset occur. Histograms are used to evaluate the frequency distribution of a given variable by visually displaying the number of data points occurring in a certain range of values, histograms are useful when there are large datasets to analyse. Similar to a bar graph, data in a histogram are represented using vertical bars or rectangles. So a histogram appears similar to bar graphs, but the bars in a bar graph are usually separated whereas in histograms the bars are adjacent to each other.

Say for example, you are conducting an experiment and you want to visually represent the outcome of the experiment. In this experiment, you are rolling two dice 1000 times, the outcome of each event is recorded by appending the outcome to a list.

outcomes = [1,5,6,3,3,2,4,1,6,11,12,10,7,8,9,12,11,…]

If you want to see the pattern of the outcomes, it is difficult to analyse the list. We can visualize the pattern by generating a histogram showing the frequency of occurrence for the sum of two dice rolls. Histograms are useful for displaying the pattern of your data and getting an idea of the frequency distribution of the variable. To plot a histogram, the entire range of input dataset is split into equal sized groups or bins. A bar is drawn for each bin with the height proportional to the number of values in the input data that fall under the specified bin.

Plot a Histogram with random numbers

The histogram below is plotted with random numbers using the ‘hist’ function defined in the pyplot module. The rand function defined in the numpy library creates an array of specified shape and fills it with random numbers from 0 (included) to 1 (excluded).

%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
input_data = (np.random.rand(10**3))
plt.hist(input_data,bins=50,color='r',alpha=0.4)
plt.title('Histogram')
plt.xlabel('bins')
plt.ylabel('frequency')
plt.show()

In the above example, the rand function has generated 1000 random numbers and using the ‘hist’ function these random numbers are distributed in 50 different bins. It can be observed from the above histogram that the distribution of random numbers is more in some bins than the other bins. You can generate random numbers to the order of 10^4,10^5,10^6 and see how the values are distributed.

Plot a Histogram to analyze Airline On-time performance

The U.S. Department of Transportation’s (DOT) – Bureau of Transportation Statistics (BTS) releases a summary of statistics and basic analysis on airline performance each month. This dataset is a summary of different air carriers showing their departure delays, arrival delays, scheduled departure, etc. Let us analyse the flight data released by BTS. For this example, I have downloaded data from the following website – (https://transtats.bts.gov/ONTIME/Departures.aspx) into a csv file. This data is collected at JFK International Airport for American Airlines carrier during Jan’19.

Let us plot a histogram which shows the distribution of departure delays(in minutes) of all flights. The delay in departure is calculated as a difference in minutes between scheduled and actual departure time. In the input dataset, early departures are represented as negative numbers and on-time departures are represented with a zero.

import csv
from matplotlib import pyplot as plt
with open (r'C:\Users\Ajay Tech\Documents\training\visualization\Data\flight_delay_american.csv') as input_file1:
    csv_file = csv.reader(input_file1)
    header = next (csv_file)
    delay_min = []
    for row in csv_file:
        delay_min.append(int(row[5]))

bins = [-50,0,50,100,150,200,250,300,350,400,450,500,550,600,650,700,750]
plt.hist(delay_min,bins=bins,log=True,color='c')    
plt.axvline(np.mean(delay_min), color='r', linestyle='dashed', linewidth=1)
plt.title('Histogram of Departure Delays(AA)')
plt.xlabel('Delay(min)')
plt.ylabel('No of flights')
plt.xticks(bins,rotation=30)
plt.show()

In the above script the yscale is set to log scale instead of normal scale because log scale allows us to visualize variations that would otherwise be barely visible. We have marked the average departure delay time on the histogram with a vertical reference line drawn using the axvline function. The axvline function plots a line across the x-axis which can be used to highlight specific points on the histogram. The dotted vertical line on the histogram indicates that on an average, the American Airlines flights departing from JFK airport took off 7 minutes late in Jan’19.

Let us also see the performance of another carrier at JFK airport for the same period.

with open (r'C:\Users\Ajay Tech\Documents\training\visualization\Data\flight_delay_jetblue.csv') as input_file2:
    csv_file = csv.reader(input_file2)
    header = next (csv_file)
    delay_min = []
    for row in csv_file:
        delay_min.append(int(row[5]))

bins = [-50,0,50,100,150,200,250,300,350,400,450,500,550,600,650,700,750]
plt.hist(delay_min,bins=25,log=True,color='b',alpha=0.3)    
plt.axvline(np.mean(delay_min), color='r', linestyle='dashed', linewidth=1)
plt.title('Histogram of Departure Delays(JB)')
plt.xlabel('Delay(min)')
plt.ylabel('No of flights')
plt.xticks(rotation=30)
plt.show()

The vertical line drawn using the axvline function indicates that the average departure delay time for JetBlue Airways flights flying out of JFK is 14 minutes. In fact, JetBlue Airways was named as the most-delayed airline at JFK airport.

0 comments on “Bar Charts”

Bar Charts

Bar Charts


  Data visualization

Contents

Bar Chart

A bar chart represents data values in the form of vertical bars. Each vertical bar in the graph represents an individual category. The bars are used to compare values in different categories. In a bar chart, the length of a bar is proportional to the value it represents and the width remains same for all bars. One axis of the chart represents categories and the other axis represents the value scale.

Below we see a bar chart plotted using the bar function defined in pyplot module. The bar chart displays cars sales during a ten year period for an automobile company. The first argument to the bar function indicates the position of the bar on the x-axis with the center at the x-tick position. The second argument indicates the height. The width of each bar is 0.8 which is the default setting, and this can be changed using the ‘width’ parameter.

from matplotlib import pyplot as plt
import numpy as np
year = [2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]
toyota_sales = [1843669,1496211,1488588,1396837,1764802,1893874,2004373,2098545,2106332,2129177,2128201]
x_pos = np.arange(len(year))
plt.bar(x_pos,toyota_sales,color='#623aa2',alpha=0.25,edgecolor='k',label='Toyota')
plt.xticks(x_pos,year,rotation=30)
plt.title('Toyota Car sales')
plt.xlabel('Year')
plt.ylabel('No of units sold')
plt.show()

Clustered Bar Chart

A clustered or a grouped bar chart is used to compare multiple data sets side by side. Say, you want to compare values of multiple datasets that come under the same category, then a clustered bar chart comes in handy. The previous example can be extended to display car sales of different automobile companies.

In the previous example, we used a bar chart to display the sales of an automobile company for a ten year period. Now we would like to compare the sales of three different companies for the same period. So we are going to have three vertical bars under each category, each bar representing a company. In order to differentiate the three datasets we use different colors for the bars.

year = [2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]
hyundai_sales = [401742,435064,538228,645691,703007,720783,725718,761710,768057,664943,667634]
honda_sales = [1284261,1045061,1096874,1023986,1266569,1359876,1373029,1409386,1476582,1486827,1445894]
toyota_sales = [1843669,1496211,1488588,1396837,1764802,1893874,2004373,2098545,2106332,2129177,2128201]
x_pos = np.arange(len(year))
width = 0.25
plt.bar(x_pos-width, toyota_sales, edgecolor='k',color='#E59998',width=width,label='Toyota')
plt.bar(x_pos, honda_sales, edgecolor='k',color='#FAD8AA',width=width,label='Honda',alpha=0.75)
plt.bar(x_pos+width, hyundai_sales, edgecolor='k',color='#538790',width=width,label='Hyundai')
plt.xticks(x_pos,year,rotation=30)
plt.title('Car sales')
plt.xlabel('Year')
plt.ylabel('No of units sold')
plt.legend()
plt.show()

Horizontal Bar Chart

Horizontal Bar Charts represent data in the form of horizontal bars, each bar representing an individual category. The data categories are shown on the y-axis and the data values are shown on the x-axis. The length of a bar is proportional to the value it represents.

The example below demonstrates how to plot a bar chart, the input datasets required for plotting are available in a csv file. We will import the built-in csv module to work with csv files.

import csv
with open (r'C:\Users\Ajay Tech\Desktop\air_pollution_index.csv') as input_file:
    csv_file = csv.reader(input_file,delimiter = ',')
    Header = next(csv_file)
    country = []
    index = []
    for row in csv_file:
        country.append(row[0])
        index.append(row[1])

plt.bar(country,index,color='#ff753e') 
plt.title('Air Pollution Index')
plt.xlabel('Index')
plt.show()

The above graph is plotted using the bar function. As can be observed from the figure the x-axis labels are overlapping with each other because the labels are too long. This problem can be solved using a horizontal bar chart, which makes optimal use of the space available. If the data labels are long or if you have too many data sets to plot, then horizontal bar charts can be used for plotting.

plt.barh(country,index,color='#ff753e') 
plt.title('Air Pollution Index')
plt.xlabel('Index')
plt.show()

0 comments on “Pie Chart”

Pie Chart

Pie Chart


  Data visualization

Contents

  • Pie Chart

Pie Chart

A pie chart is a circular chart divided into segments. The segments of a pie chart are called wedges. Each wedge represents an individual category. Pie charts display the contribution of each wedge to the total value and the sum of the values of all wedges should be equal to the whole circle. Pie charts are useful when there are few categories to compare(5 or less) else it becomes difficult to interpret the data.

Below we see a pie chart plotted using the pie function defined in pyplot module. The pie chart displays the percentage of marks obtained by a student in four subjects. The circle is divided into four wedges and the area of each wedge is proportionate to the value it represents.

from matplotlib import pyplot as plt
plt.style.use('default')
x = [22,18,13,10]
labels = ['maths','physics','chemistry','english']
colors = ['m','r','y','c']
plt.pie(x,labels=labels,colors=colors,autopct='%.1f%%')
plt.title('Marks obtained in an exam')
plt.show()

The pie chart below displays the market share of mobile phone vendors worldwide.

plt.style.use('default')
x = [33.71,19.37,4.85,3.82,7.42,30.83]
labels = ['Samsung','Apple','Huawei','LG','Unknown','Others']
colors = ['m','r','y','b','g','c']    
plt.pie(x,labels=labels,colors=colors, explode=[0,0,0,.2,0,0],autopct='%1.2f%%',startangle=45,counterclock =True,shadow=True,
        wedgeprops={"edgecolor":"0",'linewidth': 1,'linestyle': '-'})
plt.title ('Smartphone market share')
plt.axis('equal')
plt.legend(loc=2)
plt.show()

x — The first argument passed to the pie function is an array or a list denoting the values for the categories to be compared.

Labels — The labels argument is a list of strings used for labelling each wedge.

Colors — You can define an array/list of colors and then pass it to the pie function that will be applied to each wedge in pie chart in the order specified in the array.

Explode — If you want to highlight or emphasize key data in a pie chart use the explode parameter. The explode parameter explodes/expands a wedge, so the wedge is moved slightly outward from its center. This parameter accepts an array and each element in the array specifies by what fraction of the radius the wedge needs to be exploded. The value has to be defined for all wedges in the pie chart, so the length of the array should be equal to the number of wedges in the pie chart.

Autopct — If you want to label the wedges with their numeric value in a pie chart, then use autopct parameter. This parameter allows us to display the percent value using string formatting. Say for example, the percent value calculated for a segment is 34.678666 and if you want to display the percent value rounded to 1 decimal place then autopct parameter should be assigned the format string ‘%1.1f’ then the wedge will be labelled with the numeric value 34.6. If you want to add a percent sign (%) to the label then use two percent signs(%%) in the format string to escape ‘%’ sign.

Startangle — By default the Startangle is zero, which means starting from the positive x-axis the wedges are arranged in the counter clock wise direction. If you specify a different Startangle then the start of the pie chart is shifted by this angle in degrees and then the wedges are arranged in counter clock wise direction from this position.

Counterclock — Specifies the direction in which the wedges are arranged, clockwise or counter clockwise. The default value is True.

Shadow — A shadow effect can be added to the pie chart using the shadow parameter of the pie() function, passing boolean value – True will make a shadow appear below the pie chart. By default shadow is turned off.

Wedgeprops — The wedges of the pie chart can be customized using the wedgeprop parameter. A dictionary with the property name and value as the key, value pairs can be passed as the wedgeprop argument. The wedge properties like edgecolor, linestyle, linewidth can be specified.

0 comments on “Line Plots”

Line Plots

Line Plots


  Data visualization

Contents

What is a Line Plot?

The pyplot module in matplotlib supports a variety of plots such as – line plot, pie chart, bar chart, histogram, scatter plot etc., The module defines methods that are used to render various plots. In this tutorial, we will discuss about line plots.

A line plot is created by connecting the values in the input data with straight lines. Line plots are used to determine the relation between two datasets. A dataset is a collection of values. Each dataset is plotted along an axis ie., x and y axis. In order to draw a line plot, we call the plot function defined in pyplot module. We pass two arguments (arrays or lists) to the plot function, the first argument denotes the x-coordinates, second argument denotes the y-coordinates. The plot function plots the data points (x1,y1), (x2,y2) and so on defined in the input datasets and by default, draws a line between these data points. Before drawing a plot, let us the see the components that make up a basic plot.

Components of a basic plot

A basic plot is made up of the following components:

  1. Title – Title describes the information that we want to convey using the graph.
  2. Label – Label is a short description of the datasets being plotted.
  3. Scales – Scales determine the reference points for data displayed on the graph.
  4. Points – Points in a graph represent the input data in the form of x-coordinate and y-coordinate (x1,y1).
  5. Lines – Lines are used to connect points to highlight the change in values.

Plotting a graph

Matplotlib makes extensive use of the Numpy library which contains a number of mathematical functions which can be used to perform various mathematical operations. We need to import Matplotlib and Numpy libraries before making any calls to the routines defined in them. The below example demonstrates creation of a line plot by passing two numpy arrays x and y as arguments to the plot function.

from matplotlib import pyplot as plt
import numpy as np
x = np.linspace(0,10,10)
y = np.linspace(0,10,10)
plt.title('First Plot')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.plot(x,y,marker='o')
plt.show()

In order to render the above plot, we simply passed two arrays x,y to the plot function. We can see that the plot function has:

  1. Drawn the x and y axis.
  2. Marked evenly spaced scales(tick marks) on both the axes.
  3. Plotted the data points.
  4. Connected the data points with lines.
  5. Added the title, xlabel, ylabel.

Before executing the above mentioned steps, the plot function first creates a figure object. Figure objects are the individual windows on the screen in which Matplotlib displays the graphical output. It is a container for the graphical output. In the Jupyter NB, figures rendered by the Matplotlib library, are included inline. The plot function implicitly creates a figure object and then plots the graph. So, we do not have to call any other function to instantiate a figure object to render a plot when using the plot function. The standard size of a figure object is 8 inches wide by 6 inches high.

Say, for example, we have a requirement to create a figure with a specified size (4 inches wide, 4 inches high). For this, we need to call the Figure method defined in the pyplot module explicitly. The ‘figsize’ parameter of this method allows us to specify the width and height of a figure in unit inches and new a figure will be created. In order to render a plot, call the plot function.

The savefig() method saves the figure to a data file with a name specified by the string argument. The filename can be a full path and can also include a file extension if needed.

plt.figure(num=1,figsize=(4,4),dpi=100)
plt.plot(x,y,marker='o')
plt.savefig('second_plot.png')
plt.close()

Creating a line plot by passing a single array

We can pass a single dataset or an array to the plot function as shown in cell below. The plot function uses the values 0, 1, …, N-1 as the x coordinates where ‘N’ is the size of the y array.

plt.title('Second Plot')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.plot(y)
plt.show()

Multiple plots on the same graph

We can plot multiple plots on the same graph by calling the plot function for each dataset pair, this is useful to compare the plots. Each plot is rendered on top of another plot. Notice how Matplotlib applies a different color to each plot. The plots share the figure, x and y axis.

plt.plot(x,y)
plt.plot(x,x**2)
plt.plot(x,x**1/2)
plt.title('Multiple Plots')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()

The above code can be re-written as follows, that is by passing each dataset pair in an order as arguments to the plot function.

plt.plot(x,y,x,x**2,x,x**1/2)
plt.title('Multiple Plots2')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()

Line Properties

A line drawn on a graph has several properties such as color, width, style, transparency etc., these properties can be customized as per our requirement when we call the plot function.

plt.plot(x,y,color='c',marker='o',linewidth=2,linestyle='--',alpha=0.5,label='line1')
plt.plot(x,x**2,color='#90BC38',marker='D',linewidth=2,alpha=0.5,label='line2')
plt.legend()
plt.show(

Color codes

Colors in data visualization are used to enhance the look of the graphs, communicate the information clearly and to distinguish one set of data from another. The following basic colors are defined in Matplotlib.

Matplotlib also supports HEX colors. Web designers and developers use HEX colors in web designing. A HEX color is represented as a six-digit combination of numbers and letters defined by the amount of red, green and blue (RGB) that makes up the color.

Linestyle

Linestyle specifies whether the line is solid, dashed etc.,

Markers

Markers on a line plot are used to highlight particular data points.

Legend

A legend is a box associating labels(text) with lines on a graph. The legend() method is used to add a legend to the plot. This method can be called in multiple ways:

  1. plt.legend() – When no arguments are passed to the legend() method, the plots to be added in the legend are automatically detected, and the corresponding labels are used in the legend.
  2. plt.legend([‘label1’, ‘label2’, ‘label3’]) – The legend method can also be called by passing a list of string labels, where each string is used as a label for the plots in the order they were created. This method can be used to create a legend for the plots already existing on the axes. Note: This way of using the legend method is often discouraged because you should remember the order in which the plots are created which can be confusing.
  3. plt.legend([plot1,plot2,plot3],[‘label1′,’label2′,’label3’]) – We can explicitly specify the plots and labels by passing the list of plots followed by the list of string labels arranged in order to the legend method.
0 comments on “Introduction to Matplotlib”

Introduction to Matplotlib

Introduction to Matplotlib


  Data visualization

Contents

Data visualization

Data visualization is the representation of data in a graphical format. It helps putting data in a visual form. Information, relationships, patterns that might go unnoticed in a text-based format can be easily recognized with data visualization software. This is because the human brain can understand and process visuals such as images, graphs or charts more easily compared to having the data in spreadsheets or in the form of reports. Data visualizations can turn large and small datasets into visuals.

The table below displays an independent variable ‘x’ and three functions a,b and c. Each of these functions are dependent on the variable ‘x’. Let us use the data in the table below to plot these functions.

From the plot, we can observe the relation between the variable and each of the functions and can infer that the cubic function(c) grows much faster compared to the identity function(a) and the square function(b).

In this example, we have a small dataset, so analysing data from the table is easy. But what if we have a dataset with millions of entries or a complex function to be analysed. In that case, having a graphical representation of data would be useful. There are various types of graphs or charts to represent data in a visual form. The type of data given and what we want to convey to the user determine the appropriate graph to be used. Line plots, Pie charts, Bar charts, Histogram, Scatter Plots etc., are few examples of graphs.

Matplotlib

This course will take an in-depth look at the Matplotlib tool for visualization in Python.

Matplotlib is a Python package that is widely used throughout the scientific Python community to create high-quality and publication-ready graphics.

Matplotlib is Python’s alternative to MATLAB and it has the advantage of being free and open-source, whereas MATLAB is expensive and closed source.

The Matplotlib library provides the pyplot module, which contains functions which closely resemble the MATLAB plotting syntax and functionality.

Matplotlib is built on Numpy arrays. It supports a wide range of export formats suitable for both web and print publishing. It supports high-quality output formats such as PNG, PDF, SVG, EPS and PGF.

Installing Matplotlib

Install Matplotlib with pip

Matplotlib can be installed using the Python package manager, pip. To install Matplotlib with pip, open a terminal window and type:

$ pip install matplotlib

This command installs Matplotlib in the current working Python environment.

Install Matplotlib with the Anaconda distribution of Python

The easiest way to install Matplotlib is to download and install the Anaconda distribution of Python. The Anaconda distribution of Python comes with Matplotlib included and no further installation steps are required. You can download the latest version of Anaconda by following this link – https://www.anaconda.com/download/.

Backend

Matplotlib uses a backend to render the plots. Backend is a utility used to create graphs. There are two types of backends, interactive and non-interactive. Interactive backends display the figure in a graphical user interface, which allows us to pan and zoom the figure. Non-interactive backends are used to produce image files. Matplotlib supports the following backends:

Backends: GTKAgg, GTK3Agg, GTK, GTKCairo, GTK3Cairo, WXAgg

The Jupyter Notebook supports the ‘inline’ backend. With this backend, the output of plotting commands is displayed inline, that is directly below the code cell that produced it. The inline backend renders a static or a stand alone plot. The resulting plots will be stored in the notebook document.

The ‘inline’ backend can be invoked using the following command: %matplotlib inline

The Jupyter notebook also supports the ‘notebook’ backend which renders an interactive plot. Just below the plot, we can find a toolbar to switch views, pan, zoom and download options.

The ‘notebook’ backend can be invoked using the following command: %matplotlib notebook

Basic Plotting with Matplotlib

%matplotlib inline
from matplotlib import pyplot as plt
plt.plot([1,2,3,4,5],[1,2,3,4,5])
plt.show()

%matplotlib notebook
from matplotlib import pyplot as plt
plt.plot([1,2,3,4,5],[5,4,3,2,1])
plt.show()
0 comments on “Perceptron from scratch”

Perceptron from scratch

Perceptron from scratch


  Deep Learning

Contents

This is the most critical component of neural networks. In this section, we will learn about both forward propogation and backward propagation and the math behind back propagation.

Perceptron

This is the most fundamental type of element in a neural network. We have already seen what a perceptron is in the basics of neural networks section. However, we just scratched the surface. In this section, we will explore a perceptron in detail and explore a couple of simple problems it can solve.

Linearly Separable data

By definition, a perceptron can only solve linearly separable problems. What is a linearly separable problem ? Here are a couple of examples that show you linearly separable data. For example, two if the iris species are linearly separable by a hyperplane (in this case a single line). Similarly, an OR gate is also an example of a linearly separable dataset.

# Visualize a OR gate
import numpy as np

# OR gate data
x = np.array([[1,0],
              [0,1],
              [0,0],
              [1,1]])
y   = np.array([1,1,0,1])

import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(x[:,0],x[:,1],c=y)
plt.xlabel("x")
plt.ylabel("y")
plt.title("OR gate")
Text(0.5, 1.0, 'OR gate')
# Visualize just 2 species (setosa, versicolor) that are linearly separable 
# using the predictors (Sepel Length, Sepal, Width)
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

# iris data is readily available as a sklearn dataset.
from sklearn import datasets
iris = datasets.load_iris()
data = iris.data

# visualize just the first 100 rows (so that it contains only the species setosa and versicolor)
# We are specifically not plotting the third species (virginica), because it is not 
# linearly separable.
plt.scatter(data[0:100,0],data[0:100,1],c=iris.target[0:100])
plt.xlabel("sepal length")
plt.ylabel("sepal width")
plt.title("iris species - Setosa, Versicolor")
plt.savefig("iris.png")

Now that we have an understanding of the data, let’s use gradient descent to solve for the weights.

Activation function

What we are essentially trying to do is to find out values for weights and bias in such a way that

Here is how this function would look like.

# Show how a binary step function looks like.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x = np.linspace (-5,5,100)
y = np.zeros(len(x))
y[x>=0] = 1
y[x<0] = 0

plt.scatter(x,y)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Activation function - Binary step function ")

Text(0.5, 1.0, 'Activation function - Binary step function ')

Steps

Let’s solve the OR gate problem (or any other linearly separable problem) using a simple, single layer perceptron. Let’s start with the data first. Here is a quick high-level overview of the steps involved.

# OR gate data
x = np.array([[1,0],
              [0,1],
              [0,0],
              [1,1]])
y   = np.array([1,1,0,1])

Cost function

What about back propagation ? This is where gradient descent comes in (along with its paraphernelia of partial derivatives, learning rate, cost function etc). There are a couple of options to calculate the cost function (residual sum of squares & cross entropy), but for now, let’s just use the residual sum of squares (RSS) cost function. We have already seen this in the gradient descent section.

Partial Derivatives

Now, we want to see what the derivative of the cost function with respect to each of the variables (weights and bias).

similarly, the partial derivatives with respect to weight 2 (w2) and the bias (b) are

Update Rules

Once we have the partial derivatives, we can update the weights and biases

These equations can rattle any ML engineer, but remember, all of this is left to the library (tensorflow or any of the underlying deep learning library) to compute. The only reason why we are learning all of the math and hand-coding this in Python is to ensure that we get an in-depth understanding of back propagation. This is absolutely essential to be a good ML engineer.

Forward Propagation

Forward propagation is a relatively easy step. Let’s write a quick function for it .

def forward_prop(row) :
    y_hat = np.dot(x[row],w) + b
    if y_hat &gt; 0 :
        return 1
    else :
        return 0

Backward Propagation

Let’s now write a function for back propagation using all the geeky stuff above in “update rules” section.

def backward_prop(y_hat, row) :
    global b,w
    w[0]  = w[0] + alpha * (y[row] - y_hat) * x[row][0]
    w[1]  = w[1] + alpha * (y[row] - y_hat) * x[row][1]
    b     = b + alpha * (y[row] - y_hat) 

Initialize weights and biases

Initialize the weights and bias.

w = np.random.normal(size=2)
b = np.random.normal()

# learning rate. This is exactly the same term that we have already learnt in gradient descent.
alpha = 0.01

Predict Function

As of now, we are working on global variables to make things simple. Later, we will make a class out of all this to make things easy going forward. Just one more function to go, before we set this in motion. Assuming the model is in place (which we are going to write in a minute), we also need a function to predict a y

value, right ? Just like any Machine Learning algorithm, we need a predict ( ) method. Once the model fits the data to the right set of weights, this one is very easy. All we have to do is run the data through one forward propagation cycle.

# return the predicted y_hat, for the test data set.
def predict(x) :
    y = []
    
    # the user could be sending multiple rows. compute y_hat for each of the rows in the test dataset.
    for row in x :
        
        # weighted sum
        y_pred = np.dot(row,w) + b
        
        # run the weighted sum throught he activation function.
        if y_pred &gt; 0 :
            y_pred = 1
        else :
            y_pred = 0
            
        # append the predicted y (y_hat)to an array
        y.append(y_pred)
        
    # return the predicted array of y_hat values for the corresponding test data (x)
    return y

Training

The individual pieces of the simple perceptron have been coded. Now, we need to write the logic to

  • take the input data. For each row
    • do one cycle of forward propagation
    • do one cycle of backward propagation and updated the weights and bias.

This exhausts one cycle of the input data. In Deep learning, this is called as an epoch. We need to repeat the entire process for a whole bunch of epochs.

Let’s write the logic for this.

# number of epochs
for epoch in range(1000) :
    
    # for each row in x (cycle through the dataset)
    for row in range(x.shape[0]) :
        
        # for each row in x, predict y_hat
        y_hat = forward_prop(row)

        # for each row calculate weights
        backward_prop(y_hat,row)

print ( w, b)
[0.01363271 0.25196752] -0.009751486705392132

Predict

It is time to test our network. Let’s quickly print out x and y.

x
array([[1, 0],
       [0, 1],
       [0, 0],
       [1, 1]])
y
array([1, 1, 0, 1])

Since this is a small dataset, we don’t need a confusion matrix to calculate the accuracy. Let’s just use the predict function on the x array to predict y.

predict(x)
[1, 1, 0, 1]

That’s a perfect match. This is a small dataset. Let’s look at a slightly larger dataset and see if the perceptron is good enough to do linear separation. Let’s pick up the iris dataset from Scikit Learn’s

from sklearn import datasets

iris = datasets.load_iris()

data = iris.data

All of this data is not linearly separable. For example, if you plot the species against the sepal length and width, the species – versicolor and virginica are muddled. Only the first species (setosa) is clearly separated.

import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(data[:,0],data[:,1],c=iris.target)

So, let’s just use the first two species. Let’s plot it again.

# visualize just the first 100 rows (so that it contains only the species setosa and versicolor)
# We are specifically not plotting the third species (virginica), because it is not 
# linearly separable.
plt.scatter(data[0:100,0],data[0:100,1],c=iris.target[0:100])
plt.xlabel("sepal length")
plt.ylabel("sepal width")
plt.title("iris species - Setosa, Versicolor")
plt.savefig("iris.png")

Now, we have a clear, linear separation. Let’s train our perceptron on this data and see if it works.

x = data[0:100,0:2]  # iris sepal data ( sepal length and width )
y = iris.target[0:100] # iris species data (only setosa and versicolor)

w = np.random.normal(size=2)
b = np.random.normal()

# learning rate
alpha = 0.01

# number of epochs
for epoch in range(1000) :
    
    # for each row in x
    for row in range(x.shape[0]) :
        
        # for each row in x, predict y_hat
        y_hat = forward_prop(row)
        # for each row calculate weights
        backward_prop(y_hat,row)

print ( w, b)
[ 0.80437979 -1.08684544] -1.0479456545593953

We can very well do a confusion matrix to check for accuracy.

y_pred = predict(x)

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

print ( confusion_matrix(y,y_pred) )
print ( accuracy_score(y,y_pred))

[[49  1]
 [ 0 50]]
0.99

That a pretty good accuracy – almost 98%. It is a bit more interesting to see this visually – using matplotlib’s meshgrid.

import numpy as np
 
x_all = np.linspace(0,10,100).reshape(-1,1)
y_all = np.linspace(0,10,100).reshape(-1,1)
 
xx,yy = np.meshgrid(x_all,y_all)

x_grid = np.concatenate((xx.ravel().reshape(-1,1),yy.ravel().reshape(-1,1)),axis=1)

x_grid
array([[ 0.       ,  0.       ],
       [ 0.1010101,  0.       ],
       [ 0.2020202,  0.       ],
       ...,
       [ 9.7979798, 10.       ],
       [ 9.8989899, 10.       ],
       [10.       , 10.       ]])
y_grid = predict(x_grid)

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

mpl.rcParams['figure.dpi'] = 200

plt.scatter(x_grid[:,0],x_grid[:,1],c=y_grid,alpha=0.1)
plt.scatter(data[0:100,0],data[0:100,1],c=iris.target[0:100])

0 comments on “Gradient Descent”

Gradient Descent

Gradient Descent


  Deep Learning

Contents

What is Gradient Descent

In simple terms, Gradient Descent is an algorithm to compute the minimum of a function. OK – So, what is the big deal ? Well, most of the time in most machine learning algorithms, there is always a cost function that needs to be minimized. The best Machine Learning Algorithm is usually the one with the most inclusive and simple cost function. Once a cost function is defined, it is just a matter of solving for a minimum to arrive at the solution. That is why Gradient Descent is extremely useful in the context of Machine learning. Let’s see an example.

Gradient Descent for Linear Regression

Let’s start with the simplestML problem – Linear Regression. In the Machine Learning in Python Tutorial, we have covered Regression in Python in great detail.

Since the problem is simple enough to be solved mathematically, we have used the OLS (Ordinary Least Squares) technique to fit a straight line to the Linear Regression problem. You can view the equation for Ordinary Least Square to solve linear regression here. What is the cost function in this case?

picture here

Cost function = Sum of Squares of Residuals

The mathematical solution to minimize this cost function as derived by OLS is as follows.

where x¯ represents the average of x and y¯

represents the average of y

However, when the number of independent variables increase, OLS is not a good solution. That is where Gradient Descent shines. While OLS is an analytical solution, Gradient Descent is a numerical solution. However, to understand Gradient Descent, we have to be conversant with the following concepts in Math.

  • Derivatives
  • Partial Derivatives

Math

Derivatives

A derivative is the slope of a function. Let’s take a simple straight line –

A simple dataset for this could be

  • x = Number of DNA Samples
  • y = Number of DNA pairs.

Let’s plot a sample dataset and try to compute the slope.

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

x = np.array([1,2,3,4,5,6,7,8,9,10])
y = x * 2

print ( "x = ",x)
print ( "y = ",y)

plt.plot(x,y)
plt.plot(x[1], y[1], marker='o', markersize=10, color="red")
plt.plot(x[3], y[3], marker='o', markersize=10, color="red")

plt.hlines(y=y[1], xmin=x[1], xmax=x[3], color='b')
plt.vlines(x=x[3], ymin=y[1], ymax=y[3], color='b')

plt.text(4.2,5,(y[3] - y[1]))
plt.text(3,3,(x[3] - x[1]))
x =  [ 1  2  3  4  5  6  7  8  9 10]
y =  [ 2  4  6  8 10 12 14 16 18 20]



Text(3, 3, '2')

A simple dataset for this could be

  • x = Reach of a product
  • y = Sales of the product.

Let’s plot a sample dataset and try to compute the slope.

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

x = np.array([1,2,3,4,5,6,7,8,9,10])
y = x ** 2

print ( "x = ",x)
print ( "y = ",y)

plt.plot(x,y)
plt.plot(x[1], y[1], marker='o', markersize=10, color="red")
plt.plot(x[3], y[3], marker='o', markersize=10, color="red")

plt.hlines(y=y[1], xmin=x[1], xmax=x[3], color='b')
plt.vlines(x=x[3], ymin=y[1], ymax=y[3], color='b')

plt.text(4.2,8,(y[3] - y[1]))
plt.text(3,0.0,(x[3] - x[1]))

plt.text(5,10,"slope = 12/2 = 6")

plt.plot(x[5], y[5], marker='o', markersize=10, color="red")
plt.plot(x[7], y[7], marker='o', markersize=10, color="red")

plt.hlines(y=y[5], xmin=x[5], xmax=x[7], color='b')
plt.vlines(x=x[7], ymin=y[5], ymax=y[7], color='b')

plt.text(8.3,50,(y[7] - y[5]))
plt.text(7,30,(x[7] - x[5]))

plt.text(8.58,45,"slope = 14")
x =  [ 1  2  3  4  5  6  7  8  9 10]
y =  [  1   4   9  16  25  36  49  64  81 100]
Text(8.58, 45, 'slope = 14')

In this case, the slope is not constant as measured by same metric as we have done previously. The slope seems to be changing with x.

A correct way to define slope (or derivative) is to take an infinitesimally small increase in x and the corresponding value of y and divide them as before. Mathematically, it is defined as,

If f(x) is a function of x,

For example, if x = 4, increase x by a very small amount, say Δ=0.0001

. Now, let’s compute the value of y as well and plug them into the equation above

  • x = 4
  • dx = 0.0001

so, the derivative of f(x)=x2 is 2x

. We have not derived this mathematically – instead, we are trying to understanding with numbers, how a derivative works.

Derivative represents the change in the value of a function with respect to the variable (with which the derivative is being applied)

Partial Derivatives

Partial derivatives are almost similar to regular derivatives – except that partial derivatives work only on a particular variable. For example, say the speed of a car is dependent on

  • engine RPM
  • slope of the road

you can also write it as

Now, how does the speed (z) of the car vary with a unit increase in the engine RPM ? The answer is 8 – pretty straightforward. That is represented mathematically using

Let’s take another example – The equation of a 2-d plane can be generalized as below

You can visualize a plane like this –

As you can see, the plane intersects the z-axis at 2 ( Where the value of x & y are 0 ). Now, how far does the function vary, with a unit variation in x ?

Once again, I want you to take the intuitive meaning out of this –

For a unit change in x, the function changes by so much in the direction of x – That is a partial derivative.

A plane is simple to understand. However, the interpretation would be the same even if it were a complicated curve in a 3-d space – Like a hill.

Gradient Descent for Linear Regression

Now that we understand derivaties (both regular and partial), we are ready to graduate to Gradient Descent. Imagine a set of data points like so.

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([4,3,7,7,8,10,8,11])

plt.scatter(x,y)

Say, we want to fit a straight line to this data set using Linear Regression. How would you do it ? Very simple

from sklearn.linear_model import LinearRegression 
model = LinearRegression()

model.fit(x.reshape(-1,1),y) 

slope = model.coef_  
intercept   = model.intercept_

print ( "slope = ", slope)
print ( "intercept = ", intercept)

slope =  [0.84482759]
intercept =  2.6034482758620694
point_1 = slope*0 + intercept 
point_2 = slope*15 + intercept 
print ( point_1, point_2) 
plt.scatter( x,y,alpha=0.5)
plt.plot([0,15], [point_1,point_2],color="red")

[2.60344828] [15.27586207]
plt.scatter( x,y,alpha=0.5)
plt.plot([0,15], [point_1,point_2],color="red")

y_actual    = y
y_predicted = model.predict(x.reshape(-1,1))

for index,x_count in enumerate(x) :
    if y_actual[index] &gt; y_predicted[index] :
        plt.vlines(x=x_count, ymin=y_predicted[index], ymax=y_actual[index], color='b')
    if y_actual[index] <= y_predicted[index] :
        plt.vlines(x=x_count, ymin=y_actual[index], ymax=y_predicted[index], color='b')

The blue lines represent the residuals (or errors). We can calculate the slope (and intercept) of the fit using OLS (Ordinary Least Squares) or using Gradient Descent. We already know how OLS works in Linear Regression. We will see how Gradient Descent works. The equation for a straight line that would fit all the data points is some variation of

where

  • m = slope
  • b = intercept

Either way, we are minimizing the Sum of Squares of Errors. We started out with the definition of this at the beginning of the chapter.

Cost Function

Just to make things simple, assume a value of intercept (b) to be fixed at 2.6 ( b = 2.6 as we have previously solved for it). Imagine that we chart the cost function with different values of slope(m).

n = len(y)

cost_function = []
m = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0]
for slope in m : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y_actual[i] - slope * x[i] - 2.6) ** 2 )
    cost_function.append(cost)

plt.scatter(m,cost_function)

Visually, we can eyeball the minimum value of the cost function to be somewhere around 0.8. This is inline with the scikit learn’s LinearRegression model that we have solved above.

But how to mathematically solve for this (without using Ordinary Least Squares) ? That is where Gradient Descent comes in.

Gradient Descent is a technique to find out the minimum of a function numerically.

Imagine a ball put at a random location on the cost curve shown above.

If you were to let the ball go, it would roll down to the bottom. Why does this happen ? Gravity moves the ball from a higher energy state to a lower energy state. You already know this. What is more interesting is the path it takes. The path should always be from a position of higher slope to a position of lower slope.

If you see the slope of the ball at each of the 4 positions highlighted above, it is pretty clear that the slope(dashed line) is decreasing with every move down the curve. Slope represents the derivative of the curve. In this case derivative of the cost function with respect to the slope (x-axis).

How much do you move by ?

The amount and the direction you move is controlled by how much the cost function changes.

How much we move is based on how fast the cost function changes with the slope (or intercept). The way to learn that is by finding out the derivative of the cost function with respect to the slope (and intercept). For now, just to make things simple and to be able to view things in 2D, we are only keeping the slope as the variable (and the intercept as constant). We will see in the next section how to work on minimizing the cost function for both slope and intercept.

x = x.astype(float)
y = y.astype(float)

print ( x )
print ( y )
steps = 5
m = 0
n = len(x)
l_rate = 0.0001

# Start Gradient Descent
for step in range(steps) :
    y_pred = m * x + 2.6

    # Derivative of the cost function w.r.t m
    m_der  = (-1/n) * sum( (y - y_pred) * x)
    
    # move m
    m = m -  m_der
    print ( m)

[ 1.  3.  4.  5.  6.  7.  8. 10.]
[ 4.  3.  7.  7.  8. 10.  8. 11.]
31.700000000000003
-1125.3500000000001
41106.975000000006
-1500372.8875000002
54763642.09375

The value of m oscillates hugely. That is because, m

gives a general sense of direction, but doesn’t tell us how far we need to go. You can’t go an infinite distance in the direction of m. We need to take baby steps.

Take a small step, evaluate slope, take another small step in the direction of the least slope. This is the essence of Gradient Descent.

It is like a baby learning to take steps. So, there is a concept called learning rate that controls how far we move in the direction of the most descent. Let’s rewrite the program with learning rate.

import time

l_rate = 0.001
steps = 1000
m = 0

m_array = []

# Start Gradient Descent
for step in range(steps) :
    y_pred = m * x + 2.6

    # Derivative of the cost function w.r.t m
    m_der  = (-1/n) * sum( (y - y_pred) * x)
    
    # move m
    m = m -  l_rate * m_der
    m_array.append(m)

print ( "optimum slope (m) = ", m)

optimum slope (m) =  0.8453333333333318

Let’s plot the journey of the ball down the cost curve.

# Cost Function 
n = len(y)
y_actual    = y

cost_function = []
m = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0]
for slope in m : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y_actual[i] - slope * x[i] - 2.6) ** 2 )
    cost_function.append(cost)


# Steps taken    
n = len(y)

cost_function_m = []
m_steps = m_array
for slope in m_steps : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y_actual[i] - slope * x[i] - 2.6) ** 2 )
    cost_function_m.append(cost)

plt.scatter(m_steps,cost_function_m) # steps taken.
plt.scatter(m,cost_function) # cost function
plt.xlabel("m - slope")
plt.ylabel("cost function")
Text(0, 0.5, 'cost function')

As you can see, even after 500 steps, the ball was not able to roll down to it’s minimum. That is because of such a small learning rate – 0.0001.

Learning Rate

How fast do you go down the path ? It depends on how fast you want to converge (without overshooting). An arbitrary parameter called learning rate( α ) would determine how fast you go down the path. If you want to converge fast, can you increase the learning rate ? Probably not. Here is why.

If you set a learning rate = 0.1, that is roughly how fast you move along the x-axis. However, if you set the learning rate to 0.7 (thinking you could move down the curve faster), here is what would happen – You essentially miss the minimum.

Here is a quick plot of how the ball moves with a learning rate of 0.05 within just 100 iterations. The ball is going back and forth because it is overshooting. However, it finally settles down at the minimum.

Optimize Gradient Descent for both Slope & Intercept

So far, we have optimized Gradient Descent for just slope. How about the intercept ? The moment we introduce the 2nd parameter – intercept , the cost function becomes 3d.

  • x-axis = slope
  • y-axis = intercept
  • z-axis = cost function

Matplotlib provides a rudementary 3d scatter plot. We will be using that to plot the 3d plot of the cost function.

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([4,3,7,7,8,10,8,11])

# This import registers the 3D projection, but is otherwise unused.
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401 unused import

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

slope_values     = np.arange(start=0,stop=5,step=0.05)
intercept_values = np.arange(start=0,stop=5,step=0.05)
# y_pred    = slope * x + intercept

n = len(y)

cost_function = []

for index, slope in enumerate(slope_values) : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y[i] - slope_values[index] * x[i] - intercept_values[index]) ** 2 )
    cost_function.append(cost)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(slope_values, intercept_values, cost_function,marker='o')

plt.show()

Jupyter notebook doesn’t allo 3-d rotation, but if you try this in your standard IDE (say VS Code), you would be able to get a 3-d look at the plot.

Now, let’s optimize the cost function for both slope and intercept. Here are the partial derivatives of the cost function for the slope and intercept.

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([4,3,7,7,8,10,8,11])

l_rate = 0.01 # Learning rate
steps = 4000    # number of iterations ( steps )

m = 0 # initial slope
b = 0 # initial intercept

n = float(len(x))

m_array = []
b_array = []

# Start Gradient Descent
for step in range(steps) :
    y_pred = m * x + b

    # Derivative of the cost function w.r.t slope (m)
    m_der  = (-1/n) * sum( (y - y_pred) * x)
    # Derivative of the cost function w.r.t intercept (b)    
    b_der  = (-1/n) * sum( y-y_pred )
    
    # move m
    m = m -  l_rate * m_der
    b = b -  l_rate * b_der
    
    # gather the slope and intercept in an array to plot later 
    m_array.append(m)
    b_array.append(b)
    
print (" optimim slope(m) = ", m)
print ( "optimum intercept (m) = ", b)

optimim slope(m) =  0.8450107631510549
optimum intercept (m) =  2.6022056448336817

Now that we have the

slope_values     = np.arange(start=0,stop=3,step=0.05)
intercept_values = np.arange(start=0,stop=3,step=0.05)

n = len(y)

cost_function = []

for index, slope in enumerate(slope_values) : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y[i] - slope_values[index] * x[i] - intercept_values[index]) ** 2 )
    cost_function.append(cost)

slope_values_new     = m_array
intercept_values_new = b_array

cost_function_new = []
for index, slope in enumerate(slope_values_new) : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y[i] - slope_values_new[index] * x[i] - intercept_values_new[index]) ** 2 )
    cost_function_new.append(cost)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(slope_values, intercept_values, cost_function,marker='o')
ax.scatter(slope_values_new, intercept_values_new, cost_function_new,marker='o')

plt.show()

Stochastic Gradient Descent

For every iteration in Gradient Descent algo, the entire dataset is used to calculate the derivative of the cost function. This is very expensive if the dataset gets larger.

Imagine real world problems like image processing that has millions of pixels in a single image. Gradient Descent becomes almost impossible to compute if we don’t optimize.

One possible solution is to use Stochastic Gradient Descent. The word Stochastic stands for random. Instead of using every observation ( rows in the dataset), just use a random observation each time the derivative is being computed. In a standard Gradient Descent, the derivative of the cost function w.r.t slope is defined like this.

where i is the index of a random data row.

In Stochastic Gradient Descent, calculating the cost function is not done for the entire training set. Instead, you pick a random row in the dataset and calculate the cost function for that particular row only.

  • Surprisingly, this gives pretty good results (given the compromise)
  • This is computationally so much more efficient than doing the full dataset.

Let’s do this in python.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

l_rate = 0.001
steps = 1000
m = 0

n = len(x)

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([4,3,7,7,8,10,8,11])

m_array = []
cost_function_m = []

# Start Gradient Descent
for step in range(steps) :
    
    # CHANGE - At every step, get a new random number
    random_index = np.random.randint(0,len(x))
    
    # CHANGE - only calculate the predicted "y" value for that particular data row
    y_pred = m * x[random_index] + 2.6

    # Derivative of the cost function w.r.t m
    # CHANGE - calculate the derivative only for a particular row in the data
    m_der  = (-1/n) * sum( (y[random_index] - y_pred) * x)
    
    # move m
    m = m -  l_rate * m_der
    m_array.append(m)

m_steps = m_array
for slope in m_steps : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y[i] - slope * x[i] - 2.6) ** 2 )
    cost_function_m.append(cost)

plt.scatter(m_steps,cost_function_m) # steps taken.
plt.xlabel("m - slope")
plt.ylabel("cost function")

print ( "optimum slope (m) = ", m)
optimum slope (m) =  0.8793576691371693

That’s pretty close to the real value (as calculated by OLS), right ?

For just 10 observations, this is not a big deal. Imagine the performance gain if the number of rows were extremely large, as would happen in real datasets. However, there is a cost to this trade-off. The solution (optimum slope in this case) varies with each run. For example, try running the code above 4 or 5 times – each time you get a different solution. Although the difference is not a lot, still the stochastic gradient descent results in a slightly different solution with every run. How to counter this ? A compromise between standard gradient descent and stochastic gradient descent is possible – It is called Mini-batch gradient descent.

Mini-batch Gradient Descent

In practice though, a technique called mini-batch Gradient Descent is used mostly for Gradient Descent problems. It is hybird solution between standard gradient descent and stochastic gradient descent. The following picture highlights the difference between standard vs stochastic vs mini-batch gradient descent methods.

Let’s program Stochastic Gradient Descent in Python.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

l_rate = 0.001
steps = 1000
m = 0

n = len(x)

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([4,3,7,7,8,10,8,11])

m_array = []
cost_function_m = []

x_range = np.arange(len(x))

# Start Gradient Descent
for step in range(steps) :
    
    # CHANGE - At every step, get a set of new random numbers
    random_index = np.random.choice(x_range, size=3, replace=False)
    
    # CHANGE - only calculate the predicted "y" value for that particular data row
    y_pred = m * x[random_index] + 2.6

    # Derivative of the cost function w.r.t m
    # CHANGE - calculate the derivative only for a particular row in the data
    m_der  = (-1/n) * sum( (y[random_index] - y_pred) * x[random_index])
    
    # move m
    m = m -  l_rate * m_der
    m_array.append(m)

m_steps = m_array
for slope in m_steps : 
    cost = 0
    for i in range(n):
        cost = cost + (1/(2*n)) * ( (y[i] - slope * x[i] - 2.6) ** 2 )
    cost_function_m.append(cost)

plt.scatter(m_steps,cost_function_m) # steps taken.
plt.xlabel("m - slope")
plt.ylabel("cost function")

print ( "optimum slope (m) = ", m)
optimum slope (m) =  0.8472922870088795

This time the slope value is pretty steady.

Mini-batch gradient descent achieves a compromise between the time-consuming, but accurate Gradient Descent and a quick, but slighlty inaccurate Stochastic Gradient Descent.

Gradient Descent is a generic Cost Minimization algorithm. As long as there is a convex cost function. If there are multiple minima, Gradient Decent only arrives at a local minima.

Gradient Descent for Logistic Regression

If you have been through the machine learning tutorial, you must have already seen what is logistic regression works and the math behind Logistic Regression. In the tutorial, we have used Scikit Learn’s LogisticRegression class to fit the data using Logistic Regression. However, we want to understand how Gradient Descent works for Logistic Regression.

In order to do that, we have to first understand 3 things

  • Logistic Regression equation
  • Cost function
  • Partial Derivative of the Cost function w.r.t. x

The equation for logistic regression is

where

  • w is a vector of numbers
  • b is a number
  • x is a vector of predictors

We haven’t seen w and b in Linear Regression that we saw previously, right ? Where have them sprung from ? Well, we are generalizing for the number of predictors a bit. For a single predictor, we can write the Logistic Regression as follows.

where

  • m slope
  • b intercept

The output of the Logistic Regression is actually a sigmoid curve.

You can get a visual like so.

from scipy.special import expit
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x = np.linspace(-10,10)
y = expit (x)
 
plt.plot(x,y)
plt.grid()

The cost function for Logistic Regression can be formulated similar to a Linear Regression

However, this can result in a non-convex curve for logistic regression. So, instead of using Sum of Squares of Error, Logistic Regresion uses Cross Entropy for it’s cost function.

The partial derivative of the Cost function w.r.t (w,b) is

Now that we have the 3 required things, let’s write our gradient descent learning algorithm

where α

is the learning rate

For a single predictor, if you think of the slope/intercept (like the Linear Regression above), this would become

where i is the number of rows in the dataset

Now, let’s implement Logistic Regression in Python. Let’s take a simple dataset.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([0,0,0,1,1,1,1,1])

plt.scatter(x,y,c=y)

Seems simple enough, right ? Let’s first get a baseline using Scikit Learn’s LogisticRegression model.

from sklearn import linear_model
from scipy.special import expit
 
model_lr = linear_model.LogisticRegression(C=1e5, solver='lbfgs')
model_lr.fit(x.reshape(-1,1), y)

LogisticRegression(C=100000.0, class_weight=None, dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

Let’s fit it visually.

x_test = np.linspace(1.0,10.0,100)
# predict dummy y_test data based on the logistic model
y_test = x_test * model_lr.coef_ + model_lr.intercept_
 
sigmoid = expit(y_test)

plt.scatter(x,y, c=y)
 
# ravel to convert the 2-d array to a flat array
plt.plot(x_test,sigmoid.ravel(),c="green", label = "logistic fit")
plt.yticks([0, 0.2, 0.4, 0.5, 0.6, 0.7, 1])
plt.axhline(.5, color="red", label="cutoff")
plt.legend(loc="lower right")

Let’s try and fit this data using Logisic Regression based on Gradient Descent.

model_lr.intercept_
array([-67.57978497])
import numpy as np
from math import exp
import matplotlib.pyplot as plt
%matplotlib inline

l_rate = 0.001
steps = 1000
m = 0

n = len(x)

x = np.array([1,3,4,5,6,7,8,10])
y = np.array([0,0,0,1,1,1,1,1])
import time

l_rate = 1
steps = 1000
m = 0

m_array = []

# Start Gradient Descent
for step in range(steps) :
    y_pred_log = m * x + (-67.57)
    y_pred     = 1/(1 + np.exp(-y_pred_log))

    # Derivative of the cost function w.r.t m
    m_der  = (-1/n) * sum( (y - y_pred) * x)
    
    # move m
    m = m -  l_rate * m_der
    m_array.append(m)

print ( "optimum slope (m) = ", m)
optimum slope (m) =  15.017184591647284

Once again, we got pretty close to the slope as predicted by Scikit Learn’s LogisticRegression model.

model_lr.coef_
array([[15.03246322]])

Now that we understand how Gradient Descent works, let’s move on to the next important topic in Neural Networks – Back Propogation.

0 comments on “Neural Network Basics”

Neural Network Basics

Neural Network Basics


  Deep Learning

Contents

Basic Structure

A Neural network is an interconnected set of Neurons, arranged in layers. Input goes on one end and output the other end.

For example, the picture above is a neural network with 4 nodes in the input layer and 3 nodes in the output layer. This is the exact structure that we have used for iris classification that we have solved in our Hello World example on Day 1. The layer in between is called the hidden layer. This is what gives the name – Deep Learning – because the network is deep (with not just the input and output layer, but one or many hidden layers).

This is the basic structure of a neural network. The number of nodes or layers could change, but this is the basic structure of a typical neural network. To understand a neural network better, we have to get started from the basics.

Biological Neuron

Neural Network was inspired by the brain. A human brain consists of billions of neurons that are interconnected. Here is a quick picture from wikipedia.

x1,x2..xn represent the inputs. y1,y2..yn

are the outputs. So, essentially a neuron tranforms a set of inputs to a set of outputs. When many such neurons are connected, they form an intelligent system.

Perceptron

The most simplest way to represent a neuron, mathematically is with a perceptron.

A perceptron receives inputs, adds them up and produces an output. What is the big deal about it ? It is just basic addition, right ? True – That’s where the concept of weights come in.

Each of the inputs is multiplied by a weight. So, instead of just summing up the inputs, you multiply them with the weights and sum it up (weighted sum of inputs). The weighted sum could be a number within a very large range, depending on the input range and the weights. What is the use of having a number that could be anywhere from −∞ to +∞

To normalize this, a bias or threshold is introduced.

What does a perceptron achieve

The calculation above seems simple enough, but what exactly does it acheive ? Think of it like a decision making machine. It weights input parameters and provides a Yes or No decision. For example, say you want to decide if you want to learn Deep Learning or not, how do you go about it in your mind ?

Inputs 	               Weight
Job Prospect 	        30%
Interesting enough 	20%
Future Growth 	        30%
Salary 	                20%

You weigh your inputs ( multiply the inputs with the corresponding weightage) and arrive at a figure. In fact, each of these inputs are also given a number internally in your mind. However, the way a human brain functions is far more complicated. Like I said before, neural networks & deep learning are just “based on” how the human brain works. it is not an exact replica.

Sigmoid Neuron

While a perceptron is good enough for simple tasks, it has its limitations when building complex neural networks. That is where sigmoid neurons come in. If you have seen logistic regression in Machine Learning before, you will already have an idea on what a sigmoid function does. It essentially maps a range of numbers between −∞ to +∞

to values betwen 0 and 1.

A perceptron outputs either a 0 or a 1 depending on the weighted inputs & threshold. A sigmoid neuron outputs a value between 0 and 1. This makes the sigmoid neuron much more useful in large scale neural networks.

The weighted sum of inputs + bias is calculated just as above.

Now, instead of just outputting this, a sigmoid neuron calculates a sigmoid of the calculated weighted input + bias and outputs a value between a 0 and 1.

You can a have a visual of the sigmoid function as seen below.

from scipy.special import expit
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

x = np.linspace(-1000,1000)
y = expit (x)
 
plt.plot(x,y)
plt.grid()

This looks like a binary curve ( that can only take a value of 0 or 1), but if you closely observe the curve between a range of say -10 to 10, you can clearly observe the gradual progression.

x = np.linspace(-10,10)
y = expit (x)
 
plt.plot(x,y)
plt.grid()

This is the logistic regression curve. Only when the value of the ( weighted sum + bias ) stays very close to 0, do you observe the logistic curve. For any other extreme value, the output is pretty much either a 0 or 1 (very much like a perceptron).

Advantages of Sigmoid Neuron over a Perceptron

Since the output range of a sigmoid neuron is smooth, small changes in the inputs will result in small changes in the output. So, instead of just doing a flip of the switch (0 or 1), sigmoid function acts more like a slider. This feature of sigmoid functions output makes it very useful for neural networks learning.

Changes to your output are essentially a function of changes in the weights and biases. This is the basis of Neural Network learning.

However, to understand this mathematically, we have to understand a little bit of derivatives, partial derivatives and then the actual back-propogation algorithm itself – Gradient Descent. These will be the topic of our next chapter.