NumPy


  Machine Learning in Python

Numpy is a python package specifically designed for efficiently working on homogeneous n-dimensional arrays . Since array level operations are highly mathematical in nature, most of numpy is written in C and wrapped with Python. This is the key to numpy’s success.

Just enough Numpy

Additional Reading


Just enough NumPy

Install numpy

Before you do anything with numpy, you would have to first install it ( unless you have other data science distributions like Anaconda or Canopy installed ). Installing numpy is as simple as

# pip install numpy

Why NumPy

Let’s do a simple numeric operation – Summing up the first million numbers. Let’s first do it in python and then in NumPy to understand what NumPy brings to the table.

# without numpy
import time 

sum = 0

start_time = time.time()

for num in range(10000000) :
    sum = sum + num
    
print ( "sum = ", sum)

end_time = time.time()

python_time = end_time - start_time

print ( "time taken = ", python_time)
sum =  49999995000000
time taken =  3.329150438308716
# with numpy
import numpy as np
import time

sum = 0

start_time = time.time()

numbers = np.arange(10000000)

sum = np.sum(numbers, dtype = np.uint64)
print ( "sum = ", sum)

end_time = time.time()

numpy_time = end_time - start_time
factor = python_time / numpy_time

print ( "time taken = ", (end_time - start_time))

print ( "numpy is ", factor , " times faster than standard python")
sum =  49999995000000
time taken =  0.042661190032958984
numpy is  78.03698011557334  times faster than standard python

As you can see, numpy is 45 times faster than standard python. Of course the number may slightly vary based on the power of your computer. Right off the bat, you can see that NumPy brings a lot of value to the table. That level of performance improvement – all within the comfort of Python. That is the power of NumPy.

The power of NumPy lies in leveraging low level C language API to increase the performance of Numeric Operations in Python.

n-dimensional array

This is the core data structure in numpy. We will explore how useful it is and what you can do with it pretty soon. Let’s create a simple 1 dimensional array with just 10 numbers

import numpy as np

a = np.array([1,2,3,4,5,6,7,8,9,10])
a
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Let’s put a second dimension to it

b = np.array( [[1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10],
               [11,12,13,14,15,16,17,18,19,20]])
b
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])

Create an array from list

An array can be created from a standard python list. All you have to do is use the array ( ) function and pass the list to it.

numbers = [1,2,3,4,5,6,7,8,9,10,11,12]
a = np.array(numbers)
a
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

You can create a 2-d array as well from a list.

a1 = [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10]
a2 = [11,12,13,14,15,16,17,18,19,20]
b = np.array( [a1,a2])
b
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]]) 

shape ( )

How do you know it has a second dimension ? Use the shape function to tell you the shape of the array.

b.shape
(2, 10)

meaning, there are 2 rows and 10 columns.

arange ( )

Like the standard python function range ( ) , numpy has a similar function called arange ( )

numbers = np.arange(1,51)
numbers
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
numbers.shape
(50,)

reshape ( )

You can now use the reshape function to reshape the data into any number of dimensions you like. For example, you can reshape this into any of the following combinations in 2d. eg.,5 x 102 x 25etc

numbers.reshape(5,10)
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
numbers.reshape(2 , 25)
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
        42, 43, 44, 45, 46, 47, 48, 49, 50]])

What happens when you try to reshape it to a 2 x 50 array ? Basically that is not possible, and naturally NumPy throws up an error message

numbers.reshape(2,50)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-27db37f04a26> in <module>
----> 1 numbers.reshape(2,50)

ValueError: cannot reshape array of size 50 into shape (2,50)

Sometimes you need to reshape an array knowing just its columns and not its rows ( or vice-versa ). In cases like that NumPy provides a shortcut.

numbers = np.arange(1,13)

numbers.reshape(-1,2)
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12]])

You can do the same for columns as well.

numbers.reshape(2,-1)
array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

Array Operations

This is where we get the sweet surprise. Array operations are element wise. Let’s compare it to a list and you will see the difference

Element-wise Operations

a = list(range(11))
b = list(range(11,21))
a + b
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
a1 = np.arange(1,11)
b1 = np.arange(11,21)
a1 + b1
array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30])

Element wise operations are not just across 2 arrays. You can even do simple unary operations like power, multiplications etc. Essentially, we are eliminating the for loopIn [22]:

a = list(range(11))
a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a12 = pow(a1,2)
a12
array([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100], dtype=int32) 

Array Multiplication

a13 = a1 * 3
a13
array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

Aggregate Operations

sum ( )
a1 = np.arange(1,11)
print ( a1 )
a1.sum()
[ 1  2  3  4  5  6  7  8  9 10]
55
min ( ) & max ( )
a1.min()
1
a1.max()
10
len ( )
len(a1)
10

Aggregate Operations along an axis

a = np.arange(1,101).reshape(10,10)
a
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])

Sum across each of the axis

a.sum(axis=1)
array([ 55, 155, 255, 355, 455, 555, 655, 755, 855, 955]) 
a.sum(axis=0)
array([460, 470, 480, 490, 500, 510, 520, 530, 540, 550])

Similarly, you can do a min ( ) or max ( ) across any axis

a.min( axis = 1 )
array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])
a.min ( axis = 0 )
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Array indexing & Slicing

Array Indexing

Indexing a 1-d array is exactly similar to a list

To get a particular index, just use the square brackets notation ( like a list )

b[5]
6

You can use negative indexing as well.

b[-5]
6

Indexing a 2d array is just as simple. Since the array is 2 dimensional now, you have to use 2 indices. One along each axis.

a[4,7]
48

Array Slicing

Slicing a 1-d array is also similiar to a list. Use a slice in place of a number for indexing

b
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
b[3:7]
array([4, 5, 6, 7])

Slicing a 2-d array extends the same functionality across all the axis

a
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])
a[2:5, 3:8]
array([[24, 25, 26, 27, 28],
       [34, 35, 36, 37, 38],
       [44, 45, 46, 47, 48]])

You can very well use a combination of slicing and indexing

a[4,3:8]
array([44, 45, 46, 47, 48])

If you wanted to specify all the elements across a particular axis, just use a colon (:) without anything before or after.

So, both of these are equivalent.

# Expression 1
a[4,0:10]
array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
# Expression 2
a[4, : ]
array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
a[[1,4], :]
array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

What if you wanted multiple slices.. like so ?

a[ [1,4,8], : ]
array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
       [81, 82, 83, 84, 85, 86, 87, 88, 89, 90]])

Array Manipulation

So, far we have seen how to slice data from a NumPy array or use aggregate operations along an axis. In this section, we will learn about array manipulations.

Append rows or columns

Say we have a 2-d array of shape 4 x 5.

import numpy as np

numbers = np.arange(1,21)
numbers = numbers.reshape(4,5)
numbers

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

What if we wanted to insert another row at the end ? Say this row.

extras = np.array([21,22,23,24,25])

numbers = np.append(numbers,[extras],axis=0)

print ( numbers )

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]

Say if you wanted to append it as a column,

j = extras.reshape(5,-1)
j
array([[21],
       [22],
       [23],
       [24],
       [25]])
j.shape

(5, 1)
numbers = np.append(numbers,extras.reshape(5,-1),axis=1)
print ( numbers )
[[ 1  2  3  4  5 21]
 [ 6  7  8  9 10 22]
 [11 12 13 14 15 23]
 [16 17 18 19 20 24]
 [21 22 23 24 25 25]]

Insert rows or columns

What if you wanted to insert a column in the middle ? Like so ?

In this case, you should use the insert ( ) function.

import numpy as np

numbers = np.arange(1,21)
numbers = numbers.reshape(4,5)
numbers

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])
extras = np.array([21,22,23,24,25])
print ( extras)
[21 22 23 24 25]
numbers_new = np.insert(numbers,2,extras,axis=0)
numbers_new
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [21, 22, 23, 24, 25],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

Similarly, you can insert a column as well.

numbers = np.arange(1,21)
numbers = numbers.reshape(4,5)
numbers
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])
extras = np.array([21,22,23,24])
print ( extras)
[21 22 23 24]
numbers_new = np.insert(numbers,3,extras,axis=1)
print ( numbers_new)

[[ 1  2  3 21  4  5]
 [ 6  7  8 22  9 10]
 [11 12 13 23 14 15]
 [16 17 18 24 19 20]]

Delete rows or columns

To delete a row or column use the delete ( ) function. For example, to delete the 3rd column below,

numbers = np.arange(1,21)
numbers = numbers.reshape(4,5)
numbers
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])
numbers_new = np.delete(numbers,2,axis=1)
print ( numbers_new )
[[ 1  2  4  5]
 [ 6  7  9 10]
 [11 12 14 15]
 [16 17 19 20]]

numbers = np.arange(1,21)
numbers = numbers.reshape(4,5)
numbers
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

To delete the second row below,

numbers = np.arange(1,21)
numbers = numbers.reshape(4,5)
numbers

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])
numbers_new = np.delete(numbers,1,axis=0)
print ( numbers_new )
[[ 1  2  3  4  5]
 [11 12 13 14 15]
 [16 17 18 19 20]]

Additional Reading

Meshgrid

Meshgrid is a useful feature of NumPy when creating a grid of co-ordinates. The function of meshgrid is really simple. Say you have a list of x and y co-ordinates

import numpy as np

x = np.arange(1,10)
y = np.arange(1,10)

Let’s plot it to see how it looks like.

import matplotlib.pyplot as plt

plt.scatter(x,y)
plt.savefig("scatter-plot.png")

What if you want all the co-ordinates in between ? like so..

meshgrid ( ) is a convenience function in numpy that can generate all the points in the grid.

xx,yy = np.meshgrid(x,y)
print(xx)
print(yy)
[[1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]
 [1 2 3 4 5 6 7 8 9]]
[[1 1 1 1 1 1 1 1 1]
 [2 2 2 2 2 2 2 2 2]
 [3 3 3 3 3 3 3 3 3]
 [4 4 4 4 4 4 4 4 4]
 [5 5 5 5 5 5 5 5 5]
 [6 6 6 6 6 6 6 6 6]
 [7 7 7 7 7 7 7 7 7]
 [8 8 8 8 8 8 8 8 8]
 [9 9 9 9 9 9 9 9 9]]


Now, if you plot all of the elements on a scatter plot, you get this.

import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(xx,yy)

This can be used in conjunctin with matplotlib’s contour or contourf functions to evaluate behaviour of functions over a grid. For example, if you want to visualize a circle, just create another variable z that is a function of x and y. The equation of a circle is,

z = xx**2 + yy**2

print(z)
[[  2   5  10  17  26  37  50  65  82]
 [  5   8  13  20  29  40  53  68  85]
 [ 10  13  18  25  34  45  58  73  90]
 [ 17  20  25  32  41  52  65  80  97]
 [ 26  29  34  41  50  61  74  89 106]
 [ 37  40  45  52  61  72  85 100 117]
 [ 50  53  58  65  74  85  98 113 130]
 [ 65  68  73  80  89 100 113 128 145]
 [ 82  85  90  97 106 117 130 145 162]]
plt.contour(xx,yy,z,levels=[10,20,30,40,50,60,70,80,90,100])
<matplotlib.contour.QuadContourSet at 0x1295f150>

Each of these lines represent the same z value. For example, the innermost line (in purple) shows all the values where the level is 10. In other words, it is essentially mapping all the points ( x and y ) that result in a z value of 10.

If you want to fill the contours, use contourf function.

plt.contourf(xx,yy,z,levels=[10,20,30,40,50,60,70,80,90,100])
<matplotlib.contour.QuadContourSet at 0x129a3e90>

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.