NumPy
Numpy is a python package specifically designed for efficiently working on homogeneous n-dimensional arrays . Since array level operations are highly mathematical in nature, most of numpy is written in C and wrapped with Python. This is the key to numpy’s success.
Just enough Numpy
- Install numpy
- n-dimensional array
- Array Creation
- Array Operations
- Array Indexing & Slicing
- Array Manipulation
- Jupyter Notebook
Additional Reading
Just enough NumPy
Install numpy
Before you do anything with numpy, you would have to first install it ( unless you have other data science distributions like Anaconda or Canopy installed ). Installing numpy is as simple as
# pip install numpy
Why NumPy
Let’s do a simple numeric operation – Summing up the first million numbers. Let’s first do it in python and then in NumPy to understand what NumPy brings to the table.
# without numpy import time sum = 0 start_time = time.time() for num in range(10000000) : sum = sum + num print ( "sum = ", sum) end_time = time.time() python_time = end_time - start_time print ( "time taken = ", python_time)
sum = 49999995000000 time taken = 3.329150438308716
# with numpy import numpy as np import time sum = 0 start_time = time.time() numbers = np.arange(10000000) sum = np.sum(numbers, dtype = np.uint64) print ( "sum = ", sum) end_time = time.time() numpy_time = end_time - start_time factor = python_time / numpy_time print ( "time taken = ", (end_time - start_time)) print ( "numpy is ", factor , " times faster than standard python")
sum = 49999995000000 time taken = 0.042661190032958984 numpy is 78.03698011557334 times faster than standard python
As you can see, numpy is 45 times faster than standard python. Of course the number may slightly vary based on the power of your computer. Right off the bat, you can see that NumPy brings a lot of value to the table. That level of performance improvement – all within the comfort of Python. That is the power of NumPy.
The power of NumPy lies in leveraging low level C language API to increase the performance of Numeric Operations in Python.
n-dimensional array
This is the core data structure in numpy. We will explore how useful it is and what you can do with it pretty soon. Let’s create a simple 1 dimensional array with just 10 numbers

import numpy as np a = np.array([1,2,3,4,5,6,7,8,9,10]) a
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Let’s put a second dimension to it

b = np.array( [[1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10], [11,12,13,14,15,16,17,18,19,20]]) b
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
Create an array from list
An array can be created from a standard python list. All you have to do is use the array ( ) function and pass the list to it.

numbers = [1,2,3,4,5,6,7,8,9,10,11,12] a = np.array(numbers) a
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
You can create a 2-d array as well from a list.

a1 = [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10] a2 = [11,12,13,14,15,16,17,18,19,20] b = np.array( [a1,a2]) b
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
shape ( )
How do you know it has a second dimension ? Use the shape function to tell you the shape of the array.

b.shape
(2, 10)
meaning, there are 2 rows and 10 columns.
arange ( )
Like the standard python function range ( ) , numpy has a similar function called arange ( )

numbers = np.arange(1,51) numbers
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
numbers.shape
(50,)
reshape ( )
You can now use the reshape function to reshape the data into any number of dimensions you like. For example, you can reshape this into any of the following combinations in 2d. eg.,5 x 102 x 25etc

numbers.reshape(5,10)
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

numbers.reshape(2 , 25)
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
What happens when you try to reshape it to a 2 x 50 array ? Basically that is not possible, and naturally NumPy throws up an error message
numbers.reshape(2,50)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-21-27db37f04a26> in <module> ----> 1 numbers.reshape(2,50) ValueError: cannot reshape array of size 50 into shape (2,50)
Sometimes you need to reshape an array knowing just its columns and not its rows ( or vice-versa ). In cases like that NumPy provides a shortcut.

numbers = np.arange(1,13) numbers.reshape(-1,2)
array([[ 1, 2], [ 3, 4], [ 5, 6], [ 7, 8], [ 9, 10], [11, 12]])
You can do the same for columns as well.

numbers.reshape(2,-1)
array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12]])
Array Operations
This is where we get the sweet surprise. Array operations are element wise. Let’s compare it to a list and you will see the difference
Element-wise Operations
a = list(range(11)) b = list(range(11,21)) a + b
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
a1 = np.arange(1,11) b1 = np.arange(11,21) a1 + b1
array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30])

Element wise operations are not just across 2 arrays. You can even do simple unary operations like power, multiplications etc. Essentially, we are eliminating the for loopIn [22]:
a = list(range(11)) a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a12 = pow(a1,2) a12
array([ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100], dtype=int32)
Array Multiplication

a13 = a1 * 3 a13
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30])
Aggregate Operations
sum ( )
a1 = np.arange(1,11) print ( a1 ) a1.sum()
[ 1 2 3 4 5 6 7 8 9 10] 55

min ( ) & max ( )

a1.min()
1
a1.max()
10
len ( )

len(a1)
10
Aggregate Operations along an axis

a = np.arange(1,101).reshape(10,10) a
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [ 31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [ 51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [ 61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [ 71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [ 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [ 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]])
Sum across each of the axis

a.sum(axis=1)
array([ 55, 155, 255, 355, 455, 555, 655, 755, 855, 955])
a.sum(axis=0)
array([460, 470, 480, 490, 500, 510, 520, 530, 540, 550])
Similarly, you can do a min ( ) or max ( ) across any axis

a.min( axis = 1 )
array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])
a.min ( axis = 0 )
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Array indexing & Slicing
Array Indexing
Indexing a 1-d array is exactly similar to a list

To get a particular index, just use the square brackets notation ( like a list )

b[5]
6
You can use negative indexing as well.

b[-5]
6
Indexing a 2d array is just as simple. Since the array is 2 dimensional now, you have to use 2 indices. One along each axis.

a[4,7]
48
Array Slicing
Slicing a 1-d array is also similiar to a list. Use a slice in place of a number for indexing

b
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
b[3:7]
array([4, 5, 6, 7])
Slicing a 2-d array extends the same functionality across all the axis

a
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [ 31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [ 51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [ 61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [ 71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [ 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [ 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]])
a[2:5, 3:8]
array([[24, 25, 26, 27, 28], [34, 35, 36, 37, 38], [44, 45, 46, 47, 48]])
You can very well use a combination of slicing and indexing

a[4,3:8]
array([44, 45, 46, 47, 48])
If you wanted to specify all the elements across a particular axis, just use a colon (:) without anything before or after.

So, both of these are equivalent.
# Expression 1 a[4,0:10]
array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
# Expression 2 a[4, : ]
array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
a[[1,4], :]
array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
What if you wanted multiple slices.. like so ?

a[ [1,4,8], : ]
array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90]])
Array Manipulation
So, far we have seen how to slice data from a NumPy array or use aggregate operations along an axis. In this section, we will learn about array manipulations.
Append rows or columns
Say we have a 2-d array of shape 4 x 5.
import numpy as np numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
What if we wanted to insert another row at the end ? Say this row.

extras = np.array([21,22,23,24,25])
numbers = np.append(numbers,[extras],axis=0) print ( numbers )
[[ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15] [16 17 18 19 20] [21 22 23 24 25]]
Say if you wanted to append it as a column,

j = extras.reshape(5,-1) j
array([[21], [22], [23], [24], [25]])
j.shape
(5, 1)
numbers = np.append(numbers,extras.reshape(5,-1),axis=1) print ( numbers )
[[ 1 2 3 4 5 21] [ 6 7 8 9 10 22] [11 12 13 14 15 23] [16 17 18 19 20 24] [21 22 23 24 25 25]]
Insert rows or columns
What if you wanted to insert a column in the middle ? Like so ?

In this case, you should use the insert ( ) function.
import numpy as np numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
extras = np.array([21,22,23,24,25]) print ( extras)
[21 22 23 24 25]
numbers_new = np.insert(numbers,2,extras,axis=0) numbers_new
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [21, 22, 23, 24, 25], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
Similarly, you can insert a column as well.

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
extras = np.array([21,22,23,24]) print ( extras)
[21 22 23 24]
numbers_new = np.insert(numbers,3,extras,axis=1) print ( numbers_new)
[[ 1 2 3 21 4 5] [ 6 7 8 22 9 10] [11 12 13 23 14 15] [16 17 18 24 19 20]]
Delete rows or columns
To delete a row or column use the delete ( ) function. For example, to delete the 3rd column below,

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
numbers_new = np.delete(numbers,2,axis=1) print ( numbers_new )
[[ 1 2 4 5] [ 6 7 9 10] [11 12 14 15] [16 17 19 20]]
numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
To delete the second row below,

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
numbers_new = np.delete(numbers,1,axis=0) print ( numbers_new )
[[ 1 2 3 4 5] [11 12 13 14 15] [16 17 18 19 20]]
Additional Reading
Meshgrid
Meshgrid is a useful feature of NumPy when creating a grid of co-ordinates. The function of meshgrid is really simple. Say you have a list of x and y co-ordinates
import numpy as np x = np.arange(1,10) y = np.arange(1,10)
Let’s plot it to see how it looks like.
import matplotlib.pyplot as plt plt.scatter(x,y) plt.savefig("scatter-plot.png")

What if you want all the co-ordinates in between ? like so..

meshgrid ( ) is a convenience function in numpy that can generate all the points in the grid.
xx,yy = np.meshgrid(x,y) print(xx) print(yy)
[[1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9]] [[1 1 1 1 1 1 1 1 1] [2 2 2 2 2 2 2 2 2] [3 3 3 3 3 3 3 3 3] [4 4 4 4 4 4 4 4 4] [5 5 5 5 5 5 5 5 5] [6 6 6 6 6 6 6 6 6] [7 7 7 7 7 7 7 7 7] [8 8 8 8 8 8 8 8 8] [9 9 9 9 9 9 9 9 9]]
Now, if you plot all of the elements on a scatter plot, you get this.
import matplotlib.pyplot as plt %matplotlib inline plt.scatter(xx,yy)

This can be used in conjunctin with matplotlib’s contour or contourf functions to evaluate behaviour of functions over a grid. For example, if you want to visualize a circle, just create another variable z that is a function of x and y. The equation of a circle is,

z = xx**2 + yy**2 print(z)
[[ 2 5 10 17 26 37 50 65 82] [ 5 8 13 20 29 40 53 68 85] [ 10 13 18 25 34 45 58 73 90] [ 17 20 25 32 41 52 65 80 97] [ 26 29 34 41 50 61 74 89 106] [ 37 40 45 52 61 72 85 100 117] [ 50 53 58 65 74 85 98 113 130] [ 65 68 73 80 89 100 113 128 145] [ 82 85 90 97 106 117 130 145 162]]
plt.contour(xx,yy,z,levels=[10,20,30,40,50,60,70,80,90,100])
<matplotlib.contour.QuadContourSet at 0x1295f150>

Each of these lines represent the same z value. For example, the innermost line (in purple) shows all the values where the level is 10. In other words, it is essentially mapping all the points ( x and y ) that result in a z value of 10.
If you want to fill the contours, use contourf function.
plt.contourf(xx,yy,z,levels=[10,20,30,40,50,60,70,80,90,100])
<matplotlib.contour.QuadContourSet at 0x129a3e90>
