# NumPy

Numpy is a python package specifically designed for efficiently working on

. Since array level operations are highly mathematical in nature, most of numpy is written in C and wrapped with Python. This is the key to numpy’s success.homogeneous n-dimensional arrays

## Just enough Numpy

- Install numpy
- n-dimensional array
- Array Creation
- Array Operations
- Array Indexing & Slicing
- Array Manipulation
- Jupyter Notebook

#### Additional Reading

### Just enough NumPy

### Install numpy

Before you do anything with numpy, you would have to first install it ( unless you have other data science distributions like Anaconda or Canopy installed ). Installing numpy is as simple as

# pip install numpy

### Why NumPy

Let’s do a simple numeric operation – Summing up the first million numbers. Let’s first do it in python and then in NumPy to understand what NumPy brings to the table.

# without numpy import time sum = 0 start_time = time.time() for num in range(10000000) : sum = sum + num print ( "sum = ", sum) end_time = time.time() python_time = end_time - start_time print ( "time taken = ", python_time)

sum = 49999995000000 time taken = 3.329150438308716

# with numpy import numpy as np import time sum = 0 start_time = time.time() numbers = np.arange(10000000) sum = np.sum(numbers, dtype = np.uint64) print ( "sum = ", sum) end_time = time.time() numpy_time = end_time - start_time factor = python_time / numpy_time print ( "time taken = ", (end_time - start_time)) print ( "numpy is ", factor , " times faster than standard python")

sum = 49999995000000 time taken = 0.042661190032958984 numpy is 78.03698011557334 times faster than standard python

As you can see, numpy is 45 times faster than standard python. Of course the number may slightly vary based on the power of your computer. Right off the bat, you can see that NumPy brings a lot of value to the table. That level of performance improvement – all within the comfort of Python. That is the power of NumPy.

The power of NumPy lies in leveraging low level C language API to increase the performance of

Numeric Operations inPython.

### n-dimensional array

This is the core data structure in numpy. We will explore how useful it is and what you can do with it pretty soon. Let’s create a simple 1 dimensional array with just 10 numbers

import numpy as np a = np.array([1,2,3,4,5,6,7,8,9,10]) a

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Let’s put a second dimension to it

b = np.array( [[1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10], [11,12,13,14,15,16,17,18,19,20]]) b

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])

### Create an array from list

An array can be created from a standard python list. All you have to do is use the **array ( )** function and pass the list to it.

numbers = [1,2,3,4,5,6,7,8,9,10,11,12] a = np.array(numbers) a

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

You can create a 2-d array as well from a list.

a1 = [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10] a2 = [11,12,13,14,15,16,17,18,19,20] b = np.array( [a1,a2]) b

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])

### shape ( )

How do you know it has a second dimension ? Use the shape function to tell you the shape of the array.

b.shape

(2, 10)

meaning, there are 2 rows and 10 columns.

### arange ( )

Like the standard python function ** range ( ) **, numpy has a similar function called

*arange ( )*numbers = np.arange(1,51) numbers

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

numbers.shape

(50,)

### reshape ( )

You can now use the reshape function to *reshape* the data into any number of dimensions you like. For example, you can reshape this into any of the following combinations in 2d. eg.,5 x 102 x 25etc

numbers.reshape(5,10)

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

numbers.reshape(2 , 25)

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

What happens when you try to reshape it to a 2 x 50 array ? Basically that is not possible, and naturally NumPy throws up an error message

numbers.reshape(2,50)

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-21-27db37f04a26> in <module> ----> 1 numbers.reshape(2,50) ValueError: cannot reshape array of size 50 into shape (2,50)

Sometimes you need to reshape an array knowing just its columns and not its rows ( or vice-versa ). In cases like that NumPy provides a shortcut.

numbers = np.arange(1,13) numbers.reshape(-1,2)

array([[ 1, 2], [ 3, 4], [ 5, 6], [ 7, 8], [ 9, 10], [11, 12]])

You can do the same for columns as well.

numbers.reshape(2,-1)

array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12]])

### Array Operations

This is where we get the sweet surprise. Array operations are element wise. Let’s compare it to a list and you will see the difference

### Element-wise Operations

a = list(range(11)) b = list(range(11,21)) a + b

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

a1 = np.arange(1,11) b1 = np.arange(11,21) a1 + b1

array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30])

Element wise operations are not just across 2 arrays. You can even do simple unary operations like power, multiplications etc. Essentially, we are eliminating the for loopIn [22]:

a = list(range(11)) a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

a12 = pow(a1,2) a12

array([ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100], dtype=int32)

Array Multiplication

a13 = a1 * 3 a13

array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30])

### Aggregate Operations

##### sum ( )

a1 = np.arange(1,11) print ( a1 ) a1.sum()

[ 1 2 3 4 5 6 7 8 9 10] 55

##### min ( ) & max ( )

a1.min()

1

a1.max()

10

##### len ( )

len(a1)

10

### Aggregate Operations along an axis

a = np.arange(1,101).reshape(10,10) a

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [ 31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [ 51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [ 61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [ 71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [ 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [ 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]])

Sum across each of the axis

a.sum(axis=1)

array([ 55, 155, 255, 355, 455, 555, 655, 755, 855, 955])

a.sum(axis=0)

array([460, 470, 480, 490, 500, 510, 520, 530, 540, 550])

Similarly, you can do a min ( ) or max ( ) across any axis

a.min( axis = 1 )

array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

a.min ( axis = 0 )

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

### Array indexing & Slicing

### Array Indexing

Indexing a 1-d array is exactly similar to a list

To get a particular index, just use the square brackets notation ( like a list )

b[5]

6

You can use negative indexing as well.

b[-5]

6

Indexing a 2d array is just as simple. Since the array is 2 dimensional now, you have to use 2 indices. One along each axis.

a[4,7]

48

### Array Slicing

Slicing a 1-d array is also similiar to a list. Use a slice in place of a number for indexing

b

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

b[3:7]

array([4, 5, 6, 7])

Slicing a 2-d array extends the same functionality across all the axis

a

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [ 31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [ 51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [ 61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [ 71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [ 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [ 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]])

a[2:5, 3:8]

array([[24, 25, 26, 27, 28], [34, 35, 36, 37, 38], [44, 45, 46, 47, 48]])

You can very well use a combination of slicing and indexing

a[4,3:8]

array([44, 45, 46, 47, 48])

If you wanted to specify all the elements across a particular axis, just use a colon (:) without anything before or after.

So, both of these are equivalent.

# Expression 1 a[4,0:10]

array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

# Expression 2 a[4, : ]

array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

a[[1,4], :]

array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

What if you wanted multiple slices.. like so ?

a[ [1,4,8], : ]

array([[11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90]])

### Array Manipulation

So, far we have seen how to slice data from a NumPy array or use aggregate operations along an axis. In this section, we will learn about array manipulations.

### Append rows or columns

Say we have a 2-d array of shape 4 x 5.

import numpy as np numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

What if we wanted to insert another row at the end ? Say this row.

extras = np.array([21,22,23,24,25])

numbers = np.append(numbers,[extras],axis=0) print ( numbers )

[[ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15] [16 17 18 19 20] [21 22 23 24 25]]

Say if you wanted to append it as a column,

j = extras.reshape(5,-1) j

array([[21], [22], [23], [24], [25]])

j.shape

(5, 1)

numbers = np.append(numbers,extras.reshape(5,-1),axis=1) print ( numbers )

[[ 1 2 3 4 5 21] [ 6 7 8 9 10 22] [11 12 13 14 15 23] [16 17 18 19 20 24] [21 22 23 24 25 25]]

### Insert rows or columns

What if you wanted to insert a column in the middle ? Like so ?

In this case, you should use the **insert ( )** function.

import numpy as np numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

extras = np.array([21,22,23,24,25]) print ( extras)

[21 22 23 24 25]

numbers_new = np.insert(numbers,2,extras,axis=0) numbers_new

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [21, 22, 23, 24, 25], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

Similarly, you can insert a column as well.

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

extras = np.array([21,22,23,24]) print ( extras)

[21 22 23 24]

numbers_new = np.insert(numbers,3,extras,axis=1) print ( numbers_new)

[[ 1 2 3 21 4 5] [ 6 7 8 22 9 10] [11 12 13 23 14 15] [16 17 18 24 19 20]]

### Delete rows or columns

To delete a row or column use the **delete ( )** function. For example, to delete the 3rd column below,

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

numbers_new = np.delete(numbers,2,axis=1) print ( numbers_new )

[[ 1 2 4 5] [ 6 7 9 10] [11 12 14 15] [16 17 19 20]]

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

To delete the second row below,

numbers = np.arange(1,21) numbers = numbers.reshape(4,5) numbers

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])

numbers_new = np.delete(numbers,1,axis=0) print ( numbers_new )

[[ 1 2 3 4 5] [11 12 13 14 15] [16 17 18 19 20]]

### Additional Reading

### Meshgrid

Meshgrid is a useful feature of NumPy when creating a grid of co-ordinates. The function of **meshgrid** is really simple. Say you have a list of x and y co-ordinates

import numpy as np x = np.arange(1,10) y = np.arange(1,10)

Let’s plot it to see how it looks like.

import matplotlib.pyplot as plt plt.scatter(x,y) plt.savefig("scatter-plot.png")

What if you want all the co-ordinates in between ? like so..

**meshgrid ( )** is a convenience function in numpy that can generate all the points in the grid.

xx,yy = np.meshgrid(x,y) print(xx) print(yy)

[[1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9]] [[1 1 1 1 1 1 1 1 1] [2 2 2 2 2 2 2 2 2] [3 3 3 3 3 3 3 3 3] [4 4 4 4 4 4 4 4 4] [5 5 5 5 5 5 5 5 5] [6 6 6 6 6 6 6 6 6] [7 7 7 7 7 7 7 7 7] [8 8 8 8 8 8 8 8 8] [9 9 9 9 9 9 9 9 9]]

Now, if you plot all of the elements on a scatter plot, you get this.

import matplotlib.pyplot as plt %matplotlib inline plt.scatter(xx,yy)

This can be used in conjunctin with matplotlib’s **contour** or **contourf** functions to evaluate behaviour of functions over a grid. For example, if you want to visualize a circle, just create another variable **z** that is a function of x and y. The equation of a circle is,

z = xx**2 + yy**2 print(z)

[[ 2 5 10 17 26 37 50 65 82] [ 5 8 13 20 29 40 53 68 85] [ 10 13 18 25 34 45 58 73 90] [ 17 20 25 32 41 52 65 80 97] [ 26 29 34 41 50 61 74 89 106] [ 37 40 45 52 61 72 85 100 117] [ 50 53 58 65 74 85 98 113 130] [ 65 68 73 80 89 100 113 128 145] [ 82 85 90 97 106 117 130 145 162]]

plt.contour(xx,yy,z,levels=[10,20,30,40,50,60,70,80,90,100])

<matplotlib.contour.QuadContourSet at 0x1295f150>

Each of these lines represent the same **z** value. For example, the innermost line (in purple) shows all the values where the level is 10. In other words, it is essentially mapping all the points ( x and y ) that result in a **z** value of 10.

If you want to fill the contours, use **contourf** function.

plt.contourf(xx,yy,z,levels=[10,20,30,40,50,60,70,80,90,100])

<matplotlib.contour.QuadContourSet at 0x129a3e90>