Random Numbers


  Machine Learning in Python

Summary : Random number generation is the basis for sampling data in Machine Learning. It is fundamental concept in Probability and Statistics and is used very heavily in Machine Learning.

In this chapter, we will explore the basic methods of generating random numbers in basic Python – like basic random number generation, sampling a set of numbers, shuffling a list of numbers etc.

Contents

Introduction

Random Numbers is the basis for sampling data in machine learning. Although we will be typically using higher level functions ( like scikit-learn ) to do sampling, the underlying mechanism is still the same. Python has random number generation function as part of its standard library. You can use random numbers to do sampling, shuffling etc. In this section, we will learn the most used random number generation functions in the Python’s random library.

Basic Random number generation

random ()

The most fundamental function in it is the random ( ) function. It basically generates a random floating point number between 0 and 1

from random import random

random()

0.18226497796545615

Let’s try generating a bunch of them in a for loop.

for i in range(10) :
    print ( random () )

0.9120636021650675
0.5162386683119591
0.5702145169943453
0.925384117526444
0.48514511402663685
0.37683001524891946
0.11792954513490983
0.41292273179063366
0.11131059345234051
0.04762179898672436

As you can see, each time a new random number is generated.

seed ()

Sometimes, to be able to predictably generate random numbers ( especially when running test cases ) you might want to generate the same set of random numbers every time. In cases like this, you can use the seed ( ) fuction. You can set a seed to any value . For every seed value, you get the same random number again and again.

from random import seed

for i in range(10) :
    seed(1)
    print ( random () )

0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122
0.13436424411240122

Random number between a range

Sometimes you want to select a random number between a range of numbers. For example, pick a random number between 10 and 20. In such cases, use the uniform ( ) function.

uniform ( )

Generates a random number in between a user specified range of numbers.

from random import uniform

for i in range(10):
    print (uniform(10,20))

14.327670679050534
17.62280082457942
10.021060533511108
14.453871940548014
17.215400323407827
12.287622212704527
19.452706955539224
19.014274576114836
10.305899830335536
10.254458609934607

randrange ()

This function is exactly similar to the uniform function, except the random number generated is an integer and not a floating point function.

from random import randrange

for i in range(10):
    print (randrange(10,20))

18
10
16
13
16
10
18
13
17
17

Challenge – Pick a random element from a list. Example of a list given below.

cities = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]

Solution 

cities = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]

index = randrange(0,len(cities))

cities[index]

'Tokyo'

Challenge – Pick a sample of 3 elements from a list. Example of a list given below.

cities = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]

Solution

sample_size = 3
indices = []

cities = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]

# pick 3 random numbers
while len(indices) < 3 :
    i = randrange(0,len(cities))
    
    # if the index has already been added, ignore it. 
    if i in indices :
        continue
    
    # else add it to the indices list
    indices.append(i)
    
for i in indices :
    print ( cities[i])
San Francisco
Tokyo
Sydney

Challenge – Shuffle the list of cities given below.

cities = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]

Solution 

shuffled_cities = []

while len(shuffled_cities) != len(cities) :
    # pick a random element from the list
    index = randrange(len(cities))
    
    # if the city is already added, don't add it
    if cities[index] in shuffled_cities :
        continue
    
    # if not already added, add it to the shuffled cities list
    shuffled_cities.append(cities[index])
    
print ( shuffled_cities )
['Tokyo', 'San Francisco', 'Sydney', 'London', 'Hyderabad']

Sampling

Like we discussed at the beginning of this section, the key use of random numbers in machine learning is to sample data (and using statistics to draw conclusions). Python’s random library provides a couple of in-built functions to do all of this without having to write wrappers around the random () function like we have done in the previous step.

shuffle ()

from random import shuffle

names = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]
shuffle(names)
names
['London', 'San Francisco', 'Tokyo', 'Hyderabad', 'Sydney']

choice ( )

Choice function is used to pick a random choice of elements among a sequence. Once again, this is essentially another wrapper function around the random( ) function.

rom random import choice

names = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]
choice(names)

'Tokyo'

sample ( )

The sample ( ) function returns a random subset of elements from the list. You can specify how many elements you want in one of its arguments.

from random import sample

names = ["Hyderabad","San Francisco","London","Sydney","Tokyo"]
sample(names,2) # sample without replacement
['San Francisco', 'Sydney']