Python Lists

Python Lists 


  Machine Learning in Python

Summary : Lists are probably the most used data structure in Python. A list is data structure that can hold a sequence of data. This is our first introduction to data structures in Python.

Contents

What are Lists

Lists are probably the most used data structure in Python. A list is data structure that can hold a sequence of data. To understand this better, let’s rewrite the average grade program with lists. Just to recap, the professor enters all the grades in the class, one by one. After that, we need to calculate the average of all grades in the class.

Previously, we were just summing up the grades as the user enters them and finally divide the sum by the number of grades entered to get the average of the grades. This was OK for simpler cases. However, we need to be able to store all the grades that the user has entered in a data structure to manipulate them further – like

  • change grades
  • add more grades later
  • delete wrong grades etc

Once the user is finally satisfied with all the grades, then he will ask the program to calculate the average. 

Standard python data types like int, float, string, can only store a single element. What we want is a structure that holds multiple elements together. That is where lists come in. Before we rewrite the program, let’s understand a couple of things about lists. 

Let’s create an empty list

odd_numbers = [] 

We just created an empty list and assigned it to a variable odd_numbers. Now, let’s add some odd numbers to the list.

odd_numbers.append(1) 
odd_numbers.append(3) 
print(odd_numbers) 
[1, 3] 

Lists can also be initialized – like so.

That’s a good start on lists. Now, let’s rewrite the average grades program this time with lists.  

Program – Average Grade – v2

print ( "Enter grades to calculate mean/average grade. type e to exit")

# Create an empty list
grades = []
sum    = 0

while True :
    grade = input ( " - ")
    if grade == "e" :
        break
    else :
        grade   = float(grade)
        grades.append(grade)

print ( grades)

Iterate Over Lists

Once you have a list, you would want to iterate through it. A simple way to iterate over a list is to run it through a for loop.

for grade in grades :     
    sum = sum + grade 

print( "sum = ", sum) 

Let’s write another simple program to illustrate iterating over list elements.


Challenge: Find out how many vowels are in a given sentence

Take a sentence from the user and count the total number of vowels in the sentence
sentence = input ( "Enter a sentence -")

vowels = ['a','e','i','o','u']
vowel_count = 0

for i in list(sentence) :
    if i in vowels :
        vowel_count += 1

print (" The number of vowels in the sentence is - ", vowel_count)

Length of List

To compute the average, we have to also understand how long ( how many grades ) the list is ? You can do that using python’s inbuilt function len ( ) .

average = sum / len(grades) 
print ( "Average class grade is ",average )

There you go – we have just calculated the average grade using list. Now that we have a powerful data structure, we can do many more things with lists making our lives much more easy. For example, you need not iterate over a list to calculate the sum. You can use the math module’s inbuilt function fsum to do so.  

Aggregate functions

import math  

average = math.fsum(grades) / len(grades) 
print ( "Average class grade is ",average) 

fsum stands for floating point sum. If you wanted sum of just numbers ( and not floats ), you could use math.sum( ). Before we write the next version of program, let’s learn a couple more aspects obout lists.  

Merge Lists

Add a bunch of elements to a list – extend( ) function

Indexing Lists

Indexing is how you access elements in a list. We will learn more about negative indexing in a bit. However, for now, we will use positive indexing to understand how to manipulate lists.

For example, get the 5th element.

 list[4] 

or the 2nd element.

list[1] 

Manipulate Lists

So far, we have just read elements in lists, or merged lists. What if we want to manipulate lists ? Like insert elements in between, or delete elements from lists ? There are inbuilt methods for that as well – like pop( ) ,remove( )insert( )clear( ) etc. Let’s look at them now.  

Insert elements into list

odd_numbers = [1,3,7,9] 
odd_numbers.insert(2,5) 
print( odd_numbers ) 
[1, 3, 5, 7, 9] 

Change List elements

odd_numbers = [1,3,5,8,9] 
odd_numbers[3] = 7 
print ( odd_numbers) 
[1, 3, 5, 7, 9] 

Pop List elements

image

odd_numbers = [1,3,5,7,9] 
print ( odd_numbers) 
[1, 3, 5, 7, 9] 
odd_numbers.pop(2) 
print ( odd_numbers ) 
[1, 3, 7, 9] 

pop( ) without specifying an argument, removes the last element if you don’t specify an index

odd_numbers.pop()  
print ( odd_numbers ) 
[1, 3, 5, 7]

Remove List element

remove( ) does not remove elements by index. Instead it does by the actual content.. It removes the first occurence of a value

odd_numbers = [1,3,5,7,9] 
print ( odd_numbers) 
[1, 3, 5, 7, 9] 
odd_numbers.remove (9) # removes the first occurence of a value 
print ( odd_numbers ) 
[1, 3, 5, 7] 

Sometimes, you want to remove all the elements. In cases like this, use list comprehension to recreate the array.

numbers = [1,2,3,4,5,3,7,8,3]
numbers = [number for number in numbers if number !=3]
print(numbers)
[1, 2, 4, 5, 7, 8]

Clear List element

odd_numbers = [1,3,5,7,9] 
print ( odd_numbers ) 
[1, 3, 5, 7, 9] 
odd_numbers.clear() 
print ( odd_numbers ) 
[] 

Program Average Grade – v3

Challenge – Program average Grade – v3

Now, let’s put these new functions to use on the next version of the program. We will modify the program in such a way that the user can – enter new grades – modify any existing grade – remove any grades – clear all grades In order to do this, we need to modify the interface a little bit. More explanation of the challenge along with the visuals below.

and here are the list functions we will be using that we have learnt so far.

# This is a rewrite of the average grade program we have done previously. 
# In this program, we will learn about 
# 1. modify lists
# 2. delete elements in list
# 3. clear lists
# 4. remove specific elements in list

import math

# create an empty grades list
grades = []

print ( " enter your choices - ")
while ( True ) :
    print ( "Enter your choice - ")
    print ( "- 'enter' grades")
    print ( "- 'delete' grades")
    print ( "- 'update' grades")
    print ( "- 'clear' grades")
    print ( "- calculate 'average'")
    print ("- 'exit'")
    
    choice = input("- ")
    if choice == "exit" :
        break
    
    if choice == "enter" :
        print ( "Enter grades. type e to exit")
        while True :
            grade = input ( " --> ")
            if grade == "e" :
                break
            else :
                grade   = float(grade)
                grades.append(grade)
        print ( "You have entered - ")
        print ( grades )

    elif choice == "delete" :
        if len(grades) == 0 : 
            # there are no grades yet. 
            print ( " No grades input yet. Please try to input grades")
        else :
            print ( "Enter index to delete. type e to exit")
            while True or len(grades) != 0 :
                index = 0
                print ( "index - grade")
                for grade in grades :
                    print ( index,"\t",grade)
                    index = index + 1
                
                grade = input ( "-->")
                if grade == "e" :
                    break
                else :
                    if int(grade) < len(grades) :
                        grades.pop(int(grade))
    
    elif choice == "update":
        if len(grades) == 0 : 
            # there are no grades yet. 
            print ( " No grades input yet. Please try to input grades")
        else :
            print ( "Enter index to update. type e to exit")   
            while True or len(grades) != 0 :
                index = 0
                print ( "index - grade")
                for grade in grades :
                    print ( index,"\t",grade)
                    index = index + 1             

                grade = input ( "-->")
                if grade == "e" :
                    break
                else :
                    if int(grade) < len(grades) :
                        print ( "Changing grade")
                        print ( "--------------")
                        print ( "index - grade")
                        print ( int(grade), "\t",grades[int(grade)])
                        print ( "enter new grade")
                        new_grade = input( "-->")
                        grades[int(grade)] = float(new_grade)          

    elif choice == "clear" :
        if len(grades) == 0 : 
            # there are no grades yet. 
            print ( " No grades input yet. Please try to input grades")   
        else :
            grades.clear()
            print ( "cleared all grades") 
            print ( "=================")        

    elif choice == "average" :
        average = math.fsum(grades) / len(grades)
        print ("Average --> ",average)    
        print ("=================")       
    
    else :
        print ( "Enter valid choice - Try again")
        


                

Mixed data types

Lists can have mixed data types. This is a very useful feature of lists. Your data can be as mixed as possible.

# A person's name, age and weight 
person = ["Adam", 25, 160.65] 
type(person[0]) 
str 
type(person[1]) 
int 
type(person[2]) 
float 

Nested Lists

There can even be a list inside a list – Also called “Nested” Lists

person = [ ["Adam","Smith"],25,160.65] 
type(person[0]) 
list

Enumerate

So far, while looping through lists using the for x in range syntax, unlike other languages like C, you do not have access to the index. This is where enumerate ( ) comes in. Enumerate function just returns an enumerated ( with explicit indices ) object. for example,

 grades = [1.0,2.3,3.0] 
 for grade in grades : 
 print ( grade ) 
1.0 
2.3 
3.0 

To get the index, we would have to maintain a separate indicator for the same.

 i = 0 
 for grade in grades : 
   print ( i,grade ) 
    i = i + 1  
0 1.0 
1 2.3 
2 3.0 

Let’s see how enumerator can help us.

 e_grades = enumerate(grades) 
 e_grades 
<enumerate at 0x3cb7da0> 

Now e_grade is an enumerate object and no longer just a list. Let’s iterate over it to see the elements inside.

 for grade in e_grades : 
     print ( grade ) 
(0, 1.0) 
(1, 2.3) 
(2, 3.0) 

Each of these elements is a tuple ( which we are going to learn about later). Think of it is a limited version of a list for now. Now that the enumerator has indices, let’s try to loop through it again like so.

 for index, grade in e_grades : 
    print ( index , grade) 
0 1.0 
1 2.3 
2 3.0 

Why make such a big deal about indices while iterating lists ? Well for starters it gives you more control. For example, iterate over a list, but skip very 3rd element.

numbers = [1,2,3,4,5,6,7,8,9] 
for index,number in enumerate(numbers) : 
     if (index + 1)%3 != 0 : 
        print ( index, number) 
0 1 
1 2 
3 4 
4 5 
6 7 
7 8 

Let’s upgrade our Average Grades program to show off our understanding of Nested Lists and enumerator

In the previous versions, we were able to calculate grades for a single class – say _English_ subject. What if the teacher wants to input grades for the _English_ subject across all the classes that she teaches ? Say 1st grade through 5th grade ?

# Instead of calculating grades for just one class, let's calculate
# the average of grades across classes 1 to 

# In this program, we will learn about 
# 1. compound lists ( list of lists )
# 2. enumerator

import math

grades_all = []

for i in range(1,6) : 
    print ( "Enter grades of each student for grade ",i," - 'e' to exit")
    grades = []
    while True :
        grade = input ( " - ")
        if grade == "e" :
            break
        else :
            grades.append(float(grade))
    
    grades_all.append(grades)

print ( "here are the grades you entered")

for index,grades in enumerate(grades_all) :
    print ( "Grades for class - ",index)
    for grade in grades :
        print ( grade)

for index, grades in enumerate(grades_all) :
    average = math.fsum(grades) / len(grades)
    print ( "Average grade of class ", (index+1), " is - ", average)

Combine Lists

+ operator is overloaded to combine lists.

odd_numbers  = [1,3,5,7,9] 
even_numbers = [2,4,6,8,10] 
all_numbers = odd_numbers + even_numbers 
[1, 3, 5, 7, 9, 2, 4, 6, 8, 10] 

They are combined in the same order as the underlying lists are. No sorting of any sort happens by default

How about subtraction ?

 odd_numbers - even_numbers 
TypeError                             
----> 1 odd_numbers - even_numbers 
TypeError: unsupported operand type(s) for -: 'list' and 'list'

oops.. that doesn’t work.. Other arithmetic operators like  substraction, * multiplication or /division are not overloaded for lists. Let’s see these in action in the next version of Average Grades program  


Sort Lists

all_numbers.sort() 
print ( all_numbers) 
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Here is a small challenge to understand sorting of lists.

Challenge : Find if two words are anagrams or not.

Two words are anagrams if they contain the same letters. For example, the following words are anagrams of each other.
– how
– who

Here are some more examples icon and coin , item and time , keen and knee. Our program should take in two words from the user and find out if the user entered words are anagrams or not.

code
first_word = input ( "First word - ")
second_word = input ( "Second word - ")

if sorted(first_word) == sorted(second_word) :
    print ( first_word, " and ", second_word, " are anagrams")
else :
    print ( first_word, " and ", second_word, " are not anagrams")

# convert the word to a list ( break it down into it's individual letters )
# and sort to compare

# if list(first_word).sort() == list(second_word).sort() :
#     print ( first_word, " and ", second_word, " are anagrams")
# else :
#     print ( first_word, " and ", second_word, " are not anagrams") 

Program Average Grade – v4

Challenge – Program Average Grades – v4

Say you are calculating the average grade across the entire district. Each school has a separate list of grades, we just have to calculate the average across all the lists( school ). Just for simplicity, let’s assume we are calculating grades for just Class 1 across all the schools. And then finally sort the schools by their average grade to find out which school ranks better. See the visual below for more explanation.
import math

grades_each_schools = []

# say there are 5 schools and we are calculating
# the average grade for just one class

for i in range(1,6) :
    school_name = input (" Enter the school name - ") 
    school_grades = []
    school_grades.append(school_name)
    print ( "Enter the grades for this school (e to exit ) -")
    grades = []
    while True :
        grade = input ( " - ")
        if grade == "e" :
            break
        else :
            grades.append(float(grade))
    school_grades.append(grades)
    grades_each_schools.append(school_grades)

print ( grades_each_schools)

for school in grades_each_schools :
    average = math.fsum(school[1]) / len(school[1])
    school.insert(0,average)

print ( grades_each_schools)

# sort the school by average grade in reverse order
grades_each_schools.sort( reverse = True)

# pretty up the printing
# for school in grades_each_schools :
#     print (school[0],"\t\t",school[1],"\t\t\t\t",school[2])

# or better
print ('{:<10} {:<50} {:<30} '.format("Avg. Grade","School Name","Grades") )
for school in grades_each_schools :
    print ('{:<10} {:<50} {:<30} '.format(school[0],school[1],str(school[2])) )


Slicing Lists

Lists can be “sliced” to get a subset of the data.

The general syntax is list [ start : stop : increment ]

numbers = [1 , 3 , 5 , 7  ,9 ,11 ,13, 15, 17] 

Get the first 5 numbers.

numbers[0:5] 
[1, 3, 5, 7, 9]

Includes all elements starting from 5th position to the end

 odd_numbers[5:] 
[11, 13, 15, 17]
# An arbitray set of numbers - say 3rd, 5th and 8th ( something that you can't specify with a slice )
s = [4,6,8]

numbers_subset = []
for i in s : 
    numbers_subset.append( numbers[i] )
    
print ( numbers_subset )

Negative Indexing

numbers = [1,2,3,4,5,6,7,8,9] 
numbers[-1]
9 
 numbers[-9]
1 
 numbers[-9:-5]
[1, 2, 3, 4]
 numbers[-4:]
[6, 7, 8, 9] 

The number after the second colon ( : ) tells the increment

 numbers[::2] [1, 3, 5, 7, 9]
 numbers[:] [1, 2, 3, 4, 5, 6, 7, 8, 9] 

This is equivalent to

 numbers[0:len(numbers)] [1, 2, 3, 4, 5, 6, 7, 8, 9]

An arbitrary set of numbers

# An arbitray set of numbers - say 3rd, 5th and 8th ( something that you can't specify with a slice )
s = [4,6,8]

numbers_subset = []
for i in s : 
    numbers_subset.append( numbers[i] )
    
print ( numbers_subset )
[5, 7, 9] 

An alternate syntax for the same would be

 [numbers[i] for i in s] [5, 7, 9]

%d bloggers like this: