Python Data Structures Challenges

Python Data Structures Challenges


 Machine Learning in Python

Challenge : Remove duplicate elements from a list

Let the user enter a bunch of words. Store them in a list. The user could have entered duplicate words as well. Eliminate all the duplicate words in the list and print only the unique words.

solution

l = []
# Let the user enter some string
while True :
    i = input("-")
    if i == "e" or i == "exit" :
        break
    else :
        l.append(i)

# Print the user entered list of values
print (l)

# Convert the list to a set. This will eliminate all duplicates
# just due to the nature of a set
s = set(l)
print (s)




Challenge : Count the length of each of the unique words in a sentence
and list them in decreasing order

Take a string from the user as input and split it up into words. Sort all the words and discard all duplicate words. Once you have the unique list of words, count the length of each of the words and list them in decreasing order of length.

solution

input_string = input ( "Enter your string - ")

# Split the string into words
words = input_string.split(" ")

# Get only the unique words
words = set(words)

# Initialize a word length ( as a dictionary)
word_count = {  }


# # Iterate through the words and start updating the word length counter
for word in words :

    # If a word is not already counted, add it to the counter
    if word_count.get(word) == None :
        word_count[word] = len(word)

# Dictionaries cannot be sorted. So, create a list of tuples or lists
t_word_count = []
for word in word_count.keys() :
    t_word_count.append( [ word, word_count.get(word) ] )

# The sort criteria is the word count
def sort_criteria(ls) :
    return ls[1]

# Now sort the list based on the inner list's word length
t_word_count.sort(key = sort_criteria)

# Default sorting order is Ascending. If you want descending, set reverse flag to True
# t_word_count.sort(key = sort_criteria, reverse = True)


# Print them out
for i in t_word_count :
    print (i[0],"----",i[1])


Challenge : Count the number of occurrences of each of the words in a sentence.

Take a string from the user as input and split it up into words. Sort all the words and discard all duplicate words. Once you have the unique list of words, count how many times each of the word occurs in the sentence.

solution

input_string = input ( "Enter your string - ")

# Split the string into words
words = input_string.split(" ")

# Initialize a word counter ( as a dictionary)
word_count = {  }


# # Iterate through the words and start updating counts
for word in words :

    # If a word is not already counted, add it to the counter
    if word_count.get(word) == None :
        word_count[word] = 1
    # If already present, increment the word count
    else :
        word_count[word] = word_count[word] + 1

# Finally, print the counter
print ( word_count)


Challenge : Spam detector – v1

Spam is a big problem these days. Machine learning can be used very effectively to identify spam. However, we are just learning to use basic Python here. So, let’s set the stage for spam detection, by creating a simple spam detector.

Take a sentence from the user. For each word in the user’s sentence, find out what the spam score is. Add the score for each of the word and finally divide it by the length of the sentence to normalize it. If the result is > 1 it is spam, and not spam otherwise.

For simplicity sake, let’s use just the following spam scores.

spam_words = { “lottery” : 9, “won” : 8, “viagra” : 10, “free” : 8, “trial” : 8, “scam” : 10, “pharmacy” : 7, “unlimited” : 7, “nigerian” : 6}

solution

s = input("Input a sentence - ")

spam_words = {  "lottery"   : 9,
                "won"       : 8,
                "viagra"    : 10,
                "free"      : 8,
                "trial"     : 8,
                "scam"      : 10,
                "pharmacy"  : 7,
                "unlimited" : 7,
                "nigerian"  : 6}

words = s.split(" ")
spam_counter = 0

for word in words : 
    count = spam_words.get(word)
    if count != None :
        spam_counter + = count


Challenge : Spam detector – v2

Let’s enhance our spam program a bit. We used a very simplistic spam word filter in version 1. In this version, let’s read a real spam words – This is real data published by an email marketing website. You can get it from their website or download it on Ajay Tech’s github site.

Read a sentence from the user and for each word in the sentence, check if it exists in the file we just read. If yes, increase a counter by 1 ( assuming all spam words have equal weight ).

If at least one spam word is found in the sentence, mark is as spam and ham ( not spam ) if otherwise.

P.S. ham is a domain specific word that is used to indicate something as “not spam”. It is the opposite of spam.

solution

s = input("Input a sentence - ")

file = open ("./data/spam_words.txt")

# Read the spam words file
spam_words = []
try :
    line = file.readline()
    while line :
        spam_words.append(line.strip())
        line = file.readline()
finally : 
    file.close()

# Remove the credits
del spam_words[0:4]

# Remove the contact message at the bottom
spam_words.pop()

counter = 0
for word in spam_words :
    if s.upper().count(word.upper()) > 0 :
        counter += 1

if counter == 0 :
    print ("not spam")
else :
    print ( "spam")
        

Challenge : List all the Asian countries with population > 100 M

Say we have two lists

All asian countries
All countries with their respective population

Combine the data from both the lists and find out all the Asian countries with a population > 100 M.

This essentially requires you to open files, iterate over dictionaries, delete dictionary entries, work the dictionary keys etc.

import csv

file_1 = open( "./data/asia_countries.csv" )
asia_countries_f = csv.reader ( file_1 )
asia_countries = []

for row in asia_countries_f : 
    asia_countries.append ( row [0])

file_2 = open( "./data/countries_population.csv" )
countries_population_f = csv.reader ( file_2 )
countries_population = {}

for row in countries_population_f : 
    countries_population [ row[0] ] = row[1]

# Remove all countries in countries_population < 20 Million
countries_population.pop('country')
for key in list(countries_population.keys()) :
    if int(countries_population[key]) < 100000000 : 
        del countries_population[key]

asia_countries_pop_100m = []

for key in list(countries_population.keys())  :
    if asia_countries.count(key) == 0:
        del countries_population[key]

Challenge : Create a quiz program to guess the capital cities of countries.

Say we have a list of all countries along with their capitals in a csv file. Download it on Ajay Tech’s github repository. In an infinite loop, show the country and ask the user to guess the capital. Make sure you pick a random country from the list every time the loop runs.

solution

import csv
import random

file = open( "./data/capital_cities.csv" )
capital_cities = csv.reader ( file )

# Put the entries into a dictionary ( Country : Capital City) format
capital_cities_d = {}
for row in capital_cities : 
    capital_cities_d[row[0]] = row[1]

# Delete the first row
del capital_cities_d['Country']

# capitalize to ensure user errors are accomodated for
def str_comparision (user_answer, correct_answer) :
    if user_answer.upper() == correct_answer.upper() : 
        return True
    else :
        return False

# stay in this loop for ever until the user enters "exit"
while ( True ) : 
    country = random.choice( list(capital_cities_d.keys()))
    answer = input ( 'what is the capital of '+ country + " - ")
    if answer.upper() == "EXIT" :
        break
    answer = str_comparision ( answer , capital_cities_d.get(country))
    if answer == True :
        print ( " correct ")
    else : 
        print ( " incorrect ")


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: