Python Data Structures Challenges
solutionChallenge : Remove duplicate elements from a list
Let the user enter a bunch of words. Store them in a list. The user could have entered duplicate words as well. Eliminate all the duplicate words in the list and print only the unique words.
l = [] # Let the user enter some string while True : i = input("-") if i == "e" or i == "exit" : break else : l.append(i) # Print the user entered list of values print (l) # Convert the list to a set. This will eliminate all duplicates # just due to the nature of a set s = set(l) print (s)
solutionChallenge : Count the length of each of the unique words in a sentence
Take a string from the user as input and split it up into words. Sort all the words and discard all duplicate words. Once you have the unique list of words, count the length of each of the words and list them in decreasing order of length.
and list them in decreasing order
input_string = input ( "Enter your string - ") # Split the string into words words = input_string.split(" ") # Get only the unique words words = set(words) # Initialize a word length ( as a dictionary) word_count = { } # # Iterate through the words and start updating the word length counter for word in words : # If a word is not already counted, add it to the counter if word_count.get(word) == None : word_count[word] = len(word) # Dictionaries cannot be sorted. So, create a list of tuples or lists t_word_count = [] for word in word_count.keys() : t_word_count.append( [ word, word_count.get(word) ] ) # The sort criteria is the word count def sort_criteria(ls) : return ls[1] # Now sort the list based on the inner list's word length t_word_count.sort(key = sort_criteria) # Default sorting order is Ascending. If you want descending, set reverse flag to True # t_word_count.sort(key = sort_criteria, reverse = True) # Print them out for i in t_word_count : print (i[0],"----",i[1])
solutionChallenge : Count the number of occurrences of each of the words in a sentence.
Take a string from the user as input and split it up into words. Sort all the words and discard all duplicate words. Once you have the unique list of words, count how many times each of the word occurs in the sentence.
input_string = input ( "Enter your string - ") # Split the string into words words = input_string.split(" ") # Initialize a word counter ( as a dictionary) word_count = { } # # Iterate through the words and start updating counts for word in words : # If a word is not already counted, add it to the counter if word_count.get(word) == None : word_count[word] = 1 # If already present, increment the word count else : word_count[word] = word_count[word] + 1 # Finally, print the counter print ( word_count)
solutionChallenge : Spam detector – v1
Spam is a big problem these days. Machine learning can be used very effectively to identify spam. However, we are just learning to use basic Python here. So, let’s set the stage for spam detection, by creating a simple spam detector.
Take a sentence from the user. For each word in the user’s sentence, find out what the spam score is. Add the score for each of the word and finally divide it by the length of the sentence to normalize it. If the result is > 1 it is spam, and not spam otherwise.
For simplicity sake, let’s use just the following spam scores.
spam_words = { “lottery” : 9, “won” : 8, “viagra” : 10, “free” : 8, “trial” : 8, “scam” : 10, “pharmacy” : 7, “unlimited” : 7, “nigerian” : 6}
s = input("Input a sentence - ") spam_words = { "lottery" : 9, "won" : 8, "viagra" : 10, "free" : 8, "trial" : 8, "scam" : 10, "pharmacy" : 7, "unlimited" : 7, "nigerian" : 6} words = s.split(" ") spam_counter = 0 for word in words : count = spam_words.get(word) if count != None : spam_counter + = count
solutionChallenge : Spam detector – v2
Let’s enhance our spam program a bit. We used a very simplistic spam word filter in version 1. In this version, let’s read a real spam words – This is real data published by an email marketing website. You can get it from their website or download it on Ajay Tech’s github site.
Read a sentence from the user and for each word in the sentence, check if it exists in the file we just read. If yes, increase a counter by 1 ( assuming all spam words have equal weight ).
If at least one spam word is found in the sentence, mark is as spam and ham ( not spam ) if otherwise.
P.S. ham is a domain specific word that is used to indicate something as “not spam”. It is the opposite of spam.
s = input("Input a sentence - ") file = open ("./data/spam_words.txt") # Read the spam words file spam_words = [] try : line = file.readline() while line : spam_words.append(line.strip()) line = file.readline() finally : file.close() # Remove the credits del spam_words[0:4] # Remove the contact message at the bottom spam_words.pop() counter = 0 for word in spam_words : if s.upper().count(word.upper()) > 0 : counter += 1 if counter == 0 : print ("not spam") else : print ( "spam")
Challenge : List all the Asian countries with population > 100 M
Say we have two lists
– All asian countries
– All countries with their respective population
Combine the data from both the lists and find out all the Asian countries with a population > 100 M.
This essentially requires you to open files, iterate over dictionaries, delete dictionary entries, work the dictionary keys etc.
import csv
file_1 = open( "./data/asia_countries.csv" )
asia_countries_f = csv.reader ( file_1 )
asia_countries = []
for row in asia_countries_f :
asia_countries.append ( row [0])
file_2 = open( "./data/countries_population.csv" )
countries_population_f = csv.reader ( file_2 )
countries_population = {}
for row in countries_population_f :
countries_population [ row[0] ] = row[1]
# Remove all countries in countries_population < 20 Million
countries_population.pop('country')
for key in list(countries_population.keys()) :
if int(countries_population[key]) < 100000000 :
del countries_population[key]
asia_countries_pop_100m = []
for key in list(countries_population.keys()) :
if asia_countries.count(key) == 0:
del countries_population[key]
solutionChallenge : Create a quiz program to guess the capital cities of countries.
Say we have a list of all countries along with their capitals in a csv file. Download it on Ajay Tech’s github repository. In an infinite loop, show the country and ask the user to guess the capital. Make sure you pick a random country from the list every time the loop runs.
import csv import random file = open( "./data/capital_cities.csv" ) capital_cities = csv.reader ( file ) # Put the entries into a dictionary ( Country : Capital City) format capital_cities_d = {} for row in capital_cities : capital_cities_d[row[0]] = row[1] # Delete the first row del capital_cities_d['Country'] # capitalize to ensure user errors are accomodated for def str_comparision (user_answer, correct_answer) : if user_answer.upper() == correct_answer.upper() : return True else : return False # stay in this loop for ever until the user enters "exit" while ( True ) : country = random.choice( list(capital_cities_d.keys())) answer = input ( 'what is the capital of '+ country + " - ") if answer.upper() == "EXIT" : break answer = str_comparision ( answer , capital_cities_d.get(country)) if answer == True : print ( " correct ") else : print ( " incorrect ")