Read & Write files


  Machine Learning in Python

Contents

Why read/write files

Files are probably the most used source of data in Data Science and Machine Learining. Although most of the time we use higher level libraries like NumPy or Pandas</a> to read data from files, we also need to be aware of the basic Python read and write operations

File Types

Typically, there are 2 types of files.

  • text
  • binary

We are not talking about special files like database dumps, Microsoft Word or Excel files etc which require special API to read. We are talking about general files like text files, images etc. Let’s see how to read or write to these files.

Read files

There is a file in our data folder – steve_jobs.txt . It is a small snippet of a screenplay from the latest steve jobs movie. Let’s read the file and display the contents on the screen.

f = open ( "./data/steve_jobs.txt","r")
print ( f.read())
JOANNA : So what's the upshot?
ANDY : It's not gonna say "Hello."
STEVE : It absolutely is gonna say "Hello."
ANDY : It's nobody's fault, (it's a system error).
STEVE : (over) You built the voice demo.

Simple enough, right ? We have read the entire file. What if you want to read the file line by line ? Just use the readline ( ) function.

f.readline()
''

Why don’t you see anything ? That is because when you printed the file in the previous block, you have exhausted the file stream already. There is nothing more to read. Let’s try opening the file again and print the first line.

f = open ( "./data/steve_jobs.txt","r")
f.readline()

"JOANNA : So what's the upshot?\n"

There you go – you got the first line from the file. As usual, you can use a for loop to read the file line by line and print it out.

f = open ( "./data/steve_jobs.txt","r")
for line in f :
    print (line)

JOANNA : So what's the upshot?

ANDY : It's not gonna say "Hello."

STEVE : It absolutely is gonna say "Hello."

ANDY : It's nobody's fault, (it's a system error).

STEVE : (over) You built the voice demo.

Quiz – Can you figure out why there is an extra line of space between the lines ?
The original file already has a newline for each line
print statement in a for loop always prints an extra new line

OK – That was how you read text files. How about images ? We have the “Ajay Tech” logo in the pics folder. Let’s try and read it like we have done before.

f = open ( "./pics/ajay-tech-logo.png","r")
f.readline()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-26-dad3bab2b57d> in <module>
      1 f = open ( "./pics/ajay-tech-logo.png","r")
----> 2 f.readline()

c:\program files (x86)\python37-32\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 65: character maps to <undefined>

oops.. Python is complaining that it can’t decode some characters in the image file. Files like images, pdf files, microsoft word documents are binary in nature and cannot be read using the regular “r” – read mode. To read binary files, just use the “rb” mode and you should be ok.

f = open ( "./pics/ajay-tech-logo.png","rb")
f.readline()

b'\x89PNG\r\n'

Here is a list of all the file opening modes .

'r' - read 
'w' - write
'x' - exclusive write. 
'a' - append ( to the end of file if it exists. Otherwise it is like a write )
't' - text mode
'b' - binary mode
'+' - read and write

Write files

Say, we do some image manipulation and write it back to another file. Well, we are not interested in image manipulation at this point, but we just want to understand how to write the data back to another file. To write files, just use the “w” mode in the open ( ) function.

f = open ( "./pics/ajay-tech-logo.png","rb")
data = f.read()

fw = open ( "./pics/ajay-tech-logo-1.png","wb")
fw.write(data)
fw.close()

You should now see 2 files – each exactly the same as another.

What about the new statement , fw.close ( ) ? Let’s see what happens if we don’t close the file descriptor.

f = open ( "./pics/ajay-tech-logo.png","rb")
data = f.read()

fw = open ( "./pics/ajay-tech-logo-1.png","wb")
fw.write(data
20281

Try and open the file in explorer to see if the new file is ready – it should be. However, try deleting the file. You can’t

That is because, file operations ( especially write operations ) obtain an exclusive lock on the file. Meaning, only one process can write to a file and unless that write is complete, no other process can access the file for write or delete operations ( read is OK – that’s why you were able to view the file ). In order to mark the write as complete, you would have to explicitly close the file handle.

fw.close()

Now, you should be able to delete the file if required.

Append data to files

Sometimes, updating files need to be done in batches – For example, you might be writing the first half of the file in part 1 of the program and the second half of the file in part 2 of the program. One option is to keep the file open all the while in ‘w’ – write mode. If that is technically not possible ( some other program needs it in the meanwhile ), use the ‘a’ – append mode.

append mode – ‘a’, is almost like a write (‘w’) except that if the file already exists, data is appended at the end of the file, instead of over-writing the existing data.

f = open ( "./data/steve_jobs.txt","a")
f.write("\na new dummy dialog")
f.close()

Just to make sure the append operation is succesful, let’s read the file and print it out.

f = open("./data/steve_jobs.txt","r" )
print ( f.read())
JOANNA : So what's the upshot?
ANDY : It's not gonna say "Hello."
STEVE : It absolutely is gonna say "Hello."
ANDY : It's nobody's fault, (it's a system error).
STEVE : (over) You built the voice demo.a new dummy dialoga new dummy dialog
a new dummy dialog

Exceptions

Like any other python program, errors could occur during reading or writing a file. For example, what happens if a file doesn’t exist ?

f = open("./data/file_does_not_exist.txt","r" )
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-11-7657f0dcfed5> in <module>
----> 1 f = open("./data/file_does_not_exist.txt","r" )

FileNotFoundError: [Errno 2] No such file or directory: './data/file_does_not_exist.txt'

Python throws a FileNotFoundError. Worse even, imagine a situation where you have opened a file and something went wrong during the processing – Who will close the file handle ?

f = open("./data/new_file.txt","w")
result = 1/0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-15-3d3193dd09e2> in <module>
      1 f = open("./data/new_file.txt","w")
----> 2 result = 1/0

ZeroDivisionError: division by zero

The file handler is still open. You can check it by calling the attribute closed on the file handle.

f.closed

False

False , meaning the file is not closed. You can get around it by using try/except blocks as usual.

try : 
    f = open("./data/new_file.txt","w")
    result = 1/0
except : 
    print ( "exception has occured")
    
finally :
    f.close()

exception has occured
print ( "file is closed - ", f.closed)
file is closed -  True

with statement

An alternative to the try/except syntax, especially when dealing with external resources like files is the with statement. Think of it like syntactic sugar – making your syntax compact and sweet.

with open("./data/new_file.txt","w") as f :
    result = 1/0
    f.write("a new line to the file")

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-28-ed08ac4603a9> in <module>
      1 with open("./data/new_file.txt","w") as f :
----> 2     result = 1/0
      3     f.write("a new line to the file")

ZeroDivisionError: division by zero

Well, an exception has occured, but is the file handle closed or open ?

print ( "file is closed - ", f.closed)

file is closed -  True

The with statement ensures that the resources are closed even if an exception has occured. Of course, if you want to implement the with statement with your custom classes, you can do that as well. But in the interest of keeping the course simple, we will not deal with it here.