Chapter 11 File IO

11.2 Files

The content of files can be accessed and modified by Python programs. Python provides a convenient interface for reading and writing text files. In particular, a file type object in Python represents a particular text file on the hard disk. The content of a file can be obtained by calling reading methods of a corresponding file object, and changed by calling writing methods of a corresponding file object.

As file objects correspond to the physical files, the operating system imposes restriction on their usage, to ensure steady operations of the file system and computer. For example, a file object must be linked to exactly one physical file on the hard disk, and must exist in either a reading mode or a writing mode, but not both. Some files that are critical to the operating system are set as read-only, and cannot be opened in the writing mode.

To open a physical file and link it to a file object, the built-in function open can be used. It takes a string argument that specifies the path of the file to be opened, and an optional string argument that specifies a reading/writing mode, returning a file object that is associated with the file. If the second argument is not specified, it is assigned the value ‘r’ by default, and a file in the reading mode is returned by the open call.

For example. Consider the following text file under in your current python working forder c:\myPython\abc.txt:

[c:\myPython\abc.txt]
abc
def
ghi

A function call to open(‘abc.txt’) returns a corresponding file object in the reading mode. If the specified file does not exist, a call to open in the reading mode results in an IO error.

Two commonly used file reading methods include read, which takes no arguments and returns the content of the corresponding file as a string, and readlines, which takes no arguments and returns the content of the corresponding file as a list of strings, each containing a line in the file. Line separators are OS-specific. In Linux and Mac OS, the line separator is the character ‘n’; in Windows OS, the line separator is the string ‘rn’.

>>> file=open('abc.txt')
>>> type(file)
<class '_io.TextIOWrapper'>
>>> s=file.read()
>>> s
'abc\ndef\nghi\n'
>>> file.close()
>>> file=open('abc.txt')
>>> l=file.readlines()
>>> l
['abc\n', 'def\n', 'ghi\n']
>>> file.close()

The example above shows how the content of ‘abc.txt’ is retrieved by the read and readlines methods on a corresponding file object, which is bound to the identifier file. Note that after the content of the file is read using the read method, the close method is called to close the physical file, and the open method is called a second time to reopen the file before the readlines method is called. This is because a file object maintains an index on the physical file, which indicates the location that the next reading operation should start at. The read and readlines operations move the index to the end of the file, and hence no further reading is possible after a call to them. It is advised to close a file after use, so that the operating system can safely allocate the file to other users.

For a very large file, reading the whole content into an object in memory can fill up the physical memory. Python offers a third reading method, readline, which takes no arguments and reads a line from the current location. If the location is the end of the file, an empty string is returned. The following code shows an example use of the method.

>>> file=open( 'abc.txt')
>>> s=file.readline()
>>> n=1 # line number
>>> while s: # there should be at least a '\n' in the line.
        print ('The', n, 'th line is:', s[:-1])
        n+=1
        s=file.readline()

The 1 th line is: abc
The 2 th line is: def
The 3 th line is: ghi
>>> file.close()

The program above prints out the lines of a file by using a while loop, incrementally reading each line. A counter n is used to record the line index, which is increased by one every time a line is read. Note that the current line s is used as the Boolean condition of the while loop. When it is empty, the loop terminates. Here s is empty only when the end of file is reached. This is because all the lines contain the line separator character ‘\n’ and empty lines in the file yields s=’\n’. The while loop with calls to readline is more memory-efficient compared with the readlines call. This reading is performed line by line, and hence the precious line can be recorded from memory by the garbage collector when the next line is processed, thus saving memory.

File objects are iterable, and a for loop over a file object enumerates the lines in the corresponding file. It offers a more succinct way of writing the program above.

>>> file=open('abc.txt')
>>> n=1
>>> for line in file:
        print ('The', n, 'th line is:', line[:-1])
        n+=1

The 1 th line is: abc
The 2 th line is: def
The 3 th line is: ghi
>>> file.close()

There are two writing modes in which a file can be opened. The first is ‘a’, which opens a file for appending new content to its back, and the second is ‘w’, which opens a file for overwriting its content. The write method can be used to write a string to a file in a writing mode. For example,

>>> file=open('abc.txt','a')
>>> file.write('jkl')
>>> file.write('\n')
>>> file.write('mno\npqr')
>>> file.close()

The program above modifies abc.txt by adding the characters ‘jklnmnonpqrn’ to its back, which results in the content below.

[abc.txt]
abc
def
ghi
jkl
mno
pqr

Note that a call to the method write does not make a new line implicitly, and new line must be specified by writing a line separator character to the file.

The code below overwrites the content of abc.txt entirely.

>>> file=open('abc.txt','w')
>>> file.write('123456')
>>> file.close()

Execution of the code above results in abc.txt containing the six characters ‘123456’ and no other content. The original content of the file ‘abc.txt’ is erased immediately when open is called. Care should be taken when performing the write operation, so that important content are not lost by careless overwriting of files.

A text file can be used to store structured data for scientific computation. For example, consider the maintenance of examination scores for a class of students. Each student has a unique student ID, and a list of examination scores. In Python, a dict can be used to maintain the records, with the keys being student IDs and the values being lists of scores. On the other hand, the records can be stored in a text file, so that they can be kept permanently. Such a file can keep each student in a line, starting with the student ID, followed by a space, and then a space-separated list of scores. The file score.txt below shows an example.

[scores.txt]

1000101 95 90 70 90 85 65
1000103 99 90 77 85 97 55
1000208 70 35 52 56 60 51

Suppose that the file is put into c:\myPython. It can be loaded into a dict object in memory by using the function load_scores below.

>>> def load_scores(filename):
        d={}
        file=open(filename)
        for line in file: # one student item
            line=line.split( ) # split into list
            d[line[0]]=line[1:] # key=id and value=scores
        file.close()
        return d

>>> scores=load_scores('scores.txt')
>>> scores
{'1000101': ['95', '90', '70', '90', '85', '65'], '1000103': ['99', '90', '77', '85', '97', '55'], '1000208': ['70', '35', '52', '56', '60', '51']}

The function load_scores takes a string argument that specifies the path of a score file, and returns a dict object that contains the content of the record. It works by initializing the return dict as an empty dict, and then enumerating each line in the file. Each line is split by whitespaces using the split method discussed earlier, and the resulting head and tail are used as the key and value of a new entry in the return dict, respectively.

A Python program can be used to obtain a score record from a file, and then edit the record, adding the results of new examinations. For example, the function save_scores below can be used to save a score record back into a file.

>>> def save_scores(d, filename):
        file=open(filename , 'w')
        for k in d:
            file.write(k)
            file.write(' ')
            file.write(' '.join(d[k]))
            file.write('\n')
        file.close()
...
>>> d={'1000101': ['95', '90', '70', '90', '85', '45', '70'],'1000103': ['99', '90', '77', '85', '97', '55', '71'],'1000208': ['70', '35', '52', '56', '60', '51', '40']}
>>> save_scores(d, 'scores.txt')

Execution of the program overrides the content of c:\myPython\scores.txt.

© Copyright 2024 GS Ng.

Next Section - 12.1 Classes and Instances