Say that someone is familiar with British spelling and has decided to complete his degree in the US. He is asked to write a paper about Python for the class. He is well versed in Python and has no issue in writing the paper. He was talking about images in a part of his paper and wrote more than once the word
grey (British spelling) instead of
gray (US spelling), in addition to
neighbourhood (British spelling) instead of
neighborhood (US spelling). But he is now in the US and has to go through all the words spelled the British way and replace them with the US spellings.
This is one of many scenarios in which we need to change some spelling or mistake in multiple locations.
In this quick tip, I will show you an example where we have five text files that have misspelled my name. That is, instead of writing
Adber is written. The example will show you how we can use Python to correct the spelling of my name in all the text files included within a directory.
Let's get started!
Before we move forward with the example, let's prepare the data (text files) we want to work with. Go ahead and download the directory with its files. Unzip the directory and you are now all set.
As you can see, we have a directory named
Abder which contains five different files named
1,2,3,4, and 5.
Let's get to the fun part. The first thing we need to do is read the content of the directory
Abder. For this, we can use the
listdir() method, as follows:
import os directory = os.listdir('/Users/DrAbder/Desktop/Abder')
If we try to see what's inside the directory, we can do the following:
In which case, we will get:
['.DS_Store', '1.rtf', '2.rtf', '3.rtf', '4.rtf', '5.rtf']
This shows that we have five
rft files inside the directory.
To make sure we are working with the current directory (directory of interest), we can use
chdir as follows:
The next thing we need to do is loop through all the files in the directory
Abder. We can use a
for-loop as follows:
for file in directory:
Since we want to look in each of the five files in the directory and look for
Adber, the normal thing to do at this stage is to open and read the content of each file:
open_file = open(file,'r') read_file = open_file.read()
Now comes a vital step, especially when talking about pattern matching, in our case, searching for
Adber. This step is the use of regular expressions. In Python, in order to use regular expressions, we will be using the re module.
We will be using two main functions from this module. The first is compile():
Compile a regular expression pattern into a regular expression object, which can be used for matching using its
And the second is sub(), for substituting the wrong spelling with the correct one. We will thus do the following:
regex = re.compile('Adber') read_file = regex.sub('Abder', read_file)
Finally, we want to write the new text after substitution to our files, as follows:
write_file = open(file,'w') write_file.write(read_file)
Putting It All Together
In this section, let's see how the whole Python script, which will look for
Adber in each file and replace that with
Abder, will look:
import os, re directory = os.listdir('/Users/DrAbder/Desktop/Abder') os.chdir('/Users/DrAbder/Desktop/Abder') for file in directory: open_file = open(file,'r') read_file = open_file.read() regex = re.compile('Adber') read_file = regex.sub('Abder', read_file) write_file = open(file,'w') write_file.write(read_file)
As we can see, Python makes it very easy to carry out modifications across multiple files using the
for-loop. Another important part to remember here is the use of regular expressions for pattern matching.