Sometimes you need to make changes to multiple text files—for example, if you want to update some files to use US spelling instead of UK spelling.
In this quick tip, I will show you an example where we have five text files that have misspelled a word. That is, instead of writing "World", "Wolrd" is written. The example will show you how we can use Python to correct the spelling of this word in all the text files included within a directory.
Let's get started!
Before we move forward with the example, let's prepare the data (text files) we want to work with. For this tutorial, we will create a directory called hello which will have different files and sub-directories including text files named 1.txt, 2.txt, 3.txt, 4.txt, and 5.txt.
While the function we will use to iterate over the file list will include all the files in the directory, we can add our own conditionals in the code to limit ourselves to the files that we want to modify.
Let's get to the fun part. The first thing we need to do is read the content of the directory hello. For this, we can use the
scandir() method, as follows:
import os directory = os.scandir('hello')
This method returns an iterator. We can use it to create a for loop to see all the files in the directory:
entries = [it.name for it in directory] print(entries)
In which case, we will get:
['.nomedia', '1.txt', '2.txt', '3.txt', '4.txt', '5.txt', 'others']
This shows that we have five .txt files inside the hello directory. However, it contains some other files and sub-directories as well.
Now we will loop through all the files in the directory hello. We can do so with the help of
for-in loop while using a
with statement. This will automatically free up resources when we have executed the code in this block.
with os.scandir('hello') as directory: for item in directory:
Since we want to look for Wolrd in each of the five files in the directory, the normal thing to do at this stage is to open and read the contents of each file. We will skip over directories by using the
is_file() method and files that start with the
. character by using the
startswith() method on the file name. This allows us to only read and write to files that we actually intend to modify.
We also open the file with the
open() method in
r+ mode. This allows us to read the file's contents and then write to it after making the necessary changes.
if not item.name.startswith('.') and item.is_file(): with open(item, mode="r+") as file: file_text = file.read()
Now comes a vital step, especially when talking about pattern matching—in our case, searching for Wolrd. This step uses regular expressions. In Python, in order to use regular expressions, we will be using the
We will be using two main functions from this module. The first is
Compile a regular expression pattern into a regular expression object, which can be used for matching using its
And the second is
sub(), for substituting the wrong spelling with the correct one. We will thus do the following:
regex = re.compile('Wolrd') file_text = regex.sub('World', file_text)
Finally, we want to write the new text after substitution to our files, as follows:
Putting It All Together
In this section, let's see how the whole Python script, which will look for Wolrd in each file and replace that with World, will look:
import os, re with os.scandir('hello') as directory: for item in directory: if not item.name.startswith('.') and item.is_file(): with open(item, mode="r+") as file: file_text = file.read() regex = re.compile('Wolrd') file_text = regex.sub('World', file_text) file.seek(0) file.write(file_text)
As we can see, Python makes it very easy to carry out modifications across multiple files using a for loop. Another important part to remember here is the use of regular expressions for pattern matching.