Python provides several ways to download files from the internet. This can be done over HTTP using the urllib package or the requests library. This tutorial will discuss how to use these libraries to download files from URLs using Python.
Requests Library
The requests library is one of the most popular libraries in Python. Requests allow you to send HTTP/1.1 requests without the need to manually add query strings to your URLs, or form-encode your POST data.
With the requests library, you can perform a lot of functions including:
- adding form data
- adding multipart files
- accessing the response data of Python
Making Requests
The first you need to do is to install the library, and it's as simple as:
1 |
pip install requests
|
To test if the installation has been successful, you can do a very easy test in your Python interpreter by simply typing:
1 |
import requests |
If the installation has been successful, there will be no errors.
HTTP requests include:
- GET
- POST
- PUT
- DELETE
- OPTIONS
- HEAD
Making a GET request
Making requests is very easy, as illustrated below.
1 |
import requests |
2 |
req = requests.get(“https://www.google.com”) |
The above command will get the google web page and store the information in the req variable. We can then go on to get other attributes as well.
For instance, to know if fetching the Google web page was successful, we will query the status_code.
1 |
import requests |
2 |
req = requests.get(“https://www.google.com") |
3 |
req.status_code
|
4 |
200
|
5 |
|
6 |
# 200 means a successful request
|
What if we want to find out the encoding type of the Google web page?
1 |
req.encoding |
2 |
ISO-8859–1 |
You might also want to know the contents of the response.
1 |
req.text |
This is just the truncated content of the response.
1 |
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en
|
2 |
"><head><meta content="Search the world\'s information, including webpages, imag
|
3 |
es, videos and more. Google has many special features to help you find exactly w
|
4 |
hat you\'re looking for." name="description"><meta content="noodp" name="robots" |
5 |
><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta conten |
6 |
t="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image |
7 |
"><title>Google</title><script>(function(){window.google={kEI:\'_Oq7WZT-LIf28QWv |
Making a POST Request
In simple terms, a POST request is used to create or update data. This is especially used in the submission of forms.
Let's assume you have a registration form that takes an email address and password as input data. When you click on the submit button for registration, the post request will be as shown below.
1 |
data = {"email":"info@tutsplus.com", |
2 |
"password":"12345") |
3 |
req = requests.post(“http://www.google.com, params = data) |
Making a PUT Request
A PUT request is similar to a POST request. It's used to update data. For instance, the API below shows how to do a PUT request.
1 |
data= {"name":"tutsplus", |
2 |
"telephone":"12345") |
3 |
r.put("http://www.contact.com, params= data") |
Making a DELETE Request
A DELETE request, as the name suggests, is used to delete data. Below is an example of a DELETE request.
1 |
data= {'name':'Tutsplus'} |
2 |
url = "https://www.contact.com/api/") |
3 |
response = requests.delete(url, params= data) |
urllib Package
urllib is a package that collects several modules for working with URLs:
-
urllib.requestfor opening and reading URLs -
urllib.errorcontaining the exceptions raised byurllib.request -
urllib.parsefor parsing URLs -
urllib.robotparserfor parsingrobots.txtfiles
urllib.request offers a very simple interface, in the form of the urlopen function, which is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling basic authentication, cookies, proxies, etc.
How to Fetch URLs With urllib
The simplest way to use urllib.request is as follows:
1 |
import urllib.request |
2 |
with urllib.request.urlopen('http://python.org/') as response: |
3 |
html = response.read() |
If you wish to retrieve an internet resource and store it, you can do so via the urlretrieve() function.
1 |
import urllib.request |
2 |
filename, headers = urllib.request.urlretrieve('http://python.org/') |
3 |
html = open(filename) |
Downloading Images With Python
In this example, we want to download this sample image using both the request library and the urllib module.
1 |
url = 'https://www.python.org/static/opengraph-icon-200x200.png' |
2 |
# downloading with urllib
|
3 |
# imported the urllib library
|
4 |
import urllib.request |
5 |
# Copy a network object to a local file
|
6 |
urllib.request.urlretrieve(url, "python.png") |
7 |
# downloading with requests
|
8 |
# import the requests library
|
9 |
import requests |
10 |
# download the url contents in binary format
|
11 |
r = requests.get(url) |
12 |
# open method to open a file on your system and write the contents
|
13 |
with open("python1.png", "wb") as code: |
14 |
code.write(r.content) |
Download PDF Files With Python
In this example, we will download a PDF about Google trends.
1 |
url = 'https://static.googleusercontent.com/media/www.google.com/en//googleblogs/pdfs/google_predicting_the_present.pdf' |
2 |
# downloading with urllib
|
3 |
# import the urllib package
|
4 |
import urllib.request |
5 |
# Copy a network object to a local file
|
6 |
urllib.request.urlretrieve(url, "tutorial.pdf") |
7 |
# downloading with requests
|
8 |
# import the requests library
|
9 |
import requests |
10 |
# download the file contents in binary format
|
11 |
r = requests.get(url) |
12 |
# open method to open a file on your system and write the contents
|
13 |
with open("tutorial1.pdf", "wb") as code: |
14 |
code.write(r.content) |
Download Zip Files With Python
In this example, we are going to download the contents of a GitHub repository and store the file locally.
1 |
url = 'https://codeload.github.com/fogleman/Minecraft/zip/master' |
2 |
# downloading with requests
|
3 |
# import the requests library
|
4 |
import requests |
5 |
# download the file contents in binary format
|
6 |
r = requests.get(url) |
7 |
# open method to open a file on your system and write the contents
|
8 |
with open("minemaster1.zip", "wb") as code: |
9 |
code.write(r.content) |
10 |
# downloading with urllib
|
11 |
# import the urllib library
|
12 |
import urllib.request |
13 |
# Copy a network object to a local file
|
14 |
urllib.request.urlretrieve(url, "minemaster.zip") |
Download Videos With Python
In this example, we want to download a video lecture.
1 |
url = 'https://www.youtube.com/watch?v=aDwCCUfNFug' |
2 |
video_name = url.split('/')[-1] |
3 |
|
4 |
# downloading with requests
|
5 |
# import the requests library
|
6 |
import requests |
7 |
|
8 |
print("Downloading file:%s" % video_name) |
9 |
|
10 |
# download the file contents in binary format
|
11 |
r = requests.get(url) |
12 |
# open method to open a file on your system and write the contents
|
13 |
with open("tutorial.mp4", "wb") as code: |
14 |
code.write(r.content) |
15 |
|
16 |
# downloading with urllib
|
17 |
# import the urllib library
|
18 |
import urllib |
19 |
print("Downloading file:%s" % video_name) |
20 |
|
21 |
# Copy a network object to a local file
|
22 |
urllib.urlretrieve(url, "tutorial2.mp4") |
Download CSV Files with Python
You can also use the requests and urllib libraries to download CSV files and process the response using the csv module. Let's use some sample CSV address data.
1 |
import requests |
2 |
url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv" |
3 |
|
4 |
# get the file response
|
5 |
req = requests.get(url) |
6 |
print(type(req)) |
7 |
|
8 |
# get the contents of the response
|
9 |
url_content = req.content |
10 |
csv_file = open('sample2.csv', 'wb') |
11 |
|
12 |
# write the contents to a csv file
|
13 |
csv_file.write(url_content) |
14 |
# close the file
|
15 |
csv_file.close() |
16 |
|
17 |
# Using Urllib
|
18 |
|
19 |
#import necessary modules
|
20 |
|
21 |
import urllib.request |
22 |
import csv |
23 |
import codecs |
24 |
|
25 |
url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv" |
26 |
|
27 |
# download file from url
|
28 |
res = urllib.request.urlopen(url) |
29 |
|
30 |
# open a file
|
31 |
data = csv.reader(codecs.iterdecode(res, "utf-8")) |
32 |
|
33 |
for row in data: |
34 |
print(row) |
35 |
Conclusion
This tutorial has covered the most commonly used methods to download files as well as the most common file formats. Even though you will write less code when using the urllib module, the requests module is preferred due to its simplicity, popularity, and many additional features, including:
- keep-alive and connection pooling
- sessions with cookie persistence
- browser-style SSL verification
- automatic content decoding
- authentication
- automatic decompression
- Unicode response bodies
- HTTP(S) proxy support
- multipart file uploads
- streaming downloads
- connection timeouts
- chunked requests
- .netrc support



