Hostingheaderbarlogoj
Join InMotion Hosting for $3.49/mo & get a year on Tuts+ FREE (worth $180). Start today.
Advertisement

HTTP Headers for Dummies

by
Gift

Want a free year on Tuts+ (worth $180)? Start an InMotion Hosting plan for $3.49/mo.

Whether you're a programmer or not, you have seen it everywhere on the web. At this moment your browsers address bar shows something that starts with "http://". Even your first Hello World script sent HTTP headers without you realizing it. In this article we are going to learn about the basics of HTTP headers and how we can use them in our web applications.

What are HTTP Headers?

HTTP stands for "Hypertext Transfer Protocol". The entire World Wide Web uses this protocol. It was established in the early 1990's. Almost everything you see in your browser is transmitted to your computer over HTTP. For example, when you opened this article page, your browser probably have sent over 40 HTTP requests and received HTTP responses for each.

HTTP headers are the core part of these HTTP requests and responses, and they carry information about the client browser, the requested page, the server and more.

Example

When you type a url in your address bar, your browser sends an HTTP request and it may look like this:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
Host: net.tutsplus.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120
Pragma: no-cache
Cache-Control: no-cache

First line is the "Request Line" which contains some basic info on the request. And the rest are the HTTP headers.

After that request, your browser receives an HTTP response that may look like this:

HTTP/1.x 200 OK
Transfer-Encoding: chunked
Date: Sat, 28 Nov 2009 04:36:25 GMT
Server: LiteSpeed
Connection: close
X-Powered-By: W3 Total Cache/0.8
Pragma: public
Expires: Sat, 28 Nov 2009 05:36:25 GMT
Etag: "pub1259380237;gz"
Cache-Control: max-age=3600, public
Content-Type: text/html; charset=UTF-8
Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT
X-Pingback: http://net.tutsplus.com/xmlrpc.php
Content-Encoding: gzip
Vary: Accept-Encoding, Cookie, User-Agent

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Top 20+ MySQL Best Practices - Nettuts+</title>
<!-- ... rest of the html ... -->

The first line is the "Status Line", followed by "HTTP headers", until the blank line. After that, the "content" starts (in this case, an HTML output).

When you look at the source code of a web page in your browser, you will only see the HTML portion and not the HTTP headers, even though they actually have been transmitted together as you see above.

These HTTP requests are also sent and received for other things, such as images, CSS files, JavaScript files etc. That is why I said earlier that your browser has sent at least 40 or more HTTP requests as you loaded just this article page.

Now, let's start reviewing the structure in more detail.

How to See HTTP Headers

I use the following Firefox extensions to analyze HTTP headers:

In PHP:

Further in the article, we will see some code examples in PHP.

HTTP Request Structure

The first line of the HTTP request is called the request line and consists of 3 parts:

  • The "method" indicates what kind of request this is. Most common methods are GET, POST and HEAD.
  • The "path" is generally the part of the url that comes after the host (domain). For example, when requesting "http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/" , the path portion is "/tutorials/other/top-20-mysql-best-practices/".
  • The "protocol" part contains "HTTP" and the version, which is usually 1.1 in modern browsers.

The remainder of the request contains HTTP headers as "Name: Value" pairs on each line. These contain various information about the HTTP request and your browser. For example, the "User-Agent" line provides information on the browser version and the Operating System you are using. "Accept-Encoding" tells the server if your browser can accept compressed output like gzip.

You may have noticed that the cookie data is also transmitted inside an HTTP header. And if there was a referring url, that would have been in the header too.

Most of these headers are optional. This HTTP request could have been as small as this:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
Host: net.tutsplus.com

And you would still get a valid response from the web server.

Request Methods

The three most commonly used request methods are: GET, POST and HEAD. You're probably already familiar with the first two, from writing html forms.

GET: Retrieve a Document

This is the main method used for retrieving html, images, JavaScript, CSS, etc. Most data that loads in your browser was requested using this method.

For example, when loading a Nettuts+ article, the very first line of the HTTP request looks like so:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
...

Once the html loads, the browser will start sending GET request for images, that may look like this:

GET /wp-content/themes/tuts_theme/images/header_bg_tall.png HTTP/1.1
...

Web forms can be set to use the method GET. Here is an example.

<form method="GET" action="foo.php">

First Name: <input type="text" name="first_name" /> <br />
Last Name: <input type="text" name="last_name" /> <br />

<input type="submit" name="action" value="Submit" />

</form>

When that form is submitted, the HTTP request begins like this:

GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1
...

You can see that each form input was added into the query string.

POST: Send Data to the Server

Even though you can send data to the server using GET and the query string, in many cases POST will be preferable. Sending large amounts of data using GET is not practical and has limitations.

POST requests are most commonly sent by web forms. Let's change the previous form example to a POST method.

<form method="POST" action="foo.php">

First Name: <input type="text" name="first_name" /> <br />
Last Name: <input type="text" name="last_name" /> <br />

<input type="submit" name="action" value="Submit" />

</form>

Submitting that form creates an HTTP request like this:

POST /foo.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/test.php
Content-Type: application/x-www-form-urlencoded
Content-Length: 43

first_name=John&last_name=Doe&action=Submit

There are three important things to note here:

  • The path in the first line is simply /foo.php and there is no query string anymore.
  • Content-Type and Content-Lenght headers have been added, which provide information about the data being sent.
  • All the data is in now sent after the headers, with the same format as the query string.

POST method requests can also be made via AJAX, applications, cURL, etc. And all file upload forms are required to use the POST method.

HEAD: Retrieve Header Information

HEAD is identical to GET, except the server does not return the content in the HTTP response. When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself.

"When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself."

With this method the browser can check if a document has been modified, for caching purposes. It can also check if the document exists at all.

For example, if you have a lot of links on your website, you can periodically send HEAD requests to all of them to check for broken links. This will work much faster than using GET.

HTTP Response Structure

After the browser sends the HTTP request, the server responds with an HTTP response. Excluding the content, it looks like this:

The first piece of data is the protocol. This is again usually HTTP/1.x or HTTP/1.1 on modern servers.

The next part is the status code followed by a short message. Code 200 means that our GET request was successful and the server will return the contents of the requested document, right after the headers.

We all have seen "404" pages. This number actually comes from the status code part of the HTTP response. If the GET request would be made for a path that the server cannot find, it would respond with a 404 instead of 200.

The rest of the response contains headers just like the HTTP request. These values can contain information about the server software, when the page/file was last modified, the mime type etc...

Again, most of those headers are actually optional.

HTTP Status Codes

  • 200's are used for successful requests.
  • 300's are for redirections.
  • 400's are used if there was a problem with the request.
  • 500's are used if there was a problem with the server.

200 OK

As mentioned before, this status code is sent in response to a successful request.

206 Partial Content

If an application requests only a range of the requested file, the 206 code is returned.

It's most commonly used with download managers that can stop and resume a download, or split the download into pieces.

404 Not Found

When the requested page or file was not found, a 404 response code is sent by the server.

401 Unauthorized

Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.

Note that this only applies to HTTP password protected pages, that pop up login prompts like this:

403 Forbidden

If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a url for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.

For example, on my local server I created an images folder. Inside this folder I put an .htaccess file with this line: "Options -Indexes". Now when I try to open http://localhost/images/ - I see this:

There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives.

order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12
deny from 172.16.7.92
allow from all

302 (or 307) Moved Temporarily & 301 Moved Permanently

These two codes are used for redirecting a browser. For example, when you use a url shortening service, such as bit.ly, that's exactly how they forward the people who click on their links.

Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future. But if you redirect using 301, it will tell the spider that your website has moved to that location permanently. To give you a better idea: http://www.nettuts.com redirects to http://net.tutsplus.com/ using a 301 code instead of 302.

500 Internal Server Error

This code is usually seen when a web script crashes. Most CGI scripts do not output errors directly to the browser, unlike PHP. If there is any fatal errors, they will just send a 500 status code. And the programmer then needs to search the server error logs to find the error messages.

Complete List

You can find the complete list of HTTP status codes with their explanations here.

HTTP Headers in HTTP Requests

Now, we'll review some of the most common HTTP headers found in HTTP requests.

Almost all of these headers can be found in the $_SERVER array in PHP. You can also use the getallheaders() function to retrieve all headers at once.

Host

An HTTP Request is sent to a specific IP Addresses. But since most servers are capable of hosting multiple websites under the same IP, they must know which domain name the browser is looking for.

Host: net.tutsplus.com

This is basically the host name, including the domain and the subdomain.

In PHP, it can be found as $_SERVER['HTTP_HOST'] or $_SERVER['SERVER_NAME'].

User-Agent

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)

This header can carry several pieces of information such as:

  • Browser name and version.
  • Operating System name and version.
  • Default language.

This is how websites can collect certain general information about their surfers' systems. For example, they can detect if the surfer is using a cell phone browser and redirect them to a mobile version of their website which works better with low resolutions.

In PHP, it can be found with: $_SERVER['HTTP_USER_AGENT'].

if ( strstr($_SERVER['HTTP_USER_AGENT'],'MSIE 6') ) {
	echo "Please stop using IE6!";
}

Accept-Language

Accept-Language: en-us,en;q=0.5

This header displays the default language setting of the user. If a website has different language versions, it can redirect a new surfer based on this data.

It can carry multiple languages, separated by commas. The first one is the preferred language, and each other listed language can carry a "q" value, which is an estimate of the user's preference for the language (min. 0 max. 1).

In PHP, it can be found as: $_SERVER["HTTP_ACCEPT_LANGUAGE"].

if (substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2) == 'fr') {
	header('Location: http://french.mydomain.com');
}

Accept-Encoding

Accept-Encoding: gzip,deflate

Most modern browsers support gzip, and will send this in the header. The web server then can send the HTML output in a compressed format. This can reduce the size by up to 80% to save bandwidth and time.

In PHP, it can be found as: $_SERVER["HTTP_ACCEPT_ENCODING"]. However, when you use the ob_gzhandler() callback function, it will check this value automatically, so you don't need to.

// enables output buffering
// and all output is compressed if the browser supports it
ob_start('ob_gzhandler');

If-Modified-Since

If a web document is already cached in your browser, and you visit it again, your browser can check if the document has been updated by sending this:

If-Modified-Since: Sat, 28 Nov 2009 06:38:19 GMT

If it was not modified since that date, the server will send a "304 Not Modified" response code, and no content - and the browser will load the content from the cache.

In PHP, it can be found as: $_SERVER['HTTP_IF_MODIFIED_SINCE'].

// assume $last_modify_time was the last the output was updated

// did the browser send If-Modified-Since header?
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {

	// if the browser cache matches the modify time
	if ($last_modify_time == strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {

		// send a 304 header, and no content
		header("HTTP/1.1 304 Not Modified");
		exit;
	}

}

There is also an HTTP header named Etag, which can be used to make sure the cache is current. We'll talk about this shortly.

Cookie

As the name suggests, this sends the cookies stored in your browser for that domain.

Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120; foo=bar

These are name=value pairs separated by semicolons. Cookies can also contain the session id.

In PHP, individual cookies can be accessed with the $_COOKIE array. You can directly access the session variables using the $_SESSION array, and if you need the session id, you can use the session_id() function instead of the cookie.

echo $_COOKIE['foo'];
// output: bar
echo $_COOKIE['PHPSESSID'];
// output: r2t5uvjq435r4q7ib3vtdjq120
session_start();
echo session_id();
// output: r2t5uvjq435r4q7ib3vtdjq120

Referer

As the name suggests, this HTTP header contains the referring url.

For example, if I visit the Nettuts+ homepage, and click on an article link, this header is sent to my browser:

Referer: http://net.tutsplus.com/

In PHP, it can be found as $_SERVER['HTTP_REFERER'].

if (isset($_SERVER['HTTP_REFERER'])) {

	$url_info = parse_url($_SERVER['HTTP_REFERER']);

	// is the surfer coming from Google?
	if ($url_info['host'] == 'www.google.com') {

		parse_str($url_info['query'], $vars);

		echo "You searched on Google for this keyword: ". $vars['q'];

	}

}
// if the referring url was:
// http://www.google.com/search?source=ig&hl=en&rlz=&=&q=http+headers&aq=f&oq=&aqi=g-p1g9
// the output will be:
// You searched on Google for this keyword: http headers

You may have noticed the word "referrer" is misspelled as "referer". Unfortunately it made into the official HTTP specifications like that and got stuck.

Authorization

When a web page asks for authorization, the browser opens a login window. When you enter a username and password in this window, the browser sends another HTTP request, but this time it contains this header.

Authorization: Basic bXl1c2VyOm15cGFzcw==

The data inside the header is base64 encoded. For example, base64_decode('bXl1c2VyOm15cGFzcw==') would return 'myuser:mypass'

In PHP, these values can be found as $_SERVER['PHP_AUTH_USER'] and $_SERVER['PHP_AUTH_PW'].

More on this when we talk about the WWW-Authenticate header.

HTTP Headers in HTTP Responses

Now we are going to look at some of the most common HTTP headers found in HTTP responses.

In PHP, you can set response headers using the header() function. PHP already sends certain headers automatically, for loading the content and setting cookies etc... You can see the headers that are sent, or will be sent, with the headers_list() function. You can check if the headers have been sent already, with the headers_sent() function.

Cache-Control

Definition from w3.org: "The Cache-Control general-header field is used to specify directives which MUST be obeyed by all caching mechanisms along the request/response chain." These "caching mechanisms" include gateways and proxies that your ISP may be using.

Example:

Cache-Control: max-age=3600, public

"public" means that the response may be cached by anyone. "max-age" indicates how many seconds the cache is valid for. Allowing your website to be cached can reduce server load and bandwidth, and also improve load times at the browser.

Caching can also be prevented by using the "no-cache" directive.

Cache-Control: no-cache

For more detailed info, see w3.org.

Content-Type

This header indicates the "mime-type" of the document. The browser then decides how to interpret the contents based on this. For example, an html page (or a PHP script with html output) may return this:

Content-Type: text/html; charset=UTF-8

"text" is the type and "html" is the subtype of the document. The header can also contain more info such as charset.

For a gif image, this may be sent.

Content-Type: image/gif

The browser can decide to use an external application or browser extension based on the mime-type. For example this will cause the Adobe Reader to be loaded:

Content-Type: application/pdf

When loading directly, Apache can usually detect the mime-type of a document and send the appropriate header. Also most browsers have some amount fault tolerance and auto-detection of the mime-types, in case the headers are wrong or not present.

You can find a list of common mime types here.

In PHP, you can use the finfo_file() function to detect the mime type of a file.

Content-Disposition

This header instructs the browser to open a file download box, instead of trying to parse the content. Example:

Content-Disposition: attachment; filename="download.zip"

That will cause the browser to do this:

Note that the appropriate Content-Type header should also be sent along with this:

Content-Type: application/zip
Content-Disposition: attachment; filename="download.zip"

Content-Length

When content is going to be transmitted to the browser, the server can indicate the size of it (in bytes) using this header.

Content-Length: 89123

This is especially useful for file downloads. That's how the browser can determine the progress of the download.

For example, here is a dummy script I wrote, which simulates a slow download.

// it's a zip file
header('Content-Type: application/zip');
// 1 million bytes (about 1megabyte)
header('Content-Length: 1000000');
// load a download dialogue, and save it as download.zip
header('Content-Disposition: attachment; filename="download.zip"');

// 1000 times 1000 bytes of data
for ($i = 0; $i < 1000; $i++) {
	echo str_repeat(".",1000);

	// sleep to slow down the download
	usleep(50000);
}

The result is:

Now I am going to comment out the Content-Length header

// it's a zip file
header('Content-Type: application/zip');
// the browser won't know the size
// header('Content-Length: 1000000');
// load a download dialogue, and save it as download.zip
header('Content-Disposition: attachment; filename="download.zip"');

// 1000 times 1000 bytes of data
for ($i = 0; $i < 1000; $i++) {
	echo str_repeat(".",1000);

	// sleep to slow down the download
	usleep(50000);
}

Now the result is:

The browser can only tell you how many bytes have been downloaded, but it does not know the total amount. And the progress bar is not showing the progress.

Etag

This is another header that is used for caching purposes. It looks like this:

Etag: "pub1259380237;gz"

The web server may send this header with every document it serves. The value can be based on the last modify date, file size or even the checksum value of a file. The browser then saves this value as it caches the document. Next time the browser requests the same file, it sends this in the HTTP request:

If-None-Match: "pub1259380237;gz"

If the Etag value of the document matches that, the server will send a 304 code instead of 200, and no content. The browser will load the contents from its cache.

Last-Modified

As the name suggests, this header indicates the last modify date of the document, in GMT format:

Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT
$modify_time = filemtime($file);

header("Last-Modified: " . gmdate("D, d M Y H:i:s", $modify_time) . " GMT");

It offers another way for the browser to cache a document. The browser may send this in the HTTP request:

If-Modified-Since: Sat, 28 Nov 2009 06:38:19 GMT

We already talked about this earlier in the "If-Modified-Since" section.

Location

This header is used for redirections. If the response code is 301 or 302, the server must also send this header. For example, when you go to http://www.nettuts.com your browser will receive this:

HTTP/1.x 301 Moved Permanently
...
Location: http://net.tutsplus.com/
...

In PHP, you can redirect a surfer like so:

header('Location: http://net.tutsplus.com/');

By default, that will send a 302 response code. If you want to send 301 instead:

header('Location: http://net.tutsplus.com/', true, 301);

Set-Cookie

When a website wants to set or update a cookie in your browser, it will use this header.

Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Sun, 29-Nov-2009 21:42:28 GMT
Set-Cookie: session-id=120-7333518-8165026; path=/; domain=.amazon.com; expires=Sat Feb 27 08:00:00 2010 GMT

Each cookie is sent as a separate header. Note that the cookies set via JavaScript do not go through HTTP headers.

In PHP, you can set cookies using the setcookie() function, and PHP sends the appropriate HTTP headers.

setcookie("TestCookie", "foobar");

Which causes this header to be sent:

Set-Cookie: TestCookie=foobar

If the expiration date is not specified, the cookie is deleted when the browser window is closed.

WWW-Authenticate

A website may send this header to authenticate a user through HTTP. When the browser sees this header, it will open up a login dialogue window.

WWW-Authenticate: Basic realm="Restricted Area"

Which looks like this:

There is a section in the PHP manual, that has code samples on how to do this in PHP.

if (!isset($_SERVER['PHP_AUTH_USER'])) {
    header('WWW-Authenticate: Basic realm="My Realm"');
    header('HTTP/1.0 401 Unauthorized');
    echo 'Text to send if user hits Cancel button';
    exit;
} else {
    echo "<p>Hello {$_SERVER['PHP_AUTH_USER']}.</p>";
    echo "<p>You entered {$_SERVER['PHP_AUTH_PW']} as your password.</p>";
}

Content-Encoding

This header is usually set when the returned content is compressed.

Content-Encoding: gzip

In PHP, if you use the ob_gzhandler() callback function, it will be set automatically for you.

Conclusion

Thanks for reading. I hope this article was a good starting point to learn about HTTP Headers. Please leave your comments and questions below, and I will try to respond as much as I can.

Advertisement