Video icon 64
Learning to code? Skill up faster with our practical video courses. Start your free trial today.
Advertisement

Using htaccess Files for Pretty URLS

by
Student iconAre you a student? Get a yearly Tuts+ subscription for $45 →

Continuing our review of htaccess files, today we'll examine how to use mod_rewrite to create pretty URLs.

Benefits of Formatted URLs

While some claim pretty URLs help in search engine rankings, the debate here is fierce, we can all agree that pretty URLs make things easier for our users and adds a level of professionalism and polish to any web application. I could go over all the theoretical reasons for this, but I like real-world examples better. Like it or hate it we all must admit that Twitter is a wildly popular web application and part of the reason for that is most certainly how it formats URLs. I can tell anyone in the know that my Twitter username is noahhendrix, and they know my profile can easily be found at twitter.com/noahhendrix. This seemingly simple concept has vast effects in the popularity of your application.

Just to put things in perspective we can look at another popular social networking website, Facebook. Since the site launched in 2004 the profile system has grown and evolved to better tailor to users, but one glaring hole was the URL to a profile. From the time I registered with Facebook my profile was at the URL http://www.facebook.com/profile.php?id=1304880680. That is quite a mouth full, and just recently it appears Facebook has realized that and they launched Facebook vanity URLs. Now I can share my Facebook profile by telling people my Facebook username is "noahhendrix", which they know can be found by going to facebook.com/noahhendrix. While the odds are that we won't have an application as popular as Facebook, we can still borrow a few pages from their book.

Quick Overview

A quick overview before we dive into code, in today's tutorial we will go over two slightly different methods of creating pretty URLs using HTACCESS. The difference between the methods is whether Apache or PHP is doing the heavy lifting to break the URL apart for parsing. I want to point out that mod_rewrite tutorials are almost as old as the internet itself and this is not the first. At the end I will use one of the methods to create a simple application to show how these solutions would look in a real-live website (well not 100% production quality). The service we will create is a URL shortener that can mirrors the functionality of such sites like bit.ly, TinyURL, or su.pr. So without anymore fluff let us look at the code.

Using Apache

First, we can place all of our code in Apache .htaccess files. This could look something like this:

  Options +FollowSymLinks
  RewriteEngine On

  RewriteCond %{SCRIPT_FILENAME} !-d
  RewriteCond %{SCRIPT_FILENAME} !-f

  RewriteRule ^users/(\d+)*$ ./profile.php?id=$1
  RewriteRule ^threads/(\d+)*$ ./thread.php?id=$1

  RewriteRule ^search/(.*)$ ./search.php?query=$1

Let's start at the top and work our way down to better understand what is going on here. The first line sets the environment up to follow symbolic links using the Options directive. This may or may not be necessary, but some web hosts use symlinks (similar to alias in MacOSX or shortcuts is Windows) for common HTTP request errors and these are usually symlinked files, or at least this is how I understand the reasoning. Next we tell Apache we are going to use the Rewrite Engine. The next two lines are very, very important it restricts rewriting URLs only to paths that do not actually exists. This prevents the rules below from matching example.com/images/logo.png for example. The first prevents existing directories with the !-d flag and the second with !-f means ignore existing files.

The next three lines are the actual URL rewriting commands. Each line creates a rule that tries to match a regular expressions pattern against the incoming URL. Regular expressions, at least for me, are a hard set of rules to remember but I always find it helpful to use this tutorial by Nettut's own Jeffery Way and the tool he recommends. I found it easy to type in sample URLs we want to match and then try to hack together the pattern.

The first argument is the pattern, between the caret and dollar sign. We tell Apache we want URLs asking for the users directory (an artificial directory, doesn't have to actually exist) followed by a / and any length of numbers. The parenthesis create a capture group, you can use as many of these as you want, they serve as variables that we can then transplant into our rewrite. The asterisk means the user can enter whatever they want, and it won't affect the rewrite, this is primarily to handle a trailing slash so example.com/users/123 is the same as example.com/users/123/ as users would expect.

The second argument is the path we want to actually call, this unlike the first must be a real file. We tell Apache to look in the current directory for a file called profile.php and send the parameter id=$1 along with it. Remember the capture group earlier? That is where we get the variable $1, capture groups start at one. This creates a URL on the server like example.com/profile.php?id=123.

This method is great for legacy web applications that have existing URL structures that prevent us from easily rewriting the backend to understand a new URL schema because to the server the URL looks the same, but to the user it looks much nicer.

Using PHP

This next method is great for those who don't want to distribute too much logic to Apache and feel more comfortable in PHP (or similar scripting languages). The concept here is capture any URL the server receives and push it to a PHP controller page. This comes with the added benefit of control, but greater complexity at the same time. Your HTACCESS file might look something like this:

  Options +FollowSymLinks
  RewriteEngine On

  RewriteCond %{SCRIPT_FILENAME} !-d
  RewriteCond %{SCRIPT_FILENAME} !-f

  RewriteRule ^.*$ ./index.php

Everything is the same as above, except the last line so we will skip to it. Instead of creating a capture group we just tell Apache to grab every URL and redirect it to index.php. What this means is we can do all of our URL handling in PHP without relying too much on stringent URL paths in HTACCESS. Here is what we might do at the top of our index.php file to parse out the URL:

  <?php
    #remove the directory path we don't want
    $request  = str_replace("/envato/pretty/php/", "", $_SERVER['REQUEST_URI']);

    #split the path by '/' 
    $params     = split("/", $request);
  ?>

The first line is not necessary unless you application doesn't live at the root directory, like my demos. I am removing the non-sense part of the URL that I don't want PHP to worry about. $_SERVER['REQUEST_URI'] is a global server variable that PHP provides and stores the request URL, it generally looks like this:

  /envato/pretty/php/users/query

As you can see it is basically everything after the domain name. Next we split up the remaining part of the virtual path and split it by the / character this allows us to grab individual variables. In my example I just printed the $params array out in the body, of course you will want to do something a little more useful.

One thing you might do is take the first element of the $params array and include a file by that same name and within in the file you can use the second element in the array to execute some code. This might look something like this:

	 <?php
	   #keeps users from requesting any file they want
	   $safe_pages = array("users", "search", "thread");
	   
	   if(in_array($params[0], $safe_pages)) {
	     include($params[0].".php");
	   } else {
	     include("404.php");
	   }
	 ?>

WARNING: The first part of this code is unbelievably important! You absolutely must restrict what pages a user can get so they don't have the opportunity to print out any page they wish by guessing at file names, like a database configuration file.

Now that we have the soapbox out of the way let's move on. Next we check if the requested file is in the $safe_pages array, and if it is we include otherwise will include a 404 not found page. In the included page you will see that you have access to the $params array and you can grab whatever data from it that is necessary in your application.

This is great for those who want a little more control and flexibility. It obviously requires quite a bit extra code, so probably better for new projects that won't require a lot of code to be updated to fit the new URL formats.

A Simple URL Shortner

This last part of the tutorial is going to let us put some use to the code we went over above, and is more or less a "real-life" example. We are going to create a service called shrtr, I made up this name so any other products with this name are not associated with the code I am posting below. Note: I know this is by far not an original concept, and is only meant for demonstration of mod_rewrite. First let's take a look at the database:

As you can see this is very straightforward, we have only 4 columns:

  • id: unique identifier used to reference specific rows
  • short: unique string of characters appended to the end of our URL to determine where to redirect
  • url: the URL that the short url redirects to
  • created_at: a simple timestamp so we know when this URL was created

The Basics

Next, let's go over the six files we need to create for this application:

  • .htaccess: redirects all short urls to serve.php
  • create.php: validates URL, creates shortcode, saves to DB
  • css/style.css: holds some basic styling information
  • db_config.php: store variables for database connections
  • index.php: The face of our application with form for entering URL
  • serve.php: looks up short URL and redirects to actual URL

That is all we need for our basic example. I will not cover index.php or css/style.css in very great detail because they are have no PHP, and are static files.

# index.php
----
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Makes URLs Shrtr</title>
    <link type="text/css" rel="stylesheet" href="./css/style.css" />
	</head>
	<body>
	 <div id="pagewrap">
  	 <h1>shrt<span class="r">r</span>.me</h1>
  	 
  	 <div class="body">
  	   <form action="./create.php" method="post">
  	   
  	     <span class="instructions">Type your URL here</span>
  	     <input name="url" type="text" />
  	     <input type="submit" value="shrtr" />
  	   
  	   </form>
  	 </div>
	   
	 </div>
	</body>
</html>

The only real interesting to note here is that we submit the form with a field called URL to create.php.

# css/style.css
----
/* reset */
* {
  font-family: Helvetica, sans-serif;
  margin: 0;
  padding: 0;
}

/* site */
html, body { background-color: #008AB8; }
a { color: darkblue; text-decoration: none;}

  #pagewrap {
    margin: 0 auto;
    width: 405px;
  }
  
    h1 {
      color: white;
      margin: 0;
      text-align: center;
      font-size: 100px;
    }
      h1 .r { color: darkblue; }
    
    .body {
      -moz-border-radius: 10px;
      -webkit-border-radius: 10px;
      background-color: white;
      text-align: center;
      padding: 50px;
      height: 80px;
      position: relative;
    }
    
      .body .instructions {
        display: block;
        margin-bottom: 10px;
      }
      .body .back {
        right: 15px;
        top: 10px;
        position: absolute;
      }
      
      .body input[type=text] {
        display: block;
        font-size: 20px;
        margin-bottom: 5px;
        text-align: center;
        padding: 5px;
        height: 20px;
        width: 300px;
      }

That is all very generic, but makes our application a little more presentable.

The last basic file we need to look at is our db_config.php, I created this to abstract some of the database connection information.

# db_config.php
----
<?php

  $database = "DATABASE_NAME";
  $username = "USERNAME";
  $password = "PASSWORD";
  $host     = "localhost";

?>

You need to replace the values with what works in your database, and host is probably localhost, but you need to double check with your hosting provider to make sure. Here is the SQL dump of the table, url_redirects that holds all the information we showed above:

--
-- Table structure for table `url_redirects`
--

CREATE TABLE IF NOT EXISTS `url_redirects` (
  `id` int(11) NOT NULL auto_increment,
  `short` varchar(10) NOT NULL,
  `url` varchar(255) NOT NULL,
  `created_at` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `short` (`short`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

Creating the Short URL

Next lets look at the code necessary to create our short URL.

# create.php
----
<?php
  require("./db_config.php");
  
  $url = $_REQUEST['url'];
  
  if(!preg_match("/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i", $url)) {
    $html = "Error: invalid URL";
  } else {
    
    $db = mysql_connect($host, $username, $password);
    
      $short = substr(md5(time().$url), 0, 5);
    
      if(mysql_query("INSERT INTO `".$database."`.`url_redirects` (`short`, `url`) VALUES ('".$short."', '".$url."');", $db)) {
        $html = "Your short URL is<br />shrtr.me/".$short;
      } else {
        $html = "Error: cannot find database";
      }
    
    mysql_close($db);
  }
?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Makes URLs Shrtr</title>
    <link type="text/css" rel="stylesheet" href="./css/style.css" />
	</head>
	<body>
	 <div id="pagewrap">
  	 <h1>shrt<span class="r">r</span>.me</h1>
  	 
  	 <div class="body">
  	   <?= $html ?>
  	   <br /><br />
  	   <span class="back"><a href="./">X</a></span>
  	 </div>
	   
	 </div>
	</body>
</html>

Now we are getting a bit more complex! First we need to include the database connection variables we created earlier, then we store the URL parameter sent to us by the create form in a variable called $url. Next we do some regular expressions magic to check if they actually sent a URL, if not we store an error. If the user entered a valid URL we create a connection to the database using the connection variables we include at the top of page. Next we generate a random 5 character string to save to the database, using the substr function. The string we split up is the md5 hash of the current time() and $url concatenated together. Then we insert that value into the url_redirects table along with the actual URL, and store a string to present to the user. If it fails to insert the data we store an error. If you move down into the HTML part of the page all we do is print out the value of $html, be it error or success. This obviously isn't the most elegant solution but it works!

Serving the Short URL

So we have the URL in the database let's work on serve.php so we can actually translate the short code into a redirect.

<?php
  require("./db_config.php");

  $short = $_REQUEST['short'];

  $db = mysql_connect($host, $username, $password);    
    $query = mysql_query("SELECT * FROM `".$database."`.`url_redirects` WHERE `short`='".mysql_escape_string($short)."' LIMIT 1", $db);
    $row = mysql_fetch_row($query);

    if(!empty($row)) {
      Header("HTTP/1.1 301 Moved Permanently");
      header("Location: ".$row[2]."");
    } else {
      $html = "Error: cannot find short URL";
    }

  mysql_close($db);
?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Makes URLs Shrtr</title>
    <link type="text/css" rel="stylesheet" href="./css/style.css" />
	</head>
	<body>
	 <div id="pagewrap">
  	 <h1>shrt<span class="r">r</span>.me</h1>

  	 <div class="body">
  	   <?= $html ?>
  	   <br /><br />
  	   <span class="back"><a href="./">X</a></span>
  	 </div>

	 </div>
	</body>
</html>

This one is very similar to create.php we include the database information, and store the short code sent to us in a variable called $short. Next we query the database for the URL of that short code. If we get a result we redirect to the URL, if not we print out an error like before.

As far as PHP goes that is all we need to do, but at the moment to share a short URL users must enter this, http://shrtr.me/server.php?short=SHORT_CODE not very pretty is it? Let's see if we can't incorporate some mod_rewrite code to make this nicer.

Pretty-ify With HTACCESS

Of the two methods I wrote about at the beginning of the tutorial we will use the Apache one because this application is already created without considering any URL parsing. The code will look something like this:

  Options +FollowSymLinks
  RewriteEngine On

  RewriteCond %{SCRIPT_FILENAME} !-d
  RewriteCond %{SCRIPT_FILENAME} !-f


  RewriteRule ^(\w+)$ ./serve.php?short=$1

Skipping to the RewriteRule we are directing any traffic that doesn't already have a real file or directory to serve.php and putting the extension in the GET variable short. Not to bad no go try it out for yourself!

Conclusion

Today we learned a few different ways to utilize mod_rewrite in our application to make our URLs pretty. As always I will be watching over the comments if anybody has trouble, or you can contact me on twitter. Thanks for reading!


Advertisement