# You Don't Know Anything About Regular Expressions: A Complete Guide

Regular expressions can be scary...really scary. Fortunately, once you memorize what each symbol represents, the fear quickly subsides. If you fit the title of this article, there's much to learn! Let's get started.

### Section 1: Learning the Basics

The key to learning how to effectively use regular expressions is to just take a day and memorize all of the symbols. This is the best advice I can possibly offer. Sit down, create some flash cards, and just memorize them! Here are the most common:

• . - Matches any character, except for line breaks if dotall is false.
• * - Matches 0 or more of the preceding character.
• + - Matches 1 or more of the preceding character.
• ? - Preceding character is optional. Matches 0 or 1 occurrence.
• \d - Matches any single digit
• \w - Matches any word character (alphanumeric & underscore).
• [XYZ] - Matches any single character from the character class.
• [XYZ]+ - Matches one or more of any of the characters in the set.

### Creating our Own Location Object

For our final project, we'll replicate the location object. For those unfamiliar, the location object provides you with information about the current page: the href, host, port, protocol, etc. Please note that this is purely for practice's sake. In a real world site, just use the preexisting location object!

We first begin by creating our location function, which accepts a single parameter representing the url that we wish to "decode;" we'll call it "loc."

Now, we can call it like so, and pass in a gibberish url :

Next, we need to return an object which contains a handful of methods.

#### Search

Though we won't create all of them, we'll mimic a handful or so. The first one will be "search." Using regular expressions, we'll need to search the url and return everything within the querystring.

Above, we take the passed in url, and try to match our regular expressions against it. This expression searches through the string for the question mark, representing the beginning of our querystring. At this point, we need to trap the remaining characters, which is why the (.+) is wrapped within parentheses. Finally, we need to return only that block of characters, so we use [1] to target it.

#### Hash

Now we'll create another method which returns the hash of the url, or anything after the pound sign.

This time, we search for the pound sign, and, once again, trap the following characters within parentheses so that we can refer to only that specific subset - with [1].

#### Protocol

The protocol method should return, as you would guess, the protocol used by the page - which is generally "http" or "https."

This one is slightly more tricky, only because there are a few choices to compensate for: http, https, and ftp. Though we could do something like - (http|https|ftp) - it would be cleaner to do: (ht|f)tps?
This designates that we should first find either an "ht" or the "f" character. Next, we match the "tp" characters. The final "s" should be optional, so we append a question mark, which signifies that there may be zero or one instance of the preceding character. Much nicer.

#### Href

For the sake of brevity, this will be our last one. It will simply return the url of the page.

Here we're matching all characters up to the point where we find a period followed by two-four characters (representing com, au, edu, name, etc.). It's important to realize that we can make these expressions as complicated or as simple as we'd like. It all depends on how strict we must be.

#### Our Final Simple Function:

With that function created, we can easily alert each subsection by doing:

### Conclusion

