Advertisement
Ruby

Ruby for Newbies: Regular Expressions

by

Ruby is a one of the most popular languages used on the web. We’ve started a new Session here on Nettuts+ that will introduce you to Ruby, as well as the great frameworks and tools that go along with Ruby development. In this lesson, we’ll look at using regular expression in Ruby.


Prefer a Video Tutorial?


Preface: Regular Expression Syntax

If you’re familiar with regular expressions, you’ll be glad to know that most of the syntax for writing the actual regular expressions is very similar to what you know from PHP, JavaScript, or [your language here].

If you’re not familiar with regular expressions, you’ll want to check out our Regex tutorials here on Nettuts+ to get up to speed.


Regular Expression Matching

Just like everything else in Ruby, regular expressions are regular objects: they’re instances of the Regexp class. However, you’ll usually create a regular expression with the standard, literal syntax:

/myregex/

/\(\d{3}\) \d{3}-\d{4}/

To start, the simplest way to use a regexp is to apply it to a string and see if there’s a match. Both strings and regexp objects have a match method that does this:

"(123) 456-7890".match /\(\d{3}\) \d{3}-\d{4}/

/\(\d{3}\) \d{3}-\d{4}/.match "(123) 456-7890"

Both of these examples match, and so we’re going to get a MatchData instance back (we’ll look at MatchData objects soon). If there’s no match, match will return nil. Because a MatchData object will evaluate to true, you can use the match method in conditional statements (like an if-statement), and just ignore that you’re getting a return value.

There’s another method that you can use to match regexp with strings: that’s the =~ (the equals-tilde operator). Remember that operators are methods in Ruby. Like match, this method returns nil on no match. However, if there is a match, it will return the numerical position of the string where the match started. Also like match, both strings and regexps have =~.

"Ruby For Newbies: Regular Expressions" =~ /New/ # => 9

Regular expressions get more useful when we’re gleaning out some data. This is usually done with groupings: wrapping certain parts of the regular expression in parentheses. Let’s say we want to match a first name, last name, and occupation in a string, where the string is formatted like this:

str1 = "Joe Schmo, Plumber"
str2 = "Stephen Harper, Prime Minister"

To get the three fields, we’ll create this regexp:

re = /(\w*)\s(\w*),\s?([\w\s]*)/

This matches any number of word characters, some whitespace, any number of word characters, a comma, some optional whitespace, and any number of word characters or whitespace. As you might guess, the parts including word characters refer to the names or occupation we’re looking for, so they are wrapped in parentheses.

So, let’s execute this:

match1 = str1.match re
match2 = str2.match re

MatchData Objects

Now, our match1 and match2 variables hold MatchData objects (because both our matches were successful). So, let’s see how we can use on of these MatchData objects.

As we go through this, you’ll notice that there are a few different ways to get the same data out of our MatchData object. We’ll start with the matched string: If you want to see what the original string that was matched against the regexp, use the string method. You can also use the [] (square brackets) method, and pass the parameter 0:

match1.string # => "Joe Schmo, Plumber"
match1[0] # (this is the same as match1.[] 0 ) => "Joe Schmo, Plumber"

What about the regular expression itself? You can find that with the regexp method.

match1.regex # => wsw,s[ws]     (this is IRB's unique way of showing regular expressions; it will still work normally)

Now, how about getting those matched groups that were the point of this exercise? Firstly, we can get them with numbered indices on the MatchData object itself; of course, they are in the order we matched them in:

match1[1] # => "Joe"
match1[2] # => "Schmo"
match1[3] # => "Plumber"

match2[1] # => "Stephen"
match2[2] # => "Harper"
match2[3] # => "Prime Minister"

There’s actually another way to get these captures: that’s with the array property captures; since this is an array, it’s zero-based.

match1.captures[0] # => "Joe"

match2.captures[2] # => "Prime Minister"

Believe it or not, there’s actually a third way to get your captures. When you execute match or =~, Ruby fills in a series of global variables, one for each of the captured groups in your regexp:

"Andrew Burgess".match /(\w*)\s(\w*)/  # returns a MatchData object, but we're ignoring that

$1 # => "Andrew"
$2 # => "Burgess"

Back to MatchData objects. If you want to find out the string index of a given capture, pass the captures number to the begin function (here, you want the capture’s number as you’d use it with the [] method, not via captures). Alternatively, you can use end to see when that capture ends.

m = "Nettuts+ is the best".match /(is) (the)/

m[1] # => "is"
m.begin 1 # => 8
m[2] # => "end"
m.end 2   # => 14

There’s also the pre_match and post_match methods, which are pretty neat: this shows you what part of the string came before and after the match, respectively.

# m from above
m.pre_match  # => "Nettuts+ "
m.post_match # => " best"

That pretty much covers the basics of working with regular expressions in Ruby.


Regular Expression Use

Since regular expressions are so useful when manipulating strings, you’ll find several string methods that take advantage of them. The most useful ones are probably the substitution methods. These include

  • sub
  • sub!
  • gsub
  • gsub!

These are for substitution and global substitution, respectively. The difference is that gsub replaces all the instances of our pattern, while sub replaces only the first instance in the string.

Here’s how we use them:

"some string".sub /string/, "message" # => "some message"
"The man in the park".gsub /the/, "a" # => "a man in a park"

As you might know, the bang methods (ones ending with an exclamation mark!) are destructive methods: these change the actual string objects, instead of returning now ones. For example:

original = "My name is Andrew."
new = original.sub /My name is/, "Hi, I'm"
original # => My name is Andrew."
new # => "Hi, I'm Andrew"

original = "Who are you?"
original.sub! /Who are/, "And"
original # => "And you?"

Besides these simple examples, you can do more complex things, like this:

"1234567890".sub /(\d{3})(\d{3})(\d{4})/, '(\1) \2-\3' # => "(123) 456-7890"

We don’t get MatchData objects or the global variables with the substitution methods; however, we can use the “backslash-number” pattern in the replacement string, if we wrap it in single quotes. If you want to further manipulate the captured string, you can pass a block instead of the second parameter:

"WHAT'S GOING ON?".gsub(/\S*/) {|s| s.downcase } # => "what's going on?"

There are many other functions that use regular expressions; if you’re interested, you should check out String#scan and String#split, for starters.


Conclusion

We’ll that’s regular expressions in Ruby for you. If you have any questions, let’s hear them in the comments.

Related Posts
  • Code
    JavaScript & AJAX
    Testing in Node.jsNodejs testing chai retina preview
    A test driven development cycle simplifies the thought process of writing code, makes it easier, and quicker in the long run. But just writing tests is not enough by itself, knowing the kinds of tests to write and how to structure code to conform to this pattern is what it's all about. In this article we will take a look at building a small app in Node.js following a TDD pattern.Read More…
  • Code
    JavaScript & AJAX
    Getting Into Ember.js: Part 5Getting into ember
    Editor's Note: The Ember.js team has shifted to an expedited release schedule and as of this publication date are on version 1.2.0. This tutorial was written pre-v1.0 but many of the concepts are still applicable. We do our best to commission timely content and these situations happen from time-to-time. We'll work to update this in the future. In part 3 of my Ember series, I showed you how you can interact with data using Ember's Ember.Object main base class to create objects that define the methods and properties that act as a wrapper for your data. Here's an example:Read More…
  • Code
    Tools & Tips
    Using BrowserStack for Cross-Browser TestingBrowserstack retina preview
    Browser testing is the bane of our existence. Well, that's a bit of an exaggeration, but not by much. Multiple browser versions and browser fragmentation can make it difficult to get good test coverage for your sites especially when you factor in the different operating systems developers use to build with. Over the years, we've relied on a variety of tools to help us with this challenge including virtual machines, tools that simulate browsers and even having multiple devices on hand to work with. It'd be great if there were a way to have one viewport that allowed us to easily test across any major browser and their individual versions without jumping through hoops. BrowserStack.com aims to offer this via it's browser-based virtualization service and in this article we'll cover the service and how it helps tackle the cross-browser testing problem.Read More…
  • Code
    Ruby
    Active Record: The Rails Database BridgeRelational databases for dummies preview
    In the past, to build a web application, you required the skills to code in your business logic language and your database language. More recently, however, back-end frameworks are leaning toward using Object-Relational Mapping (ORM); this is a technique that lets you manage your database in the business logic language that you're most comfortable with. Rails uses an ORM in the form of Active Record. In this tutorial, we'll dive into Active Record and see what it can do for us!Read More…
  • Code
    Tools
    Demystifying RESTCode
    No, this is not an article that encourages you to sleep more! However, if that was your first inclination, then the following text was tailor-made for you! It’s an unfortunate truth, though, that the principles of REST are decidely complex. Entire books have been written on the subject. I won’t be so presumptuous to assume that I can merge such an intricate topic into a few thousand words.Read More…
  • Code
    JavaScript & AJAX
    Better CoffeeScript Testing With MochaMocha coffeescript
    Recently, I’ve been doing a considerable amount of CoffeeScript work. One problem I ran into early-on was testing: I didn’t want to manually convert my CoffeeScript to JavaScript before I could test it. Instead, I wanted to test from CoffeeScript directly. How’d I end up doing it? Read on to find out!Read More…