While the title of this article may sound like a cliche, hatched in the bowels of PR hell, I'm serious when I say that NewRelic is your secret weapon.
In this article, I'll talk about the common aspects of web application performance, and then demonstrate how NewRelic makes it blissfully easy to manage.
In the nearly six years that I've worked at Envato - previously developing and managing the series of marketplaces and currently managing development of the Tuts+ blog network - I've used NewRelic to keep costs down, improve and debug performance problems from the small to the large, and avert potential catastrophes.
If you're new to this topic, or don't currently manage a website for someone in any capacity, don't stress; this article will still be useful. You never know when this knowledge could save your bacon, and I'd wager it's inevitable that it will - unless, of course, you decide to throw in the hypothetical developer towel and become an astronaut or rancher.
NewRelic in 20 Seconds
Before I launch into a reasonably long tirade on web application performance, it makes sense to quickly sum up what NewRelic is before you trundle off to Reddit or something similar.
NewRelic is a managed service (SaaS) that you "plug in" to your web app, which collects and aggregates performance metrics of your live web application.
The information it provides can help you find answers to questions like: Is my website slow? Who is it slow for? Where is it slow exactly? Do we need more, or bigger servers? What can we do to improve things?
These questions and their answers are oftentimes crucial to your web application's success or failure. If you've never collected performance metrics from your live web app, you're literally running while blindfolded; at some point you're going to hit a wall!
Before I take you on a tour of NewRelic's features, I have to define what web application performance is. Let's get to it.
What Exactly is Web Application Performance?
Front-end performance is about "perceived" performance.
I like to split web app performance into two conceptual parts: front-end performance and back-end performance. While the two areas do indeed have crossover and affect one another, it's helpful to draw a distinction.
Primarily, you can think of front-end performance as areas concerning perceived performance, such as how long a page takes to fully render to an end user. Variables that affect this type of performance include:
- How many HTTP requests are sent to servers to fetch all of these assets
- How they are organized in the page to affect perceived performance, whether or not a user's browser has to re-download assets regardless if they're the same or not.
I have only seen web applications and websites "fall over" as a result of mismanagement of the back-end.
Back-end performance involves some kind of programming language that runs your code (i.e. PHP, Ruby), and some kind of database server (i.e. MySQL). Generally, most web applications are assembling HTML documents to be sent to your user's browser, and are made up of data fetched from one or more databases - or even one or more external service (such as the Twitter API). I also typically lump in server resources (such as CPU usage, memory usage, disk IO) into this category, as it's the code running on your server (not in your user's browser) that affects these resources.
Why is this distinction so important? Because, in my experience, I have found that a confusion between the two leads to useless effort being applied when trying to improve performance issues. I have witnessed work on the front-end performance of ailing websites when the actual issue has been the backend. On the other hand, I have watched people focus on back-end optimization when the problem has been on the front-end. It's essential that you understand and appreciate the difference.
On their own, these two subjects can be rather deep and complicated, and it's a topic for an entirely different series of posts. While I'm decidedly specialized in back-end performance, in all of my professional career, I have only seen web applications and websites "fall over" as a result of mismanagement of the back-end.
Three and a Half Approaches to Managing Performance
There are three ways in which people tend to manage the performance of their web applications:
- Write code, deploy it, and hope for the best.
- Write code, guess which areas will become bottlenecks, measure and optimize them up front, deploy and hope for the best.
- Write code, measure the live application with something like NewRelic, then fix and tune as appropriate.
The first approach is 100% reactionary. If you follow this method, you will only know if your web app is failing or performing poorly when your customers tell you (if they ever tell you).
The second approach is considerably more mature; the developers are preempting problems and attempting to resolve them upfront. While this is admirable, the possibility of spending vital resources optimizing the wrong area, and the lack of ongoing feedback will provide few facts about what is truly going on in the live environment.
The third approach is the almost ideal situation. By monitoring a live web application, you're able to review how various things are performing, based on what your customers are actually doing. You can write code and receive immediate feedback on how well (or not) it's performing.
The Ideal Approach
The ideal approach is to follow the third and apply a healthy measure of the second.
It doesn't hurt to consider performance up front; it is infinitely more useful to have true metrics. The old programming adage, "premature optimization is the root of all evil" may apply here, though, in programming, as in life, axioms are rarely anything more than half-truths.
Measurement & Management: A Balancing Act
There is no such thing as one true method to managing your web application's performance.
No matter what anyone says (including me!), there is no such thing as one true method to managing your web application's performance. Depending on your app and customers, there will be different approaches and techniques. Yet one thing remains constant: you have to measure.
So, what do you measure? Again, there isn't one true list, yet there'll always be a common number of metrics worth measuring. For example:
- The number of application requests over time.
- The wall clock time requests take to complete.
- The CPU usage of your servers over time.
- The hard drive reads, writes and utilization over time (known as Disk IO).
- The number of database queries, and the time they take to run.
- Queries run on your database that take over two seconds to complete (slow queries).
- Incoming and outgoing bandwidth over time.
This list, while certainly not exhaustive, will offer significant insight into your web app's behavior, especially if you've never monitored them previously.
Once you have this kind of data, the management of your web application is where all the fun begins. You may find that, once removing a bottleneck in your database (perhaps a few slowly executed queries), you'll expose another as more server resources are freed up. It truly is a balancing act.
Ultimately, what successful management looks like is something like this: you may double the efficiency of that single server of yours, allowing you to delay purchasing a second. On a larger scale, you may cut your server farm down by a factor of 50%, and on a large enough scale, that can equate to serious money. On a lighter side, you may simply provide good quality of service to your customers with no sudden surprises.
NewRelic: Your Secret Weapon
Now that we've covered the "what" and "how" bits, let's take a look at NewRelic. Once upon a time in software-land, you had to roll your own measurement into an app - if you measured at all (which can be as much work as writing your app itself). NewRelic allows you to simply plug in its agent to your Ruby, PHP, .NET or Python application, and begin collecting real data right away.
Thoughtfully, their product is split up into three core regions:
- End user monitoring (front-end, the browser)
- Application monitoring (back-end, your code)
- Server monitoring (back-end, the servers)
Let's have a look at each, in the order they were historically released.
The very first feature NewRelic launched was application monitoring. It tracks and reports on 'Requests per Minute' (aka RPM), average response times of these requests, and keeps this data for you to analyze. This is particularly useful for discovering trends in traffic over time (e.g. does my site get slower as our page views increase?).
Additionally, the "slow transaction traces" will provide you with a list of recent requests from real users that were disproportionately slow. Inspecting these allows you to drill down and determine why a request took such a long time, giving you the information you need to improve it.
End user monitoring will provide you with insight into how your site is rendering in the users' browsers. It breaks the total time into chunks, based on things like network time (how long assets took to download), DOM rendering (how long your browser spend figuring out your HTML), image loading (as served by your web server or a CDN).
A neat feature of end user monitoring is that it shows you how well or poorly your application is performing for users in different countries. For example, perhaps 50% of your customers are based in the UK, while the other 50% are in the US. You might discover that front-end performance isn't too great in the UK, due to the physical distance from your servers. Introducing a CDN or a server in the UK will improve their experience.
The best part of using NewRelic and taking action based on its data is that, once you've made any number of changes, you can immediately review if the changes have been effective or not!
The last piece of the puzzle, and the most recent monitoring NewRelic has introduced, is their server monitoring tools. I've always remarked that you must correlate your server's resources with your web application response times to get a fuller picture of efficiency. You may have excellent response times, but you also may be needlessly sacrificing significant server resources to provide them.
I have seen apps with excellent YSlow scores, for instance, but absolutely no headroom for more traffic - even on significant amounts of hardware!
I hope by now you're starting to see how valuable this kind of information is!
You'll need to at least be on a VPS and have root access for the PHP agent.
One of my only criticisms of NewRelic is that it's not easy to install for some types of users. If you are a Ruby on Rails programmer, you'll find it fairly easy, as it's a simple Rails plugin.
If you're a PHP developer and aren't comfortable goofing around on the command line, you're going to find it difficult to install, as it's a PHP extension and requires a daemon to be installed running alongside your web server. However, some PHP cloud platforms, like PHPFog offer NewRelic integration out of the box.
This is unfortunate in my mind, as it's a hurdle for most people. I hope NewRelic are currently looking to partner with more commodity web hosting providers, so that their product is more accessible to a wider audience. There's literally no tool like it on the market at present, and they should be making it easy for all PHP developers to use.
If you're using existing hosting, you'll need to at least be on a VPS and have root access for the PHP agent. Being completely fair, to spin up a VPS from a provider like Linode, and installing Apache, PHP, MySQL and NewRelic is a short process, but it does requires some comfort and know-how in a shell.
The best way to get started using PHP and NewRelic is to use a tool like Oracle VirtualBox, install Linux, set up Apache and PHP and then install the agent. Then you'll be able to use NewRelic in your local development environment, at the very least.
I personally haven't had any experience with the Python agent, and I've heard third-hand that the .NET component is easy as pie to get up and running.
How Envato Uses NewRelic
Envato has been using NewRelic since 2008. We've used it in the following products with good (and sometimes interesting) results:
Initially, we discovered roughly three major slow spots in unexpected places in the marketplaces. We discovered what our highest trafficked requests were, and focused on optimizing them specifically. If 80% of our time was spent in one spot, making it twice as fast increased capacity and saved us from allocating more funds to hardware. We've spotted unusual traffic (such as spammers and hackers) allowing us to take precautionary measures sooner than later, thus improving the experience for our real customers. We use it daily to monitor the performance of all our new and existing code.
The Tuts+ Blogs
In 2009-2010, Envato's blog network had serious stability problems due to a number of architectural problems. It was my job to step in and solve the issues. After performing an architectural analysis and a redesign of it, we plugged in the (then beta!) PHP monitor. We discovered many, many undesirable things!
- 20% of requests were hits to feeds (which should have been cached or sent to FeedBurner)
- 3 SQL queries were routinely taking more than 5 seconds to return results
- Long-running WP-Cron tasks were tying up our web worker pool
- 404 pages were taking more than 1 second to generate!
Over the course of 2010-2011, we progressively sorted out the issues until they were, more or less, all solved. To this day, we still monitor the PHP blogs using NewRelic. And now, thankfully, the blog network is nice and stable.
The Tuts+ Premium Redesign
When we launched the Tuts+ Premium redesign, we used NewRelic to debug performance problems before the actual launch, on the actual servers they were to run on. This allayed any fears of disaster upon launch. We continue to monitor the site's performance, using NewRelic.
Today, any important application at Envato has a NewRelic agent plugged in. It honestly has saved us heaps of time and money, and allowed us to provide quality of service to our users and customers.
Other Tools Envato Uses to Augment NewRelic
It wouldn't be fair to not mention the other tools we use to look after our applications. We currently use ScoutApp for finer-grained server monitoring (it supports user contributed 'plugins' so we can monitor specific services like HAProxy, Nginx, etc). We also use AirBrake which logs and aggregates our errors in our Ruby on Rails applications.
Lastly, we have rolled some of our own specialized, custom tools that check things like cache hits, backend requests, revenue, sign ups, notifications when a significant deviation from the trends occur. For example, revenue halting or dropping might mean our payment integration is broken; a change in sign ups means we might have been targeted by spammers creating ghost accounts for later use.
If you work on any kind of web application that is business critical, or are tasked with fixing a not-quite-working app, NewRelic is going to be invaluable to you.
If you have any questions, ask away in the comments and I'll do my best to answer them. Particularly, if there's interest in a screencast on how to set up a VPS or VM with NewRelic, I'm sure we could arrange one for you.
Become a programmer superhero; use NewRelic!