64x64 icon dark hosting
Choose a hosting plan here and get a free year's subscription to Tuts+ (worth $180).

The Best Ways to Fight Spam


Start a hosting plan from $3.92/mo and get a free year on Tuts+ (normally $180)

Spam is one of the major pitfalls of the social web. According to sites such as Postini, 10 out of 12 email messages are spam. As if that weren't already enough to make you cringe, 1 in 39 emails contain a virus. Spam is penetrating into other regions of the Internet as well. The creators of the blogging software Wordpress report that nearly 87% of all blog comments are also spam. As messaging and communication applications proliferate throughout the web, developers and site owners have to get creative in the fight against the thousands upon thousands of unwanted messages streaming in every day. Deciding on the best method of spam prevention on your blogs, forums, or even contact forms can be difficult. In this article we will take a look at a service called Akismet and how it can help. We will also look at why some other methods of fighting spam fail.

Methods of Fighting Spam

Disallowing multiple consecutive submissions. Spammers almost always post more than one SPAM comment or message at a time. A common method for fighting spam is to log the incoming message with the user's IP address and a timestamp of the post. Then, when a user attempts to post multiple comments, you can check to see if the user has posted more than once within a specified window of time, for example 30 seconds, or if the current poster was also the last poster. This is not a bulletproof method because spammers can use proxies when they want to post multiple times, and robots have as much time in the world as they want to spam your site.

Keyword Blacklist. Another method of fighting spam is to build a blacklist of common spam keywords yourself and to disallow posts that contain the words. In its most simplest form, you can create an array of keywords and check to see if an incoming string contains them. Spammers have evolved defenses against this method by posting variations of the words. They replace letters with numbers, symbols, and other such characters to create a broad selection of keyword variations.

CAPTCHA. CAPTCHA (Completely Automated Public Turing Test) is one of the most common spam prevention techniques on the web today. The technique is very useful, and almost any site that allows you to register for an account or post information publicly uses CAPTCHA in one way or another. CAPTCHA tests can be audio files, but are more commonly images presenting a series of characters and numbers that you have to enter into a form. The technique is a useful tool for blocking robots that attempt to visit your site to post spam messages or create fake accounts with fake information.

CAPTCHA works well for its intended use, but there are minor drawbacks. A CAPTCHA requires (yet another) field for users to fill in after entering usernames, passwords, and security questions. There is understandably an annoyance factor accompanying their use. In addition, disabled users may not be able to use the CAPTCHA field. Finally, human spammers can also still spam your site because a CAPTCHA only blocks out robot spammers.

So What's Left?

Having reviewed some of the current methods and their weak points, you may be wondering what else we can do to protect our blogging applications. I would like to introduce a new spam fighting tool from the creators of WordPress. The service is called Akismet and is described by its creators as a "... collaborative effort to make comment and trackback spam a non-issue and restore innocence to blogging, so you never have to worry about spam again."

The tool can be implemented in any project as long as you have an API key, which can be used free for non-commercial use or purchased for commercial use for as little as $5 a month. There are several Akismet plugins for existing software, and these are identified later in this article. Alternatively, you can include the service in your own projects as we will demonstrate.

Implementing Akismet in your Own Projects

As of now the only way to receive an API key is to sign up for a free WordPress.com user account. Turn your browsers towards http://wordpress.com/signup/ and fill out the normal required fields: username, password, and email as seen below and then read and agree to the terms of service agreement. Make sure that you register for a blog as you can not receive an API key without the registration. Don't worry about this detail, because the API key won't be tied to a specific blog. Once you have finished the registration process you should receive an email with your new API key.

You will now need to download and unzip PHP5Akismet.0.4.zip (24K) from Achingbrain. Upload the single php file to an area accessible by your scripts. The other files and documentation are just for reference.

We will assume that you are working with an existing project. This could be anything that allows user contributions such as a forum or blog. We will also assume that the logic for creating and displaying content already exists. With that in mind, our first step is to load the file into our own project.

Next we will need to create a new instance of the Akismet class. Using the classes constructor, we can pass our API key and the URL of the site using it. Make sure to replace the following data with your own.

Now the service needs the actual comment data that we want to check. In the following instance I am using some example data, but in production the comment information would derive from POST data. The Akismet service will then compare the comment information to a database of more than 7,486,928,953 spam comments and return a result if the submitted post has been identified as a spam comment.

The functions presented here are quite straightforward. The only function that requires some further explanation is the setCommentType function. This is used by Akismet to help the service identify the origin of the comment (was it posted on a public newsgroup, forum, or blog?), and you can pass any argument you want. For example, if you are using the function to spam-proof a wiki, then use wiki as the type. If you are protecting a blog, then use a blog type.

Now we will use a function called isCommentSpam. This is the function that actually contacts the service. The boolean function will return true if the comment is identified as spam and false if the comment is verified as legitimate.

Using Akismet is as simple as these few lines of code! You have now integrated a spam-fighting service into your site. The service can be used in conjunction with the other forms of spam defense mentioned earlier. Keep in mind that Akismet is a service that grows each time you use it because the functions contribute your spam content to the database. There may be valid messages sometimes identified as spam and vice-versa. As a result, we may want to integrate a little more functionality to deal with potential misidentification.

If a message is wrongly identified as SPAM, then you can notify Akismet, and they will deal with it accordingly. Alternatively, you can mark a comment as SPAM if it happened to fall through the Akismet filter. When implementing the following functionality, make sure that the comment data in the variables is set in the same format as above.

The function

can be used to notify the service that the comment they reported as spam is actually ok.

While the function

can be used to notify the service that a comment that was approved actually is a piece of spam.

Other Libraries

PHP5 isn't for everyone. Akismet libraries have also been created in a slew of other languages. Below are a few of the most popular:

All of these can be easily integrated into your projects in much the same way as described above.

Popular implementations

Don't feel the need to roll your own software but still want to take use of Akismet? Many solutions already exist for blog, CMS, or forum software:


I hope that this guide will serve as an introduction into some alternative forms of spam combat. A site without SPAM not only appears more professional to users, but is also much easier to manage for administrators and moderators.