« Tagging in Normalized Form
JSManager.js - Easily manage JS dependancies »


On Handling Spam

I've noticed a steady increase in traffic to this site, mainly due to link backs from popular aggregators.

What was just a few dozen hits per day is now averaging several hundred...The internet truly is the worlds biggest soapbox.

Of course, with the extra attention comes a downside...comment spam, at one point last week I was receiving an average of ten spams per day.

The typical counter-response for spam is the ubiquitous captcha:


MegaUpload Captcha


They border from the extremely simple (the one above can be cracked via a JavaScript neural network), to damn impossible:


WTF! WTF!


My personal belief is that captchas are annoying for real users, and do a poor job of preventing spam, so I refuse to implement one.

Instead, I took a 3 step approach to eliminating comment spam on this blog, and it has worked fairly good so far.


1. Implement a honey pot to pick off poorly written bots

This trick involves placing a hidden text field in your form that should never have any content. Most bots fill out every input field on the form before submitting, so this picks off a large majority of them instantly.

I hid this by using absolute positioning and a negative margin instead of using display: none, as this makes it harder for a bot writer to analyze.


2. Compare the URL / Comment ratio in each comment.

Once a comment passes the honeypot, the next step is to analyze the ratio of URLs to content. If the comment is too full of URLs I reject it. This could cause some false positives...but on the upside, it forces users to write longer comments.


3. Force a 60 second delay between comments.

The final step is to simply prevent multiple comment submissions. This also prevents accidental double posts so it is a good feature to have in general.

I'm pleased to say that I've had this system running for several days now, and the comment spam has gone down to zero...without the pain that is a captcha.


We will see how it holds up.


Posted by Jonathan Holland on 2/15/2009.

Tags: Tips-and-Tricks   Spam   Captcha   Honeypot

Comments:

Hope it works. Captchas are a pain in the butt for users, but they usually work - it'd be nice to not have to worry about comment spam, AND not have a captcha.

Gravatar Posted by Luke on 2/15/2009.

Another very effective strategy is discarding posts if they contain specific types of markup. Bots often put the 3 most popular ways to write URLs into a single comment (plain (autolinked), HTML, and UBB).

Since you're using Markdown you could for example discard posts which contain HTML and UBB links. Don't forget to tell the user why his/her comment was blocked though.

Gravatar Posted by Jos Hirth on 2/15/2009.

Someone pointed out that the taborder was messed up because of the honeypot...I just corrected that.

Gravatar Posted by Jonathan Holland on 2/15/2009.

I wonder if it would make any sense to try and write a comment analyzer that answers "yes" or "no" to the question "is this comment spam?" by following the links given in the comment and determining if the linked page contains a product to be sold.

Gravatar Posted by Brian on 2/15/2009.

Another tip is to do a server validation also. EG: javascript will validate in client (which will be skipped by bots). Do same check in server site also and reject ones that dont pass (only the spam ones will be rejected and useful comments will have passed javascript).

See http://www.webdigi.co.uk/blog/2009/does-your-website-really-need-a-captcha/

Gravatar Posted by php on 2/16/2009.

Comments are closed on this post.