A recent comment requested some additional anti-spam option in our Issue Tracker component. That triggered much though on a topic that obviously impact all website and their components, be it blogs, commenting systems etc.
There are a number of different parts to preventing spam on a website and this is to expand upon our own particular take on the subject.
Spam is one of the many problem that face web sites today. It is basically the proverbial ‘pain in the neck’ and if not handled correctly can be very time consuming. How often have you viewed web sites where there are totally unrelated comments /registrations/ forums posts which has to make one think about the site’s reputation and credibility.
Our site is not immune to this problem and the source is not restricted to any specific country although there does seem to be a preponderance from locations such as Turkey, China, Russian Federation and more recently Ukraine and Brazil.
Spam types can basically be broken down into two types::
This is commonly where users try and register, and is of most importance but not restricted to, social sites especially if they go undetected. We ourselves have had attempts to register accounts which on checking show that the account details (user name) are also registered on other sites (where they have succeeded in being registered) and which are then used to post blog entries or comments on various unrelated topics, but mostly of a nature unrelated to the site interest. Such checking however can be very time consuming which presumably is why many site permit automatic registration relying on the captcha check to stop undesirables!
Comment and Forum Spam
A number of sites have a discussion forum and/or permit users to submit comments on articles. This undesired contents include ‘spam’ text and undesired links to other sites. This often includes advertising for a specific product.
The most common technique for active protection is known as Captcha. There a number of ‘flavours’ such as ReCaptcha, Image Captcha ( typically words), Picture identification Captcha, Mathematical captcha, Question Captcha etc.
All these methods rely on the fact that automated scripts are unable to read the Captcha and thus pass through the process. These methods can be effective in fighting ‘comment’ and ‘registration’ spam.
Where they fail is where a human being is manually adding the comments or registrations. Where labour is cheap if may be worth the ‘promoter’ paying for people to generate the spam.
This typically includes techniques such as IP Blocks, White-lists and Black-Lists and Content Scanners. The spam is intended to insert links into your site. Passive protection focuses on using content scanners which validate the content, the source & method of delivery against extensive databases of bad links, emails, content, blacklisted IPs & domains to stop the spammer from getting his input in.
These checking databases are fed input from a huge number of sites and users that subscribe to using the databases and they in turn report malicious content. The important thing about these databases are that they are updated continuously and are true to the spirit of open source being contributed to on a everyday basis making them a comprehensive source of information.
This 'Captcha less' Spam protection can be very effective mainly since in most cases it helps the system become even more stronger. One downside is that on a busy site it can add some considerable overhead to the processing load, and possibly slow the user’s site experience.
Our Take on Tools.
It should come as no surprise that our site is based on Joomla as we develop tools for the Joomla Extension Directory.
Captcha and Recaptcha
Joomla 2.5 includes ReCaptcha as standard and all one has to do is enable it and it is present in the basic registration process. There are other plugins etc. and many components also offer different in-build captcha options. However do you really want more than one type of captcha solution on a site. We decided upon a single solution and following the Joomla core implement ReCaptcha in our components. One it is easy to implement and two it makes it easy for our component users, and site users to offer consistency throughout.
We have already said that this can be very effective and there are a few options available to Joomla sites and component developers.
It has always struck us as strange that a lot of components all seem to implement the same tools in their products so that a site with many components from different suppliers ends up configuring the same technique over and over again. Surely it is better to have the protection ‘up front’ on a site ‘checking’ the input for desirability before it gets to any specific component. After that it would be up to the specific component perform its checking. Do you really want to add the ‘words to be filtered’ to your blog component and your forum component, and to your article commenting component? You may do, but I/we certainly have better things to do with our time.
A tool such as Akeeba Admin Tools Pro, provides such a solution and the Web Filtering option (which includes word filtering, IP blocking, Bad Behavior) is (in our opinion) the way to go. Most sites implement some form of SEO and SH404SEF is the most common, and it comes with Project Honey Pot integration.
Some of these tools costs money (in our opinion considered reasonable), but one has to ask one self ‘How much is our site reputation worth?’. That is not something we can answer for you.
We therefore are reluctant to ‘re-invent the wheel’ in our components beyond what is considered reasonable. As an example in our Issue Tracker component (which can best fit into the comment/forum data entry consideration), we currently implement, IP blocking, White lists, Black Lists, Word filtering and ReCaptcha.If you use a good ‘passive solution’ up front then these (apart from the ReCaptcha) can be left unused, but of course not everyone agrees so these options will remain ‘just in case’.