Massive web spam distribution is a disease of the modern information society. Today’s spammers are far ahead of many existing spam protection tools.
Existing tools – anti-spam gateways, the anti-spam algorithms used by search engines and built-in mail services, blog and forum filters – are not sufficiently effective against this enemy. True spam reduction requires proper understanding of existing spam technologies. Such knowledge points the way to appropriate and effective countermeasures.
Web Spam Origins
The term "spam" (“spiced ham”) was originally applied to the mass delivery of these messages via chat rooms, forums and online bulletin boards. In the mid 90's, spammers began directing almost all spam flow at email because it had no protection and was used by almost all Internet users. Today, various research companies estimate that 80-95% of all email traffic is spam. The leading countries for spam distribution are the USA, Brazil, India, Vietnam and Russia. Five million email messages analyzed by Antivirus PandaLabs were received from almost a million different IP addresses.
Modern Spam Techniques
Web spam actions are aimed at misleading search engines, to give some or all information or system pages higher search results rankings than they deserve. In this way, they attract as many visitors as possible. Just a few years ago, spam messages were primarily used for threat distribution and selling illegal products. Now blog and forum bots actively replace human spammers. In spite of a great number of tools that prevent and check the comments on spam, spammers continue to stay one step ahead of existing anti-spam algorithms. Existing anti-spam filters, e.g. Akismet (anti-spam blog plugin for CMS WordPress), confirm that the “best” human-posted spam comes from Southeast Asia. This “qualitative” spam is often difficult to distinguish from the real comments. This type of spam became active in mid-2009. It complicates the work of bloggers and site administrators and significantly reduces the effectiveness of anti-spam filters.
Also, the number of fake blog chains – based on such services as Blogspot, Weebly, Tumblr, Ning and WordPress – is growing. Their developers create their own spam blog networks. And though early spammers concentrated on such niches as pornography, unlicensed media files, malicious software and medicines, today’s spammers are more interested in applying their knowledge in SEO rather than in illegal product distribution. Therefore, the current blog and forum spam often contains links to the pieces of clothing, building materials, real estate, pet products etc. used to increase the popularity of these resources.
One of the negative effects of web spam is that search engines are filled with useless, garbage pages that complicate and slow down the performance of your search. The phenomenon revealed above applies to the practice of swapping – optimizing web pages to achieve a top position in the specific search query, then completely replacing your web page content after a successful indexing and the required rating.
Other commonly used technologies include web spam damping, adulteration of keywords to content, duplication of web pages with special tags abuse, use of hidden text, redirects and links, cloaking and doorway pages, etc.
The first step to fighting spam is to understand its origins. We must analyze the technologies that allow spammers to mislead search engines and avoid spam filters. To defend against spam, Internet users have created whole spam site bases, sent complaints to hosters and search engine management, etc. But all this is still not effective enough.
The proper study and investigation of web spam technology is the way to develop effective counteractive measures. We, as members of the information society, must make the transition from passive to active contemplation of information management, in order to create the new effective means to protect ourselves against spam.