Internet.com ISP-Planet
 
ISP Glossary
Find an ISP Term
 
Search ISP-Planet


Search internet.com
 
internet.com

IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

internet.commerce
Partner With Us














ISP Technology

 

General

Image Spam

An anti-spam company's founder explains this increasingly troublesome scourge of e-mail.

by David Skoll
Roaring Penguin Software, Inc. President and Founder
[May 4, 2007]
Email a colleague

What is it?
An "Image Spam" is a spam e-mail that contains its sales pitch in the form of an image, such as a JPEG or GIF image. There may be no other content in the e-mail, or it may include nonsensical text, unrelated text such as jokes or news reports, or simply gibberish.

Why image spam?
As content-filtering spam software became more sophisticated and accurate, spammers found it more difficult to pitch their wares using normal text or HTML messages. As a result, they turned to encoding their sales pitch as an image. This completely bypasses most anti-spam content-filters, because they cannot analyze the words in the images.

How can we combat image spam?
It turns out that image spam can be detected quite accurately using the same techniques that fight other spam:

The gibberish or nonsense text included with image spam very quickly becomes "red-flag" text for a Bayesian filter. A distributed Bayesian database such as Roaring Penguin's Training Network adapts extremely quickly to most image spam.

An image with little or no accompanying text is also a red flag, because almost all legitimate mail that contains images also includes a reasonable amount of body text.

Normal connection-level techniques such as greylisting and DNS-based RBLs continue to be effective against image spam.

What about OCR?
Some anti-spam vendors have resorted to using Optical Character Recognition tools to extract the text from an image spam for analysis. Unfortunately, OCR has met with limited success. The state-of-the-art in OCR is not very advanced. Furthermore, OCR tools are not designed to extract text from an image that is actively being manipulated by an adversary. Spammers have reacted to OCR tools by obfuscating the text in the images they send. The obfuscated text is still relatively easy for humans to recognize, but very difficult for OCR tools to extract.

In addition to the accuracy problem, OCR is very compute-intensive and can greatly slow down a content filter.

—End

 

Related articles:
  [March 6, 2007] Barracuda Networks Updates Image Scanning in Anti-Spam Engine
  [July 10, 2006] IronPort Reports Surge in Image Spam
  [Sept. 1, 2005] Skoll: The Heart of the Penguin

 

 

Feedback


Advertising inquiry? Click here!

ISP-Planet's RSS feed


The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers