Spam

Why is image spam so hard to detect?

Image spam, or sometimes called picture spam, is the next step in the evolution of spam. The typical image spam consists of an image, typically a GIF file, a large amount of random words called word salad, and maybe a URL. The spammers use a combination and variation of these components to bypass traditional anti-spam technologies, which include URL blocklists, Optical Character Recognition (OCR), and fingerprinting.

Traditional spam uses images with links (URLs) embedded within the message or directly in the image. The goal is to have end users “click”the link and direct them to a website that is trying to sell something, phish for personal information, or install Spyware on their system.

These links are easily detected and blocklists have been created to classify email based on these “bad links”. Images can also be gathered from remote locations using HTML IMG tags to display images loaded from a web server when an email message is viewed. Other variations of image spam have embedded images that direct users to enter a URL address into their browser as shown in Figure 2. When there is no communication with any external source this type of spam evades URL blocklists. If the recipient visits the website, the spammer has succeeded.

While image spam may look the same to an end user, spammers use programs to automatically create and vary each image. This causes messages to appear unique when received and processed by spam filters. The randomness of the images effectively evades fingerprinting techniques, which is designed to keep track of common message characteristics.

To further confuse email filters, the spammers insert random characters and speckles, and re-use the same basic image to create a large number of unique images. Speckling allows the spammers to re-use the same base image and add what looks like random bits of lint or speckles, which to email filters are unique images effectively evading fingerprinting.

Another image spam technique uses several colors making the text more difficult to recognize when using OCR, which relies on characters being a consistent color and recognizable shape. Varying the font colors hides spam-type words within an image. The next step in image spammers' battle against OCR, is the use of animated images and strip mining.

Share

3 Comments

  1. Now I understand all this stuff that has filled my inbox lately.

  2. No need for OCR.. Use a baysian filter.
    None of my friends send me a message with just a picture, so those would go in the trash.

  3. I totally agree: the best anti-image spam solutions use Bayesian filtering to identify characteristics about the entire message, rather than wasting resources trying to read the image itself

Leave a Reply

Your email address will not be published. Required fields are marked *

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image