rulururu

post E-Mail obfuscation - a disputed question

August 11th, 2009

Filed under: General Programming, Internet, Security — Kai @ 5:15 pm

Many users and forum programs in attempt to make automatic e-mail address harversting harder conseal them via obfuscation - @ is replaced with “at” and . is replaces with “dot”, so

bill.gates@microsoft.com

now becomes

bil dot gates at microsoft dot com

I’m not an expert in regular expressions and I’m really curious - does such obfuscation really make automatic harvesting harder? Is it really much harder to automatically identify such obfuscated addresses?

For example, if every email address on a large community site is reversed in the markup and rendered properly with CSS, or token-replaced (@ becomes ‘at’), or any other predictable method, the harvesters will just write a thin adapter for your site.

Think of it this way: if it only takes you one line of code to “scramble” them sitewide, it will only take the harvester one line of code to “unscramble” them for your site. Roughly speaking.

What concept is the right? Do more complex obfuscation or consider about new ways?

Obfuscation techniques fall in the same category than captchas. They are not reliable and tend to hurt regular users more than bots.

Javascript obfuscation seems to be praised, but is no silver bullet: it is not that hard today to automate a browser for email sniffing. If it can be displayed in a browser, it can be harvested. You could even imagine a bot that’s taking screenshots of a browser window and using OCR to extract addresses to beat your million-dollar-obfuscation-technique.

Depending on where and why you want to obfuscate emails, those techniques could be useful:

  • Restrict email visibility: you may hide emails on your website/forum to anonymous users, to new users (with little to no activity or posts to date) or even hide them completely and replace email contact between members with a built-in private messaging feature.
  • Use a dedicated spam-filtered email: you will get spammed, but it will be limited to this particular address. This is a good trade-off when you need to expose the email address to any user.
  • Use a contact form: while bots are pretty good at filling forms, it turns out that they are too good at filling forms. Hidden field techniques can filter most of the spam coming through your contact form.

One common way of hiding email from bots and spammers is to create an image containing the email address. Facebook does this, for instance. Now, using images for email is inherently bad for accessibility, because text readers will not be able to read it. But even otherwise, there are several free character recognition programs that do a pretty good of decoding such email-images.

At least you have always to keep in brain that if it’s difficult for the spammers it’s as well your users to identify the email address. A nice article from wikipedia on Email obfuscation or address munging you’d pay regard to.

The real question is whether the extra effort will be put in by harvesters and if the (major? minor?) barrier to the harvesters is worth the possible problems for your users.

Finally this article is as so many about fighting spam - In my opinion, spam has become such a problem and so many databases have been turned over that we’re beyond hiding our addresses. Instead, consider of more efficient ways of classifying and blocking spam.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

ruldrurd
Powered by WordPress, Content and Design by Kai Bellmann
Entries (RSS) and Comments (RSS)