Difference between revisions of "fighting email spam"

Revision as of 14:55, 12 October 2006

This is a growing seedling article. You can help HTYP by watering it.

Overview

This page is about fighting spam received via email.

Types of Techniques

Email-spam-fighting techniques can be broken down into the following categories:

Pre-send: prevent spammers from getting your "real" address
- Harvesting control: one major source of spammable email addresses is web pages; you can prevent or control such harvesting in a number of different ways; see section below
- Domain control:
  - "per-org unique addresses": see section below
  - "time-stamped addresses": a powerful harvesting control technique; see section below
Post-send: filters and such for catching the mail after it has been sent but before you have to read it
- Most decent web hosts now include spam-fighting tools with their email service. These tend to include sender verification against lists of known spammers, and thus do a much better job than training-based filters.
- Most decent email clients (including Thunderbird and Evolution) also include spam fighting tools, but they tend to be error-prone.
- 3rd-party services are available which prevent email from reaching you unless the sender has passed a "humanity test" (most commonly a "CAPTCHA"); this is a problematic solution at best as it may prevent you from receiving legitimate automated messages and is a nuisance for legitimate senders.

Per-Organization Unique Addresses

A significant source of spammable email addresses is disreputable organizations who sell their emailing lists; if you have your own domain name for email and the ability to set up a "catch-all", you can invent a new email address whenever you need to give one out. If that address starts receiving spam, then there are simple techniques to have it automatically discarded. Furthermore, if the address is named in such a way that you can easily identify the organization to whom it was given (e.g. "nameoforganization@yourdomain.com"), you can also keep track of which organizations have been less than discreet with the information you have given them.

Web Page Harvesting Control

Email addresses which are published on a web page are a particularly likely target for spam, as any "findable" page on the web will inevitably be patrolled by spambots looking for email addresses for their spam databases. The one advantage we have here is that this process is highly automated, which allows for a number of prevention techniques.

Some things to remember:

While some spambots may be fairly sophisticated, many of them may be written by novice script-kiddies; such bots should be relatively easy to fool.
While most workarounds may be very easy to defeat, it probably won't be worth the spambot programmer's time to implement a fix; the small increase in the number of harvested addresses simply wouldn't be worth the extra coding (and possibly support, if the spambot is being sold or traded in the "underground" malware market) necessary.

simple HTML obfuscation

A few simple techniques which are easily worked around by spambot programmers, but which should at least cut down the volume of spam:

Instead of the "@" and "." characters, use the HTML entities @ and . respectively. These will copy-and-paste properly into an email program, as well as being visually indistinguishable from plain "@" and "." on a web browser. (Note: on this wiki, you can use the email template to disguise email addresses this way.)
Insert HTML markup around the "@" and ".", such as <i> (italics), <b> (bold), <span class="whatever"> (style sheet markup; no visible effect unless you define the "whatever" class in CSS). This requires the spambot to filter out all HTML tags before searching for email addresses on the page.

An example of the above two techniques combined (using the HTYP email template): spam1spam@spamhtypspam.spamorg

Insert "invisible" characters within the email address using <span style="display: none">. Example: spam2SPAM@SPAMhtypSPAM.SPAMorg
- This technique has the disadvantage that it does not copy-and-paste properly, but if the inserted text is something obvious like " REMOVE ME ", you can instruct people to remove the extra text before emailing. Unfortunately, such instructions tend to get overlooked. If you include illegal characters (such as spaces) in the inserted text, you may actually be able to prevent the email from being sent until it is "fixed" by the user. Some experimentation is probably called for here.
- The above example also includes a "mailto:" link; some users can click on such links to open their email programs, but many computers are not configured to allow this to work. This opens the issue of whether or not to "obfuscate" the address as shown in the email link. A poorly-written spambot might be HTML-unaware and find the unobfuscated address inside the <a href="mailto:..."> tag – or it might be set to strip out all HTML before looking, in which case the unobfuscated address would be missed. As with the above, some experimentation is probably called for here.

domain-dependent techniques

If you have your own domain name and can control the redirection of email, then some much more powerful techniques become available.

Insert a date in the address, like this: spam320240425@htyp.org (this example uses HTYP's emaildated template; a normal-looking address is displayed by the browser). This requires that you are able to either (a) set up a "catch-all" email account on your domain, so that all email not otherwise redirected goes to a valid email account, or (b) dynamically configure your email handler to accept email from an address that changes every day.
- In this example, the displayed address appears normal, but the mailto: address has a bunch of extra numbers after it. These numbers change every day – so if your email address happens to get picked up by a spambot on, say, April 25, 2024, all the email addresses will have that particular date on them; all you have to do is put in a redirect for the address for that date – in this case, spam320240425@yourdomain.com – and send those emails straight to purgatory (or your worst enemy, or the spam-reporting address of your choice).
- For the displayed address, you can use any of the simple obfuscation techniques described above or include the date; it's kind of a trade-off between friendly-looking addresses and spam-prevention.
Assign a new address for each display of the web page: This is much the same as the above technique but provides a higher degree of filtering. It may also help pollute spam databases, thus rendering them less effective. (It is not, however, easy to implement as a MediaWiki template. It would be pretty easy to include the time after the date, which would accomplish much the same thing, but MediaWiki's "CURRENTTIME" variable prints the time with a colon, and I don't know if that might cause problems with email routing; further investigation is needed. --Woozle 10:31, 12 October 2006 (EDT))

@@ Line 5: / Line 5: @@
 Email-spam-fighting techniques can be broken down into the following categories:
 * '''Pre-send''': prevent spammers from getting your "real" address
-** Harvesting control: one major source of spammable email addresses is web pages; you can prevent or control such harvesting
+** '''Harvesting control''': one major source of spammable email addresses is web pages; you can prevent or control such harvesting in a number of different ways; see section below
-** Domain control:
+** '''Domain control''':
-*** "per-org unique addresses": see below
+*** "per-org unique addresses": see section below
-*** "time-stamped addresses": a powerful harvesting control technique; see below
+*** "time-stamped addresses": a powerful harvesting control technique; see section below
 * '''Post-send''': filters and such for catching the mail after it has been sent but before you have to read it
 ** Most decent web hosts now include spam-fighting tools with their email service. These tend to include sender verification against lists of known spammers, and thus do a much better job than training-based filters.
 ** Most decent email clients (including [[Thunderbird]] and [[Evolution (email client)|Evolution]]) also include spam fighting tools, but they tend to be error-prone.
 ** 3rd-party services are available which prevent email from reaching you unless the sender has passed a "humanity test" (most commonly a "[[wikipedia:CAPTCHA|CAPTCHA]]"); this is a problematic solution at best as it may prevent you from receiving legitimate automated messages and is a nuisance for legitimate senders.
 ==Per-Organization Unique Addresses==
 A significant source of spammable email addresses is disreputable organizations who sell their emailing lists; if you have your own [[domain name]] for email and the ability to set up a "catch-all", you can invent a new email address whenever you need to give one out. If that address starts receiving spam, then there are simple techniques to have it automatically discarded. Furthermore, if the address is named in such a way that you can easily identify the organization to whom it was given (e.g. "nameoforganization@yourdomain.com"), you can also keep track of which organizations have been less than discreet with the information you have given them.