Web Site

Internet-description.com



» Internet » E-Mail » Topics begins with S » Spam filter


Page modified: Saturday, June 24, 2006 10:36:55

A Spam filter is a computer program for filtering unwanted electronic advertisement (so-called Spam). If one wanted to only filter E-Mail Spam some time ago, then this became meanwhile also for Weblogs (Blogspam) or Wikis of great importance. There are several different methods, in order to filter contents:

  • Segregate on the basis regular expressions, Blacklists so mentioned
  • Filter by means of Bayes filter
  • Filter by means of a data base-based solution (railway filters)

Besides it in addition, the possibility gives to filter a E-Mail on the basis its header.

Blacklist method

This method examines contents of the E-Mail after certain expressions and/or references from a Blacklist. If the expression and/or the keyword is contained in the E-Mail, the E-Mail is segregated. This Blacklists must be provided generally manually and is accordingly complex to be administered. In addition the hit rate is not very high, since now and then Spam as good E-Mail and good E-Mail can be sorted as Spam. Also such filters can be gone around easily: if "“Viagra"” e.g. stands in the Blacklist, the filter will not recognize "“Vla*gr-a"”. If the filter permits the input of regular expressions, one can use sophisticated filter samples, which consider all conceivable ways of writing however according to, e.g. "“v. {0.1} [! \ |l]. {0.1} {0.1} G. {0.1} R. {0.1}

One of the most well-known programs under Linux and other Unix derivatives is SpamAssassin, which classifies each Mail according to different criteria (dated obviously invalid senders, well-known Spam text passages, HTML contents, into the future dates of dispatch etc.) bepunktet and starting from a certain score as Spam. Likewise with a Blacklist works SPAVI, which examines the sides linked in the E-Mail except the respective E-Mail also still for suspicious terms.

"„Razor "“and "„Pyzor "“again produce a Hash value for each Mail and examine in central data bases whether other persons, who likewise received this Mail classified them than Spam or not.

Bayes filter method

Alternatively can be filtered the Spam also due to the Bayes probability. Those are so-called learning filters. The user must sort for instance the first 1000 enamels manually into Spam and non--Spam. Afterwards the system recognizes the Spam E-Mail nearly independently with a hit rate from usually over 95%. Of the system the user must after-sort incorrectly sorted enamels manually. Thus the hit rate is constantly increased. This method is usually clearly superior to the Blacklist method.

Bogofilter and Mozilla Thunderbird as well as the Spamihilator in the current versions, particularly liked in the German linguistic area, make themselves this mechanism too use. The program must be trained in each case by the user, before it recognizes Spam reliably.

The Bayes filter a cognate method is the Markov filter. It uses in addition a Markow chain and is more effective than a Bayes filter, as Bill Yerazunis with its Spam filter CRM114 could show.

Data base-based solutions

In the Usenet in the 90's was already discussed to recognize Spam due to in the Mail applied URLs (and if necessary telephone numbers). The Spammer can modify and personalisieren the message at will, but since it always concerns in the long run (with UCE) to tempt the user to an establishment of contact and which is not for an unlimited period variable possible address area, this beginning makes a theoretically very good recognition possible. It participates particularly interesting that no heuristics are used, which always bring the risk of error recognitions with itself. Due to the technical requirements, reaction rates etc. one considered this not practicable however. The Spam filter "„SpamStopsHere "“be based (when central solution gehostete) in the core however on exactly this idea and shows that this can function quite also in practice.

See also

  • SpamAssassin

Related links


Related Websites

We found here 3 related websites.

  • Paul Graham : A Plan For Spam
    In fact, I've found that you can filter present-day spam acceptably well using nothing more than a Bayesian combination of the spam probabilities of ...

  • SpamAssassin
    A mail filter, written in Perl, to identify spam using a wide range of heuristic tests on mail headers...

  • Spamihilator - Anti-Spam-Filter
    Filter that works with any POP3 client. Uses Bayesian filtering.

Page cached: Wednesday, July 5, 2006 23:56:11
Valid XHTML 1.0!  Valid CSS!

Navigation

Related articles


Page copy protected against web site content infringement by Copyscape