The bayessche filter (also called bayesischer filters) counts on conditioned probabilities: Of characteristic words in a E-Mail (event) on the characteristic, Spam to be (a cause), closed; the name is derived from the English mathematician Thomas Bayes (about 1702 - 1761).
This statistic filters, first 1998 of Sahami et al. M. Sahami, S. Dumais, D. Heckerman, E. Horvitz: A Bayesian approach tons filtering junk E-Mail, AAAI'98 Workshop on Learning for text Categorization, 1998. suggested and starting from 2002 by an influential article of Paul GrahamP. Graham: A plan for Spam, August 2002. popularized, is to predict whether a E-Mail is Spam or not. The filter is used by many anti-Spam programs and is for example in the E-Mail-Client Mozilla Thunderbird implemented.
Statistic counter measures are based on probability methods, derived from the Bayes theorem. Bayes' filters often are "learning" (also "learning") organized and to set on word frequentnesses into E-Mails.Ein of bayesscher filters classified received from the user and by its user are already trained, by dividing its enamels in desired (Ham) and unwanted (Spam). The bayessche filter arranges now a list with words, which occur in unwanted enamels. If the user marked enamels by the terms "Sex" and "Viagra" as Spam, all enamels with these terms have a high probability of Spam. Terms from desired enamels like "appointment" or "report" lead then to small probability of Spam. However individual keywords are not sufficient, relevantly are the sum total of the evaluations of the individual words.
The filter already creates after short training with zirka 30 enamels amazingly high hit rates - one recommends even if for the productive use a training with at least several hundred Mails of both categories. It is used from many Providern to interception of Spam.
The crucial risk exists the wrongpositive cases for the user that a regular Mail escapes it, thus. For a Privatmann, who works additionally with Whitelists, this can be still acceptable, however companies risk in contrast to this that important inquiries of new customers are lost. This danger is however with correct training of the filter substantially smaller than the danger that with manual filtering or for other reasons survey a Mail, is deleted or simply only forgotten. Importantly it is only that one marks the unwanted Mails particularly in the initial phase of the training not only, but also the regular.
In addition, the senders of Spam watch not doless. Advertising messages will become e.g. in pictures accommodated, so that them do not find the filter, and suspicious terms consciously wrongly (e.g. "V|agra" or "Va1ium") or with interspersed blanks written. However the filter evaluates also HTML tags as "img" and "src" negative, so that pictures are in enamels a quite good reference to Spam, just as the wrongly written words, which are learned likewise by the filter and naturally evaluated with an extremely high probability of Spam.
In recent time frequently a method is to be observed, with which coincidental quotations or whole chapters from the world literature (possibly in white writing or as a Meta day illegibly) are inserted, in order to out-cheat the statistic measures. This is however likewise no very successful strategy, because coincidentally selected "harmless" terms or sentences have neither a particularly high still another particularly low probability of Spam, so that they do not play a role in the total evaluation all into the Mail occurring terms.
A characteristic in not English-language countries develops from the fact that Spam is written predominantly in English language. The probability of hit of a bayesschen filter might lie therefore in these countries more highly, in addition, the danger that a desired English-language Mail is falsely recognized as Spam.
Filtering on statistic bases is a kind text classification. A number of researchers of the applied linguistics, which with machine learning are concerned, already to this problem dedicated themselves.
See also: bayesscher probability term
We found here 5 articles.
B» Bayes filter» Black list » Blackberry » Blue Frog » Bounce Message |
We found here 3 related websites.
Index | Privacy | Terms Of Use | Sitemap | Feedback