spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Axb <>
Subject Re:
Date Tue, 14 Oct 2014 21:54:56 GMT
On 10/14/2014 05:07 PM, RW wrote:
> On Tue, 14 Oct 2014 13:58:27 +0200
> Axb wrote:
>> On 10/14/2014 01:51 PM, RW wrote:
>>> On Tue, 14 Oct 2014 10:44:51 +0200
>>> Axb wrote:
>>>> have you verified that some of these are not included?
>>>> X-Originating-IP will not be included as it can be used to help
>>>> detect ham or spam
>>> It's really no different to other headers you are ignoring.
>> for example, if you get a flood of 419s from the same source, you may
>> want it to be tokenized...
> As I do with, for example:
>    X-AntiAbuse: Originator/Caller UID/GID - [514 32007] / [47 12]
> in this spam Bayes found
>    0.999-4--HX-AntiAbuse:32007
> These numbers seem to be very good indicators for me.
> Most of the headers in the file have never appeared in my ham, so
> they'll be pure spam indicators if they are ever faked. In general
> it's difficult for a spammer to gain an overall advantage against
> an average per user database using faked headers.
> Whatever the merits of this on system-wide Bayes (if any beyond
> reducing token count), I think it would have a negative effect on
> per user Bayes.

now here's a suprise (it's all in the code :)

the plugin alreafy includes:

# Which headers should we scan for tokens?  Don't use all of them, as 
it's easy
# to pick up spurious clues from some.  What we now do is use all of them
# *less* these well-known headers; that way we can pick up spammers' 
# headers (which are obviously not well-known in advance!).

# Received is handled specially
$IGNORED_HDRS = qr{(?: (?:X-)?Sender    # misc noise
   |Delivered-To |Delivery-Date
   |X-MIME-Auto[Cc]onverted |X-Converted-To-Plain-Text

   |Subject      # not worth a tiny gain vs. to db size increase

   # Date: can provide invalid cues if your spam corpus is
   # older/newer than ham

   # List headers: ignore. a spamfiltering mailing list will
   # become a nonspam sign.
   |X-Mailman-Version |X-Been[Tt]here |X-Loop

   # gatewayed through mailing list (thanks to Allen Smith)

   # Spamfilter/virus-scanner headers: too easy to chain from
   # these
   |X-Antispam |X-RBL-Warning |X-Mailscanner
   |X-MDaemon-Deliver-To |X-Virus-Scanned
   |X-Pyzor |X-DCC-\S{2,25}-Metrics
   |X-Filtered-B[Yy] |X-Scanned-By |X-Scanner
   |X-AP-Spam-(?:Score|Status) |X-RIPE-Spam-Status
   |X-SMTPD |(?:X-)?Spam-Apparently-To
   |SPAM |X-Perlmx-Spam

   # some noisy Outlook headers that add no good clues:
   |Content-Class |Thread-(?:Index|Topic)

   # Annotations from IMAP, POP, and MH:
   |(?:X-)?Status |X-Flags |X-Keywords |Replied |Forwarded
   |Lines |Content-Length
   |X-UIDL? |X-IMAPbase

   # Annotations from Bugzilla

   # Annotations from VM: (thanks to Allen Smith)

   # Annotations from Gnus:
   | X-Gnus-Mail-Source
   | Xref


# Note only the presence of these headers, in order to reduce the
# hapaxen they generate.


View raw message