spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sander Holthaus - Orange XL" <i...@orangexl.com>
Subject RE: Manually training SpamAssassin by forwarding mail
Date Fri, 04 Feb 2005 18:47:40 GMT
 

> -----Original Message-----
> From: Stuart Johnston [mailto:stuart@ebby.com] 
> Sent: Friday, February 04, 2005 7:35 PM
> To: Peter Marshall; SpamAssassin Users
> Subject: Re: Manually training SpamAssassin by forwarding mail
> 
> Peter Marshall wrote:
> > Stuart Johnston wrote:
> > 
> >> Peter Marshall wrote:
> >>
> >>> Kevin Sullivan wrote:
> >>>
> >>>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> >>>>
> >>>>> I've been interested in offering customers to train 
> manually train 
> >>>>> the SpamAssassin Bayes filter for ham and spam (to reduce false

> >>>>> positives and negatives). However, I can only find 
> documentation 
> >>>>> to this for local mailboxes and IMAP. Most users 
> however, retrieve 
> >>>>> their mail through POP and use Outlook (Express) as 
> mail client. 
> >>>>> Is there a way to train SpamAssassin with such a setup (e.g. 
> >>>>> forwarding mail with Outlook
> >>>>> (Express) using SMTP)?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> If you want to do a lot of programming, you could save 
> all incoming 
> >>>> messages for a few days in a database somewhere.  When a user 
> >>>> forwards a message to a special "ham" or "spam" mailbox, 
> you pull 
> >>>> the message-id from the message and use it to recover 
> the original 
> >>>> message from your database.
> >>>>
> >>>>     -Kevin
> >>>
> >>>
> >>>
> >>>
> >>> My question is the same as Henrik, I have a bunch of 
> email that is 
> >>> spam (either tagged by spam assassin or not tagged at all.  I 
> >>> forwared it as an attachment to a "spam" mail box.  What 
> do I have 
> >>> to do now before I can get bayes to learn the message ... 
> I read you 
> >>> have to remove the headers .... Could anyone give me a 
> little more 
> >>> detail ?
> >>
> >>
> >>
> >> I use a modified version of the DMZS-sa-learn.pl from: 
> >> http://www.dmzs.com/tools/files/spam.phtml
> >>
> >> When someone forwards a spam to me, I move the message to 
> a special 
> >> imap folder that gets processed by the script.  My additions look 
> >> something like:
> >>
> >> use Email::MIME;
> >> ...
> >> my $msg = Email::MIME->new($raw_message_body);
> >>
> >> my @parts = $msg->parts;
> >>
> >> foreach (@parts) {
> >>   if ($_->content_type =~ m|message/rfc822|) {
> >>     sa_learn($_->body_raw);
> >>   }
> >> }
> >>
> >>
> >> I've tested this with messages forwarded as attachment 
> from Outlook 
> >> and Thunderbird.  I'm not sure how effective it is though. 
>  I'm sure 
> >> that it still looses something in the translation.  All imap is 
> >> really the way to go if you can.
> >>
> >>
> >> Stuart Johnston
> >>
> >>
> > But I have no imap .. only pop .. they would forwared (as 
> attachment) 
> > to a mailbox, and then I have to run sa-learn ... I assume as root ?
> > 
> > Will the stuff you posted work for this setup as well ??
> > 
> > Would there be big problems just running it after the forwared as 
> > attachment. ??
> 
> The code I posted only shows how you can extract the attached 
> spam from the email.  You'll need to write your own code to 
> integrate it into your particular setup.
> 
> BTW, in Outlook, you can easily attach multiple spams to one 
> message and this code should handle it.

CTRL-a, right click, "Forward Items" will indeed do the trick.

> > 
> > Can users also forwared as attachemtn mail that was sent that was 
> > already marked as spam ... or is there any advantage to this ?
> 
> If you use Bayes auto learn, I suspect that this wouldn't do much. 
> Otherwise, it might help.

I would check the headers of the forwarded messages to see if their
spam-score is above your auto-learning threshold. If it is, relearning is is
perhaps quite useless. You might wonder why they received the message anyway
(I would think that something that is good enough to autolearn is good
enough to refuse or discard).

Kind Regards,
Sander Holthaus


Mime
View raw message