spamassassin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Spamassassin Wiki] Update of "CorpusCleaning" by Darxus
Date Thu, 01 May 2014 21:15:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The "CorpusCleaning" page has been changed by Darxus:
https://wiki.apache.org/spamassassin/CorpusCleaning?action=diff&rev1=18&rev2=19

Comment:
Saving a list of previously verified non-spams

  
  See also 'Corrupt Messages' below for other stuff to clear out.
  
+ === Saving a list of verified non-spams ===
+ 
+ To make corpus cleaning easier next time, you can save a list of emails that scored high
that weren't spam, to automatically skip.  When viewing emails as above, they have a "X-Mass-Check-Id:"
header which lists the file they came from, which you can use to remove any email that was
actually spam from the id.hi file.  Then copy the id.hi file to something like ~/sa/id.hi.good
and next time run:
+ 
+ {{{
+ sort -rn -k 2 ham.log | fgrep -vf ~/sa/id.hi.good | head -n 200 > id.hi
+ ./mboxget < id.hi > mbox
+ mutt -f mbox
+ }}}
+ 
  == Corrupt Messages ==
  
  Occasionally, these will crop up -- some MUAs have a tendency to mess up mail messages or
folders, making them unsuitable for use with MassCheck. SpamAssassin includes a few rules
that can help identify corrupt messages.

Mime
View raw message