spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Hardin <>
Subject Re: Spam messages autolearned as ham
Date Fri, 26 Sep 2014 16:11:21 GMT
On Fri, 26 Sep 2014, Matus UHLAR - fantomas wrote:

> On 25.09.14 07:51, John Hardin wrote:
>> You are probably going to have to wipe and retrain your bayes database from 
>> scratch using known-good (i.e. hand classified) corpora. I also suggest 
>> turning off autolearn.
> I'm not sure wiping BAYES is needed, unless training does not

He has autolearn running. Unless he has copies of the spams that were 
learned as ham, there's no way to totally undo that short of wipe and 
start over from scratch.

>> You *did* keep your initial Bayes training corpora, right?
> this is very good idea to have. Maybe at least keeping all autolearned spam
> and ham for some time, just for the possibility of retraining.

The critical part is to have base corpora of *correctly classified* (i.e. 
manually reviewed) messages. If you're keeping copies of autolearned 
messages (which will probably be quite a few) then you *need* to *manually 
review* them before using them for retraining, otherwise you'll probably 
end up simply rebuilding a mistrained database.

If you have users submitting FP/FN messages for training, and the admin 
verifies them before training with them (which should be done unless the 
judgement and responsibility of the user in question is trusted), that's a 
good source for part of your base retraining corpora.

  John Hardin KA7OHZ              FALaholic #11174     pgpk -a
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
   The difference between ignorance and stupidity is that the stupid
   desire to remain ignorant.                             -- Jim Bacon
  848 days since the first successful private support mission to ISS (SpaceX)

View raw message