spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Norell <je...@kci.net>
Subject Re: txrep training performance
Date Wed, 12 Jul 2017 23:40:13 GMT
One thing pointing to maybe a need for reworking the training logic is
that I have txrep_track_messages at the default (1), and almost every
message in my corpus has already been trained; each run brings in only a
handful of new messages (usually 10-20, but often 0, and always < 100).
It sure seems like a quick check to find out if it has already learned
this message as the same type (ham/spam) would take a single query, then
move on to the next message for those already seen; but I see sa-learn
doing many INSERTS (usually failing with 'Duplicate entry') and UPDATEs
of the txrep table.


On Wed, 2017-07-12 at 09:59 -0600, Jesse Norell wrote:
> Hello,
> 
>   I have txrep data in a mysql database, and am working on a training
> script to run sa-learn; with bayes also in MySQL and a corpus size of
> 5279 nspam and 849 nham, sa-learn takes a full 2 hours to run with txrep
> enabled (use_txrep 1), but only 13 minutes with txrep disabled
> (use_txrep 0).  One of my main gripes with the old AWL was that it
> didn't learn/correct when training messages, so I love that txrep does
> that, but does anyone have any tips to improve txrep training
> performance?  Either tweaks/improvements on my end, or even a little
> thought on logic redesign in that area?
> 
> Thanks,
> 


-- 
Jesse Norell
Kentec Communications, Inc.
970-522-8107  -  www.kci.net


Mime
View raw message