spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "emailitis.com" <i...@emailitis.com>
Subject RE: Single images with random wording & general rules
Date Mon, 17 Jun 2013 10:52:30 GMT
Autolearn is turned on.  I don't think we allow users to train without
review - is there a way I can confirm?  We have Plesk 10 and are using SA
through qmail-scanner.  Even a high Bayes seems to have been mis-classified:

 

Jun 17 11:44:04 plesk3 spamd[18601]: spamd: result: . 3 -
BAYES_99,FORGED_RELAY_MUA_TO_MX,HTML_MESSAGE
scantime=5.9,size=6016,user=qscand,uid=10124,required_score=5.0,rhost=localh
ost.localdomain,raddr=127.0.0.1,rport=49363,mid=<1371465833.fdyxtlmiklb@redu
cetummyfatsite.com>,bayes=0.999999,autolearn=no

Jun 17 11:44:04 plesk3 qmail-scanner[32763]:
Clear:RC:0(174.139.0.51):SA:0(3.5/5.0): 6.00291 5948
bloomberg.businessweek.2013@spam-domain.com user@hosted-domain.com
Exclusive_Discount_Rate_-_Save_92%% <1371465833.fdyxtlmiklb@
spam-domain.com> 1371465838.32765-1.plesk3.emailitis.co.uk:3867
orig-plesk3.emailitis.co.uk137146583879732763:5948
1371465838.32765-0.plesk3.emailitis.co.uk:700

 

I think that it is as the Bayes training that has become corrupted over time
as you say because most things suggest that we should be getting a lot less
Spam than we are. 

 /root/.spamassassin/bayes_seen is 21Mb in size so I guess trying to retrain
would take ages.  We have about 100 domains on the mail server.  If we begin
again, I know that we need to:

rm /root/.spamassassin/bayes_*

 

>From http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.txt, it seems
that sa-learn --clean or the above is destructive, but in your opinion is it
better than having the false reporting we are suffering at present?  

 

If I do sa-learn --backup and if so, where does the backup file go?  Can we
review with ease and then restore or would it be a lengthy process?

 

If we cleared, we would want to do Unsupervised learning from SA rules.  We
need to supplement this with supervised training to help it.  When we
identify Spam in the maillog as below from rules (that could be coming to
any email address in any domain) is there an easy way to tell SA that they
are Spam or Ham?

 

The server is sending/receiving about 3000 emails per day.  I do not know
about the spamc -l switch so can you guide me on using that if that would be
better?

 

I have some Spam emails in an offline folder in Outlook that I personally
have received.  If we create an email address
/var/qmail/mailnames/domain.com/spam, and resend those emails to it, would I
run:

sa-learn --spam /var/qmail/mailnames/domain.com/spam

Being able to do something from the command line interface would be easier
if that is possible on individual messages in the maillog.

 

I hope you can help.  

Many thanks in advance, Christoph 

 

-----Original Message-----
From: John Hardin [mailto:jhardin@impsec.org] 
Sent: 10 June 2013 14:24
To: users@spamassassin.apache.org
Subject: Re: Single images with random wording & general rules

 

On Mon, 10 Jun 2013, emailitis.com wrote:

 

> I tried to send the source from one such email but it was rejected 

> with a Spam score of 13:

> 

> Remote host said: 552 spam score (13.6) exceeded threshold

> 

> HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_J

> P_SURB

> L,URIBL_RHS_DOB,URIBL_WS_SURBL

> 

> 

> 

> On our server, it passed with:

> 

> Jun  8 15:12:40 plesk3 spamd[2692]: spamd: result: . 1 - 

> BAYES_00,HTML_EXTRA_CLOSE,HTML_IMAGE_RATIO_06,HTML_MESSAGE,REMOVE_BEFO

> RE_LIN

> K,T_REMOTE_IMAGE,URIBL_BLACK

 

BAYES_00 is probably your largest problem.

 

Do you have autolearn turned on? If you are manually training, have you
retained your training corpora so that they can be reviewed for
misclassfications? Do you allow your users to train without review?

 

Depending on the answers to the above, you are probably looking at wiping
and retraining your Bayes database from scratch. It is possible that
training these messages as spam will correct things, but for best results
you'll need to unlearn the messages that led them to be scored as ham
initially, and determine why they were learned as ham in the first place so
you can prevent that happening in the future.

 

-- 

  John Hardin KA7OHZ                     <http://www.impsec.org/~jhardin/>
http://www.impsec.org/~jhardin/

   <mailto:jhardin@impsec.org> jhardin@impsec.org    FALaholic #11174
pgpk -a  <mailto:jhardin@impsec.org> jhardin@impsec.org

  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

-----------------------------------------------------------------------

   The Tea Party wants to remove the Crony from Crony Capitalism.

   OWS wants to remove Capitalism from Crony Capitalism.

                                                     -- Astaghfirullah

-----------------------------------------------------------------------

  375 days since the first successful private support mission to ISS
(SpaceX)


Mime
View raw message