Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (nike.apache.org: local policy)
Subject: Re: I'm doing it wrong.
From: Karsten =?ISO-8859-1?Q?Br=E4ckelmann?= <guenther@rudersport.de>
To: users@spamassassin.apache.org
In-Reply-To: <ad3e9eb0d622296f76b13d0e89d8b3a4@gnukai.com>
References: <ad3e9eb0d622296f76b13d0e89d8b3a4@gnukai.com>
Content-Type: text/plain
Date: Fri, 23 May 2014 05:33:31 +0200
Message-Id: <1400816011.4835.140.camel@monkey>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:
> I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin 
> (user prefs via mysql) server that I've been running for a few years 

The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.

> now. It's just a few of my private domains, not a lot of traffic. In the 
> last 6 months, the amount of spam getting through has gone from one or 
> two a week to 30 a day. I had sa-learn setup on imap folders called SPAM 
> and HAM running as root, so I just started tossing emails in there. It 

Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train as
the mail receiving / scanning user.

> seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold 
> to dump to my JUNK folder is 3, and I have spamchk sideline things above 
> 7). I still get legitimate email in the 2-3 range, but I haven't had 
> legitimate email above 3 in a long time. After a bit, the 2s became 3s 
> and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did 
> this habitually for more than a month, and the progress seemed to stop. 
> I googled around a bit and realized that I didn't do a very good job 
> setting up rules, so I added pyzor and razor2, and they seem functional. 
> Spam got better, and it's down to maybe 10 a day, but they still range 
> all the way up to 5.

Mixing in Razor or Pyzor sure can help. But that "setting up rules" you
just considered your job is a bit weird. Local rules of course also can
help, but are  (a) an advanced topic, and  (b) not the task of a regular
SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.


> What really gets me is that if I take an email that scores -2, strip 
> the X-Spam* headers, and run it through spamc by hand (even as the spamd 
> user) just like the spamchk script does, it scores around a 4. I have 

It is not necessary to strip X-Spam headers. SA ignores these, if
present.

You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.

What is that "spamchk script" you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.

> one here that scores a 4.1 if it comes through the mail, and a 6.6 if I 
> run it manually. What can I do to reconcile these scores? I would like 
> the scores I'm getting from the commandline over the ones I'm getting 
> through postfix, but I don't know the system well enough to know what is 
> causing the difference.

Highlighting the differences, removing common rule hits:

> ================== Via postfix

>   0.0 HTML_IMAGE_RATIO_08    BODY: HTML has a low ratio of text to image 
> area

> ================ Via commandline (cat test.mail | sudo -u spamd 
> /usr/bin/spamc -u <myemail> > postsa.mail)

>   2.5 URIBL_DBL_SPAM         Contains an URL listed in the DBL blocklist

The Bayesian probability is ~identical, merely differing a thousands.

Hitting URIBL_DBL_SPAM in the later manual check, but not at receiving
time may be due to timing and the URI actually getting listed later.

What's odd is, that the subsequent manual check is *missing* the HTML
image ratio rule triggering. Something altered the message.


> ================ /etc/mail/spamassassin.cf (I added the last 4 lines in 
> a desperate attempt to see something change, but to no effect)
> /etc/mail/spamassassin/local.cf

Which one? The latter spamassassin/local.cf is default (though packager
dependent), the claimed (typo'ed ?) one is custom, if it exists at all.

Snip, skipping to the last four lines:

> auto_learn 0
> use_razor2
> use_dcc
> use_pyzor

auto_learn is not a valid option. That would be bayes_auto_learn.

The other use_* options require arguments (0 or 1). The lines as pasted
do not enable them, and instead produce lint warnings. See

  spamassassin --lint

That lint check is a good starting point anyway...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}