spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Johnson <...@indietorrent.org>
Subject Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new
Date Wed, 17 Apr 2013 22:47:42 GMT


On 4/17/2013 5:05 PM, Kris Deugau wrote:
> Ben Johnson wrote:
>> Is there anything else that would cause Bayes tests not be performed? I
>> ask because other types of tests are disabled automatically under
>> certain circumstances (e.g., network tests), and I'm wondering if there
>> is some obscure combination of factors that causes Bayes tests not to be
>> performed.
> 
> Do you have bayes_sql_override_username set?  (This forces use of a
> single Bayes DB for all SA calls that reference this configuration file
> set.)
> 
> If not, you may be getting a Bayes DB for each user on your system;
> IIRC this is supported (sort of) and default with Amavis.
> 
> -kgd
> 

Thanks for jumping-in here, Kris.

Yes, I do have the following in my SA local.cf:

bayes_sql_override_username amavis

So, all users are sharing the same Bayes DB. I train Bayes daily and the
token count, etc., etc. all look good and correct.

Just a quick update to my previous post.

The Pyzor and Razor2 score information is indeed coming through for the
handful of messages that have landed since I made those configuration
changes. So, all seems to be well on the Pyzor / Razor2 front.

However, I still don't see any evidence that Bayes testing was performed
on the messages that are "slipping through".

It bears mention that *most* messages do indeed show evidence of Bayes
scoring.

--- OH, SNAP! I found the root cause. ---

Well, when I went to confirm the above statement, regarding most
messages showing evidence of Bayes scoring, I realized that *none* show
evidence of it since 3/23! No wonder all of this garbage is slipping
through!

I recognized the date 3/23 immediately; it was the date on which we
upgraded ISPConfig from 3.0.4.6 to 3.0.5. (For those who have no
knowledge of ISPConfig, it is basically a FOSS solution to managing vast
numbers of websites, domains, mailboxes, etc., as the name implies.)

We also updated OS packages (security only) on that day.

After diff-ing all of the relevant service configuration files
(amavis-new, spamassassin, postfix, dovecot, etc.) I couldn't find any
discrepancies.

Then, I tried:

-----------------------------------------------------
# spamassassin -D -t < /tmp/msg.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'

Apr 17 15:36:08.723 [23302] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x2fbc508),
bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
Apr 17 15:36:08.746 [23302] dbg: bayes: using username: amavis
Apr 17 15:36:08.746 [23302] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x3305358)
Apr 17 15:36:08.758 [23302] dbg: bayes: database connection established
Apr 17 15:36:08.758 [23302] dbg: bayes: found bayes db version 3
Apr 17 15:36:08.759 [23302] dbg: bayes: Using userid: 1
Apr 17 15:36:08.914 [23302] dbg: bayes: corpus size: nspam = 6083, nham
= 2334
Apr 17 15:36:08.920 [23302] dbg: bayes: tok_get_all: token count: 163
Apr 17 15:36:08.921 [23302] dbg: bayes: tok_get_all: SQL error: Illegal
mix of collations for operation ' IN '
Apr 17 15:36:08.921 [23302] dbg: bayes: cannot use bayes on this
message; none of the tokens were found in the database
Apr 17 15:36:08.921 [23302] dbg: bayes: not scoring message, returning undef
Apr 17 15:36:13.116 [23302] dbg: timing: total 5159 ms - init: 804
(15.6%), parse: 10 (0.2%), extract_message_metadata: 99 (1.9%),
poll_dns_idle: 3426 (66.4%), get_uri_detail_list: 0.24 (0.0%),
tests_pri_-1000: 11 (0.2%), compile_gen: 133 (2.6%), compile_eval: 18
(0.3%), tests_pri_-950: 5 (0.1%), tests_pri_-900: 5 (0.1%),
tests_pri_-400: 12 (0.2%), check_bayes: 8 (0.1%), tests_pri_0: 804
(15.6%), dkim_load_modules: 23 (0.4%), check_dkim_signature: 5 (0.1%),
check_dkim_adsp: 99 (1.9%), check_spf: 61 (1.2%), check_razor2: 211
(4.1%), check_pyzor: 138 (2.7%), tests_pri_500: 3387 (65.7%)
-----------------------------------------------------

Check-out the message buried half-way down:

bayes: tok_get_all: SQL error: Illegal mix of collations for operation '
IN '

I have run into this unsightly message before, but in that case, I could
see the entire query, which enabled me to change the collations accordingly.

In this case, I have no idea what the original query might have been.

Further, I have no idea what changed that introduced this problems on 3/23.

Was it a MySQL upgrade? Was it an ISPConfig change?

Has anybody else run into this?

Thanks again,

-Ben

Mime
View raw message