spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Woick <a...@wombaz.de>
Subject Re: Bayes innodb problems
Date Fri, 28 Sep 2007 15:48:53 GMT
>>> processing has ground down to really slow. I'm seeing some incredibly
>>> long queries now in my slow-query log, such as:

>> Try an "optimize table <tabname>" for each of the sa tables. You just 
>> filled the database from scratch, so perhaps the counters/statistics do not 
>> reflect the actual value distribution yet.

> Actually this bayes DB has been around for a few months, and has been
> built up over time.

I experienced a problem with a custom innodb-stored database, where at 
one point a certain query suddenly took minutes instead of microseconds. 
Using EXPLAIN, I saw that indexes were used not as intended any more. I 
changed the query to include a FORCE INDEX clause, but later detected 
that an OPTIMIZE TABLE remedied the situation also. I saw this on two 
different databases that were built up over time from totally empty to 
many million records.
OPTIMIZE TABLE on innodb tables only defragments the index and updates 
the index statistics.

> This does make me wonder what regular DB maintenance tasks should be
> performed on the bayes DB.  It sounds like some people let the code
> auto-expire, while some run cron jobs to expire data?

I added a TIMESTAMP column to the bayes_seen table, so I can expire this 
table by date. The other tables are maintained internally by SA, so 
don't do anything with them.

> Should I be running an optimize table every so often?

No, I don't think so. In my above mentioned databases, the problem never 
came back, and I didn't run optimize table since then (more than 1 year 
ago).

> mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND
> ('1190846660' - atime) > '345600';
> +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
> | id | select_type | table       | type | possible_keys            | key
> | key_len | ref   | rows   | Extra                    |
> +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
> |  1 | SIMPLE      | bayes_token | ref  | PRIMARY,bayes_token_idx2 |
> bayes_token_idx2 | 2       | const | 229946 | Using where; Using index | 
> +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+

The "2" in the key_len column makes me wonder. That means only 2 bytes 
of the index are used. The index at my system includes id (INTEGER=4 
bytes) and atime (INTEGER=4 bytes). 2 Bytes would be half of the id 
field, which is impossible. The key_len field always shows 4 at my 
system, so it uses the id part (the first half) of the index for lookup. 
Have you changed id to smallint? Then you have altered the table which 
has the same (side-)effect as optimize table. I don't think truncating 
the field to 2 bytes did the speedup, I think the side-effect of 
recreating the index did it.

Mime
View raw message