spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars Jørgensen <l...@kb.dk>
Subject RE: Conversion Spamassassin(bayes) database to SDBM
Date Fri, 05 Aug 2011 13:09:25 GMT
> Hello, thanks for the post. Firstly, you are wrong about performance of my
> computer - I dont have supercomputer. I didnt run 10 000 000 messages
> through spamc/spamd. In fact the number is 100 000 000 and it means the max.
> size of message I run through spamc/spamd(notice that the number is behind
> -s parametr, s as SIZE). The result about 85 minutes is for about 17000
> messages (354MB). The average is 3,33 sec per message.

That number seems pretty high. I'm not experienced enough in the general deployment of SA
to say anything definite, but can only contribute numbers and hints from our own system. We
use amavisd-new which doesn't spawn SA but has it running all the time, thus saving lots of
time in that area.

Amavisd/postfix/SA can be configured to offer a lot of parallelism and can thus take full
advantage of available system resources. Currently we have 32 parallel processes running on
a rather small machine (2 cores, 3 GB RAM), and our average per message is around 1.5 second.

If you need to improve performance, I suggest you start looking at the machine. Do you have
a lot of iowait? Faster disks or look at dividing access between multiple drives. Do you have
swapping? More memory. Do you have constant high cpu usage? More CPUs.

Then start looking at the timing reports (I don't know if these are provided by SA or amavisd,
so you might not have them in your setup). Each and every mail through the system has a timing
report logged so you can see exactly how much time each step of the process took. It looks
like this:

Aug  5 00:01:53 post amavis[30559]: (30559-07) TIMING-SA total 1438 ms - parse: 1.60 (0.1%),
extract_message_metadata: 35 (2.5%), get_uri_detail_list: 4 (0.3%), tests_pri_-1000: 13 (0.9%),
tests_pri_-950: 1.54 (0.1%), tests_pri_-900: 1.55 (0.1%), tests_pri_-400: 33 (2.3%), check_bayes:
31 (2.2%), tests_pri_0: 1280 (89.0%), check_dkim_adsp: 109 (7.6%), check_spf: 40 (2.8%), poll_dns_idle:
35 (2.4%), check_dcc: 525 (36.5%), check_razor2: 492 (34.2%), check_pyzor: 0.25 (0.0%), tests_pri_500:
28 (1.9%), learn: 23 (1.6%), get_report: 1.45 (0.1%)

Here you can see that check_dcc and check_razor2 are pretty expensive, because they have to
query external servers. We are a low traffic site (less than 50k messages a day) and that's
not a problem for us. But if you have a high volume of traffic and DNS lookup dependent tests
takes a long time, you might consider adding a local DNS server to your setup. Look at http://www.spamtips.org/2011/07/spamassassin-why-run-your-own-dns.html
for further information.


-- 
Lars

Mime
View raw message