Return-Path: X-Original-To: apmail-spamassassin-dev-archive@www.apache.org Delivered-To: apmail-spamassassin-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B342F9D84 for ; Tue, 24 Apr 2012 07:48:44 +0000 (UTC) Received: (qmail 43396 invoked by uid 500); 24 Apr 2012 07:48:44 -0000 Delivered-To: apmail-spamassassin-dev-archive@spamassassin.apache.org Received: (qmail 41935 invoked by uid 500); 24 Apr 2012 07:48:06 -0000 Mailing-List: contact dev-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spamassassin.apache.org Received: (qmail 41614 invoked by uid 99); 24 Apr 2012 07:47:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Apr 2012 07:47:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.115] (HELO eir.zones.apache.org) (140.211.11.115) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Apr 2012 07:47:50 +0000 Received: by eir.zones.apache.org (Postfix, from userid 80) id 02FB33542; Tue, 24 Apr 2012 07:47:28 +0000 (UTC) From: bugzilla-daemon@bugzilla.spamassassin.org To: dev@spamassassin.apache.org Subject: [Bug 6793] PATCH reduce sa-awl memory usage Date: Tue, 24 Apr 2012 07:47:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Spamassassin X-Bugzilla-Component: Tools X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: vitalyb@telenet.dn.ua X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: dev@spamassassin.apache.org X-Bugzilla-Target-Milestone: Undefined X-Bugzilla-Changed-Fields: CC Message-ID: In-Reply-To: References: X-Bugzilla-URL: https://issues.apache.org/SpamAssassin/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6793 Vitaly V. Bursov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vitalyb@telenet.dn.ua --- Comment #4 from Vitaly V. Bursov 2012-04-24 07:47:11 UTC --- New patch has the same root issue as original sa-awl has - potentially stores a lot of keys in memory. I had to reduce a DB from 8M keys to 800K and proposed algorithm would keep around 3M in memory keys and if 800K keys eat 1G of RAM on x86-64.... well, not much better than original. Looks like there are few options then. 1. Create a new DB file and replace the old one with it. Probably it's hard to do correctly if SA is running. 2. Leave missed or duplicate keys as is - probably duplicates are harmless and missed keys will be handled on next runs. Ugly. 3. Modify the algorithm. Few options here as well: a) check size of @delete_keys, stop iteration if over limit, delete keys, start all over again (very slow); b) if we got 'totscore' key, also get and check 'count' (strip off /totscore$/ part from the key name), if it's 'count' key proceed as usual. If keys should be deleted delete the current one and remember another in @delete_keys. The trick is that keys should not be deleted afterwards all at once but on every iteration key name should be checked if it's in @delete_keys and if so, this key must be deleted and removed from @delete_keys. The size of @delete_keys should be checked also to keep it from consuming too much memory. c) store keys that should be deleted in another on-disk DB. Few more thoughts. It's hard to predict how efficient method 3b going to be compared to 3a as keys are randomly sorted (I think), probably it's best to have two implementations and make it an option - the 'fast' one but not entirely correct for huge DBs (like 2) and, default, correct one and not so fast (like 3b) for small-medium sized DBs. Hope this helps, Thanks. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.