Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 11810 invoked from network); 10 Feb 2008 23:13:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Feb 2008 23:13:55 -0000 Received: (qmail 73433 invoked by uid 500); 10 Feb 2008 23:13:47 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 73411 invoked by uid 500); 10 Feb 2008 23:13:47 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 73400 invoked by uid 99); 10 Feb 2008 23:13:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Feb 2008 15:13:47 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [203.217.22.128] (HELO file1.syd.nuix.com.au) (203.217.22.128) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Feb 2008 23:13:03 +0000 Received: from host68.syd.nuix.com.au (host68.syd.nuix.com.au [192.168.222.68]) by file1.syd.nuix.com.au (Postfix) with ESMTP id 195AB4A81C0 for ; Mon, 11 Feb 2008 10:12:30 +1100 (EST) From: Daniel Noll Organization: Nuix Pty Ltd To: "Derby Discussion" Subject: Re: COUNT() optimisation Date: Mon, 11 Feb 2008 10:10:35 +1100 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <200802080925.35160.daniel@nuix.com> <47ACADBD.3030603@gmail.com> In-Reply-To: <47ACADBD.3030603@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200802111010.35312.daniel@nuix.com> X-Virus-Checked: Checked by ClamAV on apache.org On Saturday 09 February 2008 06:30:05 Army wrote: > Can you can provide a simple example to demonstrate the behavior that > you are seeing? I tried playing around with some simple (nonsense) > tables but I wasn't able to come up with anything that sounds like what > you are describing. > > Are you simply referring to something like: > > select count(*) from jobitems; > > vs > > select count(*) from jobitems where jobid = 4288; > select count(*) from jobitems where jobid > 1000; Schema: CREATE TABLE jobitems (jobid INTEGER NOT NULL, itemid INTEGER NOT NULL, state INTEGER NOT NULL) CREATE INDEX jobitem_state_idx ON jobitems(state) CREATE INDEX jobitem_jobid_state_idx ON jobitems(jobid, state) I created bogus test data where jobid was always 0 and state was cycled between 0,1,2. Query: SELECT COUNT(*) FROM jobitems WHERE jobid = 0 AND state = 0 Initial results were around 600ms for 100000 rows, when I realised it would probably speed up if I put in the second index listed above. The performance is still linear but it does speed up as now it doesn't have to iterate through every row in the table, only the ones which match the WHERE clause. 1-100 rows: <1ms 1000 rows: 2ms 10000 rows: 7ms 100000 rows: 66ms 1000000 rows: 1176ms I naively thought an index would also result in being able to determine the count quickly for any given query on that index. If I log the query plan it confirms what's happening. It estimates the number of rows which will come back and then says it's iterating through each row to determine the count. I don't suppose there is some advanced way to get an approximate count. When the number of items is so large it seems a little pointless to get an accurate count anyway; within 1% would be acceptable if there were a way to get it quickly. My initial thought is to have total columns on the table I'm joining to, and update both whenever anything is changed. Problem is that two users adding or removing items at the same time would be updating the same table and I'm not sure how clever Derby is if two users do an UPDATE at the same time which adds or subtracts from the existing value. If it merged them together as expected then I'm guessing would be a viable solution. Daniel