db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: COUNT() optimisation
Date Sun, 10 Feb 2008 23:10:35 GMT
On Saturday 09 February 2008 06:30:05 Army wrote:
> Can you can provide a simple example to demonstrate the behavior that
> you are seeing?  I tried playing around with some simple (nonsense)
> tables but I wasn't able to come up with anything that sounds like what
> you are describing.
>
> Are you simply referring to something like:
>
>    select count(*) from jobitems;
>
> vs
>
>   select count(*) from jobitems where jobid = 4288;
>   select count(*) from jobitems where jobid > 1000;

Schema:
CREATE TABLE jobitems (jobid INTEGER NOT NULL, itemid INTEGER NOT NULL, state 
INTEGER NOT NULL)
CREATE INDEX jobitem_state_idx ON jobitems(state)
CREATE INDEX jobitem_jobid_state_idx ON jobitems(jobid, state)

I created bogus test data where jobid was always 0 and state was cycled 
between 0,1,2.

Query:
SELECT COUNT(*) FROM jobitems WHERE jobid = 0 AND state = 0

Initial results were around 600ms for 100000 rows, when I realised it would 
probably speed up if I put in the second index listed above.  The performance 
is still linear but it does speed up as now it doesn't have to iterate 
through every row in the table, only the ones which match the WHERE clause.

1-100 rows: <1ms
1000 rows: 2ms
10000 rows: 7ms
100000 rows: 66ms
1000000 rows: 1176ms

I naively thought an index would also result in being able to determine the 
count quickly for any given query on that index.

If I log the query plan it confirms what's happening.  It estimates the number 
of rows which will come back and then says it's iterating through each row to 
determine the count.

I don't suppose there is some advanced way to get an approximate count.  When 
the number of items is so large it seems a little pointless to get an 
accurate count anyway; within 1% would be acceptable if there were a way to 
get it quickly.

My initial thought is to have total columns on the table I'm joining to, and 
update both whenever anything is changed.  Problem is that two users adding 
or removing items at the same time would be updating the same table and I'm 
not sure how clever Derby is if two users do an UPDATE at the same time which 
adds or subtracts from the existing value.  If it merged them together as 
expected then I'm guessing would be a viable solution.

Daniel

Mime
View raw message