db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Hajduch (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3937) Select count(*) scans all the rows (and is therefore slow with big tables), is the amount of rows not available/known for example in index ?
Date Wed, 05 Nov 2008 21:38:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645333#action_12645333
] 

Martin Hajduch commented on DERBY-3937:
---------------------------------------

Ok, in the meantime I read more about the topic and Derby.

So back to my original question - is there any alternative to COUNT(*) and implementing of
paging over ordered set of (milion) rows with constant performance towards 'last' pages ?
If not at the moment - is there any development planned in this area (where I could possibly
contribute as well) ?
If not, is there any development planned in the area of speeding up the index scan so that
milion rows do not take 10, but let's say 1 second ? Even such constant improvement would
help in my particular case.

I tried several open source databases, found Derby to be the closest to my needs and I am
able to devote some resources in this direction - but would like to get some pointers towards
what to concentrate on to.

> Select count(*) scans all the rows (and is therefore slow with big tables), is the amount
of rows not available/known for example in index ?
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3937
>                 URL: https://issues.apache.org/jira/browse/DERBY-3937
>             Project: Derby
>          Issue Type: Improvement
>          Components: Performance
>         Environment: Any
>            Reporter: Martin Hajduch
>
> Create table with 5000000 rows. Create index on unique ID. Select count(*) on such table
is going to take quite some time.
> Shouldn't the index contain amount of indexed rows and the value taken from there ?
> Additionally, queries of the form select count(*) from table where col1=value; take lots
of time (depending on amount of rows satisfying WHERE clause) even if index on col1 exists.
Isn't it possible to find first and last occurence in the index, and then calculate amount
of rows more effectively then scanning through all of them ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message