db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3937) Select count(*) scans all the rows (and is therefore slow with big tables), is the amount of rows not available/known for example in index ?
Date Wed, 05 Nov 2008 17:19:45 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Matrigali updated DERBY-3937:
----------------------------------


The current implementations of indexes and base tables do not maintain an exact count of rows.
 While on the surface for a single user using the table this seems simple, there are some
complicated things to take into account when implementing such a thing for a multi-user, transactional,
row locked system.  Coordinating a single count across multiple users could 
create additional blocking locks which many applications would not want.  

For an index there is no way to tell how many rows given the first and last instance of a
range in an index.  The number of rows per page is variable.  And often physically in both
the index and base table there are entries that are actually marked deleted (which may be
committed deleted or not), which should not be included in such a count.  

> Select count(*) scans all the rows (and is therefore slow with big tables), is the amount
of rows not available/known for example in index ?
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3937
>                 URL: https://issues.apache.org/jira/browse/DERBY-3937
>             Project: Derby
>          Issue Type: Improvement
>          Components: Performance
>         Environment: Any
>            Reporter: Martin Hajduch
>
> Create table with 5000000 rows. Create index on unique ID. Select count(*) on such table
is going to take quite some time.
> Shouldn't the index contain amount of indexed rows and the value taken from there ?
> Additionally, queries of the form select count(*) from table where col1=value; take lots
of time (depending on amount of rows satisfying WHERE clause) even if index on col1 exists.
Isn't it possible to find first and last occurence in the index, and then calculate amount
of rows more effectively then scanning through all of them ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message