db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3937) Select count(*) scans all the rows (and is therefore slow with big tables), is the amount of rows not available/known for example in index ?
Date Fri, 07 Nov 2008 12:52:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645753#action_12645753
] 

Knut Anders Hatlen commented on DERBY-3937:
-------------------------------------------

> So back to my original question - is there any alternative to
> COUNT(*) and implementing of paging over ordered set of (milion)
> rows with constant performance towards 'last' pages ?

One alternative is to create insert and delete triggers on the table
and maintain the row count in a separate table. For example:

ij> create table t(x int);
0 rows inserted/updated/deleted
ij> create table t_row_count(row_count int);
0 rows inserted/updated/deleted
ij> insert into t_row_count values 0;
1 row inserted/updated/deleted
ij> create trigger t_insert after insert on t for each row update t_row_count set row_count
= row_count + 1;
0 rows inserted/updated/deleted
ij> create trigger t_delete after delete on t for each row update t_row_count set row_count
= row_count - 1;
0 rows inserted/updated/deleted
ij> insert into t values 1,2,3;
3 rows inserted/updated/deleted
ij> select count(*) from t;
1          
-----------
3          

1 row selected
ij> select row_count from t_row_count;
ROW_COUNT  
-----------
3          

1 row selected
ij> insert into t values 4,5;
2 rows inserted/updated/deleted
ij> select row_count from t_row_count;
ROW_COUNT  
-----------
5          

1 row selected
ij> delete from t where x > 2;
3 rows inserted/updated/deleted
ij> select row_count from t_row_count;
ROW_COUNT  
-----------
2          

1 row selected
ij> select count(*) from t;
1          
-----------
2          

1 row selected

> Select count(*) scans all the rows (and is therefore slow with big tables), is the amount
of rows not available/known for example in index ?
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3937
>                 URL: https://issues.apache.org/jira/browse/DERBY-3937
>             Project: Derby
>          Issue Type: Improvement
>          Components: Performance
>         Environment: Any
>            Reporter: Martin Hajduch
>
> Create table with 5000000 rows. Create index on unique ID. Select count(*) on such table
is going to take quite some time.
> Shouldn't the index contain amount of indexed rows and the value taken from there ?
> Additionally, queries of the form select count(*) from table where col1=value; take lots
of time (depending on amount of rows satisfying WHERE clause) even if index on col1 exists.
Isn't it possible to find first and last occurence in the index, and then calculate amount
of rows more effectively then scanning through all of them ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message