db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3937) Select count(*) scans all the rows (and is therefore slow with big tables), is the amount of rows not available/known for example in index ?
Date Thu, 25 Jun 2009 08:57:07 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723959#action_12723959
] 

Knut Anders Hatlen commented on DERBY-3937:
-------------------------------------------

I don't know how to access the table's estimated row count, but
SYS.SYSSTATISTICS contains the cardinality statistics for non-unique
indexes. These are not as accurate as the table's estimated row count,
but if you call SYSCS_UTIL.SYSCS_UPDATE_STATISTICS every now and then,
they should give a pretty good estimate. So provided that you have at
least one non-unique index on the table, and the statistics have been
created, a query such as this should give you the information you
need:

select st.statistics
from
        sys.sysstatistics st, sys.sysschemas sc, sys.systables t
where
        st.tableid = t.tableid and t.schemaid = sc.schemaid and
        sc.schemaname='APP' and t.tablename='T'

The query should return something like "numunique= 3 numrows= 9",
where numunique tells how many unique values the index contains, and
numrows tells the total number of rows in the index, which should be
the same as the number of rows in the table.

> Select count(*) scans all the rows (and is therefore slow with big tables), is the amount
of rows not available/known for example in index ?
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3937
>                 URL: https://issues.apache.org/jira/browse/DERBY-3937
>             Project: Derby
>          Issue Type: Improvement
>          Components: Performance
>         Environment: Any
>            Reporter: Martin Hajduch
>
> Create table with 5000000 rows. Create index on unique ID. Select count(*) on such table
is going to take quite some time.
> Shouldn't the index contain amount of indexed rows and the value taken from there ?
> Additionally, queries of the form select count(*) from table where col1=value; take lots
of time (depending on amount of rows satisfying WHERE clause) even if index on col1 exists.
Isn't it possible to find first and last occurence in the index, and then calculate amount
of rows more effectively then scanning through all of them ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message