db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-3790) Investigate if request for update statistics can be skipped for certain kind of indexes, one instance may be unique indexes based on one column.
Date Tue, 22 May 2012 21:56:41 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Kristian Waagan updated DERBY-3790:

    Attachment: derby-3790-2a-mintor_test_improvements.diff

Dag, I've addressed most of your suggestions.
Some comments below:

- DiposableIndexStatistics#insertData 
   * Added comments, skipped the reduced row count. This was a leftover from when I used far
more rows in the tables.
   * Commented constants, they're used to get different numbers of unique values in various

- The sleeps ...
   I have tried to address the one which is most likely to fail intermittently by adding a
new method to IndexStatsUtil: getNewStatsTable. It will first wait for the current stats to
go away, then it expects to get the same number of updated statistics entries (with a timeout).

- Call "JDBC.assertDrainResultsHasData...
   I removed this, since the checkpoint invocation is missing and the operation is not required
for the test to work. Adding a checkpoint, you can observe the following change:

Tue May 22 23:22:01 CEST 2012 Thread[TestRunner-Thread,5,main] {istat} "APP"."STAT_SCUI":
update scheduled, reason=[t-est=1039, i-est=20 => cmp=3.9502817175452365] (queueSize=1)

becomes (note t-est)

Tue May 22 23:24:52 CEST 2012 Thread[TestRunner-Thread,5,main] {istat} "APP"."STAT_SCUI":
update scheduled, reason=[t-est=2001, i-est=20 => cmp=4.605670061029743] (queueSize=1)

I intend to commit this patch, which I hope is the last one, tomorrow.
Patch ready for review.
> Investigate if request for update statistics can be skipped for certain kind of indexes,
one instance may be unique indexes based on one column.
> ------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: DERBY-3790
>                 URL: https://issues.apache.org/jira/browse/DERBY-3790
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions:
>            Reporter: Mamta A. Satoor
>            Assignee: Kristian Waagan
>         Attachments: derby-3790-1a-skip_stats_scui.diff, derby-3790-1b-skip_stats_scui.diff,
derby-3790-1c-skip_stats_scui.diff, derby-3790-2a-mintor_test_improvements.diff
> DERBY-269 provided a manual way to update the statisitcs. There was some discussion in
that jira entry for possibly optimizing the cases where there is no need to update the statistics.
I will enter the related comments from that jira entry here for reference.
> **************************
> Knut Anders Hatlen - 18/Jul/08 12:39 AM 
> If I have understood correctly, unique indexes always have up to date cardinality statistics
because cardinality == row count. If that's the case, one possible optimization is to skip
the unique indexes when SYSCS_UPDATE_STATISTICS is called. 
> **************************
> **************************
> Mike Matrigali - 18/Jul/08 09:48 AM 
> is the cardinality of a unique index 1 or is it row count? 
> It is also more complicated than just skipping unique indexes, it depends on the number
of columns in the index because 
> in a multi-column index, multiple cardinalities are calculated. So for instance on an
index on columns A,B,C there are 
> actually 3 cardinalities calculated: 
> A 
> A,B 
> A,B,C 
> I agree that the calculation of cardinality of A,B,C could/should be short circuited
for a unique index. 
> **************************
> **************************
> Knut Anders Hatlen - 18/Jul/08 03:25 PM 
> Mike, 
> It looks to me as if the cardinality is the number of unique values, so I think the cardinality
of a unique index is equal to its row count (for the full key, that is). You're right that
we can't short circuit it if we have a multi-column index. I don't know if it's worth the
extra complexity to short circuit the A,B,C case, since we'd have to scan the entire index
anyway. For a single-column unique index it sounds like a good idea, though. 
> **************************

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message