db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4938) Implement istat scheduling/triggering
Date Wed, 09 Feb 2011 22:57:01 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Kristian Waagan updated DERBY-4938:

    Attachment: derby-4938-1a-istat_scheduling.diff

Attaching patch 1a, which adds the initial scheduling logic.

Updates or creation of the index cardinality statistics will only happen for prepared statements,
and only when the query involves an access path using an index. In addition there are threshold
that has to be reached/exceeded before an update is scheduled. These thresholds may have to
be tweaked after a period of testing.

Note that DERBY-4939 has to be committed before the autostats are enabled, but here's some
comments from DERBY-4771 about the available debug knobs for this feature:

 a) derby.storage.indexStats.debug.createThreshold (100)
 b) derby.storage.indexStats.debug.absdiffThreshold (1000)
 c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
 d) derby.storage.indexStats.debug.queueSize (5)

(a) determines how big a table must be before statistics are automatically
created. (b) determines how big the discrepancy between the row estimates for
the table and the index must be before the statistics are updated. (c)
determines how big the logarithmic (natural logarithm) must be before the
statistics are updated. The values of these properties are printed if tracing
is turned on. Now:

  Q: I don't understand these properties!
  A: Read the code ;)
     These properties are made available for experimentation and debugging
     only. a-c affect when statistics are created or updated, and are used in
     TableDescriptor. (d) is only used in IndexStatisticsDaemonImpl.

  Q: Why have both (a) and (b)?
  A: Purely for debugging and experimentation. If these properties are included
     in production code, I expect they can be folded into one.

  Q: Why have both (b) and (c)?
  A: In general (c) will decide if the statistics are updated. However, for
     small tables (c) will cause frequent updates of the statistics. For small
     tables accurate statistics are not needed for good performance [1], so
     there is no reason to frequently update the stats. This is where (b) comes
     into play.

[1] One exception might be if the rows are huge.

Committed to trunk with revision 1069160.

> Implement istat scheduling/triggering
> -------------------------------------
>                 Key: DERBY-4938
>                 URL: https://issues.apache.org/jira/browse/DERBY-4938
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions:
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>         Attachments: derby-4938-1a-istat_scheduling.diff, derby-4938-1a-istat_scheduling.stat
> The istat daemon has to get its orders from somewhere (it is not operating purely on
its own), and this issue tracks the addition of code that will schedule units of works with
with the daemon. 
> The current approach is based on statement compilation, i.e. prepared statements, triggering
the addition of units of work.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message