db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: [jira] Updated: (DERBY-937) Instability in wisconsin test
Date Thu, 29 Jun 2006 17:19:53 GMT

Andreas Korneliussen (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/DERBY-937?page=all ]
> Andreas Korneliussen updated DERBY-937:
> ---------------------------------------
>     Attachment: DERBY-937.diff
>                 DERBY-937.stat
> I was able to reproduce this problem on every run on a very fast laptop (it was not reproducible
on any other Windows lab machine I have tried).  This laptop happens to be the same kind as
Ole uses for the nightly tests.
> Adding a 10 second sleeping period after the population of the tables, did not have any
effect. I therefore tried to do run a compress on the tables (based on the assumption that
statistics is updated on compress), and now the test does not fail for me anymore.
> Attached is the patch which makes this test stop failing. The patch does not seem to
have any sideeffects on other platforms (Solaris) tested, however the test will use more time.

It is great that this fixes the problem on your reproducible case, I say 
we should commit.  But I don't know why it does.  To me it looks like 
the loader program always first inserts all the data into the base 
tables and then creates the indexes.  In this case Derby automatically
creates the "statistics" that you mention here, and they should be
no different than what is recreated when you do the compress.  Also
derby will create "packed" indexes in this case which again should look
no different than what compress will do (since all that has happened
are inserts into the table - if deletes or updates were involved it 
would be a different story).

What the change does do is alter the cache a lot, and alter the I/O 
timing a lot.

Also I wanted to again make sure people understand what statistics
mean in derby.
What is usually called "statistics" in other db's comes in 3 forms
in derby:

1) key distribution information
    o this is automatically maintained by derby by using existing 
indexes, no user interaction is ever needed and it is always up
to date.  Basically we use the index itself to estimate a given
number of rows for a range of keys.  Most other db's I know require
some sort of "update statistics" to get this info.

2) base table row counts
    o also automatically maintained by derby, but can be slightly out
      of date as it is only an estimate for performance reasons.  Row
      counts are updated at insert/delete time on a per page basis
      automatically, but we delay the rollup.  The rollup is done
      always when the page goes to disk.  The rollup may also be done
      earlier if the change is significant to the total - I think if
      delta is 10% or more of the table.  The row count estimate is also
      updated if they system ever happens to do a full scan on the

3) cardinality statistic
    This is basically one number which is the average number of duplicate
    keys for a given key, where key is a leading set of columns in an
    index.  So a one key index has one number and an index with 3 
columns (a, b, c) has 3 numbers indicating the cardinality of (a), (a, 
b) and
(a, b, c).
    o This statistic is automatically created when an index is created, 
and also when a number of "bait/switch" ddl operations are done like
compress table.  There have been a number of discussions on the list
on how to automate the update of this.  My opinion is that we should
attack it in the following 3 ways:
o provide a way for users to schedule it to be calculated, other than by
   running compress.  Not greate for a zero-admin - just a work around 
until we can better automate it.  The code exists, just not shown 
through to users right now.

o Develop a back ground "zero-admin" component which would 
programtically figure out how and when to schedule this kind of activity.

o see if there is any smart quick way to estimate the values 
automatically by doing some statistical analysis of the the btree
leaf pages.

View raw message