db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <kristian.waa...@oracle.com>
Subject Re: question about istats thresholds
Date Tue, 15 Feb 2011 19:25:08 GMT
On 11.02.2011 23:29, Mike Matrigali wrote:
> Thanks, could you take a look at DERBY-4211.  It looks like
> the stat updater is running, but I don't think it should be.
> basically what would you expect to happen on a newly created
> table, that then has 7 rows added to it.

I've only looked briefly at the test, and here are my thoughts about 
what's going on:
  o some of the tables in the test are created, populated and then 
having an index created. Since the table is not empty, the index 
creation will cause statistics to be generated.
  o queries in the test will then cause the istat scheduling logic to fire.
  o due to inaccurate row estimates for the table the istat incorrectly 
schedules an update.

My opinion (after having looked very quickly at this) is that the istat 
code is doing as it should with the current parameters. The bad behavior 
is caused by a combination of poor information quality (the row 
estimate), the low number of rows in the table ("defeats" the 
logarithmic threshold), and the istat configuration (absdiff=0).
Since the row estimate is exactly that - an estimate - it may be wise to 
reintroduce the absdiff parameter to avoid problems like these for small 
tables. At least it should be simple to change its value and re-run the 
test to see if the istat work is still happening or not (note that the 
value quoted below is wrong - 
derby.storage.indexStats.debug.absdiffThreshold is currently set to zero).

There are at least two issues with the row estimate handling:
  o not logged
  o there are two ways to update the estimate: using an absolute value, 
or using deltas. In some cases these two ways interfere, i.e. changes 
already reflected by a set absolute value are also applied afterwards as 
delta operations.

> case one: then queries are run from ij

If the stat updater is running for case one, where there are no indexes, 
that's certainly a bug!


> case two: an index is created on the table, and then queries are run 
> from ij
> Kristian Waagan wrote:
>> On 11.02.11 20:11, Mike Matrigali wrote:
>>> From DERBY-4934 i see there are the following thresholds:
>>>  a) derby.storage.indexStats.debug.createThreshold (100)
>>>  b) derby.storage.indexStats.debug.absdiffThreshold (1000)
>>>  c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
>>>  d) derby.storage.indexStats.debug.queueSize (5)
>>> My question is that I don't understand how they are expected to 
>>> interact.  If a table has less than 100 rows does that mean
>>> stat will not be created even if b or c is exceeded.
>> Hi Mike,
>> To start with, you can probably ignore threshold (d) for now.
>> It applies to the scheduling phase - that is when the unit of work is
>> scheduled with the daemon - and to get that far (at least) one of the
>> other thresholds has to be exceeded. If the queue is full the unit of
>> work won't be scheduled, and another attempt may be made at a later time
>> during another statement compilation. This requires that someone
>> actually compiles a relevant query, or potentially that the existing
>> statement is recompiled (stale plan check).
>> The purpose of (d) is to avoid excessive queue growth. Since the queue
>> is implemented as a list, searching it for duplicates may also be
>> expensive if it grows too large.
>> Threshold (a) concerns indexes without existing statistics. If there are
>> less than 100 rows in the base table, statistics won't be created.
>> Thresholds (b) and (c) concern indexes with existing statistics.
>> Threshold (b) was introduced to avoid too frequent updates of existing
>> statistics for small tables. I don't remember off the top of my head
>> where it was discussed, but I ended up effectively removing it by
>> setting it to zero for now. I kept the property (and the relevant code)
>> to allow people to experiment somewhat without having to recompile the
>> code if they have an application running into trouble with this 
>> scenario.
>> Finally, the main threshold for existing statistics is (c). Here the
>> natural logarithms of the row estimate of the index statistics and the
>> row estimate of the base table are compared. If the difference is
>> greater than or equal to lndiffThreshold (defaults to 1.0) the
>> statistics for the index are scheduled for update. If the daemon queue
>> is full the request is discarded, assuming another compilation will
>> manage to schedule the update eventually.
>> Hope this helped a bit, feel free to ask additional questions. As I have
>> said before, these threshold may have to be changed significantly as we
>> test the feature (remove existing, add new ones, or modify existing 
>> ones).
>> Cheers,

View raw message