db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Why is the optimizer choosing such a bad path
Date Fri, 29 Mar 2013 16:57:34 GMT
On 3/29/2013 9:30 AM, Mike Matrigali wrote:
> On 3/29/2013 5:28 AM, Bergquist, Brett wrote:
>> Mike I have a reproducible case but the database is 132GB so it is not
>> practical.   I am willing to dig in and try to help debug this.  On
Some bugs can cause problems in the past in a table that was built using 
a version of the old software when has later been fixed, so we
  like to have a case where the problem can be reproduced starting from 
an empty database.  But it is valid to try and figure out why an 
existing db is having problem.  And in this case I would say that if
a problem exists in the table it is likely a bug that running
update statistics on the table does not fix it.  Especially when
running update statistics with the keepDisposableStats flag to avoid
at least the one known issue and possibly others.
>> the issue that was pointed to, Mamta seemed to have used a debug
>> procedure that would print out more information on why the optimizer
>> is choosing what it does.   This might be useful in this case as well.
>>
>> So if someone can point me in the right area of the code to start
>> looking and adding some debug that would be helpful.  I have done some
>> fixes for Derby in the past so I am able to build and debug Derby.
>>
I assume your real application is a java program and not ij.  be careful
with using ij to simulate your actual application.  The biggest thing
to watch out for is parameterized queries, the following 2 queries
most likely get very different code paths in the optmizer, but often 
people are suprised that the optimizer treats them different:

jdbc set field1 to 100
jdbc execute select * from t where field1 = ?

jdbc execute select * from t where field1 = 100

We recommend the 1st if you are going to execute the query more than
once as we only have to compile it once.  But the downside is that
at compile time we have to guess at the selectivity of the possible
values of ?.  If a unique index exists on the field then guessing
is easy, otherwise stats come into play.  And the more complicated
the query the more one guess builds on another, ...

For the second version at compile time if an index exists such that
field1 is the first column then we use the index itself to estimate
number of rows.

Mime
View raw message