db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@sbcglobal.net>
Subject Re: Regression Test Observation: increased testsuite execution time for nist and storetests
Date Wed, 15 Mar 2006 16:45:04 GMT
Army wrote:

> Ole Solberg wrote:
>> I had a look at the testsuite execution times for the last week and 
>> see that two testsuites have had significant increases:
>> nist:       with svn 383172 (2006-03-04) duration increased
>>              4 to 7 times depending on platform.
>> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383172.txt
> Looks like this is caused by my changes for DERBY-805.

Actually, the nist slow-down is the result of my changes for DERBY-1007 and/or 
DERBY-1073 (depending on how you look at it) rather than for DERBY-805.

Prior to DERBY-1073, if the optimizer for a subquery timed out in one round and 
then we returned for a subsequent round after that, the subquery's optimizer 
would immediately return without further optimization.  My changes to DERBY-1073 
added logic to reset the timeout state in cases where the subquery receives 
predicates that have been pushed to it from outer queries, so that the optimizer 
can use the predicates to consider better plans.  At the same time, in an 
attempt to avoid increasing the compilation time for subqueries which do not 
receive pushed predicates, I added logic to keep the "immediately return without 
further optimization" functionality as the 'default' behavior (i.e. behavior 
when there are no pushed predicates).  But as the nist slowdown shows, this 
"default" approach was incomplete.

The reason the DERBY-1073 patch is incomplete is because of DERBY-1007, which 
resets the bestCost field of a subquery's optimizer at the start of every round 
of optimization.  This behavior (which is correct) combined with the 'default' 
behavior of DERBY-1073 means that when we "immediately return without further 
optimization" for a subquery, the bestCost value will be uninitialized, i.e. 
Double.MAX_VALUE.  That value will then be passed back up to the outer query, 
which will add it to its running cost and end up with "infinity".  If the outer 
query has already found a best plan at this point, then the infinite cost for 
the current join order will cause the outer query's optimizer to reject that 
join order in favor of the previous best.  What this ultimately means is that 
whatever outer join order was deemed "best" at the time the subquery first timed 
out will remain as the "best" from that point on, even if there are other join 
orders which, in the absence of a subquery timeout, would actually have been 
cheaper.  Since the best cost so far is what determines the optimizer's timeout 
value, the optimizer for the outer query could end up spending more time trying 
to find an optimal join order than we really want to spend.

The above scenario is what is causing the slowdown in nist/dml132.sql after the 
DERBY-1007 and DERBY-1073 checkins.  The outer query has 10 optimizables, one of 
which is subquery. While the outer optimizer is costing different join orders, 
the subquery's optimizer eventually times out.  At that point the best join 
order for the outer query has a "best cost" of 6,616,444.  That cost in turn 
becomes the timeout value (in milliseconds) for the outer query, translating to 
about 110 minutes. Thus the optimizer will continue to optimize for up to 110 
minutes or until it has exhausted all possible permutations of its 10 
optimizables.  In this case it exhausts the permutations first--but only after 
about 10 minutes of optimization.

To verify this behavior I ran the nist/dml132.sql test against 10.1 with 
derby.optimizer.noTimeout set to true, which tells the optimizer to exhaust all 
possible join orders before picking the best one.  When I did this the 
dml132.sql test took roughly the same amount of time to compile and execute as 
it does with the current 10.2 trunk (i.e. just over 10 minutes).  That's what I 
would expect given the above explanation.

Prior to the fixes for DERBY-1007 and DERBY-1073, the subquery would still 
timeout--but since bestCost was never reset, the subquery would return whatever 
value was in bestCost from the previous round of optimization.  That cost, while 
still technically incorrect (see below), was nonetheless more reasonable than 
"infinity", and thus the outer optimizer was still able to do meaningful cost 
comparisons and to find cheaper join orders.  With each new "best" join order 
the timeout value for the outer optimizer would decrease accordingly, until it 
eventually became something reasonable like 15-ish seconds, at which point the 
optimizer would stop and just pick the best plan so far.

Note, though, that while things worked before DERBY-1007 and DERBY-1073, the 
cost that the subquery was returning after it timed out was actually 
incorrect--it was the "best cost" for the subquery as it was stored at the point 
it timed out, instead of the best cost for the subquery w.r.t its current 
position in the outer join order.

So all of that said, the fix for the nist slowdown issue is to take the changes 
for DERBY-1073 one step further by recognizing when a subquery's optimizer has 
timed out and, instead of returning immediately, letting the optimizer continue 
until it finds it's first "best plan" for the current round.  Or put more 
simply, we need to delay the timeout until the optimizer has a legitimate cost 
that it can return.

I have a patch to address this issue and I plan to attach it to DERBY-1073 as 
follow-up patch to the original.  I ran derbyall with the changes last night and 
saw no failures, but I did make some slight changes this morning and so would 
like to re-run derbyall again before posting.

In the meantime, if anyone who reads this has any feedback/comments, that would 
be great.

Thanks much,

View raw message