db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: [ optimizer ] Some questions on optimization cost estimates?
Date Mon, 03 Apr 2006 18:09:30 GMT

Army wrote:

> ----------------------------------
> 3) Comparing plan costs.
> ----------------------------------
> In OptimizerImpl.java there is the following code for comparing 
> "current" cost with "best cost" for this round of optimization:
>     /*
>     ** Is the cost of this join order lower than the best one we've
>     ** found so far?
>     **
>     ** NOTE: If the user has specified a join order, it will be the
>     ** only join order the optimizer considers, so it is OK to use
>     ** costing to decide that it is the "best" join order.
>     */
>     if ((! foundABestPlan) || currentCost.compare(bestCost) < 0)
>     {
>         rememberBestCost(currentCost, Optimizer.NORMAL_PLAN);
>         // Since we just remembered all of the best plans,
>         // no need to reload them when pulling Optimizables
>         // from this join order.
>         reloadBestPlan = false;
>     }
> The call to "currentCost.compare()" only compares the estimatedCost 
> fields; it does not look at the other two fields of a 
> CostEstimate--namely, rowCount and singleScanRowCount.
> In running some tests against 10.1, though, I noticed that it is 
> possible for currentCost and bestCost to have the same estimatedCost 
> values but different row counts.  Since the above check does not look at 
> row counts, we can end up in situations where the following two queries 
> can result in different query plans:
> select * from V1, V2 where V1.j = V2.b;
> select * from V2, V1 where V1.j = V2.b;
> I.e. the queries are identical with the exception of the order of V1 and 
> V2.  In my particular example V1 and V2 are both unions of two tables, 
> where V2 has 100,000 rows and V1 has less than ten.  When I ran these 
> queries against 10.1 with NO optimizer timeout--meaning that the 
> optimizer had time to examine all possible query plans--the result was 
> surprising: the first query chose join order [ V2, V1 ] and took 4.5 to 
> 5 seconds to complete; the second query chose join order [ V1, V2 ] and 
> took 30-33 seconds to complete.  Wow.
> I made a small change to the CostEstimateImpl.compare() method to 
> consider rowCount as a secondary field of comparison (if the 
> estimatedCosts are the same)  and singleScanRowCount as a third field of 
> comparison (if the rowCounts are the same).  With that change the 
> behavior was then what I expected: namely, the optimizer chose the same 
> join order (the faster one) regardless of which query I ran.
> So my question is this: if two cost estimates have the same 
> estimatedCost but different rowCounts (or even singleScanRowCounts), it 
> is reasonable to suppose that the estimate with the lower rowCount is 
> actually better?  In the above case the query with the lower row count 
> was the one that ran more quickly--is it reasonable to believe that this 
> will generally be the case?
I would think it would be better to figure out why the costs are the 
same for these vastly different plans.  Seems like a problem with 
costing.  It seems cleaner to me compare based on one field.  On the
face of this there is some bad assumption in building the cost - maybe
it is the 0 cost per hash look up you also mentioned?  What were the 
actual 2 plans?

> ---
> As always, thanks to anyone with the time and inclination to read this 
> email and provide feedback...
> Army

View raw message