db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-1905) Optimizer cost estimates for subqueries are way (way) too high.
Date Mon, 02 Oct 2006 21:31:19 GMT
    [ http://issues.apache.org/jira/browse/DERBY-1905?page=comments#action_12439323 ] 
A B commented on DERBY-1905:

>  Can the code detect, at execution time, how accurate its estimate was?

Good question.  I ran the query attached to this issue using the ij "cursor" command and based
strictly on that, it looks like the query plan is written to log *after* the query executes
and the cursor has been closed.

This seems to indicate that we still have the query plan information--including the estimated
row count and the estimated cost--somewhere accessible after the query has executed.  So if
there's an easy way to figure out how many rows were actually returned by the cursor and also
the time it took to retrieve all of the rows, then it seems like it would indeed be possible
to dump some information about the accuracy of the optimizer's cost estimates.

I don't know enough about when/how the query plan is logged nor about the state of things
at that time in processing to say for sure how hard it would be for the code to determine
the accuracy of the optimizer's estimates, but I agree that this is something which at least
seems feasible and which could be very useful as a first step in tracking down problems with
cost estimates...

> Optimizer cost estimates for subqueries are way (way) too high.
> ---------------------------------------------------------------
>                 Key: DERBY-1905
>                 URL: http://issues.apache.org/jira/browse/DERBY-1905
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions:,,
>            Reporter: A B
>         Attachments: d1905.sql
> If I run the attached repro (pulled from DERBY-1866) with derby.optimizer.noTimeout=true
(meaning the optimizer should take all the time it needs to try out all possible plans within
the restrictions of the optimizer overrides), the estimated cost shown in derby.log (if I
log the query plan) is over 600k and the estimated row count is over 520k.
> If, as the code in OptimizerImpl seems to expect, the unit for a cost estimate is milliseconds,
then the optimize here is guessing that the query will take over 10 minutes to execute and
will return a half-million rows.  But in truth the *combined* time for compilation AND execution
is just a couple of seconds, and only 15 rows are returned.
> That suggests to me a rather serious problem with the optimizer cost estimates for subqueries.
> Among other things this can have a major impact on the optimizer's timeout mechanism
for very deeply-nested queries.  The optimizer will continue to try out different combinations
of indexes/joinOrders/joinStrategies until it either exhausts them all or until the number
of milliseconds spent optimizing is greater than the "best cost" estimate so far. In the case
of the repro for this issue, the optimizer quickly exhausts all of the options and so it finishes
in a fair amount of time.
> But in larger queries where there are far more combinations to try (see esp. queries
attached to DERBY-1205, DERBY-1777), these severly inflated cost estimates get very large
very quickly (sometimes even reaching infinity--see DERBY-1259, DERBY-1260) and so the optimizer
just keeps on optimizing and never times out.  The result is that for queries like those in
DERBY-1777, the optimizer can spend literally hours optimizing a query which then executes
in a matter of seconds.
> I'm still investigating this, but preliminary examination suggests that part of the problem
is in some way related to the treatment of "outer costs" by the optimizer--and in particular,
it looks like the optimizer is perhaps too aggressive in multiplying cost estimates by row
counts pulled from "outer costs".  That's just my first guess at the problem, though; there
could of course be far more to it than that...

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message