db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <derby-...@db.apache.org>
Subject [jira] Updated: (DERBY-1908) Investigate: What's the "unit" for optimizer cost estimates?
Date Tue, 03 Oct 2006 23:12:21 GMT
     [ http://issues.apache.org/jira/browse/DERBY-1908?page=all ]

Mike Matrigali updated DERBY-1908:

Here is the "units" view from the storage layer, which I believe should be the basis for all
the optimizer costs.  

The actual interface does not specify a unit.  This originally was a decision to allow for
a number of different
implementations.  The guarantee was that across all calls one could compare the cost to another
cost and
get reasonable results.  Having said that the actual implementation of the costs returned
by store have always
been based on ms. elapsed time for a set of basic  operations.  These basic operations were
run and then 
a set of constants defined.  The last time this was done was quite awhile ago, on probably
a 400mhz machine.

The "hidden" unit of ms. was broken when the optimizer added timeout - which is basically
a decision to stop
optimizing once the estimated cost is less than the elapsed time of the compile.  At this
point something outside
the interface assumed the unit was ms.

I think a good direction would be to change the interfaces to somehow try to support costs
as truly elapsed time, 
fix at the least the defaults to be based on a modern machine, fix any optimizer code that
may not be currently
treating the cost unit correctly (like multiplying a cost by a cost), and maybe look at dynamically
sizing the costs based
on current machine operations.  

I will look around for the old unit tests that produced the original costs.  You can see the
constants used in

> Investigate: What's the "unit" for optimizer cost estimates?
> ------------------------------------------------------------
>                 Key: DERBY-1908
>                 URL: http://issues.apache.org/jira/browse/DERBY-1908
>             Project: Derby
>          Issue Type: Task
>          Components: SQL, Performance
>            Reporter: A B
> Derby optimizer decisions are necessarily based on cost estimates.  But what are "units"
for these cost estimates?  There is logic in OptimizerImpl.getNextPermutation() that treats
cost estimates as if their unit is milliseconds--but is that really the case?
> The answer to that question may in fact be "Yes, the units are milliseconds"--and maybe
the unexpected cost estimates that are sometimes seen are really caused by something else
(ex. DERBY-1905).  But if that's the case, it would be great to look at the optimizer costing
code (see esp. FromBaseTable.estimateCost()) to verify that all of the "magic" of costing
really makes sense given that the underlying unit is supposed to be milliseconds.
> Also, if the stats/cost estimate calculations are truly meant to be in terms of milliseconds,
I can't help but wonder on what machine/criteria the determination of milliseconds is based.
 Is it time to update the stats for "modern" machines, or perhaps (shooting for the sky) to
dynamically adjust the millisecond stats based on the machine that's running Derby and use
the adjusted values somehow?  I have no answers to these questions, but I think it would be
great if someone out there was inclined to discuss/investigate these kinds of questions a
bit more...

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message