db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@sbcglobal.net>
Subject Re: [OPTIMIZER] Optimizer "timeout" for subqueries?
Date Mon, 06 Mar 2006 19:32:53 GMT
Satheesh Bandaram wrote:
> Just remembered, during discussions with Jeffl, he mentioned another one... 
> Improve unnesting of subqueries with more than one table. More work. :-)

<snip list of Optimizer To-Do's>

One other thing to add to this list was mentioned a while ago on derby-dev here:


In that email Jack talked about his BackingStoreHashtable changes and the need 
to update Optimizer costing to account for those changes (which were committed 
as part of 10.1).  In that email Jack says:

 > I would like to work on this, changing BackingStoreHashtable to spill to
 > disk when the hash table gets large, and changing the optimizer to
 > understand that large hash tables are allowed, but more costly.

The changes to spill to disk were implemented for 10.1, but I don't think the 
optimizer was ever updated accordingly.  In particular, it still checks the 
memory requirements for the hash join and, if it's too high, the Optimizer will 
skip it.  I think the code that does this in the OptimizerImpl.considerCost() 

** Skip this access path if it takes too much memory.
** NOTE: The default assumption here is that the number of rows in
** a single scan is the total number of rows divided by the number
** of outer rows.  The optimizable may over-ride this assumption.
** NOTE: This is probably not necessary here, because we should
** get here only for nested loop joins, which don't use memory.
if(!optimizable.memoryUsageOK( estimatedCost.rowCount() / outerCost.rowCount(),
	if (optimizerTrace)
		trace(SKIPPING_DUE_TO_EXCESS_MEMORY, 0, 0, 0.0, null);

Note that the comment regarding "this is probably not necessary" appears to be 
out of date--there are situations where we can and do get to this code when 
considering the cost of hash joins.

So as another "To-Do", I think we need to somehow update the optimizer to 
account for the fact that if the hash join requires a lot of memory and thus 
will (probably?) spill over, the cost should be adjusted accordingly--instead of 
just skipping the plan altogether.


View raw message