db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Lichtman <swa...@rcn.com>
Subject Re: Join order and access path
Date Tue, 22 Feb 2005 03:53:59 GMT

>Does someone have plan to contribute documentation of optimization of join 
>order and access path selection in derby?

No one has answered this yet, so I'll try to help. I wrote the original 
implementation of the optimizer, but it's been a few years since I've 
looked at it.

The optimizer currently only considers left-deep trees. That is, when 
looking at joins, it doesn't consider join nodes to the right of other join 
nodes - the join trees it considers are skewed to the left. I thought this 
would be a good first implementation of the optimizer. Bushy trees (where 
the join tree can take any shape) are harder to implement but are often 
useful for big multi-way decision-support joins.

During optimization the join order is represented as an array of 
Optimizables. The first element in the array is the leftmost table in the 
join tree, and the successive elements in the array are the joining tables 
in the left-deep tree.

The optimizer does an exhaustive search of query plans, going through all 
the join orders and, for each join order, all of the access paths for that 
order. It remembers the cheapest complete plan it has found so far, and 
does cost-based search space pruning. That is, it doesn't consider plans 
that are more expensive that the best plan found so far.

The optimizer goes through the search space depth-first. In other words, it 
adds a table to the join order, then goes through all the access paths for 
that table (choosing the cheapest one) before going on to the next position 
in the join order. When it gets all the way to the end of the join order, 
with all of the tables accounted for, it considers whether the plan it is 
looking at is the best one found so far. If so, it remembers it. 
Eventually, it exhausts all the tables to consider at a given depth in the 
join order, and it has to back up to the previous position in the join 
order and consider the next table there.

Every time the optimizer adds a table to the prospective join order, it 
figures out which parts of the WHERE clause can be evaluated at that 
position in the order and pushes these expressions down to that position so 
they can be used there. For example, if you have a five-way join of tables 
t1, t2, t3, t4 and t5, and the current permutation being considered is t3 - 
t1 - t2 (with t3 as the outermost table and t2 as the innermost), it can 
evaluate the join "t3.c1 = t2.c2" at the third position in the join order, 
so when it adds table t2 it pushes the expression down to that level in the 
order. Later, when it removes t2 from the join order to consider some other 
table there, it pulls the expression back out of the join order.

getNextPermutation() does the adding (and deletion) of tables to the join 
order under consideration by the optimizer. getNextDecoratedPermutation() 
goes through all the access paths for the current permutation (in the 
current implementation, it only has to consider the access paths for the 
innermost table in the join order, as the search is done depth-first). You 
can think of a decorated permutation as a join order with things like 
indexes and join strategies "decorating" the nodes.

The Optimizable interface represents anything in the query that can have an 
access path. In practice an Optimizable is usually a base table, but it can 
be anything that appears in a FROM list (for example, standard SQL allows a 
UNION in the FROM list). Different Optimizables have different choices for 
access paths. The optimizer uses the nextAccessPath() method on Optimizable 
to step through the different access paths. These include different join 
strategies (such as nested loop and hash join) and different conglomerates 
(the base table and the different indexes). Sometimes the Optimizable has 
to decide whether a given access path is feasible (for example, hash join 
is only feasible for equijoins).

I'm leaving out a whole lot of details, like how the optimizer does costing 
for sort avoidance and how it handles join order dependencies (e.g. when an 
"exists" or "in" subquery is flattened to an existence join, the join order 
musn't be inverted under the current implementation).

                        -        Jeff Lichtman
                                 Check out Swazoo Koolak's Web Jukebox at

View raw message