db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Optimizer issue, was: interested in working on DERBY-713
Date Tue, 20 Jun 2006 18:22:02 GMT
Hi Brent,

Thanks for the extra information. And thanks to Army for the pointers to 
extra documentation. Hopefully our optimizer guru, Jeff Lichtman, will 
join the discussion, but he may be on vacation. Please bear with me as I 
try to understand your approach. I am going to use some shorthand to 
describe the nodes in the query plan and not use the actual class names 
in trunk/java/engine/org/apache/derby/impl/sql/execute, which can be a 
bit obscure. I hope this doesn't confuse the issue.

It sounds to me from the bug description that the Optimizer is choosing 
the following plan

Plan 1:

  TableScan

and that you would like to teach the Optimizer to choose this plan, instead:

Plan 2:

                Join
                /     \
  IndexScan   TableScan

I am not sure that Plan 2 will perform much better on the hard case in 
the bug report. I'm afraid that 10,000 rows may still be scanned out of 
the index before being filtered and passed up to the Join node. It's 
quite possible that the Optimizer did, in fact, consider this plan and 
rejected it because it actually was more expensive. It looks to me as 
though the IndexScan only has start and end qualifiers and doesn't have 
the concept of nextKey. What you want to end up with, I think, is a plan 
that probes the index for a small number of key values rather than 
scanning it end to end.

It would be interesting to build nextKey support into the IndexScan so 
that the scan could be repositioned for IN lists. Somebody please 
correct me if the support is already there and I'm just not seeing it. 
Lacking that support, it might be simpler to build a plan out of the 
existing tinker-toys. Something like the following might give you the 
probing you need:

Plan 3:

                     Join
                      /   \
               Union  TableScan
               /     \
  IndexScan   IndexScan

Each IndexScan would probe for a single key value. If you could just get 
the Opimizer to consider this plan, then I'm cautiously optimistic that 
the existing cost-based decisions would favor Plan 3 over Plans 1 and 2.

I hope this is helpful.

Regards,
-Rick


Brent Verner wrote:

>Hiya Rick,
>
>   Thanks for you helpful reply!  I spent some time over the weekend
>snooping around (mostly) InListOperatorNode and OptimizerImpl, but 
>never found out _where_ the table scan was chosen over an index scan
>for finding the IN matches...I really wanted to find it before begging
>for someone to point out the obvious :-)  Specifically, I got all 
>tangled up around the dynamically generated classes which implement
>the chosen query plan.  Also, I didn't ever find any QueryPlan-like 
>class, but I guess this what the dynamic class(es) implement.
>
>
>  What did I want to do once I got there?  Pseudo-code says it best:
>
>    // ISTR that the bound IN values are ordered and unique.  If not,
>    // this should be done...
>
>    if( column is indexed ){
>        if( index is unique ){
>            // this should be a guaranteed win.  We know we'll only get
>            // one match per IN value
>            do index scan
>        }
>        else if( table size > SOME_MAGIC_SIZE ){
>            // SOME_MAGIC_SIZE would represent the size of (the) table
>            // at which using the index would be reasonably known to
>            // be more expensive than scanning the table.
>            do index scan
>        }
>        else {
>            do table scan
>        }
>    }
>    else {
>        do table scan
>    }
>
>
>  Is that a workable approach (excepting the SOME_MAGIC_SIZE bit)?
>
>  I'll probably have some time later this week to immerse myself in locating 
>the magic location to apply this logic.  A _little_ pointer would be 
>helpful: maybe a brief overview of how a query plan is determined -- I
>suspect his is somewhere around where trulyTheBestTableAccess (or whatever
>it's really called :-)) is defined.  I'd rather have to bang my head 
>against this wall a bit, to get a better understanding of things, instead
>of being pointed immediately to the solution -- my own version of smelling
>the roses, I guess ;-)
>
>Cheers!
>  Brent
>
>
>[2006-06-19 11:30] Rick Hillegas said:
>| Hi Brent,
>| 
>| Sounds like you're off to a good start. From the initial bug report, it 
>| looks like there's a good idea about which heuristic is being 
>| mis-applied here. Once you've studied the optimizer papers, I recommend 
>| that you post some high-level candidate solutions. Try to avoid 
>| optimizer jargon and concentrate on simple descriptions:
>| 
>| o What query plan would you rather see?
>| o What heuristic would the optimizer apply that would lead it to your 
>| preferred plan?
>| o How would the optimizer decide to apply the new heuristic rather than 
>| the old one?
>| 
>| I think you'll get some good feedback if your post contains the phrase 
>| "Attention, optimizer experts." I think that the optimizer enthusiasts 
>| on the list will give you good feedback:
>| 
>| o Maybe they can think of a better query plan or heuristic.
>| o Maybe they can see some awkward corner cases in your  heuristic.
>| o They can advise you on whether your heuristic will short-circuit other 
>| optimizer choices.
>| o They can advise you on whether your heuristic will cause an explosion 
>| in the time that the optimizer takes.
>| 
>| Thanks for wanting to scratch this itch!
>| 
>| Regards,
>| -Rick
>| 
>| Brent Verner wrote:
>| 
>| >Hi,
>| >
>| >    I've recently found need for an embedded java db and only Derby seems
>| >even close to handling the task, however the broken query planning for IN
>| >clauses makes it unusable :-(.  I've decided to eschew an embedded db
>| >in favor of PG for now, but I'd really like to be able to use Derby in the
>| >near(ish) future for deployment.
>| >
>| >    I'd like to try to fix the query planning around the IN clause.  I'm
>| >reviewing the internals papers right now, but I'd appreciate any input
>| >that might point me in the right direction :-).
>| >
>| >cheers!
>| > Brent
>| >
>| > 
>| >
>  
>


Mime
View raw message