db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Optimizer hints?
Date Tue, 21 Dec 2004 23:01:26 GMT
All the database systems I have worked on have had optimizer hints,
while I agree it would be best if we didn't have to ever have them I
think they are a good way to be able to provide a quick fix while we
improve the optimizer to not need them.  I agree with all the posts that
encourage people to provide as much detail as possible to their problems
with query plans.

I think it is a good idea to somehow introduce the hints such that it
does not affect the standard SQL.  Seems like it might be nice to make
it some sort of XML that might fit with an eventual XML output of query
plans so that tools could read and write them.

As a zero admin db derby should have the goal of making the hints not
necessary, but in the real world we may not get there - so I see them as
a necessary evil.  I also think it makes supporting the product easier
as one can force a test case when maybe the customers data that caused
that particular query plan shape is not available.

The area of the optimizer may provide some interesting work for those
interested out there.  I am not an expert in the optimizer but here are
some areas I have thought may be interesting:

o add a sort merge option to the optimizer choices
o do a better job choosing when to invalidate a plan and cause another
optimization try.  Currently we just use a row count metric and a number
of iterations metric, and always when a ddl operation affects one of the
tables.  Some options might be:
    1) is it really necessary to recompile a single table query,
non-join query other than at ddl?
    2) can we use actual stats gathered during execution and compare
them with estimates to see if we should change estimates and recompile?
    3) Currently we recompile after a set number of iterations, should
we bump the default number - or just get rid of it all together?
    4) xml query plans
o Update the costing information at least to current reality of the
system (raw numbers have not been run for a few releases).  Maybe change
units of costing so that other software products could provide costing.
o add support in the optimizer to choose an index based on a function
rather than a key (and add the function index support in language/store).


Satheesh Bandaram wrote:

> Right... I also remember seeing a large query (with 40+ table
> references) where Derby optimizer was spending a lot of time figuring
> out the right join-order. All the tables were really small in this case
> and forcing a join-order saved optimization time and  improved total
> query execution time significantly.
> 
> The fact that most (if not all) database vendors have optimizer hints
> should show its need ...
> 
> Satheesh
> 
> Mike Matrigali wrote:
> 
> 
>>I don't have specific queries, but I can give examples where the
>>optimizer may not have the right information to pick a good plan:
>>
>>1) queries involving tables with no indexes have no data distribution
>>~   maintained by the system, so any guess the optimizer makes may be
>>~   wrong for a dataset.
>>2) queries involving indexes basically use the shape of the tree to
>>~   estimate data distribution.  The cost information for an estimate
>>~   of number of rows between key1 and key2, assumes uniform distribution
>>~   throughout the tree.
>>3) because of #2, multi-column indexes (key1, key2) don't provide data
>>~   distribution information on key2 - so again the optimizer just uses
>>~   a default distribution guess.
>>
>>RPost wrote:
>>| Do you have any specific examples of queries that could benefit from an
>>| optimizer hint?
>>|
>>| Does anyone know of any documented instances where Derby has selected an
>>| execution plan that is ineffective or suboptimal? Were these instances
>>able
>>| to be corrected by tuning the system using existing
>>| properties?
>>|
>>| Oracle allows hints to be provided as SQL comments.
>>|
>>| Re providing an offline tool. Does this method imply that the statements
>>| exist independent of any given running database instance. If so are you
>>| proposing that a prepared statement be saved and made available for
>>future
>>| execution without preparing it again?
>>|
>>| ----- Original Message -----
>>| From: "Satheesh Bandaram" <satheesh@Sourcery.Org>
>>| To: "Derby Development" <derby-dev@db.apache.org>
>>| Sent: Friday, December 17, 2004 2:41 PM
>>| Subject: Optimizer hints?
>>|
>>|
>>|
>>| I have been thinking about adding optimizer hints to Derby. Though Derby
>>| optimizer does perform a reasonable job, it may be useful to have
>>| optimizer hints for cases 1) when updated index statistics is not
>>| available, or even incorrect 2) Rare cases, like this one?, when the
>>| optimizer doesn't do the job right 3) when applications issue queries
>>| not designed for Derby (like very large number of tables).
>>|
>>| Derby optimizer is primarily tasked to handle 1) access method (whether
>>| to use table scan or index) 2) join order and 3) join strategy (nested
>>| loop or HashJoin) A complete optimizer hints (should they actually be
>>| called optimizer overrides?) should be able to provide means of
>>| specifying all these. There are several ways to do this, including
>>|
>>|    1. Enhance the SQL to recognize additional keywords, like properties.
>>| These properties could specify optimizer hints. That is what Cloudscape
>>| had before, but this causes non-portable SQL to be written. Attempts to
>>| run these statements against any other SQL server could cause syntax
>>| errors. Not an ideal solution, according to me.
>>|    2. Provide optimizer hints as SQL comments. These comments are
>>| recognized only by Derby parser and other SQL engines would simply
>>| ignore these. There are some limitations in Derby parser currently to
>>| implement this option.
>>|    3. Provide an offline tool which registers hints for SQL statements.
>>| When those SQL statements are executed, Derby could look up previously
>>| registered hints for those statements and automatically apply them.
>>| Advantage of this scheme is that applications don't need to be modified
>>| to add hints and they can easily be removed when not needed. How exactly
>>| to register the hints could be worked out. I personally prefer this
>>| approach.
>>|
>>| Any comments?
>>|
>>| Satheesh
>>|
>>| Gerald Khin (JIRA) wrote:
>>|
>>|
>>|>HashJoinStrategy leads to java.lang.OutOfMemoryError
>>|>----------------------------------------------------
>>|
>>|>        Key: DERBY-106
>>|>        URL: http://nagoya.apache.org/jira/browse/DERBY-106
>>|>    Project: Derby
>>|>       Type: Bug
>>|>   Reporter: Gerald Khin
>>|
>>|
>>|>My application is running out of memory: I encounterd a
>>|
>>| java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
>>| spend an arbitrary amount of JVM memory.
>>|
>>|>Then, I commented out the line in class OptimizerFactoryImpl which was
>>|
>>| adding the HashJoinStrategy to the set of Join strategies:
>>|
>>|>       if (joinStrategySet == null)
>>|>       {
>>|>//            JoinStrategy[] jss = new JoinStrategy[2];
>>|>           JoinStrategy[] jss = new JoinStrategy[1];
>>|>           jss[0] = new NestedLoopJoinStrategy();
>>|>//            jss[1] = new HashJoinStrategy();
>>|>           joinStrategySet = jss;
>>|>       }
>>|
>>|>And with these changes the OutOfMemoryError has gone away! And it works
>>|
>>| even with -Xmx128M!!!
>>|
>>|>So I guess that there is a major memory issue with this HashJoin
>>|
>>| strategy implementation.
>>|
>>|>If it turns out to be too complicated to make the memory consumption
>>|
>>| more predicatble or even bounded to some configurable limit, then I need
>>| at least as a workaround a way to turn off the HashJoin strategy
>>| completely: I did it by patching and building my own derby.jar, but if
>>| there would be an official solution with some kind of switch like a
>>| system property, it would be great!
>>|
>>|
>>|
> 
> 
> 

Mime
View raw message