Amit Patel wrote:
> I ran into a problem with Derby. When I issue a large SQL statement
> with lots of WHERE conditions, the Derby optimizer looks like it's in an
> infinite loop.
Do you know how long the query was running before you took the thread dump? If
you repeat the process of running the query and taking the thread dump, do the
threads look the same every time?
> Does anyone know if this is a bug or can I tune Derby to get around this
> issue?
It's hard to say with just the info given. From a quick glance it looks like
your joining 14 tables in this query. If that's true, the Derby optimizer is
going to try out all possible orderings for the 14 tables until it either
finishes or "times out". There are known issues in this area that could cause
the optimizer to spend a *lot* of time trying to figure out what the best plan
is--see for example DERBY-1906, DERBY-1907, DERBY-1905. In such a case it might
look like the optimizer has hung when in fact it's just busy trying out all of
the various combinations.
Since you're using 10.2, one thing you can do is use optimizer overrides to tell
the optimizer to skip optimization of join order--i.e. to just join the tables
in the order in which they appear in the query. To do that you can specify the
"joinOrder" optimizer override as follows:
SELECT DISTINCT t0.OID, t0.JDOVERSION, t0.BINARYCOLLABORATIONHASHCODE,
...
FROM --DERBY-PROPERTIES joinOrder=FIXED
DBA.MESSAGES t0 INNER JOIN DBA.CUSTOMDATA t1 ON t0.OID = t1.MESSAGEOID
...
Note that you must have an end-of-line after specifying the joinOrder=FIXED
property.
If the query completes in a reasonable time with this property set, then it's
probably the case that, in your original query, the optimizer is spending time
trying out all of the various join orders for the 14 tables.
If the query still "hangs" with this property set, then it might be good if you can:
1. Reduce the number of tables and/or WHERE predicates to find the minimum
number of tables/predicates required to reproduce the hang.
2. Take note of how long the query actually runs. Is it minutes, hours, days?
3. See the following wiki page for additional suggestions:
http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
As a general rule, more info is better. The above wiki page describes the kind
of information that is most helpful in debugging situations like these. If you
can start there, someone (perhaps even you!) may be able to provide further
insight as to what's happening in this query...
Army
|