db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: Pathologically poor query performance - advice required
Date Wed, 25 Jun 2008 16:14:39 GMT
Matt Doran wrote:
> Both of these queries only return fields from 2 of the joined tables.  
> The slow one selects 5 fields from each of these tables.  The fast one 
> selects 5 fields from one table and 4 fields from the other.   Strange 
> indeed!!

Thanks for the great description of the problem, and for taking the time 
to narrow down the "good" vs "bad" scenarios.

> So does anybody understand what might be happening here?

I took a (very) quick look at the good query plan vs the bad query plan 
and the thing that first jumped out at me is that, for the fast query, 
the optimizer is choosing to do two Hash Joins, but for the slow query 
the optimizer is choosing Nested Loop joins everywhere.  I don't know if 
that alone is the difference, but *if* it is, one possible explanation 
of this is that the optimizer will deliberately reject Hash Joins if it 
(the optimizer) thinks that the resultant in-memory hash table will take 
"too much" of the available JVM memory.  In that case the optimizer will 
skip the hash join and use a nested loop join instead.

Having said that, the calculation for what "too much" memory is depends 
on several things, once of which is the estimated memory usage for a 
single row in the result set.  So it's *possible* that addition of the 
extra column to the SELECT result column list is causing the estimated 
per-row memory usage to increase such that the optimizer's calculation 
of "too much memory" changes, ultimately leading to rejection of the 
hash join.

> How I could go about improving the performance?

Another factor in the calculation of "too much memory" is a Derby 
property called "derby.language.maxMemoryPerTable".  For details see 
DERBY-1397 (https://issues.apache.org/jira/browse/DERBY-1397); an 
extract from that issue is as follows:

<begin extract>

When considering whether or not to do a hash join with a FROM table, the 
Derby optimizer will use this value to determine if the resultant hash 
table would consume too much memory, where "too much" means that the 
hash table would take up more than maxMemoryPerTable kilobytes of 
memory.  If the optimizer decides that the hash table will require 
greater than maxMemoryPerTable kilobytes, it will reject the hash join 
and instead do a nested loop join.

If memory use is not a problem for your environment, setting this 
property to a high number gives the optimizer the maximum flexibility in 
considering what join strategy to use, which can lead to better performance.

<end extract>

So perhaps you could try increasing the value of maxMemoryPerTable to 
see if that improves things for the slow query?

And as a final note, a recent discussion on derby-dev noted some 
glitches in another calculation that can also affect the optimizer's 
decision to skip a hash join due to "excess memory"; see:

   http://markmail.org/message/pxuuhvr3kamxix64

That's probably not related to what you are seeing, but I thought I'd 
mention it just in case...

Army


Mime
View raw message