db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Doran <matt.do...@papercut.com>
Subject Re: Pathologically poor query performance - advice required
Date Thu, 26 Jun 2008 02:02:20 GMT
Army wrote:
> Matt Doran wrote:
>> Both of these queries only return fields from 2 of the joined 
>> tables.  The slow one selects 5 fields from each of these tables.  
>> The fast one selects 5 fields from one table and 4 fields from the 
>> other.   Strange indeed!!
> Thanks for the great description of the problem, and for taking the 
> time to narrow down the "good" vs "bad" scenarios.
No worries ... I hoped if I put in a lot of effort in 
describing/investigation the problem ... someone you help.   And you 
did. Thanks for looking at this!  I though that due to the complexity of 
what's going on ... I might get no answers.   It's very much appreciated!!!!
>> So does anybody understand what might be happening here?
> I took a (very) quick look at the good query plan vs the bad query 
> plan and the thing that first jumped out at me is that, for the fast 
> query, the optimizer is choosing to do two Hash Joins, but for the 
> slow query the optimizer is choosing Nested Loop joins everywhere.  I 
> don't know if that alone is the difference, but *if* it is, one 
> possible explanation of this is that the optimizer will deliberately 
> reject Hash Joins if it (the optimizer) thinks that the resultant 
> in-memory hash table will take "too much" of the available JVM 
> memory.  In that case the optimizer will skip the hash join and use a 
> nested loop join instead.
> Having said that, the calculation for what "too much" memory is 
> depends on several things, once of which is the estimated memory usage 
> for a single row in the result set.  So it's *possible* that addition 
> of the extra column to the SELECT result column list is causing the 
> estimated per-row memory usage to increase such that the optimizer's 
> calculation of "too much memory" changes, ultimately leading to 
> rejection of the hash join.
>> How I could go about improving the performance?
> Another factor in the calculation of "too much memory" is a Derby 
> property called "derby.language.maxMemoryPerTable".  For details see 
> DERBY-1397 (https://issues.apache.org/jira/browse/DERBY-1397); an 
> extract from that issue is as follows:
Wow!  This worked.  Thank you very very much.  Upping the max memory to 
10MB made my modified query run fast (but not the original).  Upping it 
to 20MB made the original query run fast too.  Thank you.

We embed derby in our commercial product, where we have minimal control 
of memory.  We'll have to make a compromise here.  We can dynamically 
change this property at startup based on the amount of memory available.

I think the optimizer is being too conservative in switching back to 
nested loop joins.  And I'd be interested in understanding why with the 
possibility of improving the optimizer's behavior.  Unfortunately it 
seems I can't enable optimizer tracing without building my own derby (I 
might try that later).  The tracing would allow me to see which memory 
check (there are a number of them throughout the optimizer) is 
triggering the behaviour.

In this particular case there is an index on the large main table, which 
limits the results returned dramatically.  In this particular case, I 
was only returning 4 of the 130,000 rows.   The index was being used in 
the fast version, but was no being used in the slow version.   So with 
the slow version it looks like it's doing the join on all these rows ... 
even though the index could have dramatically reduced it's work.  That's 
a pretty serious issue when this index is *so* useful in this case.  I'm 
no DB expert (by any stretch of the imagination) .... but if I was a 
derby developer I'd be asking:

   1. In estimating the memory used, why aren't we making effective use
      of the index cardinality/stats to make an accurate estimate of the
      rows returned.  If the optimizer knew that only a small percentage
      of the rows would be returned (and that minimal memory would be
      used) it could have used the hash join approach.   Do derby index
      statistics give you enough info to do this type of thing?

   2. When we fell back to the nested loop join, why weren't we making
      any use of the obvious index?  And order the join appropriately?  
      If it used that index in the first step the result should have
      been much faster.

But as I said I'm no expert ... and achieving those could be extremely 

Thanks again .... at least this "derby.language.maxMemoryPerTable" 
setting gives us a way forward.   This setting doesn't seem to be 
documented in the tuning guide ... but DERBY-1397 talks about 
documenting it ... but I couldn't find it.


View raw message