db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Pathologically poor query performance - advice required
Date Thu, 26 Jun 2008 15:22:03 GMT
Matt Doran wrote:
> Army wrote:
>> Matt Doran wrote:
>>> Both of these queries only return fields from 2 of the joined 
>>> tables.  The slow one selects 5 fields from each of these tables.  
>>> The fast one selects 5 fields from one table and 4 fields from the 
>>> other.   Strange indeed!!
>> Thanks for the great description of the problem, and for taking the 
>> time to narrow down the "good" vs "bad" scenarios.
> No worries ... I hoped if I put in a lot of effort in 
> describing/investigation the problem ... someone you help.   And you 
> did. Thanks for looking at this!  I though that due to the complexity of 
> what's going on ... I might get no answers.   It's very much appreciated!!!!
>>> So does anybody understand what might be happening here?
>> I took a (very) quick look at the good query plan vs the bad query 
>> plan and the thing that first jumped out at me is that, for the fast 
>> query, the optimizer is choosing to do two Hash Joins, but for the 
>> slow query the optimizer is choosing Nested Loop joins everywhere.  I 
>> don't know if that alone is the difference, but *if* it is, one 
>> possible explanation of this is that the optimizer will deliberately 
>> reject Hash Joins if it (the optimizer) thinks that the resultant 
>> in-memory hash table will take "too much" of the available JVM 
>> memory.  In that case the optimizer will skip the hash join and use a 
>> nested loop join instead.
>> Having said that, the calculation for what "too much" memory is 
>> depends on several things, once of which is the estimated memory usage 
>> for a single row in the result set.  So it's *possible* that addition 
>> of the extra column to the SELECT result column list is causing the 
>> estimated per-row memory usage to increase such that the optimizer's 
>> calculation of "too much memory" changes, ultimately leading to 
>> rejection of the hash join.
>>> How I could go about improving the performance?
>> Another factor in the calculation of "too much memory" is a Derby 
>> property called "derby.language.maxMemoryPerTable".  For details see 
>> DERBY-1397 (https://issues.apache.org/jira/browse/DERBY-1397); an 
>> extract from that issue is as follows:
> Wow!  This worked.  Thank you very very much.  Upping the max memory to 
> 10MB made my modified query run fast (but not the original).  Upping it 
> to 20MB made the original query run fast too.  Thank you.
> We embed derby in our commercial product, where we have minimal control 
> of memory.  We'll have to make a compromise here.  We can dynamically 
> change this property at startup based on the amount of memory available.
> I think the optimizer is being too conservative in switching back to 
> nested loop joins.  And I'd be interested in understanding why with the 
> possibility of improving the optimizer's behavior.  Unfortunately it 
> seems I can't enable optimizer tracing without building my own derby (I 
> might try that later).  The tracing would allow me to see which memory 
> check (there are a number of them throughout the optimizer) is 
> triggering the behaviour.
> In this particular case there is an index on the large main table, which 
> limits the results returned dramatically.  In this particular case, I 
> was only returning 4 of the 130,000 rows.   The index was being used in 
> the fast version, but was no being used in the slow version.   So with 
> the slow version it looks like it's doing the join on all these rows ... 
> even though the index could have dramatically reduced it's work.  That's 
> a pretty serious issue when this index is *so* useful in this case.  I'm 
> no DB expert (by any stretch of the imagination) .... but if I was a 
> derby developer I'd be asking:
>    1. In estimating the memory used, why aren't we making effective use
>       of the index cardinality/stats to make an accurate estimate of the
>       rows returned.  If the optimizer knew that only a small percentage
>       of the rows would be returned (and that minimal memory would be
>       used) it could have used the hash join approach.   Do derby index
>       statistics give you enough info to do this type of thing?
>    2. When we fell back to the nested loop join, why weren't we making
>       any use of the obvious index?  And order the join appropriately?  
>       If it used that index in the first step the result should have
>       been much faster.
> But as I said I'm no expert ... and achieving those could be extremely 
> difficult.
> Thanks again .... at least this "derby.language.maxMemoryPerTable" 
> setting gives us a way forward.   This setting doesn't seem to be 
> documented in the tuning guide ... but DERBY-1397 talks about 
> documenting it ... but I couldn't find it.
> Regards,
> Matt
To get feedback from people it may also help to post the ddl of the 
tables in the query, including the indexes/constraints.  I know you
provided the db, so with work it is available - but the more up front
info you provide the more help possible.

One other suggestion might be to try the query once with no optimizer
timeout to see if it would have found a better plan:

View raw message