db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation
Date Fri, 16 Jun 2006 16:49:18 GMT
Bryan Pendleton wrote:
> Is it really realistic, though, to allow a setting of 2 million,
> which gives the optimizer permission to try to create a
> in-memory hash table of 2 gigabytes? I mean, I'd need a 64-bit JVM...
> Related, is it worth putting a line or two in this doc telling the
> user that it would be foolish to set this value to something which
> is higher than the actual physical memory that they have on their
> machine -- if I have a 256 Mb machine, telling the optimizer to do
> a 512 Mb hash join in "memory" is probably performance suicide, no?

I think these are both excellent points, and I think that Yes, it would be a 
good idea to add the appropriate notes to the documentation.  When I answered 
the question of "max value", I was just looking at it from a programmatic 
standpoint--but as you've pointed out, there are some practical considerations 
here, as well.  Note that "performance suicide" is the right term here--as 
described below, the query *should* theoretically still execute, but it might 
suffer from a spill-to-disk performance penalty.

> Also, perhaps we should have a line suggesting that if the
> user decides to increase this value, they should check their JVM
> memory settings and make sure they've authorized the JVM to use
> that much memory (e.g., on Sun JVMs, -XmxNNNm should be set).

Sounds good to me--thanks for pointing that out.  Note though, that if the user 
does not give the JVM more memory, the query should (in theory) still work--see 
below--but it might take a performance hit at execution time.

> Lastly, what is the symptom if the user sets this too high? If
> they tell the optimizer that it's allowed to make, for example,
> a 200 Mb hash table by setting this value to 200,000, then what
> happens if the runtime evaluation of the query plan finds that
> it can't allocate that much memory? Does the query fail? Does
> Derby crash?

Yet another very good question.  I haven't done thorough testing, but my *guess* 
is that this issue is indirectly addressed by some "follow-up" changes that I 
posted for DERBY-1007: in particular, see the BackingStoreHashtable changes in 
d1007_followup_v1.patch, which is attached to that issue.  The changes there do 
a very rough (and only slightly meaningful, unit-wise) comparison of estimated 
row count to max in-memory size, and if it thinks there's not enough memory in 
the JVM, it will create a smaller hash table and then spill the excess to disk. 
  I looked at the relevant code in the BackingStoreHashtable and the value of 
max_inmemory_size is set as follows:

        if( max_inmemory_rowcnt > 0)
             max_inmemory_size = Long.MAX_VALUE;
             max_inmemory_size = Runtime.getRuntime().totalMemory()/100;

where max_inmemory_rowcnt is received from the optimizer.  So far as I can tell 
from the quick tests I did, in cases where we're creating a hash table for a 
hash join as determined by the optimizer, max_inmemory_rowcnt is always -1 and 
thus max_inmemory_size is set based on the available JVM memory at the time the 
backing store hash table is created.  In such a case the BackingStoreHashtable 
will receive the estimated row count from the optimizer.  If the 
maxMemoryPerTable was set "too high", then the estimated row count that we get 
could be insanely high, as well (if maxMemoryPerTable was lower, the optimizer 
would have rejected the hash join and we never would have made it here).   That 
said, the DERBY-1007 follow-up changes added logic to change the way a hash 
table is built if the estimated row count is "too high": namely, if it's equal 
to or larger than the max in-memory bytes, we'll create a hash table that is 
only as large as { max in-memory bytes / estimated size of first row }.

And that, I think, means that we should only create a hash table that will 
actually fit in the available JVM memory; if we need a larger hash table, then 
we'll spill to disk and take the performance hit, but the query should still 
execute.  In theory.

Of course, all of that is assuming that the "red flag" for a large hash table is 
a high row count.  If it turns out that we have a low row count but each row is 
extremely large, the changes for DERBY-1007 will not be triggered.  I think, 
however, that we should still be okay, but for a different reason.  In this 
scenario, the hash table itself will not be very large, so we should have enough 
memory to create it (empty).  Then, the BackingStoreHashtable code keeps track 
of how much JVM memory is available after each insertion into the hash table. 
So if we have a couple of very large rows, the available memory will decrease 
with each row and the optimizer will start spilling to disk before the hash 
table is full--which should keep us from running out of memory...

I think.  If I've got any of this wrong, someone please feel free to correct me!

So to get back to your final question of "what is the symptom if the user sets 
this too high?", the theoretical answer is that the query performance might end 
up being lousy because the hash table will spill to disk.  A more practical (and 
admittedly uglier) answer is that the query might perform poorly OR the user 
might encounter memory problems (including OutOfMemory errors) at execution time 
when Derby is building the hash table.  Theoretically the OutOfMemory conditions 
shouldn't arise (as explained above), and I haven't been able to produce any 
since DERBY-1007 was closed, but I guess the possibility is still there.  If we 
choose to document the possibility, then a user who changes maxMemoryPerTable 
and suddenly hits an OutOfMemory error might find it useful to have that 
behavior documented as a "potential" problem with a maxMemoryPerTable value that 
is too large.

Well, that took a lot longer to write than I thought it would.  Hopefully the 
answers to your questions are somewhere in there...?


View raw message