db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satheesh Bandaram <banda...@gmail.com>
Subject Re: [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation
Date Thu, 15 Jun 2006 22:28:25 GMT
*Good *write up! I am not a big fan of silently ignoring user specified
value, even if specified incorrectly. I would vote for filing a big to
correct two issues 1) Checking for less than Integer.MAX_VALUE value
after multiplying specified value by 1024 and 2) Raising an error for
negative values.


A B (JIRA) wrote:

>    [ http://issues.apache.org/jira/browse/DERBY-1397?page=comments#action_12416418 ]

>A B commented on DERBY-1397:
>Well I spent a little time investigating this property and here's what I found out, based
on your questions.
>- Are we sure that the name of the property is derby.language.maxMemoryPerTable ?
>Yes -- I ran some simple tests with this property name and it is indeed picked up by the
Derby optimizer and used accordingly.
>- What is the maximum value allowed?
>The property is stored as an integer value (as found in the OptimizerFactorImpl.boot()
method), so the *theoretical* maximum value is Integer.MAX_VALUE (i.e. 2^31 - 1).  Attempts
to specify a larger value will result in the following error at connect time:
>ij> connect 'simpDB';
>ERROR XJ040: Failed to start database 'simpDB', see the next exception for details.
>ERROR XJ001: Java exception: 'For input string: "2147483650": java.lang.NumberFormatException'.
>But that said, one of the "gotchas" here is that the property is actually specifying *kilobytes*,
not bytes--i.e. the value that the user specifies is multiplied by 1024.  What this means
is that if the user specifies Integer.MAX_VALUE, Derby will grab the value and think that
it's fine, then it will multiply the value by 1024, which causes integer overflow and thus
leads to a maxMemoryPerTable that is negative--which is wrong.  I think this warrants a new
Jira issue--Derby should check to make sure that the value, when multiplied by 1024, is still
less than Integer.MAX_VALUE.
>To answer your question, then, the documentation should explicitly say that this number
is specified *in kilobytes* (i.e. "1" means 1K which means 1024 bytes) and that, therefore,
the maximum value allowed is Integer.MAX_VALUE / 1024 (which is 2097151).  Then we need to
file a Jira issue (as mentioned above) to check for this maximum and throw the appropriate
error if it's exceeded.
>- Is there a value that we want to recommend for users to specify?
>The default value is 1024, which translates into 1024K, which means the default max memory
per table is 1M.  I think we should update the documentation to indicate what the default
is; if a user wants to change it, then I don't think we need to (or want to) recommend a value,
so long as we indicate what the tradeoff is for lower/higher values (see below).
>- The text says that it can be set to smaller values, what is the minimum value that we
>Again, I don't know about "recommending" a value.  If the default isn't good enough, then
I think all we can do is say what happens for smaller values and what happens for larger values,
and then let the user make the decision him/herself.
>That said, the theoretical minimum is 0--I checked and Derby will accept that value and
use it (but the performance hit is huge--see below).  If the user specifies a negative value,
Derby will silently ignore the value and just use the default--which is something that either
warrants a fix (Derby should complain about the negative value like it does for other invalid
numbers) or else should be documented (in my opinion).
>In terms of the documentation, then, I suggest that we add a link for the property to
the paragraph preceding the one cited in this issue description, perhaps as follows:
>The hash table for a hash join is held in memory and if it gets big enough, it will spill
to the disk. The optimizer makes a very rough estimate of the amount of memory required to
make the hash table. If it estimates that the amount of memory required would exceed the system-wide
limit of memory use for a table (see derby.language.maxMemoryPerTable), the optimizer chooses
a nested loop join instead.
>Then the words "derby.language.maxMemoryPerTable" would link to the "Properties" section
(which already exists) and there is where we would put the cited paragraph, with the following
changes (feel free to modify as you see fit):
>  Function:
>   When considering whether or not to do a hash join with a FROM table, the
>   Derby optimizer will use this value to determine if the resultant hash
>   table would consume too much memory, where "too much" means that the
>   hash table would take up more than maxMemoryPerTable kilobytes of memory.
>   If the optimizer decides that the hash table will require greater than
>   maxMemoryPerTable kilobytes, it will reject the hash join and instead
>   do a nested loop join.
>   If memory use is not a problem for your environment, setting this property
>   to a high number gives the optimizer the maximum flexibility in considering
>   what join strategy to use, which can lead to better performance.  For more
>   limited environments, this property can be set to lower values to conserve
>   memory use--but note that if the value is set too low the optimizer could
>   reject most/all hash joins, which could have a significant negative impact
>   on query performance.
>   This value must be specified in kilobytes.  The minimum value is 0
>   (which will completely disable all hash joins and thus is not recommended);
>   negative values will be ignored.  The maximum value is 2097151, which comes
>   from java.lang.Integer.MAX_VALUE / 1024.
>  Default:
>   1024  (Note that since the value is kilobytes, this translates into 1M).
>  Scope:
>   System-wide.
>  Dynamic or static:
>   This property is static; if you change it while Derby is running, the
>   change does not take effect until you reboot. 
>The part about what happens for a negative value might need to be removed
>if we decide to throw an error instead of ignoring it--that's a question
>worth posting to derby-dev, I think...
>>Tuning Guide: Puzzling optimizer documentation
>>         Key: DERBY-1397
>>         URL: http://issues.apache.org/jira/browse/DERBY-1397
>>     Project: Derby
>>        Type: Bug
>>  Components: Documentation
>>    Versions:
>>    Reporter: Rick Hillegas
>>    Assignee: Laura Stewart
>>     Fix For:
>> Selectivity and cardinality statistics
>>   Working with cardinality statistics
>>     When cardinality statistics are automatically updated
>>       "For other operations, Derby automatically updates statistics for the table
and all indexes on the table if they are already exist. Those operations are:
>>   * (all indexes) When you execute SYSCS_UTIL.SYSCS_COMPRESS_TABLE.
>>   * (index only) When you drop a column that is part of a table's index; the statistics
for the affected index are dropped, and statistics for the other indexes on the table are
>>What does the second bullet mean? Derby doesn't let you drop a column from a table
right now. 
>>Here's another puzzling piece of optimizer documentation:
>>I'm puzzled by the following paragraph in Tuning Guide->DML statements and performance->Performance
and optimization->Joins and performance->Join strategies:
>>"If memory use is not a problem for your environment, set this property to a high
number; allowing the optimizer the maximum flexibility in considering a join strategy queries
involving large queries leads to better performance. It can also be set to smaller values
for more limited environments."
>>I can't find the name of this property on that page of the Tuning Guide. I'm also
confused about what we consider to be a "high number" versus what we consider to be "smaller
values". Would appreciate advice here. 
>>Satheesh adds this:
>>The property it may be referring to is
>>*derby.language.maxMemoryPerTable*. The default value is 1024 KB.
>>Current default value is too small, so it would be a good tip for
>>developers to know and tune this property. It would be great if Derby
>>can configure this property value based on factors like max heap size,
>>size of data cache and/or other parameters.

View raw message