db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation
Date Thu, 15 Jun 2006 22:08:31 GMT
    [ http://issues.apache.org/jira/browse/DERBY-1397?page=comments#action_12416418 ] 

A B commented on DERBY-1397:
----------------------------

Well I spent a little time investigating this property and here's what I found out, based
on your questions.

- Are we sure that the name of the property is derby.language.maxMemoryPerTable ?

Yes -- I ran some simple tests with this property name and it is indeed picked up by the Derby
optimizer and used accordingly.

- What is the maximum value allowed?

The property is stored as an integer value (as found in the OptimizerFactorImpl.boot() method),
so the *theoretical* maximum value is Integer.MAX_VALUE (i.e. 2^31 - 1).  Attempts to specify
a larger value will result in the following error at connect time:

ij> connect 'simpDB';
ERROR XJ040: Failed to start database 'simpDB', see the next exception for details.
ERROR XJ001: Java exception: 'For input string: "2147483650": java.lang.NumberFormatException'.

But that said, one of the "gotchas" here is that the property is actually specifying *kilobytes*,
not bytes--i.e. the value that the user specifies is multiplied by 1024.  What this means
is that if the user specifies Integer.MAX_VALUE, Derby will grab the value and think that
it's fine, then it will multiply the value by 1024, which causes integer overflow and thus
leads to a maxMemoryPerTable that is negative--which is wrong.  I think this warrants a new
Jira issue--Derby should check to make sure that the value, when multiplied by 1024, is still
less than Integer.MAX_VALUE.

To answer your question, then, the documentation should explicitly say that this number is
specified *in kilobytes* (i.e. "1" means 1K which means 1024 bytes) and that, therefore, the
maximum value allowed is Integer.MAX_VALUE / 1024 (which is 2097151).  Then we need to file
a Jira issue (as mentioned above) to check for this maximum and throw the appropriate error
if it's exceeded.

- Is there a value that we want to recommend for users to specify?

The default value is 1024, which translates into 1024K, which means the default max memory
per table is 1M.  I think we should update the documentation to indicate what the default
is; if a user wants to change it, then I don't think we need to (or want to) recommend a value,
so long as we indicate what the tradeoff is for lower/higher values (see below).

- The text says that it can be set to smaller values, what is the minimum value that we recommend?

Again, I don't know about "recommending" a value.  If the default isn't good enough, then
I think all we can do is say what happens for smaller values and what happens for larger values,
and then let the user make the decision him/herself.

That said, the theoretical minimum is 0--I checked and Derby will accept that value and use
it (but the performance hit is huge--see below).  If the user specifies a negative value,
Derby will silently ignore the value and just use the default--which is something that either
warrants a fix (Derby should complain about the negative value like it does for other invalid
numbers) or else should be documented (in my opinion).

In terms of the documentation, then, I suggest that we add a link for the property to the
paragraph preceding the one cited in this issue description, perhaps as follows:

<begin>

The hash table for a hash join is held in memory and if it gets big enough, it will spill
to the disk. The optimizer makes a very rough estimate of the amount of memory required to
make the hash table. If it estimates that the amount of memory required would exceed the system-wide
limit of memory use for a table (see derby.language.maxMemoryPerTable), the optimizer chooses
a nested loop join instead.

<end>

Then the words "derby.language.maxMemoryPerTable" would link to the "Properties" section (which
already exists) and there is where we would put the cited paragraph, with the following changes
(feel free to modify as you see fit):

<begin>

derby.language.maxMemoryPerTable

  Function:

   When considering whether or not to do a hash join with a FROM table, the
   Derby optimizer will use this value to determine if the resultant hash
   table would consume too much memory, where "too much" means that the
   hash table would take up more than maxMemoryPerTable kilobytes of memory.
   If the optimizer decides that the hash table will require greater than
   maxMemoryPerTable kilobytes, it will reject the hash join and instead
   do a nested loop join.

   If memory use is not a problem for your environment, setting this property
   to a high number gives the optimizer the maximum flexibility in considering
   what join strategy to use, which can lead to better performance.  For more
   limited environments, this property can be set to lower values to conserve
   memory use--but note that if the value is set too low the optimizer could
   reject most/all hash joins, which could have a significant negative impact
   on query performance.

   This value must be specified in kilobytes.  The minimum value is 0
   (which will completely disable all hash joins and thus is not recommended);
   negative values will be ignored.  The maximum value is 2097151, which comes
   from java.lang.Integer.MAX_VALUE / 1024.

  Default:

   1024  (Note that since the value is kilobytes, this translates into 1M).

  Scope:

   System-wide.

  Dynamic or static:

   This property is static; if you change it while Derby is running, the
   change does not take effect until you reboot. 

<end>

The part about what happens for a negative value might need to be removed
if we decide to throw an error instead of ignoring it--that's a question
worth posting to derby-dev, I think...

> Tuning Guide: Puzzling optimizer documentation
> ----------------------------------------------
>
>          Key: DERBY-1397
>          URL: http://issues.apache.org/jira/browse/DERBY-1397
>      Project: Derby
>         Type: Bug

>   Components: Documentation
>     Versions: 10.2.0.0
>     Reporter: Rick Hillegas
>     Assignee: Laura Stewart
>      Fix For: 10.2.0.0

>
>  Selectivity and cardinality statistics
>    Working with cardinality statistics
>      When cardinality statistics are automatically updated
>        "For other operations, Derby automatically updates statistics for the table and
all indexes on the table if they are already exist. Those operations are:
>    * (all indexes) When you execute SYSCS_UTIL.SYSCS_COMPRESS_TABLE.
>    * (index only) When you drop a column that is part of a table's index; the statistics
for the affected index are dropped, and statistics for the other indexes on the table are
updated.
> "
> What does the second bullet mean? Derby doesn't let you drop a column from a table right
now. 
> ----------------------------------------------------------
> Here's another puzzling piece of optimizer documentation:
> I'm puzzled by the following paragraph in Tuning Guide->DML statements and performance->Performance
and optimization->Joins and performance->Join strategies:
> "If memory use is not a problem for your environment, set this property to a high number;
allowing the optimizer the maximum flexibility in considering a join strategy queries involving
large queries leads to better performance. It can also be set to smaller values for more limited
environments."
> I can't find the name of this property on that page of the Tuning Guide. I'm also confused
about what we consider to be a "high number" versus what we consider to be "smaller values".
Would appreciate advice here. 
> Satheesh adds this:
> The property it may be referring to is
> *derby.language.maxMemoryPerTable*. The default value is 1024 KB.
> Current default value is too small, so it would be a good tip for
> developers to know and tune this property. It would be great if Derby
> can configure this property value based on factors like max heap size,
> size of data cache and/or other parameters.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message