db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3882) Expensive cursor name lookup in network server
Date Wed, 23 Sep 2009 13:51:16 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758705#action_12758705

Knut Anders Hatlen commented on DERBY-3882:

I think the reason why it's not used as a general optimization technique for String.equals()
is that there are some conditions that must be satisfied before it actually is an optimization:

1) The same String objects must be compared multiple times, otherwise the cost of calculating
the hash codes will be too high compared to the benefit.

2) There's nothing to gain by comparing the hash codes if the Strings are equal, so it only
speeds up the comparisons where we expect a high number of mismatches.

3) String.equals() is very fast if the strings have different lengths or if the strings differ
in one of the first characters, so the optimization has the best effect when comparing strings
of the same length with a common prefix.

The cursor names generated by the network client satisfy all of these conditions. They are
on the form SQL_CURLH000C + serial#, which means they are not equal, have a common prefix,
and are likely to have the same length. Also, the names are stored in the activation on the
server, so they'll be reused and benefit from the caching of the hash code.

Another reason is, as you mentioned, that a hash table is normally used for such lookups.
A hash table could be used in this case as well, but there are some complicating issues that
may make it too complex to justify it:

a) The cursor names are not unique within a connection (open cursors cannot have the same
name, but open statements can share the same name as long as they don't have open cursors
at the same time). This means that one key (cursor name) can map to many values, so some sort
of multi-map must be implemented. In the normal embedded case with no cursor name, all activations
will be located in the same bucket (key=null).

b) A statement can change its cursor name any time. Currently, this is done by simply changing
the cursorName field in the activation. If we store the activation list in a hash table, changing
the cursor name means that we also need to move the activation from one bucket to another.

c) There's some code to reclaim memory if the activation list has been big and later shrinks.
I'd imagine that this code would be somewhat more complex too if the list is transformed into
a multi-map.

That said, I'm all for replacing the list with a data structure that's more suited for effective
lookups. I'd suggest that we go for the current patch proposal for now, since it looks simple
and rather harmless, and then revisit the issue and try to come up with a more efficient data
structure if this optimization turns out to be insufficient.

> Expensive cursor name lookup in network server
> ----------------------------------------------
>                 Key: DERBY-3882
>                 URL: https://issues.apache.org/jira/browse/DERBY-3882
>             Project: Derby
>          Issue Type: Improvement
>          Components: Network Server, SQL
>    Affects Versions:
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: check_hash.diff, Cursors.java
> I have sometimes seen in a profiler that an unreasonably high amount of the CPU time
is spent in GenericLanguageConnectionContext.lookupCursorActivation() when the network server
is running. That method is used to check that there is no active statement in the current
transaction with the same cursor name as the statement currently being executed, and it is
normally only used if the executing statement has a cursor name. None of the client-side statements
had a cursor name when I saw this.
> The method is always called when the network server executes a statement because the
network server assigns a cursor name to each statement even if no cursor name has been set
on the client side. If the list of open statements is short, the method is relatively cheap.
If one uses ClientConnectionPoolDataSource with the JDBC statement cache, the list of open
statements can however be quite long, and lookupCursorActivation() needs to spend a fair amount
of time iterating over the list and comparing strings.
> The time spent looking for duplicate names in lookupCursorActivation() is actually wasted
time when it is called from the network server, since the network server assigns unique names
to the statements it executes, even when there are duplicate names on the client. It would
be good if we could reduce the cost of this operation, or perhaps eliminate it completely
when the client doesn't use cursor names.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message