db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3882) Expensive cursor name lookup in network server
Date Tue, 22 Sep 2009 08:50:15 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Knut Anders Hatlen updated DERBY-3882:
--------------------------------------

    Attachment: Cursors.java
                check_hash.diff

Here's a patch which implements the simple approach using String.hashCode(). I
suggest that we go for that approach for now since changing the protocol would
be a bigger task and riskier. It seems to be sufficient in order to get the
method off the profiler's list of big CPU consumers in my environment, and we
can always revisit the issue and change the network protocol later if someone
has a load where the simple fix is not sufficient.

The patch makes GenericLanguageConnectionContext.lookupCursorActivation() check
the hash codes of the two strings before calling String.equals(), and it skips
equals() if the hash codes are different, as strings with different hash codes
are never equal. This exploits the fact that the most common implementations of
java.lang.String cache the hash code, so that computing and comparing the hash
codes will be reduced to a simple comparison of two integer fields after
warm-up.

I've also attached a small test class (Cursor.java) to show the effect of the
patch. It repeatedly executes "VALUES 1" in an embedded connection with 50 open
statements, and each statement has a cursor name. "VALUES 1" is executed 2
million times for warm-up and then 2 million times again with the time being
recorded. Running the test 10 times with trunk and 10 times with the patch (on
OpenSolaris, Java version 1.6.0_15), it needed on average ~30% shorter time to
complete with the patched version. Average/min/max time in seconds for the runs
is shown below.

ij> select name, avg(tps) "AVG", min(tps) "MIN", max(tps) "MAX" from results group by name;
NAME    |AVG          |MIN          |MAX          
--------------------------------------------------
d3882   |9.0245       |8.515        |9.867        
trunk   |12.968401    |11.732       |14.372       

2 rows selected

All the regression tests ran cleanly with the patch.

> Expensive cursor name lookup in network server
> ----------------------------------------------
>
>                 Key: DERBY-3882
>                 URL: https://issues.apache.org/jira/browse/DERBY-3882
>             Project: Derby
>          Issue Type: Improvement
>          Components: Network Server, SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: check_hash.diff, Cursors.java
>
>
> I have sometimes seen in a profiler that an unreasonably high amount of the CPU time
is spent in GenericLanguageConnectionContext.lookupCursorActivation() when the network server
is running. That method is used to check that there is no active statement in the current
transaction with the same cursor name as the statement currently being executed, and it is
normally only used if the executing statement has a cursor name. None of the client-side statements
had a cursor name when I saw this.
> The method is always called when the network server executes a statement because the
network server assigns a cursor name to each statement even if no cursor name has been set
on the client side. If the list of open statements is short, the method is relatively cheap.
If one uses ClientConnectionPoolDataSource with the JDBC statement cache, the list of open
statements can however be quite long, and lookupCursorActivation() needs to spend a fair amount
of time iterating over the list and comparing strings.
> The time spent looking for duplicate names in lookupCursorActivation() is actually wasted
time when it is called from the network server, since the network server assigns unique names
to the statements it executes, even when there are duplicate names on the client. It would
be good if we could reduce the cost of this operation, or perhaps eliminate it completely
when the client doesn't use cursor names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message