db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-5068) Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager
Date Tue, 10 May 2011 13:15:48 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Knut Anders Hatlen updated DERBY-5068:

    Attachment: d5068-2a.stat

Attaching an alternative patch (d5068-2a.diff) that must be applied on
top of the patch attached to DERBY-5210.

The patch makes the following changes:

1) Adds two new methods to CcsidManager: startEncoding() and encode().
These are roughly equivalent to the reset() and encode() methods in
java.nio.charset.CharsetEncoder (and Utf8CcsidManager indeed
implements them as wrappers around the CharsetEncoder methods). The
methods allow encoding a string directly into a ByteBuffer without
going via an intermediate throw-away array.

2) Removes these methods from CcsidManager:
    - convertFromJavaString(String, byte[], int, Agent)
    - convertToJavaString(byte[])
    - maxBytesPerChar()
    - getByteLength(String)

3) Changes Request, NetPackageRequest and NetConnection to use the new
methods instead of the removed ones.

In addition to performing the string encoding without creating an
intermediate byte array, the patch eliminates the use of
getByteLength() completely (that method also created an intermediate
byte array). The original code needed to know the exact byte length of
the string up front so that it could make sure the destination buffer
was large enough. The new interface for encoding the strings lets the
caller know if it runs out of buffer space, so that the caller can
allocate a larger buffer and continue the operation. This way, we
don't need to encode each string twice.

The one place where we still need to know the byte length up front, is
in NetPackageRequest.buildCommonPKGNAMinfo(). That's because the
format of the message depends on whether or not the string length
exceeds a certain threshold. The method now creates a byte array
representation of the string once, and uses that array both to find
the byte length and to copy the encoded version of the string into the

I've rerun the sr_select load client, with 10 threads, to see how this
new patch performs. I used JDK 6u24 on Solaris 10, and collected the
CPU usage in the client driver by using the /bin/time command. I ran
each configuration twice, 10 minutes each. Here's the CPU time per
transaction seen with various versions/patches: (plain):                       62.4 µs/tx (plain):                       67.0 µs/tx
trunk + d5068-1a.diff:                  63.4 µs/tx
trunk + d5210-1a.diff:                  67.9 µs/tx
trunk + d5210-1a.diff + d5068-2a.diff:  65.2 µs/tx

So, in short: None of the patches bring the CPU usage all the way down
to the level. The 1a patch attached to this issue (the one
that does the UTF-8 encoding manually) is close, though.

The 2a patch doesn't perform quite as well as the 1a patch, but still
better than The advantage is that it hides the details on
how the encoding is done. Also, by using the standard class library
interface, we may benefit from improvements that are made to the class
library implementation in the future.

I guess I'm leaning towards the approach in the 2a patch. The
performance difference isn't that big anyway (I've only been able to
see impact on CPU usage, never on the transaction rate), so it doesn't
seem worthwhile to duplicate functionality provided by the standard

> Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager
> ----------------------------------------------------------------------------------
>                 Key: DERBY-5068
>                 URL: https://issues.apache.org/jira/browse/DERBY-5068
>             Project: Derby
>          Issue Type: Task
>    Affects Versions:
>            Reporter: Knut Anders Hatlen
>         Attachments: d5068-1a.diff, d5068-2a.diff, d5068-2a.stat
> While looking at the performance graphs for the single-record select test during the
last year - http://home.online.no/~olmsan/derby/perf/select_1y.html - I noticed that there
was a significant increase (10-20%) in CPU usage per transaction on the client early in October
2010. To be precise, the increase seems to have happened between revision 1004381 and revision
1004794. In that period, there were three commits: two related to DERBY-4757, and one related
to DERBY-4825 (tests only).
> We should try to find out what's causing the increased CPU usage and see if there's some
way to reduce it.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message