db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5068) Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager
Date Thu, 12 May 2011 17:37:47 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032532#comment-13032532

Knut Anders Hatlen commented on DERBY-5068:

Thanks for looking at the patch, Dag. I'm still learning the API myself. :)

You're probably right that we should handle those conditions. I'm not sure how unmappable-character
errors can happen with UTF-8, but malformed-input errors seem to be raised for characters
in the range \uD800 to \uDFFF.

We have two alternatives:

1) Make the CharsetEncoder replace problematic characters with '?' instead of reporting an
error. (By calling onMalformedInput() and onUnmappableCharacter() with CodingErrorAction.REPLACE.)

2) Detect and report the conditions. (By checking the CoderResult and raising an exception.)

Option 2 sounds like the right thing to do. However, the original code used String.getBytes(String)
to do the encoding, which implements option 1 (the API javadoc says that it's unspecified
what it does when it cannot encode the string, but its actual behaviour matches option 1).
Also, we still have the convertFromJavaString(String,Agent) method which matches option 1.

On the other hand, all the encoding methods in EbcdicCcsidManager do raise an exception if
the string contains characters not in the EBCDIC range, so there's no clear precedence. I
guess no matter what we choose to do, we should make all these methods consistent. I think
my preference would be option 2.

> Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager
> ----------------------------------------------------------------------------------
>                 Key: DERBY-5068
>                 URL: https://issues.apache.org/jira/browse/DERBY-5068
>             Project: Derby
>          Issue Type: Task
>    Affects Versions:
>            Reporter: Knut Anders Hatlen
>         Attachments: d5068-1a.diff, d5068-2a.diff, d5068-2a.stat
> While looking at the performance graphs for the single-record select test during the
last year - http://home.online.no/~olmsan/derby/perf/select_1y.html - I noticed that there
was a significant increase (10-20%) in CPU usage per transaction on the client early in October
2010. To be precise, the increase seems to have happened between revision 1004381 and revision
1004794. In that period, there were three commits: two related to DERBY-4757, and one related
to DERBY-4825 (tests only).
> We should try to find out what's causing the increased CPU usage and see if there's some
way to reduce it.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message