avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
Date Thu, 12 Dec 2013 18:08:08 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846520#comment-13846520

Doug Cutting commented on AVRO-1411:

Please contribute a patch with this change.  Also please provide benchmark results.  Ideally
these would use the existing performance suite (lang/java/ipc/src/test/java/org/apache/avro/io/Perf.java).
 Once we can validate the performance improvement then we can probably get the change committed.

> org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
> ------------------------------------------------------------------------------------
>                 Key: AVRO-1411
>                 URL: https://issues.apache.org/jira/browse/AVRO-1411
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Tie Liu
>            Priority: Minor
> Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private
static final Charset UTF8 = Charset.forName("UTF-8");
> and it's used as:
>   public static final byte[] getBytesFor(String str) {
>     return str.getBytes(UTF8);
>   }
> I guess the intention of create this object is to save object creation, but when we dive
into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder
in java.lang.StringCoding:
>     static byte[] encode(Charset cs, char[] ca, int off, int len) {
> 	StringEncoder se = new StringEncoder(cs, cs.name());
> 	char[] c = Arrays.copyOf(ca, ca.length);
> 	return se.encode(c, off, len);
>     }
> If instead we just call it with string literal "UTF-8", it will just reuse the threadlocal
> We tried overwrite this class with passing string literal and proved those short lived
StringEncoder objects is not created any more. Would like apache to fix this so we don't need
to overwrite it anymore. 

This message was sent by Atlassian JIRA

View raw message