avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tie Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion
Date Mon, 16 Dec 2013 21:38:08 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849746#comment-13849746
] 

Tie Liu commented on AVRO-1348:
-------------------------------

Just run the Perf test in our environment, which is 64 bit linux box. Our prod is using a
commercial jvm call Azul. I run the test on both java 1.6.0_25 which is our dev version, and
the azul jvm. Below is the comparison.
With java 1.6.0_25:
$ java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)

Using CharSet:
test name     time    M entries/sec   M bytes/sec  bytes/cycle
StringRead:   6097 ms       6.560       233.663       1780910
StringWrite:   7410 ms       5.398       192.269       1780910

Using "UTF-8" string literal:
test name     time    M entries/sec   M bytes/sec  bytes/cycle
StringRead:   5504 ms       7.267       258.839       1780910
StringWrite:   7307 ms       5.474       194.980       1780910

Running with Azul:
$ /efs/dist/java/azuljdk/5.5.3.0/common/bin/java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b5)
Java HotSpot(TM) 64-Bit Tiered VM (build 1.6.0_33-ZVM_5.5.3.0-b5-product-azlinuxM-X86_64,
mixed mode)

With CharSet:
test name     time    M entries/sec   M bytes/sec  bytes/cycle
StringRead:    8878 ms       4.505       160.469       1780910
StringWrite:  13078 ms       3.058       108.936       1780910

With "UTF-8" string literal:
test name     time    M entries/sec   M bytes/sec  bytes/cycle
StringRead:    6976 ms       5.733       204.213       1780910
StringWrite:  12829 ms       3.118       111.053       1780910

Our application is a trading application which handles 30k-40k message/sec at peak time, so
we are very careful about garbage collection. We are calling Utf8.toString multiple times
on each incoming/outgoing messages, the additional garbage created by the toString method
is very important for us to get rid of, that's the biggest motivation for us to use the string
literal instead of Charset in this case.


> Improve Utf8 to String conversion
> ---------------------------------
>
>                 Key: AVRO-1348
>                 URL: https://issues.apache.org/jira/browse/AVRO-1348
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Mark Wagner
>            Assignee: Mohammad Kamrul Islam
>         Attachments: AVRO-1348v2.patch, AVRO1348v1.patch
>
>
> AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could
be made faster. The same method is being used in the Utf8.toString(), and could likely be
sped up by doing the same thing.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message