avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-753) Java: Improve BinaryEncoder Performance
Date Mon, 14 Feb 2011 00:48:57 GMT

    [ https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994177#comment-12994177
] 

Scott Carey commented on AVRO-753:
----------------------------------

Performance results from the above patch.

I tested with Sun JRE 6u22 (64 bit) on Mac OS X 10.6.6 pm a 2.4 Ghz Intel Core i5 (2 cores,
4 threads, can 'turbo' up to 2.93Ghz).

I used the following JVM arguments:
-server -Xmx256m -Xms256m -XX:+UseParallelGC  -XX:+UseCompressedOops -XX:+DoEscapeAnalysis
-XX:+UseLoopPredicate

ParallelGC is fast and most common on servers.  CompressedOops is _highly_ recommended if
running 64 bit, it improves performance and reduces memory footprint.
The last two are default flags in JRE 6u23 and above, but are not in 6u22.  These have measurable
impact on the tests.  UseLoopPredicate speeds up a couple cases by 10%.

A 32 bit JVM slows down somewhat.  In particular, writeLong is about 35% slower, and a few
other cases degrade by 15% or so.  Some others (writeDouble, writeFloat) don't change.  More
registers, and 64 bit integer native registers, help some of the inner loops significantly.
 I expect non-Intel hardware to behave more like the 64 bit case.

I ran with the '-noread' command line option of Perf.java

This is the performance of the legacy encoder:
{noformat}
old legacy encoder:
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
                     IntWrite:   3784 ms      52.849       133.036        629325
               SmallLongWrite:   3715 ms      53.828       135.500        629325
                    LongWrite:   6153 ms      32.502       142.013       1092353
                   FloatWrite:   7289 ms      27.437       109.748       1000000
                  DoubleWrite:  13988 ms      14.298       114.383       2000000
                 BooleanWrite:   2150 ms      93.001        93.001        250000
                   BytesWrite:   2588 ms      15.451       549.113       1776937
                  StringWrite:   9656 ms       4.142       147.535       1780910
                   ArrayWrite:   7315 ms      27.340       109.359       1000006
                     MapWrite:   8727 ms      22.916       114.581       1250004
                  RecordWrite:  10204 ms       3.266       126.771       1617069
        ValidatingRecordWrite:  11584 ms       2.877       111.673       1617069
                 GenericWrite:   7522 ms       2.216        85.986        808498
          GenericNested_Write:   9713 ms       1.716        66.588        808498
      GenericNestedFake_Write:   5893 ms       2.828       109.743        808498
{noformat}

And the new BinaryEncoder:
{noformat}
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
                     IntWrite:   1558 ms     128.342       323.076        629325
               SmallLongWrite:   1495 ms     133.760       336.714        629325
                    LongWrite:   2736 ms      73.083       319.329       1092353
                   FloatWrite:   1286 ms     155.517       622.066       1000000
                  DoubleWrite:   2005 ms      99.742       797.935       2000000
                 BooleanWrite:    597 ms     334.696       334.696        250000
                   BytesWrite:   2491 ms      16.054       570.550       1776937
                  StringWrite:   9050 ms       4.420       157.417       1780910
                   ArrayWrite:   1352 ms     147.852       591.412       1000006
                     MapWrite:   2245 ms      89.054       445.269       1250004
                  RecordWrite:   2418 ms      13.780       534.813       1617069
        ValidatingRecordWrite:   4191 ms       7.952       308.631       1617069
                 GenericWrite:   3477 ms       4.792       185.978        808498
          GenericNested_Write:   5661 ms       2.944       114.249        808498
      GenericNestedFake_Write:   2068 ms       8.057       312.696        808498
{noformat}

Performance ranges from 2x to 7x faster, except for writing byte arrays and strings, which
are only slightly faster.  The test above writes strings and byte arrays that average 35 bytes
in size -- smaller ones will benefit more from the buffering, especially with high overhead
OutputStreams.

This is the performance of the new non-buffering variation, DirectBinaryEncoder:
{noformat}
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
                     IntWrite:   3446 ms      58.023       146.062        629325
               SmallLongWrite:   3491 ms      57.274       144.176        629325
                    LongWrite:   5931 ms      33.716       147.320       1092353
                   FloatWrite:   4337 ms      46.105       184.419       1000000
                  DoubleWrite:   5525 ms      36.194       289.556       2000000
                 BooleanWrite:   1949 ms     102.603       102.603        250000
                   BytesWrite:   2814 ms      14.212       505.091       1776937
                  StringWrite:   9480 ms       4.219       150.285       1780910
                   ArrayWrite:   4437 ms      45.068       180.273       1000006
                     MapWrite:   5803 ms      34.464       172.321       1250004
                  RecordWrite:   5005 ms       6.659       258.446       1617069
        ValidatingRecordWrite:   6519 ms       5.113       198.419       1617069
                 GenericWrite:   4978 ms       3.348       129.920        808498
          GenericNested_Write:   6966 ms       2.392        92.838        808498
      GenericNestedFake_Write:   3507 ms       4.752       184.430        808498
{noformat}

This is between 0x and 2.5x faster than the 'legacy' BinaryEncoder, with Float and Double
encoding significantly faster and most other things only slightly faster.  It is still substantially
slower than the buffering variation.

Next up: BlockingBinaryEncoder.  This is essentially the same performance as the BinaryEncoder,
however it defaults to a larger buffer size (64K instead of 2K) and due to this is slightly
faster, except for MapWrite, ArrayWrite, where blocking is in effect.

{noformat}
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
                     IntWrite:   1512 ms     132.260       332.937        629325
               SmallLongWrite:   1459 ms     137.012       344.902        629325
                    LongWrite:   2640 ms      75.739       330.937       1092353
                   FloatWrite:   1265 ms     158.088       632.352       1000000
                  DoubleWrite:   1999 ms     100.004       800.032       2000000
                 BooleanWrite:    638 ms     313.294       313.294        250000
                   BytesWrite:   2458 ms      16.273       578.305       1776937
                  StringWrite:   9259 ms       4.320       153.862       1780910
                   ArrayWrite:   1443 ms     138.580       554.373       1000098
                     MapWrite:   2589 ms      77.233       386.200       1250119
                  RecordWrite:   3001 ms      11.104       430.964       1617069
        ValidatingRecordWrite:   5829 ms       5.718       221.933       1617069
                 GenericWrite:   3545 ms       4.701       182.450        808498
          GenericNested_Write:   5831 ms       2.858       110.906        808498
      GenericNestedFake_Write:   2052 ms       8.119       315.091        808498
{noformat}

And for those curious, this is what JSON looks like:

{noformat}
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
                     IntWrite:  10238 ms      19.534       115.334       1476104
               SmallLongWrite:  10383 ms      19.261       113.722       1476104
                    LongWrite:  18078 ms      11.063       109.950       2484706
                   FloatWrite:  50300 ms       3.976        42.252       2656635
                  DoubleWrite:  96585 ms       2.071        39.894       4816469
                 BooleanWrite:   8940 ms      22.369       123.022       1374900
                   BytesWrite:  40859 ms       0.979        72.197       3687468
                  StringWrite:   9021 ms       4.434       166.411       1876635
                   ArrayWrite:  59728 ms       3.349        54.000       4031647
                     MapWrite:  63564 ms       3.146        55.460       4406637
                  RecordWrite:  63687 ms       0.523        64.246       5114596
        ValidatingRecordWrite:  65488 ms       0.509        62.480       5114596
                 GenericWrite:  34985 ms       0.476        58.478       2557400
          GenericNested_Write:  42137 ms       0.396        58.047       3057392
      GenericNestedFake_Write:  37551 ms       0.444        65.134       3057392
{noformat}

Note that included in all of these results (including the legacy result) is improved string
<> Utf8 conversion in Utf8.java.  This brings String encoding up from ~120MB/sec to
~160MB/sec.  I noticed that Jackson was faster than our binary encoder for the string test
case, and now it is a tie.  There is more to do there, but it is dominated by JVM code that
isn't as optimal as it should be.


> Java:  Improve BinaryEncoder Performance
> ----------------------------------------
>
>                 Key: AVRO-753
>                 URL: https://issues.apache.org/jira/browse/AVRO-753
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-753.v1.patch, AVRO-753.v2.patch
>
>
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder did.  It
still mostly writes directly to the underlying OutputStream which is not optimal for performance.
 I like to use a rule that if you are writing to an OutputStream or reading from an InputStream
in chunks smaller than 128 bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x performance improvement.
 The process is significantly simpler than BinaryDecoder because 'pushing' is easier than
'pulling' -- and also because we do not need a 'direct' variant because BinaryEncoder already
buffers sometimes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message