hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Caspole (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7333) Performance improvement in PureJavaCrc32
Date Thu, 26 May 2011 20:05:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039890#comment-13039890
] 

Eric Caspole commented on HADOOP-7333:
--------------------------------------

Here is an example of what I got:

baseline:

java.version = 1.6.0_25
java.runtime.name = Java(TM) SE Runtime Environment
java.runtime.version = 1.6.0_25-b06
java.vm.version = 20.0-b11
java.vm.vendor = Sun Microsystems Inc.
java.vm.name = Java HotSpot(TM) 64-Bit Server VM
java.vm.specification.version = 1.0
java.specification.version = 1.6
os.arch = amd64
os.name = Linux
os.version = 2.6.35-22-generic

Performance Table (The unit is MB/sec)
|| Num Bytes ||    CRC32 || PureJavaCrc32 ||
|          1 |    11.312 |         83.741 |
|          2 |    21.210 |        160.665 |
|          4 |    42.377 |        195.647 |
|          8 |    75.407 |        264.901 |
|         16 |   124.451 |        291.424 |
|         32 |   183.565 |        376.901 |
|         64 |   245.988 |        424.204 |
|        128 |   281.473 |        436.240 |
|        256 |   317.394 |        458.765 |
|        512 |   335.792 |        479.419 |
|       1024 |   347.709 |        473.562 |
|       2048 |   351.318 |        476.762 |
|       4096 |   351.745 |        485.425 |
|       8192 |   357.670 |        479.614 |
|      16384 |   355.637 |        481.792 |
|      32768 |   357.449 |        483.500 |
|      65536 |   358.486 |        472.664 |
|     131072 |   355.985 |        483.490 |
|     262144 |   361.117 |        483.599 |
|     524288 |   357.867 |        468.159 |
|    1048576 |   355.371 |        476.476 |
|    2097152 |   355.238 |        476.458 |
|    4194304 |   354.549 |        471.960 |
|    8388608 |   349.739 |        465.464 |
|   16777216 |   347.075 |        461.695 |


with patch:

java.version = 1.6.0_25
java.runtime.name = Java(TM) SE Runtime Environment
java.runtime.version = 1.6.0_25-b06
java.vm.version = 20.0-b11
java.vm.vendor = Sun Microsystems Inc.
java.vm.name = Java HotSpot(TM) 64-Bit Server VM
java.vm.specification.version = 1.0
java.specification.version = 1.6
os.arch = amd64
os.name = Linux
os.version = 2.6.35-22-generic

Performance Table (The unit is MB/sec)
|| Num Bytes ||    CRC32 || PureJavaCrc32 ||
|          1 |    11.388 |         70.238 |
|          2 |    21.377 |        140.269 |
|          4 |    42.950 |        195.840 |
|          8 |    76.818 |        316.527 |
|         16 |   126.218 |        336.181 |
|         32 |   187.139 |        407.078 |
|         64 |   246.038 |        450.022 |
|        128 |   283.940 |        463.666 |
|        256 |   315.997 |        484.709 |
|        512 |   337.407 |        492.866 |
|       1024 |   346.651 |        497.357 |
|       2048 |   349.376 |        507.337 |
|       4096 |   356.193 |        497.322 |
|       8192 |   356.303 |        506.458 |
|      16384 |   353.121 |        503.858 |
|      32768 |   351.595 |        503.809 |
|      65536 |   358.412 |        500.009 |
|     131072 |   356.675 |        503.672 |
|     262144 |   356.723 |        501.896 |
|     524288 |   357.432 |        497.297 |
|    1048576 |   349.544 |        500.216 |
|    2097152 |   350.197 |        500.098 |
|    4194304 |   350.040 |        497.357 |
|    8388608 |   348.890 |        477.051 |
|   16777216 |   344.792 |        484.409 |


> Performance improvement in PureJavaCrc32
> ----------------------------------------
>
>                 Key: HADOOP-7333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7333
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.21.0
>         Environment: Linux x64
>            Reporter: Eric Caspole
>            Assignee: Eric Caspole
>            Priority: Minor
>         Attachments: HADOOP-7333.patch
>
>
> I would like to propose a small patch to 
>   org.apache.hadoop.util.PureJavaCrc32.update(byte[] b, int off, int len)
> Currently the method stores the intermediate result back into the data member "crc."
I noticed this method gets
> inlined into DataChecksum.update() and that method appears as one of the hotter methods
in a simple hprof profile collected while running terasort and gridmix.
> If the code is modified to save the temporary result into a local and just once store
the final result back into the data member, it results in slightly more efficient hotspot
codegen.
> I tested this change using the the "org.apache.hadoop.util.TestPureJavaCrc32$PerformanceTest"
which is embedded in the existing unit test for this class, TestPureJavaCrc32 on a variety
of linux x64 AMD and Intel multi-socket and multi-core systems I have available to test.
> The patch removes several stores of the intermediate result to memory yielding a 0%-10%
speedup in the "org.apache.hadoop.util.TestPureJavaCrc32$PerformanceTest" which is embedded
in the existing unit test for this class, TestPureJavaCrc32.
>  
> If you use a debug hotspot JVM with -XX:+PrintOptoAssembly, you can see the intermediate
stores such as:
> 414     movq    R9, [rsp + #24] # spill
> 419     movl    [R9 + #12 (8-bit)], RDX # int ! Field PureJavaCrc32.crc
> 41d     xorl    R10, RDX        # int
> The patch results in just one final store of the fully computed value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message