hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster
Date Tue, 15 Jul 2014 20:22:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062591#comment-14062591
] 

Todd Lipcon commented on HADOOP-10778:
--------------------------------------

I just took a look at running the benchmark locally, and couldn't repro the results on my
Linux core i7 box. For bpc = 512:

                 java.version = 1.7.0_55
            java.runtime.name = OpenJDK Runtime Environment
         java.runtime.version = 1.7.0_55-b14
              java.vm.version = 24.51-b03
               java.vm.vendor = Oracle Corporation
                 java.vm.name = OpenJDK 64-Bit Server VM
java.vm.specification.version = 1.7
   java.specification.version = 1.7
                      os.arch = amd64
                      os.name = Linux
                   os.version = 3.11.0-20-generic
Data Length = 64 MB
Trials      = 5

Direct Buffer Performance Table (bpc = byte-per-crc in MB/sec; #T = #Theads)
|  bpc  | #T ||      Zip || PureJava | % diff ||   Native | % diff | % diff |
|   512 |  1 |     973.4 |    1288.5 |  32.4% |    1660.5 |  70.6% |  28.9% |
|   512 |  2 |     946.1 |    1248.4 |  32.0% |    1619.1 |  71.1% |  29.7% |
|   512 |  4 |     931.2 |    1199.1 |  28.8% |    1576.6 |  69.3% |  31.5% |
|   512 |  8 |     762.1 |     683.9 | -10.3% |    1352.5 |  77.5% |  97.8% |
|   512 | 16 |     396.3 |     368.6 |  -7.0% |     828.3 | 109.0% | 124.7% |

Also, I remembered that a long time ago I wrote a pipelined (instruction-level-parallel) implementation
of the bulk_verify_crc32 method: https://github.com/toddlipcon/crc-workbench/blob/master/bulk_crc32.c#L308
which goes about twice as fast. (using that benchmark I get about 3.1GB/sec on the same machine).
So, if we actually care about performance of the zlib polynomial, we should probably pull
in that code rather than dynamically switch.

> Use NativeCrc32 only if it is faster
> ------------------------------------
>
>                 Key: HADOOP-10778
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10778
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: c10778_20140702.patch
>
>
> From the benchmark post in [this comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when bytesPerChecksum
> 512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message