hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5598) Implement a pure Java CRC32 calculator
Date Sun, 14 Jun 2009 06:34:07 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HADOOP-5598:
--------------------------------

    Attachment: crc32-results.txt
                TestCrc32Performance.java
                hadoop-5598.txt

This is a patch to implement CRC32 in Pure Java, along with a performance test that shows
its improvement. Also attaching the benchmark output from both Sun 1.6.0_12 and OpenJDK 1.6.0_0-b12,
which looks pretty different.

The summary is that, on Sun's JDK (which most people use), the pure Java implementation is
faster for all chunk sizes less than 32 bytes (by a high factor for the smaller end of the
spectrum) and about 33% slower for chunk sizes larger than that. On OpenJDK, the CRC32 implementation
is 3-4x faster than the Sun JDK.

Running the concurrency benchmark from HADOOP-5318 also shows huge improvements (the same
as was seen with Ben's buffering patch) by using the pure Java CRC32. This patch contains
the change to FSDataOutputStream to make use of it.

Review from someone who understands Java's bit extension semantics better than me would be
appreciated - I bet more performance can be squeezed out of this by a Java bitwise op master.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: crc32-results.txt, hadoop-5598.txt, TestCrc32Performance.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a long time
in crc calculation. In particular, it was spending 5 seconds in crc calculation out of a total
of 6 for the write. I suspect that it is the java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message