hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5598) Implement a pure Java CRC32 calculator
Date Tue, 16 Jun 2009 01:02:07 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HADOOP-5598:
--------------------------------

    Attachment: hadoop-5598-evil.txt

Here's another approach, which is exceedingly evil, but also "best of both worlds" in terms
of speed. It switches between pure-Java CRC32 and the built-in CRC32, but avoids the slow
crc_combine function. It does so by cheating using reflection to set the internal (private)
crc member of the java.util.zip.CRC32.

This is probably not a good idea to actually use, but it does show an "upper bound" in terms
of performance. In benchmarks, this "fast" version is always as fast as the faster of the
pure and built-in CRC32 routines. In case the reflection fails, it falls back to always pure.

Owen and Arun: could you guys comment on the real-life workload where you saw this as an issue?
I would imagine most MR workloads don't have this problem since they're spilling out of an
in-memory buffer and therefore can write large chunks.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt,
hadoop-5598.txt, TestCrc32Performance.java, TestCrc32Performance.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a long time
in crc calculation. In particular, it was spending 5 seconds in crc calculation out of a total
of 6 for the write. I suspect that it is the java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message