hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5598) Implement a pure Java CRC32 calculator
Date Sun, 14 Jun 2009 08:08:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719234#action_12719234
] 

Todd Lipcon commented on HADOOP-5598:
-------------------------------------

The other solution that would solve this whole issue is to simply not checksum the data in
FSOutputSummer until an entire chunk has been buffered up. This would bring the "write size"
up to io.bytes.per.checksum, 512 by default, which performs better in java.util.zip.CRC32
than in the Java implementation. Doug expressed some concern over that idea in HADOOP-5318,
I guess since data can sit in memory for an arbitrarily long amount of time before getting
checksummed, but I would imagine most people are using ECC RAM anyway :)

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: crc32-results.txt, hadoop-5598-hybrid.txt, hadoop-5598.txt, TestCrc32Performance.java,
TestCrc32Performance.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a long time
in crc calculation. In particular, it was spending 5 seconds in crc calculation out of a total
of 6 for the write. I suspect that it is the java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message