hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5598) Implement a pure Java CRC32 calculator
Date Fri, 19 Jun 2009 14:52:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721803#action_12721803
] 

Owen O'Malley commented on HADOOP-5598:
---------------------------------------

Our problem with JNI mostly happens when you have large byte[] that you are using for your
input. However, it depends a lot on the fragmentation of the heap and thus is not easy to
benchmark against. It was in the context of doing the terabyte sort. The problem with JNI
is that to get access to a byte[], the runtime may need to copy the array in/out of the C
code. If the array is 100 mb, that takes a lot of time.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt,
hadoop-5598.txt, hadoop-5598.txt, PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java,
TestCrc32Performance.java, TestCrc32Performance.java, TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a long time
in crc calculation. In particular, it was spending 5 seconds in crc calculation out of a total
of 6 for the write. I suspect that it is the java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message