hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apekshit Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11927) Use Native Hadoop Library for HFile checksum
Date Tue, 12 May 2015 22:29:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540905#comment-14540905
] 

Apekshit Sharma commented on HBASE-11927:
-----------------------------------------

There were a couple of options. NHL(native hadoop library) and [Circe|https://github.com/trevorr/circe]
We decided to go with NHL, despite the fact that it introduces dependency on hadoop, because
hfile checksum requires interface which take two streams, data and checksums, and verifies/calculates
checksums for chunks of a fixed size data. NHL already supports it while Circe doesn't. (More
differences in this [doc|https://docs.google.com/document/d/1NCB3h8YU86mGFjK_uWA7KMDmu288nrCZvwRTr30zX-s/edit]

We switched from CRC32 as default to CRC32C because:
- crc32c has better error detection properties
- crc32c has advantage of dedicated instruction on newer Intel processors
(couldn't profile this case because the machines i used for testing weren't new enough, ie
didn't support [sse4.2|http://en.wikipedia.org/wiki/SSE4#SSE4.2] instructions)

Profiling was done using lightweight-java-profiler.




> Use Native Hadoop Library for HFile checksum
> --------------------------------------------
>
>                 Key: HBASE-11927
>                 URL: https://issues.apache.org/jira/browse/HBASE-11927
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Apekshit Sharma
>         Attachments: HBASE-11927-v1.patch, HBASE-11927.patch, c2021.crc2.svg, c2021.write.2.svg,
c2021.zip.svg, compact-with-native.svg, compact-without-native.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that it makes
a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because
of compacting, flushing, etc.).  We should also make use of native CRCings -- especially the
2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message