hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apekshit Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
Date Fri, 15 May 2015 17:01:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-11927:
------------------------------------
    Release Note: 
Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums
too) and stores them inline with file data. During reading, these checksums are verified to
ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation,
if it’s available, otherwise falls back to standard Java libraries. Instructions to load
NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two
reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have
a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This change is fully
backward compatible. Also, users should not see any differences except decrease in cpu usage.
To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your
processor supports SSE4.2.

  was:
Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums
too) and stores them inline with file data. During reading, these checksums are verified to
ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation,
if it’s available, otherwise falls back to standard Java libraries. Instructions to load
NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two
reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have
a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This changes is
fully backward compatible. Also, users should not see any differences except decrease in cpu
usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your
processor supports SSE4.2.


> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-11927
>                 URL: https://issues.apache.org/jira/browse/HBASE-11927
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Apekshit Sharma
>         Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch,
HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch,
HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg,
before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that it makes
a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because
of compacting, flushing, etc.).  We should also make use of native CRCings -- especially the
2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message