hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
Date Wed, 13 May 2015 04:23:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541293#comment-14541293

stack commented on HBASE-11927:

So, on machine w/ hardware support, we spend 20% less CPU. Nice one [~appy].

Minor. I don't think we want to do this in HConstants.

949	  public static ChecksumType DEFAULT_CHECKSUM_TYPE = ChecksumType.CRC32C;

HConstants is a bit of an anti-pattern. It should have defines that are truly global. Better
to keep constants with the code they are related to.  Maybe in ChecksumType? (I suppose we
need ChecksumType? We can't use hadoop's DataChecksum.Type?  We'd break too much? Could maybe
do in followup patch).

Nice test.

And to be clear, if an hfile is written with CRC32, we'll just read it out of the hfile and
use that verifying.... so making the change to new checksum type should only apply to new
files written? At least that is how I read it.

If good, lets get this in. On commit I'll add note to refguide unless you want too to make
sure the native libs are available and that for sure they are working for you into perf section.
We have this http://hbase.apache.org/book.html#hadoop.native.lib but we could do better I'd
say if its 20% or more.

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
> ------------------------------------------------------------------------------------
>                 Key: HBASE-11927
>                 URL: https://issues.apache.org/jira/browse/HBASE-11927
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Apekshit Sharma
>         Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch,
HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg,
before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg
> Up in hadoop they have this change. Let me publish some graphs to show that it makes
a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because
of compacting, flushing, etc.).  We should also make use of native CRCings -- especially the
2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now.

This message was sent by Atlassian JIRA

View raw message