hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5074) support checksums in HBase block cache
Date Thu, 02 Feb 2012 18:42:56 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199073#comment-13199073

Phabricator commented on HBASE-5074:

dhruba has commented on the revision "[jira] [HBASE-5074] Support checksums in HBase block

  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 will fix
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 sure
  src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I will make this part of the code
cleaner. I still am hoping to keep only one knob: whether to verify hbase checksums or not.
If hbase checksums is switched on, then hdfs checksums will automatically be switched off.
If hbase checksums is configured 'off', then it will automatically switch on hdfs checksums.
I feel that the other knobs (e.g. no checksums at all or use both checksums) are not very
interesting in *any* production environment and I would like to keep the code complexity a
little lower by avoiding those two combinations. Hope that is ok with you.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 Good idea, will do
  src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 It tried
this, but it needs a few changes, so I anyway landed up with needing my own object wrapper
over DataOutputBuffer.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I too feel that we should
add the checksum type to the hfileblock header. That will make us future proof to try new
checksum algorithms in the future. Will make this change.
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 This is equivalent to
the existing FileSystem.get() and many places in hbase uses this.
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:80 I will make this public so
that users can create a HFileSystem object on a non-default path
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 I am making changes here
based on mikhial's suggestion too.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 as you would see, the
existing code path that create a HFileBlock usin g this constructor uses it for only in-memory
caching, so it never fills up or uses the onDiskDataSizeWithHeader field. But I will set it
to what you propose.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 ondisksizewithheader
= ondiskdatasizewithheader + checksum bytes
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 I am in complete agreement
with you. I wish I could have used the hadoop trunk code here. Unfortunately, hbase pulls
in hadoop 1.0 which does not have this implementation. Another option is to make a copy of
this code from hadoop into hbase code, but this has its own set of problems for maintainability.
I am hoping that hbase will move to hadoop 2.0 very soon and then we can start the more optimal
checksum implementation. Hope that is ok with you.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 This needs to be
thread safe.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 This is an internal
method and this error is handled by upper layers (by switching off hbase checksums). So, I
am following the paradigm of using Exceptions only when true errors happen; I would like to
avoid writing code that generates exceptions  in one layer catches them in another layer and
handles them. The discussion with Doug Cutting on the hdfs-symlink patch is etched in my mind:-)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 I will work (in a later
patch) to use bulk checksum verifications, using native code, etc (from hadoop) in a later
patch. I would like to keep this patch smaller that what it already is by focussing on the
disk format change, compatibility with older versions, etc. The main reason is that most of
the hadoop checksum optimizations are only in hadoop 2.0. I am hoping that it is ok with you.


> support checksums in HBase block cache
> --------------------------------------
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch
> The current implementation of HDFS stores the data in one block file and the metadata(checksum)
in another block file. This means that every read into the HBase block cache actually consumes
two disk iops, one to the datafile and one to the checksum file. This is a major problem for
scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that
the storage-hardware offers.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message