hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5074) support checksums in HBase block cache
Date Wed, 22 Feb 2012 08:14:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213448#comment-13213448

Phabricator commented on HBASE-5074:

dhruba has commented on the revision "[jira] [HBASE-5074] Support checksums in HBase block

  src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different
fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs.

  src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer
is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer
uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open
two file descriptors for the same file, both for reading purposes... one which checksum and
the other without checksums
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected
member, so users of this class are not concerned on what this is.  If you have a better structure
on how to organize this one, please do let me know
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns
a long. But actual implementations like CRC32, CRC32C, etc all return an int.

  Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we
should store 8 byte checksums, I can do that. But for the common case, we will be wasting
4 bytes in the header for every checksum chunk
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done


> support checksums in HBase block cache
> --------------------------------------
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch,
D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch,
> The current implementation of HDFS stores the data in one block file and the metadata(checksum)
in another block file. This means that every read into the HBase block cache actually consumes
two disk iops, one to the datafile and one to the checksum file. This is a major problem for
scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that
the storage-hardware offers.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message