hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5074) support checksums in HBase block cache
Date Thu, 02 Feb 2012 02:15:56 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198472#comment-13198472
] 

Phabricator commented on HBASE-5074:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-5074] Support checksums in HBase block
cache".

  @dhruba: thanks for the update! See my replies inline.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 I don't see
any overrides of this method in HFileReaderV{1,2} in the patch, and this particular method
looks really confusing, since it takes a parameter, ignores it, and returns this.hfs instead.
Did you mean to override it in a way that does use the parameter? In that case, could you
please add a javadoc here explaining why the argument is being ignored?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Agreed. Perhaps we should
avoid replacing all occurrences of FileSystem with HFileSystem. One class cast is much simpler.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 OK.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 How much more work is
it to make it configurable? Otherwise we would be storing the bytes-per-checksum field but
not actually using it, which would be really confusing.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Sounds good. Could you
replace comments with javadocs? That seems to be the convention in HBase code even for private
fields.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 OK, sounds good.
I probably just misread the code.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 It is fine to leave duplicate
code between DataInputStream and ByteBuffer implementations for performance reasons. However,
I still think it is better to move these into a separate utility class, e.g. ByteBufferUtils.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Sounds good.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1525 This is probably an
error, not a warning, as we are about to shut down the regionserver.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This constructor
will be called in the same thread that tries to read the block (see ThreadLocal.get() implementation).
I am not sure if throwing a RuntimeException will shut down the regionserver. But this type
of error definitely too serious to recover from gracefully, so this is probably fine.

  Just to make sure: are we planning to swap checksum implementations in production? In that
case, most RPC threads will still keep their associated PrefetchedHeader instance with the
wrong checksum class.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 Sounds good. In that
case it is probably better to add a method call to an external utility method here, instead
of putting checksum calculation inline.

REVISION DETAIL
  https://reviews.facebook.net/D1521

                
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch
>
>
> The current implementation of HDFS stores the data in one block file and the metadata(checksum)
in another block file. This means that every read into the HBase block cache actually consumes
two disk iops, one to the datafile and one to the checksum file. This is a major problem for
scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that
the storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message