hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5074) support checksums in HBase block cache
Date Tue, 31 Jan 2012 01:05:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196606#comment-13196606

Phabricator commented on HBASE-5074:

mbautin has commented on the revision "[jira] [HBASE-5074] Support checksums in HBase block
Added CCs: dhruba

  @Dhruba: The "checksum at the end of block" approach seems reasonable and the implementation
looks good! Specific comments inline.

  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 What is the
purpose of the hfs parameter here?
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 s/preceeding/preceding/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:50 s/deermines/determines/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:51 s/does not need/do
not need/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 s/major/minor/
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:260 Rename the existing
expectVersion to expectMajorVersion for clarity.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:343 Rename to expectMajorVersion
for clarity.

  Also, does the version field of this class now only contain the major version? If so, rename
it to majorVersion.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:345 Add the word "major"
to the error message.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:462 Rename to getMajorVersion
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Can we modify the parameter
type and get rid of this cast?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:415 This is not a constructor,
but a factory method.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 Add "ForTest" to method name
for clarity.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:91 s/has/have/
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:95 Consider replacing the
"_V0" suffix with something more meaningful like "_NO_CHECKSUM".
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 Consider using a suffix
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 When the number of bytes
per checksum becomes configurable, will that require a persistent data format change? What
will the upgrade procedure be in that case?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:138 It is not clear from
this call that 0 is minor version. Create a constant with a meaningful name (e.g. MINOR_VERSION_NO_CHECKSUM).
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Consider adding "WithChecksum"
to variable name.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:395-398 This is becoming
bulky. Factor out the common term (uncompressedSizeWithoutHeader + headerSize() + cksumBytes)
into a local variable. Also avoid evaluating headerSize() twice.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:400-402 Reuse the new local
variables from the above comments here.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 Update this comment,
since the meaning of "extraBytes" has changed from just being the room for the next block's
header to a much more complex role.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:757-758 Should we throw an
IOException instead since this method already throws it?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:771-772 tmp is a particularly
bad variable name. Combine these two lines and get rid of tmp.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:808-809 Get rid of tmp and
combine these two lines.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 This method and the above
one seem to share a lot of code. Is it possible to get rid of code duplication?

  Also, these two methods seem isolated enough to be moved to another class, maybe even as
static methods.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Do we need this in
case of minorVersion = 0? Or do we always write new files with checksums?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Somehow the fact
that checksum format is different for compressed and uncompressed blocks has escaped me halfway
through the review. Maybe it is worth explicitly mentioning that in javadoc.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:999-1011 Use System.arraycopy
instead of loops.

  Add "ForTest" to method name to discourage its use in production.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1202 Nice! Thanks for locking
down these internal base classes and methods.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1389-1390 Delete one of these
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 Does it make sense
to move this checksum instantiation code to a function and reuse it everywhere we call ChecksumFactory.newInstance()?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1786 Remove this and other
debug output statements.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 As I mentioned, it is
probably better to move checksum computation and validation code to a separate utility class.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Use a constant to
indicate that this is a minor version without checksum support instead of just 0.
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:56 Is this necessary? Does not
Java call default constructors automatically?
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:70 This is for testing
only, right?
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:225 Long line.
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 A lot
in this file appears to be copy-paste from TestHFileBlock, so it very difficult to see the
real changes. Please reuse the appropriate code instead.


> support checksums in HBase block cache
> --------------------------------------
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch
> The current implementation of HDFS stores the data in one block file and the metadata(checksum)
in another block file. This means that every read into the HBase block cache actually consumes
two disk iops, one to the datafile and one to the checksum file. This is a major problem for
scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that
the storage-hardware offers.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message