hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5074) support checksums in HBase block cache
Date Fri, 02 Mar 2012 07:43:41 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HBASE-5074:
-------------------------------

    Attachment: D1521.11.patch

dhruba updated the revision "[jira] [HBASE-5074] Support checksums in HBase block cache".
Reviewers: mbautin

  1. I modified the ChecksumType code to not dum an exception stack trace to the output if
CRC32C is not
     available. Ted's suggestion of pulling CRC32C into hbase code sounds reasonable, but
I would like
     to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then it will automatically
     get CRC32C.
  2. I added a "minorVersion=" to the output of HFilePrettyPrinter.
     Stack, will you be able to run "bin/hbase hfile -m -f filename on your cluster to verify
that this
     checksum feature is switched on. If it prints minorVersion=1, then you are using this
feature.
     Do you still need a print somewhere saying that this feature in on? The older files that
were
     pre-created before that patch was deployed will still use hdfs-checksum verification,
so you
     could possible see hdfs-checksum-verification on stack traces on a live regionserver.
  3. I did some thinking (again) on the semantics of major version and minor version. The
major version
     represents a new file format, e.g. suppose we add a new thing to the file's triailer,
then we
     might need to bump up the major version. The minor version indicates the format of data
inside a
     HFileBlock.
     In the current code, major versions 1 and 2 share the same HFileFormat (indicated by
minor version
     of 0). In this patch, we have a new minorVersion 1 because the data contents inside a
HFileBlock
     has changed. Tecnically, both major version 1 and 2 could have either minorVerion 0 or
1.
     Now, suppose we want to add a new field to the trailer of the HFile. We can bump the
majorVersion
     to 3 but do not change the minorVersion because we did not change the internal format
of an
     HFileBlock.
     Given the above, does it make sense to say that HFileBlock is independent of the majorVersion?

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/fs
  src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

                
> support checksums in HBase block cache
> --------------------------------------
>
>                 Key: HBASE-5074
>                 URL: https://issues.apache.org/jira/browse/HBASE-5074
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch,
D1521.10.patch, D1521.10.patch, D1521.11.patch, D1521.11.patch, D1521.2.patch, D1521.2.patch,
D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch,
D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch,
D1521.9.patch, D1521.9.patch
>
>
> The current implementation of HDFS stores the data in one block file and the metadata(checksum)
in another block file. This means that every read into the HBase block cache actually consumes
two disk iops, one to the datafile and one to the checksum file. This is a major problem for
scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that
the storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message