hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
Date Fri, 30 Sep 2011 21:05:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118444#comment-13118444

jiraposter@reviews.apache.org commented on HBASE-4496:

This is an automatically generated e-mail. To reply, visit:

Ship it!

I love it. This is what I should have done with HBASE-4496 if I had had more knowledge about
the reader code.
I'll do some more manual testing with your patch applied.
This will create extra merging work for HBASE-4422 and HBASE-4344


    This is good.


    I like this. HFileReaderV2 implementing HFileBlock.BasicReader was strange.


    No more casting, awesome.
    *Very* minor nit, but why not do reader.getDataBlockIndexReader().seekToDataBlock(...)
as you do below?

- Lars

On 2011-09-30 20:41:01, Mikhail Bautin wrote:
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2136/
bq.  -----------------------------------------------------------
bq.  (Updated 2011-09-30 20:41:01)
bq.  Review request for hbase, Jonathan Gray and Lars Hofhansl.
bq.  Summary
bq.  -------
bq.  This fixes a couple of long-existing code issues in HFile v2:
bq.  - Making seekBefore cache the previous block it has to read when the scanner happens
to be at the first key of a block (this was a performance regression introduced in HFile v2).
bq.  - Fixing the accounting of the number of blocks read for the one-level index case in
HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.
bq.  - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2,
but the former did not cache blocks (a source of confusion).
bq.  - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile
readers and passed to HFileBlockIndex.
bq.  This addresses bug HBASE-4496.
bq.      https://issues.apache.org/jira/browse/HBASE-4496
bq.  Diffs
bq.  -----
bq.    src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367 
bq.    src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e 
bq.  Diff: https://reviews.apache.org/r/2136/diff
bq.  Testing
bq.  -------
bq.  This is in production in Facebook's hbase-89 branch. 
bq.  Still testing this open-source patch -- please don't commit yet.
bq.  Thanks,
bq.  Mikhail

> HFile V2 does not honor setCacheBlocks when scanning.
> -----------------------------------------------------
>                 Key: HBASE-4496
>                 URL: https://issues.apache.org/jira/browse/HBASE-4496
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.92.0, 0.94.0
>         Attachments: 4496.txt
> While testing the LRU cache during the scanning I noticed quite some churn in the cache
even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always
caches blocks in the LRU cache regardless of the cacheBlocks setting.
> Here's a trace (from Eclipse) showing the problem:
> HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279	
> HFileReaderV2.readBlockData(long, long, int, boolean) line: 219	
> HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line:
> HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502	
> HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539	
> StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151	
> StoreFileScanner.reseek(KeyValue) line: 110	
> KeyValueHeap.reseek(KeyValue) line: 255	
> StoreScanner.reseek(KeyValue) line: 409	
> StoreScanner.next(List<KeyValue>, int) line: 304	
> KeyValueHeap.next(List<KeyValue>, int) line: 114	
> KeyValueHeap.next(List<KeyValue>) line: 143	
> HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774	
> HRegion$RegionScannerImpl.nextInternal(int) line: 2722	
> HRegion$RegionScannerImpl.next(List<KeyValue>, int) line: 2682	
> HRegion$RegionScannerImpl.next(List<KeyValue>) line: 2699	
> HRegionServer.next(long, int) line: 2092	
> Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...)
at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock
with cacheBlocks set unconditionally to true.
> The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock
and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly
as readBlockData should not care about caching.
> Avoiding caching during scans is somewhat important for us.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message