hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
Date Wed, 10 Aug 2016 08:12:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414885#comment-15414885
] 

ramkrishna.s.vasudevan commented on HBASE-16213:
------------------------------------------------

Perf improvement is great. With smaller blocks and bigger value size impact is lesser as only
very few rows are to be found so that seek is not taking time. The meta data overhead is at
the max 4k more I think. 
HAving multiple columns for the same row also should go with the same meta data overhead only
(if the total size is going to account to approx 1K).
Went through the patch. 
Some of the tag related decode and encode can be moved to a subclass and avoid duplicate with
the existing code I think.
And see if the SeekState's Cell impl should be all together new in the new EncodedSeeker state
implementation. May be they can be reused. I have not checked if there is something different
so that it is not getting reused.
I think all the existing tests for DBE would work with this because the new DBE enum will
iterate through all. Do you need any specific test case for these new types?

> A new HFileBlock structure for fast random get
> ----------------------------------------------
>
>                 Key: HBASE-16213
>                 URL: https://issues.apache.org/jira/browse/HBASE-16213
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch,
HBASE-16213_v2.patch, hfile-cpu.png, hfile_block_performance.pptx, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, it scan
from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find the exact
row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and also record
HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message