hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
Date Thu, 11 Aug 2016 06:06:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416612#comment-15416612
] 

Anoop Sam John commented on HBASE-16213:
----------------------------------------

bq. Make it default
Am not sure.  Because it is implemented as a type of DBE.  That means we make one DBE as default.
 This will help in random get but not much in range scan. 
Also one more thing to note that is when this is used users can not use other DBE optimizations
(space saving)..  Ya that is true also.. Because all DBE impls rely on the fact that the reads
are linear over an HFile block.  They key and/or value of one KV can be obtained by reading
all the previous cells in the block.   So implementing this as a kind of DBE also correct
IMO.

We should get this in to 2.0.  Good one.

> A new HFileBlock structure for fast random get
> ----------------------------------------------
>
>                 Key: HBASE-16213
>                 URL: https://issues.apache.org/jira/browse/HBASE-16213
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch,
HBASE-16213_v2.patch, hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx,
new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, it scan
from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find the exact
row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and also record
HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message