hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
Date Wed, 24 Aug 2016 12:17:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434804#comment-15434804

Anoop Sam John commented on HBASE-16213:

One thought after one more look
bq.private List<Integer> rowsOffset = new ArrayList<Integer>(64);
So we add all row offset into this List and then finally write all ints to the Hfile block's
stream.  Every addition to List needs an Object creation (int to Integer autoboxing) and so
many garbage.  We ca avoid this.
Instead of List we can create a ByteArrayOutputStream (See org.apache.hadoop.hbase.io.BAOS)
 and write offsets in final serializing way and at the end   write getBuffer() at once.  The
capacity of the BAOS can be initialized with 64 * 4. It will resize automatically as per the
need.  Also #rows can be calculated as BAOS#size()/4

> A new HFileBlock structure for fast random get
> ----------------------------------------------
>                 Key: HBASE-16213
>                 URL: https://issues.apache.org/jira/browse/HBASE-16213
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16213-master_v1.patch, HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch,
HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch,
cpu_blocksize_64K_valuelength_16B.png, cpu_blocksize_64K_valuelength_256B.png, cpu_blocksize_64K_valuelength_64B.png,
hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, new-hfile-block.xlsx,
qps_blocksize_64K_valuelength_16B.png, qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
> HFileBlock store cells sequential, current when to get a row from the block, it scan
from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find the exact
row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and also record
HFileBlock's dataSize/dataWithMetaSize in the encoding.

This message was sent by Atlassian JIRA

View raw message