hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
Date Sat, 25 Oct 2014 12:25:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184075#comment-14184075

Anoop Sam John commented on HBASE-11425:

Testing with a 2 million Cells with single cell per row.
Writing all cells to a BB/DBB and trying a seek with to last kv. (To make compare across all
cells in BB/DBB)
Seek code is like what we have in ScannerV3#blockSeek
with RK length 17 bytes (1st 13 bytes are same) Getting almost same result.
With RK length 117 bytes (1st 113 bytes are same) the DBB based read is ~3% degrade
Well in this test, the read and compare were from HBB and DBB and those are almost same. 
In case of our CellComparator we have Unsafe based optimization. In my old test this was not
in use.  With Unsafe based read from HBB#array() [this is what happens in HFileReaderV2/V3]
there is a significant perf diff with DBB. Here RK length of 117 bytes and 2 millions cells
and we seek to last cell, the DBB test is 50% slower. :(

I am thinking of doing Unsafe based compares for data in DBB as well.

Just done Unsafe based access from DBB/HBB and then we are in a better shape. The DBB based
above test is ~12% slower than old HBB.array() based compares. Will raise a subtask and attach
the approach there.

> Cell/DBB end-to-end on the read-path
> ------------------------------------
>                 Key: HBASE-11425
>                 URL: https://issues.apache.org/jira/browse/HBASE-11425
>             Project: HBase
>          Issue Type: Umbrella
>          Components: regionserver, Scanners
>    Affects Versions: 0.99.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
> Umbrella jira to make sure we can have blocks cached in offheap backed cache. In the
entire read path, we can refer to this offheap buffer and avoid onheap copying.
> The high level items I can identify as of now are
> 1. Avoid the array() call on BB in read path.. (This is there in many classes. We can
handle class by class)
> 2. Support Buffer based getter APIs in cell.  In read path we will create a new Cell
with backed by BB. Will need in CellComparator, Filter (like SCVF), CPs etc.
> 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy.
> 4. Remove all CP hooks (which are already deprecated) which deal with KVs.  (In read
> Will add subtasks under this.

This message was sent by Atlassian JIRA

View raw message