hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Fri, 15 Nov 2013 05:55:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823319#comment-13823319

Lars Hofhansl commented on HBASE-9969:

Tested again without performing a major compaction. In my scenario I have 10m rows with 5
columns each. Ends up being only three regions and only that last one has more than one store
file (4 in this case).
With that I still did not see any improvement.

Re: HBASE-9778, it's still not immediately clear there how to optimize the cases mentioned
there (many small KVs) while keeping other optimizations (for example for large KVs, where
the likelihood is high that a call to next() will land us unnecessarily in the next block).

Anything that will bring HBase's CPU consumption down is a win. Unless all data is in the
cache I would expect us always being CPU bound, that is not always true (with SSDs for example).

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>         Attachments: hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png,
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message