hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Mon, 25 Nov 2013 16:36:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831610#comment-13831610
] 

Chao Shi commented on HBASE-9969:
---------------------------------

bq. It looks like the LoserTree always maintains the top element at the top after a call to
updateTopValue(current), and you can easily peek() the top element (using topValue()). Could
the LoserTreeKeyValueHeap simply compare the current scanner to topValue(), and only updateTopValue(current)
if it's greater than topValue()? I think that's what current KeyValueHeap does.

I don't think we can do this with LoserTree, because it can only used to determine minimal
among a *fixed* number of streams at construction time. topValue(). It does not support "extractMin"
operation as defined by a heap. So the winner must remain in the tree.

I tried to optimize LoserTreeKeyValueHeap and got ~10% improvement, but it is still worse
than the priority queue based implementations. The LoserTree knows whether the added value
is on the same row with others. I'm trying to think about other optimization techniques.

bq. Loser tree will always need at least 2 comparison for consecutive optimization. One need
to compare new element with winner of a first half tree and with 2nd runner of a second half
tree (the previous top element was a winner in this half - tree).

Is it true? The global 2nd runner of the tree may sit very near the leave. To find it out,
we need logN comparisons.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods,
hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch,
hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message