hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Thu, 21 Nov 2013 05:33:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828507#comment-13828507

Chao Shi commented on HBASE-9969:

bq.  Doing a quick comparison against the second minimal element is not possible, as the second
minimal element in the loser tree is not always "tree\[1\]" (consider the case that the two
top players meet in the quarterfinal in a tournament game).

To elaborate the problem more clearly, let's consider updateTop() method of LoserTree, which
takes logN comparisons to decide the new minimal after the original one has gone. If we want
to get the second minimal as well, we have to compare among the losers to the newly updated
winner, which takes another logN comparisons (there are logN losers in the path from a leave
up to the root). Thus it will cost 2logN in total. This is equivalent to a binary heap.

The benefit of keeping the second minimal, is where there are consecutive KVs from one HFile,
we can compare the it against the second minimal first. If it is still less than the second
minimal, then it is therefore the winner. This optimization naturally fit in the binary heap.

I wonder if we can find some data structure in the middle, or perhaps don't always keep the
second minimal and determine it when every n calls of next.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods,
hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch,
hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message