hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Fri, 22 Nov 2013 05:36:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829666#comment-13829666

Matt Corgan commented on HBASE-9969:

It looks like the LoserTree always maintains the top element at the top after a call to updateTopValue(current),
and you can easily peek() the top element (using topValue()).  Could the LoserTreeKeyValueHeap
simply compare the current scanner to topValue(), and only updateTopValue(current) if it's
greater than topValue()?  I think that's what current KeyValueHeap does.

{quote}Another optimization: as KVs are usually sharing common prefixes, we can skip such
common part during comparisons. I haven't thought on this carefully. Just mention it here
in case someone may have better thoughts based on it.{quote}yeah, I think there's potential
here.  You could maintain a heap for each level: row, family, qualifier, timestamp.  This
would probably require adding methods to the KeyValueScanner: nextRow(), nextQualifier(),
etc.  PrefixTree, in particular, would benefit from this because it can execute the nextRow()
method with a trivial amount of work while the other encodings have to scan through every
row key in the block and do a comparison on it.  If someone tried it, you could start with
only the nextRow() optimization which would have the biggest payoff.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods,
hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch,
hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message