hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Mon, 18 Nov 2013 21:55:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825816#comment-13825816

Matt Corgan commented on HBASE-9969:

Regarding adding 10 cols/row:
{quote}I tried this on my laptop but seems your case above is even faster than before. Maybe
there is something wrong with my environment. I will try it on my devbox tomorrow.{quote}yes,
you are right.  i was profiling this weekend and confirmed the current heap is handling that
situation favorably.  still good to test to make sure we don't lose this aspect!

I made a stripped down version of the PriorityQueue based heap to compare with the LoserTree.
 It adds some counters to track the number of KV comparisons which is interesting to see.
 I was seeing that PQ is faster for next(), especially with just 1 file, and LT is faster
for reseek().  I'll try to post a patch tonight.

I was paying particular attention to this code at KeyValueHeap:103
      KeyValueScanner topScanner = this.heap.peek();
      if (topScanner == null ||
          this.comparator.compare(kvNext, topScanner.peek()) >= 0) {
        this.current = pollRealKV();
I can't figure out why we need to do a heap.add() and pollRealKV when topScanner==null.  I
actually removed the topScanner==null check from the above and the single file scanner was
50% faster.  The whole test suite passed, so either it's not necessary, or we could use another
unit test.  Maybe it has something to do with LazySeek?

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1, 0.94.15
>         Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch,
hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message