hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Thu, 20 Mar 2014 23:05:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942476#comment-13942476

Lars Hofhansl commented on HBASE-9969:

I take back what I said about this above:
      KeyValueScanner topScanner = this.heap.peek();
      if (topScanner == null ||
          this.comparator.compare(kvNext, topScanner.peek()) >= 0) {
        this.current = pollRealKV();
Looking again, we already have the invariant that the <current> scanner always had a
real seek done it already. So if current is the last/only scanner we can avoid those checks.
Did some more tests (20m rows, 5 cols each, everything in the cache, fully compacted - i.e.
one HFile per store). With the check a scan that filters everything at the server through
all 100m KVs takes 15.1s without it takes 13.3s, so a 12% improvement.
Definitely a change we should make after all, I'll roll this into my experiments on HBASE-10625.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.99.0
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods,
hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch,
hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message