hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Fri, 15 Nov 2013 03:55:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823256#comment-13823256

Chao Shi commented on HBASE-9969:

To [~tedyu@apache.org]:
bq. In what condition would numOpenStreams become negative ?
There must be some bug if this happens, such as multi-threading access.

bq. Should the return value from loserTree.isEmpty() be negated ?
Yes, you are correct. It seems this return value is ignored by RegionScannerImpl, so the tests
do not catch this.

bq. For benchmark B, can you try ColumnPaginationFilter with smaller offset ?
This optimization is only effective when it is skipping a large number of KVs. So I think
such filter is required.

bq. Giving percentage improvement in the tables is desirable.
Yes, I will update the table once these issues you guys mentioned are fixed.

To [~vrodionov]:
bq. I just ran StoreScanner with 8 store files and the same test after compaction. All data
is in block cache in both runs. The results I can not explain. Scanner after compaction is
slower: 3.7 sec vs 3.5 sec. The effect of KeyValueHeap sub-par implementation is probably

I guess other things are the dominant overhead in your case, such as RPC? This patch optimize
the case scanning with filter that rarely accepts a KV. Perhaps [~lhofhansl]'s case on phoenix
is a good one.

To [~lhofhansl]:
bq. Chao Shi, are these numbers on top of HBASE-9915? Or are they on top a release version
of HBase?
These numbers are from a one-month-ago trunk. It does not have HBASE-9915. The benchmark I
ran does not use encoded blocks, so HBASE-9915 should have no effects on this.

bq. Did a quick port to 0.94, tested with Phoenix and a fully flushed and compacted table
(i.e. the worst case for this patch). Still found a few percent performance improvement -
no slowdown. Will test with a some more HFiles next.
Glad to hear that. I'm looking forward to your numbers.

To [~mcorgan]:
bq. With 8 storefiles, the normal case would go from ~3 comparisons down to 1. I think the
behavior could be reversed by negating the comparator.

I don't quite understand this. The one I understood to improve is to compare against the previous
second top value (i.e. tree\[1\]) before going through leaf to root comparisions. But this
will add one more comparison on the worst case. 

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>         Attachments: hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png,
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message