hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Fri, 15 Nov 2013 03:55:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823256#comment-13823256
] 

Chao Shi commented on HBASE-9969:
---------------------------------

To [~tedyu@apache.org]:
bq. In what condition would numOpenStreams become negative ?
There must be some bug if this happens, such as multi-threading access.

bq. Should the return value from loserTree.isEmpty() be negated ?
Yes, you are correct. It seems this return value is ignored by RegionScannerImpl, so the tests
do not catch this.

bq. For benchmark B, can you try ColumnPaginationFilter with smaller offset ?
This optimization is only effective when it is skipping a large number of KVs. So I think
such filter is required.

bq. Giving percentage improvement in the tables is desirable.
Yes, I will update the table once these issues you guys mentioned are fixed.

To [~vrodionov]:
bq. I just ran StoreScanner with 8 store files and the same test after compaction. All data
is in block cache in both runs. The results I can not explain. Scanner after compaction is
slower: 3.7 sec vs 3.5 sec. The effect of KeyValueHeap sub-par implementation is probably
negligible.

I guess other things are the dominant overhead in your case, such as RPC? This patch optimize
the case scanning with filter that rarely accepts a KV. Perhaps [~lhofhansl]'s case on phoenix
is a good one.

To [~lhofhansl]:
bq. Chao Shi, are these numbers on top of HBASE-9915? Or are they on top a release version
of HBase?
These numbers are from a one-month-ago trunk. It does not have HBASE-9915. The benchmark I
ran does not use encoded blocks, so HBASE-9915 should have no effects on this.

bq. Did a quick port to 0.94, tested with Phoenix and a fully flushed and compacted table
(i.e. the worst case for this patch). Still found a few percent performance improvement -
no slowdown. Will test with a some more HFiles next.
Glad to hear that. I'm looking forward to your numbers.

To [~mcorgan]:
bq. With 8 storefiles, the normal case would go from ~3 comparisons down to 1. I think the
behavior could be reversed by negating the comparator.

I don't quite understand this. The one I understood to improve is to compare against the previous
second top value (i.e. tree\[1\]) before going through leaf to root comparisions. But this
will add one more comparison on the worst case. 

> Improve KeyValueHeap using loser tree
> -------------------------------------
>
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png,
kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message