hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
Date Fri, 15 Nov 2013 04:39:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823272#comment-13823272

Vladimir Rodionov commented on HBASE-9969:

I just ran StoreScanner with 8 store files and the same test after compaction. All data is
in block cache in both runs. The results I can not explain. Scanner after compaction is slower:
3.7 sec vs 3.5 sec. The effect of KeyValueHeap sub-par implementation is probably negligible.


Sorry, that was a wrong test. Actual results:  1.9sec before compaction and 1.5 sec after
~ 20% improvement. 2M rows (400M cache size). Full scan time. All data cached in block cache.
The potential win for Loser tree is ~ 20% currently, but if HBASE-9778 will be resolved  and
 ScanQueryMatcher will be optimized ...

> Improve KeyValueHeap using loser tree
> -------------------------------------
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>         Attachments: hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png,
> LoserTree is the better data structure than binary heap. It saves half of the comparisons
on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from
multiple HFiles in a single store, the other is merging results from multiple stores. This
patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter
over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its public interfaces.
The new code looks more cleaner and simpler to understand.

This message was sent by Atlassian JIRA

View raw message