hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9440) Pass blocks of KVs from HFile scanner to the StoreFileScanner and up
Date Fri, 13 Sep 2013 22:00:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767030#comment-13767030
] 

Lars Hofhansl commented on HBASE-9440:
--------------------------------------

Some more numbers: 720k rows, 50cols, 100 bytes, each, 36m kvs. 4.9gb HFile.
HFile directly... Disk: 28s, Block cache: 1.2s
RS frontdoor (RowFilter, skips to next row after first column)... Disk: ~30s, block cache:
2.6s
RS frontdoor (ValueFilter)... Disk: ~30s, block cache: 6s

So HBase is doing something incredibly expensive for row assembly. Not sure I trust the numbers.
Will double check.
                
> Pass blocks of KVs from HFile scanner to the StoreFileScanner and up
> --------------------------------------------------------------------
>
>                 Key: HBASE-9440
>                 URL: https://issues.apache.org/jira/browse/HBASE-9440
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> Currently we read KVs from an HFileScanner one-by-one and pass them up the scanner/heap
tree. Many time the ranges of KVs retrieved from StoreFileScanner (by StoreScanners) and HFileScanner
(by StoreFileScanner) will be non-overlapping. If chunks of KVs do not overlap we can sort
entire chunks just by comparing the start/end key of the chunk. Only if chunks are overlapping
do we need to sort KV by KV as we do now.
> I have no patch, but I wanted to float this idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message