hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
Date Mon, 28 Oct 2013 03:35:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806534#comment-13806534
] 

Lars Hofhansl commented on HBASE-9778:
--------------------------------------

This needs a bigger discussion. The optimizations put in with HBASE-4433 are counter productive
for many use cases.
The patch there avoids an additional call to next, but does so at the expense of an extra
seek (if there aren't many versions). That pays off in the scenario described in HBASE-4433
(large KVs, where an extra next will like lead to loading another block), but with small KVs
and few versions, the extra seek is way more expensive than the risk of loading another block.
(In fact that is exactly the part of the change that Ted requested an extra comment on)

And, BTW, ScanWildcardQueryMatcher does not have the optimization from HBASE-4433, so this
is quite a mess.


> Avoid seeking to next column in ExplicitColumnTracker when possible
> -------------------------------------------------------------------
>
>                 Key: HBASE-9778
>                 URL: https://issues.apache.org/jira/browse/HBASE-9778
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.98.0, 0.96.1, 0.94.14
>
>         Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt,
9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt
>
>
> The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on
the dev list.
> My idea here is to avoid the seeking if we know that there aren't many versions to skip.
> How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS
is set to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly.
> HBASE-9769 has some initial number for this approach:
> Interestingly it depends on which column(s) is (are) selected.
> Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered
at the server with a ValueFilter. Everything measured in seconds.
> Without patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.5|14.3|14.6|11.1|20.3|
> With patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.4|8.9|9.9|6.4|10.0|
> Variation here was +- 0.2s.
> So with this patch scanning is 2x faster than without in some cases, and never slower.
No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message