hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
Date Fri, 07 Mar 2014 06:59:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923637#comment-13923637
] 

ramkrishna.s.vasudevan commented on HBASE-9778:
-----------------------------------------------

Went thro the patch.  the patch looks good.  If we have 100 cols and in scan we have added
col98, now i think there is no use in using this eager next, same is the case when i have
2 cols but col1 has 99 versions and col2 has one version and scan.addcol we have added col2
only right?
Mostly when the number of versions are going to be 1 or very min for the column added in scan
this could be very much useful.  so should we add a comment before this eager_next saying
if the column before the one added in scan.addCol has very few versions then this could be
used?
I don't have a better name for 'eager next'.


> Avoid seeking to next column in ExplicitColumnTracker when possible
> -------------------------------------------------------------------
>
>                 Key: HBASE-9778
>                 URL: https://issues.apache.org/jira/browse/HBASE-9778
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt,
9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt
>
>
> The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on
the dev list.
> My idea here is to avoid the seeking if we know that there aren't many versions to skip.
> How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS
is set to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly.
> HBASE-9769 has some initial number for this approach:
> Interestingly it depends on which column(s) is (are) selected.
> Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered
at the server with a ValueFilter. Everything measured in seconds.
> Without patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.5|14.3|14.6|11.1|20.3|
> With patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.4|8.9|9.9|6.4|10.0|
> Variation here was +- 0.2s.
> So with this patch scanning is 2x faster than without in some cases, and never slower.
No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message