hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
Date Wed, 16 Oct 2013 04:50:46 GMT
Lars Hofhansl created HBASE-9778:
------------------------------------

             Summary: Avoid seeking to next column in ExplicitColumnTracker when possible
                 Key: HBASE-9778
                 URL: https://issues.apache.org/jira/browse/HBASE-9778
             Project: HBase
          Issue Type: Bug
            Reporter: Lars Hofhansl
            Assignee: Lars Hofhansl
             Fix For: 0.98.0, 0.94.13, 0.96.1


The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev
list.
My idea here is to avoid the seeking if we know that there aren't many rows to skip.
How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set
to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly.

HBASE-9769 has some initial number for this approach:
Interestingly it depends on which column(s) is (are) selected.

Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered
at the server with a ValueFilter. Everything measured in seconds.

Without patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.5|14.3|14.6|11.1|20.3|

With patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.4|8.9|9.9|6.4|10.0|

Variation here was +- 0.2s.

So with this patch scanning is 2x faster than without in some cases, and never slower. No
special hint needed, beyond declaring VERSIONS correctly.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message