hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10102) CF.VERSIONS is not enforced with timerange scans
Date Sat, 07 Dec 2013 02:29:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842019#comment-13842019
] 

Lars Hofhansl commented on HBASE-10102:
---------------------------------------

Currently the workflow in ScanQueryMatcher is something like this:

# <versions> = min(<CF versions>, <scan version>)
# filter by timerange
# filter out columns (i.e. columns not specified in the scan)
# apply customer filters
# filter by <versions>

Every KV is passed through this filtering process.

What we should do is this:

# filter by <CF versions>
# filter by timerange
# filter out columns (i.e. columns not specified in the scan)
# apply customer filters
# filter by <scan versions>

I have a POC patch that does this. It does not slow scanning in a measurable way.

> CF.VERSIONS is not enforced with timerange scans
> ------------------------------------------------
>
>                 Key: HBASE-10102
>                 URL: https://issues.apache.org/jira/browse/HBASE-10102
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> Example brought up by Niels Basjes on the user list:
> If I do the following commands into the hbase shell
> {code}
>     create 't1', {NAME => 'c1', VERSIONS => 1}
>     put 't1', 'r1', 'c1', 'One', 1000
>     put 't1', 'r1', 'c1', 'Two', 2000
>     put 't1', 'r1', 'c1', 'Three', 3000
>     get 't1', 'r1'
>     get 't1', 'r1' , {TIMERANGE => [0,1500]}
> the result is this:
>     get 't1', 'r1'
>     COLUMN                     CELL
>      c1:                       timestamp=3000, value=Three
>     1 row(s) in 0.0780 seconds
>     get 't1', 'r1' , {TIMERANGE => [0,1500]}
>     COLUMN                     CELL
>      c1:                       timestamp=1000, value=One
>     1 row(s) in 0.1390 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message