hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8555) FilterList correctness was dominated by sub-filter(list) ordering randomly
Date Wed, 22 May 2013 05:39:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663797#comment-13663797
] 

Lars Hofhansl commented on HBASE-8555:
--------------------------------------

Sorry for chiming in late here. This is a problem with RowFilter, right?

There are only three filters that implement both filterRowKey and filterKeyValue:
# RowFilter: Does not reimplement the check in filterKeyValue
# RandomRowFilter: Has the same problem
# WhileMatchFitler: Implements proper checks in both filterRowKey and filterKeyValue

So only RowFilter and RandomRowFilter have this problem. Might be better to just fix these
two.
Fix would just be to turn filterOutRow into a Boolean (with capital B) and redo the test on
the row key of the KV passed into filterKeyValue only if filterOutRow is null and then set
it accordingly.

That said, I'm fine with the current fix too if you guys think this is a better fix. A gain
in performance does not trump correctness.

                
> FilterList correctness was dominated by sub-filter(list) ordering randomly
> --------------------------------------------------------------------------
>
>                 Key: HBASE-8555
>                 URL: https://issues.apache.org/jira/browse/HBASE-8555
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.94.3
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>            Priority: Critical
>         Attachments: 8555-trunk-v1.txt, HBASE-8555-0.94.txt, HBASE-8555-0.94-v2.txt,
HBASE-8555-0.94-v3.txt
>
>
> say, ther're 10 rows, column value is i%2:
> row0 0
> row1 1
> row2 0
> row3 1
> row4 0
> row5 1
> row6 0
> row7 1
> row8 0
> row9 1
> 1: filter : row filter > row4   ===> row5 row6 row7 row8 row9
> 2: subFilterList:  row filter <= row4 && column==0    ===> row0 row2 row4
> 3.1 filterlist[expected]   filter || subFilterList  ===> row0 row2 row4 row5 row6
row7 row8 row9
> 3.2 filterlist[BUGON!]  subFilterList || filter ===> row0 row1 row2 row3 row4 row5
row6 row7 row8 row9
> (Please refer to the new testNestedFilterListWithSCVF case)
> It was found when i managed to transform the following SQL into HBase scan statement:

> select xxx from xxx where (pk <= xxx and column1 = xxx) or pk > xxx
> My finding is that we had an assumption for filter methods call sequence:
> e.g. filterRowKey() should be called before filterKeyValue().
> and the orignial filterList.filterRowKey impl broke it due to fast short-circuit returning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message