hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viral Bajaria (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
Date Tue, 30 Jul 2013 02:31:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723305#comment-13723305
] 

Viral Bajaria commented on HBASE-9079:
--------------------------------------

I will upload a new patch with the fixes that Ted pointed out.

[~tedyu@apache.org] When you say trunk patch you mean against the 0.95/0.96 tree ?

Regards Lars comment on turning it around to ==, I could move it to the following prior to
even running the for loop:
{code}
if (seekHintFilter != null) {
  return seekHintFilter.getNextKeyHint();
}
{code}

Regarding the ordering, I think the issue will be when operator is MUST_PASS_ONE and both
filters want to give you a SEEK_HINT but one of them is operating at the row level while the
other is operating at the column level. For example, if ColumnRange comes before FuzzyRow
and operator is MUST_PASS_ONE, we will iterate through both the filters filterKeyValue method
and keep the state returned from FuzzyRow and not from ColumnRange. I think this issue exists
in current code too since we go through each filter and keep the max row. 

Personally I feel it's not a good use-case to make a FilterList with one filter operating
at the row level and another at the column level and asking the operator to be MUST_PASS_ONE.
That's almost like saying that keep a column even if row does not match. Any suggestions on
what should be done here ?
                
> FilterList getNextKeyHint skips rows that should be included in the results
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-9079
>                 URL: https://issues.apache.org/jira/browse/HBASE-9079
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.94.10
>            Reporter: Viral Bajaria
>         Attachments: TestFail.patch, TestSuccess.patch
>
>
> I hit a weird issue/bug and am able to reproduce the error consistently. The problem
arises when FilterList has two filters where each implements the getNextKeyHint method.
> The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint()
whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which
at this stage is of type FilterList. The implementation in FilterList iterates through all
the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList
in which only one of them implements getNextKeyHint. but if multiple of them implement then
that's where things get weird.
> For example:
> - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of
them implement getNextKeyHint
> - wrap them in FilterList with MUST_PASS_ALL
> - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter
which will return the SEEK_NEXT_USING_HINT code.
> - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first
which basically says what the next row should be. While in reality we want the ColumnRangeFilter
to give the seek hint.
> - The above behavior skips data that should be returned, which I have verified by using
a RowFilter with RegexStringComparator.
> I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT
and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested
it with my current queries and it works fine but I need to run the entire test suite to make
sure I have not introduced any regression. In addition to that I need to figure out what should
be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different.
> Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring
something very important ? If it's tough to wrap your head around the explanation, then I
can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message