hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5104) FilterList doesn't work right with ColumnPaginationFilter
Date Thu, 29 Dec 2011 23:07:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177451#comment-13177451
] 

Kannan Muthukkaruppan commented on HBASE-5104:
----------------------------------------------

Lars: Yes.

Jiakai wrote in with: <<< the filters in FilterList are applied in order. The ColumnPaginationFilter's
filterKeyValue() is called only when ColumnPrefixFilter's filterKeyValue() returns true. i.e.
the current implementation should be equivalent to:
select * from (select * from Tab where filter1) where filter2

So it should return the desired result after the bug is fixed.

If you meant to suggest that filters in FilterList should be interchangeable, then it becomes
a design question. I'm fine with the alternative approaches you suggested, too.>>>>

Response:  Existing code structure wise, Jiakai is correct. The filters are evaluated in order...
so once SEEK_NEXT_USING_HINT is correctly handled, you'll get the behavior you want. But I
am concerned overall with a ColumnPaginationFilter being a stateful filter whose state gets
updated depending on what other filters where ahead of it. But perhaps, for backward compatibility,
we cannot change its existing behavior.

So we'll probably need to do both... fix the SEEK_NEXT_USING_HINT to work right with FilterList
(at which point your case will start working fine), and also support limit/offset at the Scan/Get
or ColumnPrefixFilter level as a cleaner alternative to do pagination.

One disadvantage of sticking with the FilterList approach would be that it might be trickier
to get the "seek_next_using_hint" optimization. The ColumnPrefixFilter can only seek next
using hint in limited circumstances. For example, if you have an OR filter of two prefix filters:

((ColumnPrefix("B") or ColumnPrefix("A")) AND (PaginationFilter(5, 5))

we cannot have the first filter suggest a SEEK_NEXT_USING_HINT to go to prefix B, as that'll
miss out columns starting at "A".

We'll need to restrict the SEEK_NEXT_USING_HINT to be used in much more limited circumstances...
and if there are other filters in the mix, we probably need to scan one cell at a time. This
might be another reason to deal with LIMIT/OFFSET as either an option to the ColumnPrefixFilter
itself or at the Scan/Get API level.

                
> FilterList doesn't work right with ColumnPaginationFilter
> ---------------------------------------------------------
>
>                 Key: HBASE-5104
>                 URL: https://issues.apache.org/jira/browse/HBASE-5104
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Madhuwanti Vaidya
>         Attachments: testFilterList.rb
>
>
> Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email
from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
>     ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes("tag1"));
>     ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset
*/);
>     FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
>     filters.addFilter(filter1);
>     filters.addFilter(filter2);
>     Get get = new Get(USER);
>     get.addFamily(COLUMN_FAMILY);
>     get.setMaxVersions(1);
>     get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 were not
set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList
filter does not handle this return code properly (treat it as INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message