hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
Date Fri, 06 Jan 2012 01:15:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181020#comment-13181020
] 

Kannan Muthukkaruppan commented on HBASE-5104:
----------------------------------------------

<<<I just had a discussion today about how it would be nice if one could start a
scanner at a certain column prefix within a certain row and also set a stop column prefix
with in a row. (i.e. not using a filter).>>>

Why would you not want to use a filter for this case? ColumnRangeFilter() handles this case
nicely correct?


<<< Alternatively, there was some discussion about startRow/stopRow also allowing
to specify a CF/columm. Would that work here? It would allow precise placement of a scan and
might be a relatively simple change with more general applicability.>>>

This may not work for many cases. How do I, for instance say, get me the next 5 KVs in a particular
row whose value is X (note: here the filter is on column value rather than column name; assume
you are using the SingleColumnValueFilter()).

I think limit/offset is a simple/well understood concept that we should support in a clean
way. Scan/Get API seems like a good place to do it. What is the concern with adding the capability
there?

                
> Provide a reliable intra-row pagination mechanism
> -------------------------------------------------
>
>                 Key: HBASE-5104
>                 URL: https://issues.apache.org/jira/browse/HBASE-5104
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Madhuwanti Vaidya
>         Attachments: testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular "offset")
is currently supported via the ColumnPaginationFilter. However, it is not a very clean way
of supporting pagination.  Some of the problems with it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results
as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter
as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE
for a particular cell.
> * When this Filter is used in combination with other filters (e.g., doing AND with another
filter using FilterList), the behavior of the query depends on the order of filters in the
FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions
of the cell as separate values even if another filter upstream or the ScanQueryMatcher is
going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case that prompted
this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns
with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some
possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter)
[Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email
from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
>     ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes("tag1"));
>     ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset
*/);
>     FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
>     filters.addFilter(filter1);
>     filters.addFilter(filter2);
>     Get get = new Get(USER);
>     get.addFamily(COLUMN_FAMILY);
>     get.setMaxVersions(1);
>     get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 were not
set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList
filter does not handle this return code properly (treat it as INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message