hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2466) Improving filter API to allow for modification of keyvalue list by filter
Date Thu, 22 Apr 2010 03:22:53 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859636#action_12859636

ryan rawson commented on HBASE-2466:

it seems like the right thing to do would be to disallow filters that
use filterRow() and filterRow(List) if you are using the batch
functionality. This could be implemented by providing a new method:
public boolean hasFilterRow();

if returns true, then the filter expects to do something within
filterRow() (either variety) and thus a batched scan cannot be

Most filters would either hardcode this, or defer to the union of
their underlying filter responses.

As for the hasResults() - it is used to filter the results between
rows, if filterRow() returns true inside hasResults, then nextInternal
will continue on to find the next row of returnable results.  This is
so that RegionScanner#next() always returns an actual row and we dont
need to wrap it with another method.  So not calling filterRow(List)
inside hasResults would surely end up with a problem?  Does your unit
test cover this case?

> Improving filter API to allow for modification of keyvalue list by filter
> -------------------------------------------------------------------------
>                 Key: HBASE-2466
>                 URL: https://issues.apache.org/jira/browse/HBASE-2466
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: filters, regionserver
>            Reporter: Juhani Connolly
>            Priority: Minor
>         Attachments: HBASE-2466-2.patch, HBASE-2466.patch
> As it stands, the Filter interface allows filtering by
> Filter#filterAllRemaining() -> true indicates scan is over, false, keep going on.
> Filter#filterRowKey(byte[],int,int) -> true to drop this row, if false, we will also
> Filter#filterKeyValue(KeyValue) -> true to drop this key/value
> Filter#filterRow() -> last chance to drop entire row based on the sequence of filterValue()
calls. Eg: filter a row if it doesn't contain a specified column.
> It would be useful to allow for an additional API in the form of a step to prune the
list of KeyValues to be sent by implementing an additional
> Filter#filterRow(List<KeyValue>)
> This would allow for a user to write a custom filter against the api that drops unnecessary
KeyValues according to user-defined rules.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message