hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Filter with State
Date Thu, 02 Aug 2012 04:13:07 GMT
The Filter is initialized per Region as part of a RegionScannerImpl.

So as long as all the rows you are interested are co-located in the same region you can keep
that state in the Filter instance.

You can use a custom RegionSplitPolicy to control (to some extend at least) how the rows are
colocated (KeyPrefixRegionSplitPolicy is an example).

I also blogged about this here (in the context of cross row transactions): http://hadoop-hbase.blogspot.com/2012/02/limited-cross-row-transactions-in-hbase.html

Maybe what you really are looking for are coprocessors?

-- Lars

----- Original Message -----
From: Jerry Lam <chilinglam@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Sent: Wednesday, August 1, 2012 7:06 PM
Subject: Re: Filter with State

Hi Lars,

I understand that it is more difficult to carry states across regions/servers, how about in
a single region? Knowing that the rows in a single region have dependencies, can we have filter
with state? If filter doesn't provide this ability, is there other mechanism in hbase to offer
this kind of functionalities?

I think this is a good feature because it allows efficient scanning on dependent rows. Instead
of fetching each row to the client side and check if we should fetch the next row, the filter
on the server side handles this logic. 

Best Regards,


Sent from my iPad (sorry for spelling mistakes)

On 2012-08-01, at 21:52, lars hofhansl <lhofhansl@yahoo.com> wrote:

> The issue here is that different rows can be located in different regions or even different
region servers, so no local state will carry over all rows.
> ----- Original Message -----
> From: Jerry Lam <chilinglam@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Cc: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Wednesday, August 1, 2012 5:48 PM
> Subject: Re: Filter with State
> Hi St.Ack:
> Schema cannot be changed to a single row.
> The API describes "Do not rely on filters carrying state across rows; its not reliable
in current hbase as we have no handlers in place for when regions split, close or server crashes."
If we manage region splitting ourselves, so the split issue doesn't apply. Other failures
can be handled on the application level. Does each invocation of scanner.next instantiate
a new filter at the server side even on the same region (I.e. Does scanning on the same region
use the same filter or different filter depending on the scanner.next calls??)
> Best Regards,
> Jerry 
> Sent from my iPad (sorry for spelling mistakes)
> On 2012-08-01, at 18:44, Stack <stack@duboce.net> wrote:
>> On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam <chilinglam@gmail.com> wrote:
>>> Hi HBase guru:
>>> From Lars George talk, he mentions that filter has no state. What if I need
>>> to scan rows in which the decision to filter one row or not is based on the
>>> previous row's column values? Any idea how one can implement this type of
>>> logic?
>> You could try carrying state in the client (but if client dies state dies).
>> You can't have scanners carry state across rows.  It says so in API
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description
>> (Whatever about the API, if LarsG says it, it must be so!).
>> Here is the issue: If row X is in region A on server 1 there is
>> nothing to prevent row X+1 from being on region B on server 2.  How do
>> you carry the state between such rows reliably?
>> Can you redo your schema such that the state you need to carry remains
>> within a row?
>> St.Ack

View raw message