hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Range Based Filtering with FuzzyRowFilter
Date Tue, 21 Aug 2012 01:11:48 GMT
You might want to rethink your key schema or denormalize your data at write time.
If the key leads with userid then searching for a range of action ids is necessary a full
scan through your table, which is not what you want (unless you run these rarely as Map/Reduce
type jobs).

I assume you have different scans, which scan by userid; so I'd suggest just storing the same
data again but with actionid_userid as key.

If the values of your cells are large store a mapping of actionid_userid -> userid_actionid
in the 2nd table (i.e. a 2ndary index). In that case mind the previous discussions we had
about consistency here, though.

-- Lars

From: anil gupta <anilgupta84@gmail.com>
To: user@hbase.apache.org 
Sent: Friday, August 17, 2012 1:03 PM
Subject: Range Based Filtering with FuzzyRowFilter

Hi All,

I have a question related to FuzzyRowFilterfilter. I have a similar
filtering requirement which might be an extension to FuzzyRowFilter.
Suppose, i have the following structure of rowkeys: userid_actionid, where
userid is of 6 digit and then actionid is 5 digit. I would like to get all
the rows with actionid between 00200 to 00350. With current FuzzyRowFilter
i can search for all the rows a particular actionid. Instead of searching
for a particular actionid i would like to search for a range of actionid.

Does this use case sounds like an extension to current FuzzyRowFilter? Can
i run this kind of filter on HBase0.92 without doing any significant update
to the cluster. I am willing to put in my efforts to do the necessary
changes required in FuzzyRowFilter for my requirement.
If you know of any other easier & equally optimized way to do the same then
please share that.

Thanks & Regards,
Anil Gupta

View raw message