hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Fast scan with PrefixFilter?
Date Wed, 15 Jan 2014 09:27:35 GMT
Take a look at this blog:
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/

From your earlier description, the components of your rowkey have fixed length. 
Thus you can consider using fuzzy row filter. 

Cheers

On Jan 14, 2014, at 11:08 PM, Ramon Wang <ramon@appannie.com> wrote:

> Hi Ted
> 
> Thanks for the quick reply.
> 
> With this FuzzyRowFilter, do i still need to pass in startRow and stopRow
> like below when constructing a Scan object?
> 
>> Scan(byte [] startRow, byte [] stopRow)
> 
> 
> Will the FuzzyRowFilter provide us performance like a directly get by row
> when we pass something like "20140101_EN_?"
> 
> Cheers
> Ramon
> 
> 
> On Wed, Jan 15, 2014 at 2:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> 
>> Please take a look at
>> http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.html
>> 
>> Cheers
>> 
>> On Jan 14, 2014, at 10:16 PM, Ramon Wang <ramon@appannie.com> wrote:
>> 
>>> Hi Folks
>>> 
>>> We have a table with fixed pattern row key design, the format for the row
>>> key is YEAR_COUNTRY_randomNumber, for example:
>>> 
>>> 20140101_EN_1
>>> 20140101_EN_2
>>> 20140101_EN_3
>>> 20140101_US_1
>>> 20140101_US_2
>>> 20140101_US_3
>>> ...
>>> 
>>> Is there a way i can quickly get the data for "20140101_EN_*" by using
>> Scan
>>> without scan the full table? I think we are probably going to use
>>> the PrefixFilter filter with the Scan object, but the problem is that we
>>> don't know the "startRow" for each scan, any ideas?
>>> 
>>> Thanks
>>> Ramon
>> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message