hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anil Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
Date Wed, 22 Aug 2012 06:38:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439311#comment-13439311

Anil Gupta commented on HBASE-6618:

Hi Alex,

I agree with you idea of RangeBased Fuzzy Filter. However, I would like to take a phased approach
in developing this:
In your proposal, the user can provide multiple fuzzy ranges in a single scan. i.e. <any
4 bytes><any 6 bytes value between "_0001" and "0099"><any 3 bytes><any
4 bytes value between "_001" and "_099">
Instead of the above, IMO lets try to make a filter for "<any 4 bytes><any 6 bytes
value between "_0001" and "0099"><any 3 bytes>" or "<any 4 bytes><any 6
bytes value between "_0001" and "0099">". Once we develop this then we can enhance it to
use multiple fuzzy ranges. This is just my thought/approach of developing this. Let me know
your opinion.

>From this week, at work I had to shift focus from HBase to Hive and HCatalog for another
POC. So, I'll be squeezing time for this JIRA out of work schedule. I'll start looking into
the current implementation of FuzzyRowFilter to get idea about implementation.

Anil Gupta
Software Engineer II, Intuit, Inc 
> Implement FuzzyRowFilter with ranges support
> --------------------------------------------
>                 Key: HBASE-6618
>                 URL: https://issues.apache.org/jira/browse/HBASE-6618
>             Project: HBase
>          Issue Type: New Feature
>          Components: filters
>            Reporter: Alex Baranau
>            Priority: Minor
> Apart from current ability to specify fuzzy row filter e.g. for <userId_actionId>
format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify
the "fuzzy range" , e.g. ????_0004, ..., ????_0099.
> See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
> Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter,
but in case when the range is big (contains thousands of values) it is not efficient.
> Filter should perform efficient fast-forwarding during the scan (this is what distinguishes
it from regex row filter).
> While such functionality may seem like a proper fit for custom filter (i.e. not including
into standard filter set) it looks like the filter may be very re-useable. We may judge based
on the implementation that will hopefully be added.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message