hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Baranau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
Date Mon, 20 Aug 2012 19:55:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438134#comment-13438134
] 

Alex Baranau commented on HBASE-6618:
-------------------------------------

Just an idea. May be we should try improve existing FuzzyRowFilter by allowing to specify
each fuzzy rule with:
* fuzzy key start
* fuzzy key end << this is currently missing in FuzzyRowFilter
* mask

This looks flexible enough to me. E.g. one could specify rule ????(_0001_-_0099_)???(_001-_099),
i.e. <any 4 bytes><any 6 bytes value between "_0001_" and "_0099_"><any 3 bytes><any
4 bytes value between "_001" and "_099"> with this definition:
* ????_0001_???_001
* ????_0099_???_099 << currently missing
* 11110000001110000

In this case any sequence of "fixed" positions treated as one n-bytes value.

--
Alternatively, such fuzzy rule can be specified as list of parts, each part being one of:
* n "fuzzy" bytes
* start/stop key part range (of the same length)

This might be closer to "human-readable" definition, though the former one could be easier
to deal with.

Anil, as you expressed willing to work on this, what are your thoughts? May be you have smth
different in your mind?
                
> Implement FuzzyRowFilter with ranges support
> --------------------------------------------
>
>                 Key: HBASE-6618
>                 URL: https://issues.apache.org/jira/browse/HBASE-6618
>             Project: HBase
>          Issue Type: New Feature
>          Components: filters
>            Reporter: Alex Baranau
>            Priority: Minor
>
> Apart from current ability to specify fuzzy row filter e.g. for <userId_actionId>
format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify
the "fuzzy range" , e.g. ????_0004, ..., ????_0099.
> See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
> Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter,
but in case when the range is big (contains thousands of values) it is not efficient.
> Filter should perform efficient fast-forwarding during the scan (this is what distinguishes
it from regex row filter).
> While such functionality may seem like a proper fit for custom filter (i.e. not including
into standard filter set) it looks like the filter may be very re-useable. We may judge based
on the implementation that will hopefully be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message