hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by "???alex?b"
Date Mon, 06 Aug 2012 17:09:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429269#comment-13429269

Jonathan Hsieh commented on HBASE-6509:

Ted, interesting.  

I wonder if this kind of optimization is more generally applicable or it is worth pushing
down into the regex comparator.  

My concern is that this seems a bit special purpose.  I wonder if there is a jira already
filed for custom filter plugins/coprocessors. 
> Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by "???alex?b"
> ---------------------------------------------------------------------------------
>                 Key: HBASE-6509
>                 URL: https://issues.apache.org/jira/browse/HBASE-6509
>             Project: HBase
>          Issue Type: New Feature
>          Components: filters
>            Reporter: Alex Baranau
>            Assignee: Alex Baranau
>            Priority: Minor
>         Attachments: HBASE-6509.patch, HBASE-6509_1.patch
> Implement fuzzy row key filter to allow fetching records e.g. by this criteria: "???alex?b".
> This seems to be very useful as an alternative to select records by row keys by specifying
their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations
this helps to avoid heavy full-table scans.
> This is especially effective when you have composite row key and (some of) its parts
has fixed length. E.g. with the key of format userId_actionId_time, given that userId and
actionId length is fixed, one can select user actions of specific type using fuzzy row key
by specifying mask "????_myaction". Given fast-forwarding nature of filter, this will usually
work much faster than doing whole table scan with any of the existing server-side filters.
> In many cases this can work as secondary-indexing alternative.
> Many times users implement it as a custom filter and many times they just don' know this
is possible. Let's add it to the common codebase.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message