hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anil Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filtering rows e.g. by "???alex?b"
Date Fri, 17 Aug 2012 17:37:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436901#comment-13436901

Anil Gupta commented on HBASE-6509:

Hi Alex,

I have a question related to this filter. I have a similar filtering requirement which will
be an extension to FuzzyFilterRow.
Suppose, i have the following structure of rowkeys: userid_actionid, where userid is of 6
digit and then actionid is 5 digit. I would like to get all the rows with actionid between
00200 to 00350. With current FuzzyRowFilter i can search for all the rows a particular actionid.
Instead of searching for a particular actionid i would like to search for a range of actionid.

Does this use case sounds like an extension to current FuzzyRowFilter? Can i run this kind
of filter on HBase0.92 without doing any significant update to the cluster. If i develop this
kind of filter then what is needed to run it on all the RS's?

> Implement fast-forwarding FuzzyRowFilter to allow filtering rows e.g. by "???alex?b"
> ------------------------------------------------------------------------------------
>                 Key: HBASE-6509
>                 URL: https://issues.apache.org/jira/browse/HBASE-6509
>             Project: HBase
>          Issue Type: New Feature
>          Components: filters
>            Reporter: Alex Baranau
>            Assignee: Alex Baranau
>            Priority: Minor
>             Fix For: 0.96.0
>         Attachments: HBASE-6509_1.patch, HBASE-6509_2.patch, HBASE-6509_3.patch, HBASE-6509.patch
> Implement fuzzy row key filter to allow fetching records e.g. by this criteria: "???alex?b".
> This seems to be very useful as an alternative to select records by row keys by specifying
their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations
this helps to avoid heavy full-table scans.
> This is especially effective when you have composite row key and (some of) its parts
has fixed length. E.g. with the key of format userId_actionId_time, given that userId and
actionId length is fixed, one can select user actions of specific type using fuzzy row key
by specifying mask "????_myaction". Given fast-forwarding nature of filter, this will usually
work much faster than doing whole table scan with any of the existing server-side filters.
> In many cases this can work as secondary-indexing alternative.
> Many times users implement it as a custom filter and many times they just don' know this
is possible. Let's add it to the common codebase.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message