hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Baranau (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6618) Implement FuzzyRowFilter with ranges support
Date Thu, 23 Aug 2012 04:04:42 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Baranau updated HBASE-6618:
--------------------------------

    Attachment: HBASE-6618-algo-desc-bits.png
                HBASE-6618-algo.patch

Anil,

Now that I thought about it I just realized that finding the row key to fast-forward to, when
given any number of "range groups" in the fuzzy rules is quite easy. And also can be done
in just *one pass*, by going through the bytes of the given row.

Didn't have much time to add this functionality to the filter itself, but implemented the
algorithm that seems to find the row key to fast-forward to (if you are interested to look
at it). Added static method for that with small (not full) unit-test. Also attached brief
description of the algo. I hope I'm not missing anything.

Will implement the new feature of the filter as a next step.
                
> Implement FuzzyRowFilter with ranges support
> --------------------------------------------
>
>                 Key: HBASE-6618
>                 URL: https://issues.apache.org/jira/browse/HBASE-6618
>             Project: HBase
>          Issue Type: New Feature
>          Components: filters
>            Reporter: Alex Baranau
>            Priority: Minor
>         Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch
>
>
> Apart from current ability to specify fuzzy row filter e.g. for <userId_actionId>
format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify
the "fuzzy range" , e.g. ????_0004, ..., ????_0099.
> See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
> Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter,
but in case when the range is big (contains thousands of values) it is not efficient.
> Filter should perform efficient fast-forwarding during the scan (this is what distinguishes
it from regex row filter).
> While such functionality may seem like a proper fit for custom filter (i.e. not including
into standard filter set) it looks like the filter may be very re-useable. We may judge based
on the implementation that will hopefully be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message