hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-6618) Implement FuzzyRowFilter with ranges support
Date Thu, 10 Apr 2014 02:03:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964897#comment-13964897
] 

chunhui shen edited comment on HBASE-6618 at 4/10/14 2:02 AM:
--------------------------------------------------------------

bq.you have to somehow define how to put "?" if I want it as "normal byte".
We could use '\' before '?' to define the normal byte '?'

As my consideration,  user could construct FuzzyRowFilter with the readable String directly.
e.g  {noformat}???11??AA\x00??\?{noformat}
Using Bytes.toBytesBinary to convert the string to bytes, then parse the bytes, if the byte
is '?', mark it as non-fixed byte, if the byte is '\', skip it and see the next byte, and
so on

Of course, if user want to make '\x00' as 4 bytes, the above seems wrong.  
For this case, we should also support constructing FuzzyRowFilter with the readable byte array.
For example,  {noformat}???11??AA\x00??\? {noformat} => 
byte[0]='?'
byte[1]='?'
byte[2]='?'
byte[3]='1'
byte[4]='1'
byte[5]='?'
byte[6]='?'
byte[7]='A'
byte[8]='A'
byte[9]=0
byte[10]='?'
byte[11]='?'
byte[12]='\'
byte[13]='?'

Correct me if something wrong :)



was (Author: zjushch):
bq.you have to somehow define how to put "?" if I want it as "normal byte".
We could use '\' before '?' to define the normal byte '?'

As my consideration,  user could construct FuzzyRowFilter with the readable String directly.
e.g "???11??AA\x00??\?"
Using Bytes.toBytesBinary to convert the string to bytes, then parse the bytes, if the byte
is '?', mark it as non-fixed byte, if the byte is '\', skip it and see the next byte, and
so on

Of course, if user want to make '\x00' as 4 bytes, the above seems wrong.  
For this case, we should also support constructing FuzzyRowFilter with the readable byte array.
For example, "???11??AA\x00??\?" => 
byte[0]='?'
byte[1]='?'
byte[2]='?'
byte[3]='1'
byte[4]='1'
byte[5]='?'
byte[6]='?'
byte[7]='A'
byte[8]='A'
byte[9]=0
byte[10]='?'
byte[11]='?'
byte[12]='\'
byte[13]='?'

Correct me if something wrong :)


> Implement FuzzyRowFilter with ranges support
> --------------------------------------------
>
>                 Key: HBASE-6618
>                 URL: https://issues.apache.org/jira/browse/HBASE-6618
>             Project: HBase
>          Issue Type: New Feature
>          Components: Filters
>            Reporter: Alex Baranau
>            Assignee: Alex Baranau
>            Priority: Minor
>             Fix For: 0.99.0
>
>         Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch,
HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch
>
>
> Apart from current ability to specify fuzzy row filter e.g. for <userId_actionId>
format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify
the "fuzzy range" , e.g. ????_0004, ..., ????_0099.
> See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
> Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter,
but in case when the range is big (contains thousands of values) it is not efficient.
> Filter should perform efficient fast-forwarding during the scan (this is what distinguishes
it from regex row filter).
> While such functionality may seem like a proper fit for custom filter (i.e. not including
into standard filter set) it looks like the filter may be very re-useable. We may judge based
on the implementation that will hopefully be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message