Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E6E110804 for ; Thu, 10 Apr 2014 02:36:23 +0000 (UTC) Received: (qmail 98685 invoked by uid 500); 10 Apr 2014 02:36:22 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 98392 invoked by uid 500); 10 Apr 2014 02:36:21 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 98380 invoked by uid 99); 10 Apr 2014 02:36:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Apr 2014 02:36:20 +0000 Date: Thu, 10 Apr 2014 02:36:20 +0000 (UTC) From: "Alex Baranau (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964919#comment-13964919 ] Alex Baranau commented on HBASE-6618: ------------------------------------- got it thanx. bq. We could use '\' before '?' to define the normal byte '?' and then \ before \ if we need \. And so on. I mean we can do that. Having Strings _with special chars_ where impl expects _any_ bytes as normal input at times not trivial. And if we really want that we would make it in API in some from of standard. And then, we could have it everywhere, e.g. in Puts, etc. I am not sure we want to create specific format for one filter.. What are your thoughts on builder? seems like can help to avoid all those special chars and still keep it very human-friendly. We can allow also Strings as I mentioned with \x notation. But the difference is that no pain with special chars and easy guiding API for users... > Implement FuzzyRowFilter with ranges support > -------------------------------------------- > > Key: HBASE-6618 > URL: https://issues.apache.org/jira/browse/HBASE-6618 > Project: HBase > Issue Type: New Feature > Components: Filters > Reporter: Alex Baranau > Assignee: Alex Baranau > Priority: Minor > Fix For: 0.99.0 > > Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch > > > Apart from current ability to specify fuzzy row filter e.g. for format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify the "fuzzy range" , e.g. ????_0004, ..., ????_0099. > See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 > Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. > Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). > While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)