Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 94546 invoked from network); 20 Aug 2009 19:43:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Aug 2009 19:43:41 -0000 Received: (qmail 94529 invoked by uid 500); 20 Aug 2009 19:44:00 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 94510 invoked by uid 500); 20 Aug 2009 19:44:00 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 94496 invoked by uid 99); 20 Aug 2009 19:44:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 19:44:00 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jlist@streamy.com designates 72.34.249.3 as permitted sender) Received: from [72.34.249.3] (HELO mail.streamy.com) (72.34.249.3) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 19:43:51 +0000 Received: from [192.168.249.50] (static-98-112-71-211.lsanca.dsl-w.verizon.net [98.112.71.211]) by ns1.streamy.com (8.13.1/8.13.1) with ESMTP id n7KJhSbG021256 for ; Thu, 20 Aug 2009 12:43:29 -0700 Message-ID: <4A8DA731.9030408@streamy.com> Date: Thu, 20 Aug 2009 12:42:41 -0700 From: Jonathan Gray User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: hbase-dev@hadoop.apache.org Subject: Re: real prefix filter References: <4A8D43DD.6060103@b-ideas.eu> <4A8D6CF4.7080703@streamy.com> <78568af10908201150o7773ef5bl971d6c385e449587@mail.gmail.com> In-Reply-To: <78568af10908201150o7773ef5bl971d6c385e449587@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ns1.streamy.com X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.2.5 Thanks Ryan. Do note, this will not give you proper behavior on a Get, only a Scan. I don't just mean the prefix+while, I also mean using the WhileMatchFilter at all on a Get. Filters are allowed in Gets because you can filter on columns and values as well, they don't make sense on row keys. If you need an "early-out" filter with a Get, you most likely need to use a Scan instead. This kind of confusion/inconsistency is more reason to re-implement Gets as optimized Scans... JG Ryan Rawson wrote: > The expected idiom is like so: > scanSpec.setFilter(new WhileMatchFilter( > new PrefixFilter(prefix))); > > This is common for most filters, rather than encoding the 'stop once > past' type of logic, it is embedded in the while match flter and all > others are wrapped with it where necessary. > > -ryan > > 2009/8/20 Jonathan Gray : >> It should, perhaps, stop once you pass the prefix. I actually thought it >> did, but you and the code say otherwise. Doing the early-out with a Get is >> actually not possible, so this may be why it is not implemented as such. >> >> However, a Scan can take both a startRow and a stopRow. So you can use that >> to early-out instead. >> >> Given that filters now work with Gets, you cannot actually implement the >> early-out within the filter. You'll have to use start/stop rows. One could >> argue a prefix filter may not make much sense on a Get (since you must >> explicitly specify row), so if you'd like to raise that issue and see if we >> could integrate an early-out in the filter, please file a JIRA. >> >> JG >> >> Matus Zamborsky wrote: >>> Hello, >>> I am scaning hbase a table with Scan and I am using PrefixFilter. As I >>> understand, it scans the whole table and run the filter on every row. But >>> why it does not stop after finding row without the desired prefix? If it did >>> not find the prefix, if should return true in filterAllRemaining calling. >>> Combining this with possible specifing the start row in Scan object, one >>> can very fast filter only rows with the desired prefix. >>> >>> I am using hbase 0.20 from trunk. >>> >>> Regards >>> >>> Matus Zamborsky >>> >