Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28437F357 for ; Mon, 8 Apr 2013 19:11:10 +0000 (UTC) Received: (qmail 76947 invoked by uid 500); 8 Apr 2013 19:11:07 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 76876 invoked by uid 500); 8 Apr 2013 19:11:07 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 76853 invoked by uid 99); 8 Apr 2013 19:11:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 19:11:07 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of graeme.wallace@farecompare.com designates 74.125.149.75 as permitted sender) Received: from [74.125.149.75] (HELO na3sys009aog105.obsmtp.com) (74.125.149.75) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 08 Apr 2013 19:10:47 +0000 Received: from mail-ee0-f72.google.com ([74.125.83.72]) (using TLSv1) by na3sys009aob105.postini.com ([74.125.148.12]) with SMTP ID DSNKUWMWH/tyNCoVknyRiHQsKp/cjzX1Fc0h@postini.com; Mon, 08 Apr 2013 12:10:24 PDT Received: by mail-ee0-f72.google.com with SMTP id d4so6096620eek.3 for ; Mon, 08 Apr 2013 12:10:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=bfUbRBa01KKvFJmqp4iIbFDeN6/NKissWv4+BuJRu+c=; b=bhODYuseXPgoMT6Lbd/O30GAQAxlFxR8VRVi34gW1Sfgg12aJm0WStyknIDLcuI1cC WbAXggJwsPqKOTPCMY6DuhFnuH0J9GcNurxcor5WvML+lcUhAW+mWG1aK6fmKBLonTrM U1MIHV47djmzedbiKfGoNifxS+YqD7vUafXYRuj6Amn7eVaUyGXth1hclpuC1TaNO0SL rzb7wAW1zU3m5ZSNuDiywyyPe0BSjxLW1VOE+hEO1a/WrqEG03p8LnWw0ctFWyt6/vc2 CqwRoJKrLecGrtogBzJNVchfWzwKSraLCEqmIjddHRniehgfxvBnujn4eDPgy3G+odu5 oGdg== X-Received: by 10.204.183.194 with SMTP id ch2mr11458754bkb.114.1365448223409; Mon, 08 Apr 2013 12:10:23 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.204.183.194 with SMTP id ch2mr11458749bkb.114.1365448223181; Mon, 08 Apr 2013 12:10:23 -0700 (PDT) Received: by 10.205.83.8 with HTTP; Mon, 8 Apr 2013 12:10:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 8 Apr 2013 14:10:23 -0500 Message-ID: Subject: Re: Best way to query multiple sets of rows From: Graeme Wallace To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=20cf302666884d985904d9de301c X-Gm-Message-State: ALoCoQnH9e0E6af9LPCFIIUBrASQAyMOfadFhqQ0QBEv4pzYDMObZMyrbEQC8SfddAO078THHIIeuaEpdamxn5KmqiKKlnlG8IvoX5QyNUsCd/VEbEzpN9V1lx37/nUT0x0VphUSevzS8B1W8Ksso2iYhhXeYUewIZMLM4AnadSLbr7Grh277ko= X-Virus-Checked: Checked by ClamAV on apache.org --20cf302666884d985904d9de301c Content-Type: text/plain; charset=ISO-8859-1 Everyone - thanks for the replies. I have a followup question on Filters. boolean filterRowKey(byte [] buffer, int offset, int length) If i implement this to decide to include or exclude a row based upon my sets of rowkey pairs. How much I/O is involved to disk on each region server ? Will it just read row keys (hopefully from cache) until i say i need a row, then read the KeyValues for the columns i want and then pass into filterKeyValue() ? Is that the most efficient way of doing it ? I dont see a way of hinting for the next row i'm interested in (I'm assuming row keys are ordered ??), so does that mean for each region all the row keys are passed into the filter ? On Mon, Apr 8, 2013 at 1:39 PM, Ted Yu wrote: > For Scan: > > * To add a filter, execute {@link > #setFilter(org.apache.hadoop.hbase.filter.Filter) setFilter}. > > Take a look at RowFilter: > > * This filter is used to filter based on the key. It takes an operator > > * (equal, greater, not equal, etc) and a byte [] comparator for the row, > > You can enhance RowFilter so that you may specify the pair(s) of start and > end rows. > > Cheers > > On Mon, Apr 8, 2013 at 11:30 AM, Graeme Wallace < > graeme.wallace@farecompare.com> wrote: > > > I thought a Scan could only cope with one start row and an end row ? > > > > > > On Mon, Apr 8, 2013 at 1:27 PM, Jean-Marc Spaggiari < > > jean-marc@spaggiari.org > > > wrote: > > > > > Hi Greame, > > > > > > The scans are the right way to do that. > > > > > > They will give you back all the data you need, chunck by chunk. Then > > > yoiu have to iterate over the data to do what you want with it. > > > > > > What was your expectation? I'm not sure I'm getting your "so that i > > > dont have to issue sequential Scans". > > > > > > jM > > > > > > 2013/4/8 Graeme Wallace : > > > > Hi, > > > > > > > > Maybe there is an obvious way but i'm not seeing it. > > > > > > > > I have a need to query HBase for multiple chunks of data, that is > > > something > > > > equivalent to > > > > > > > > select columns > > > > from table > > > > where rowid between A and B > > > > or rowid between C and D > > > > or rowid between E and F > > > > etc. > > > > > > > > in SQL. > > > > > > > > Whats the best way to go about doing this so that i dont have to > issue > > > > sequential Scans ? > > > > > > > > -- > > > > Graeme Wallace > > > > CTO > > > > FareCompare.com > > > > O: 972 588 1414 > > > > M: 214 681 9018 > > > > > > > > > > > -- > > Graeme Wallace > > CTO > > FareCompare.com > > O: 972 588 1414 > > M: 214 681 9018 > > > -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018 --20cf302666884d985904d9de301c--