Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of
 graeme.wallace@farecompare.com designates 74.125.149.75 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALte62wL4Xreps=i2M49wPMxtUFKSF513jbApQjj7yABHdmFBA@mail.gmail.com>
References: 
 <CAP0_YE_uNfWBK0WCtMvy-h4WOnvzwj-7irNwfGgCamgBevV8BQ@mail.gmail.com>
	<CAPQV63U+q9nSthNT4TMP5TEU=t3DYwrSDQgkqzaDbjys1+kWNQ@mail.gmail.com>
	<CAP0_YE9q26KCOth3-AByN=vqQ9Foiq_sRGnTtWy18BGRqJsfGw@mail.gmail.com>
	<CALte62wL4Xreps=i2M49wPMxtUFKSF513jbApQjj7yABHdmFBA@mail.gmail.com>
Date: Mon, 8 Apr 2013 14:10:23 -0500
Message-ID: 
 <CAP0_YE_dQFBriEu0Afsm2z=NQpYJSVCsnwB2nk=wEtb14VpM9Q@mail.gmail.com>
Subject: Re: Best way to query multiple sets of rows
From: Graeme Wallace <graeme.wallace@farecompare.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=20cf302666884d985904d9de301c

--20cf302666884d985904d9de301c
Content-Type: text/plain; charset=ISO-8859-1

Everyone - thanks for the replies.

I have a followup question on Filters.

boolean filterRowKey(byte [] buffer, int offset, int length)

If i implement this to decide to include or exclude a row based upon my
sets of rowkey pairs.

How much I/O is involved to disk on each region server ? Will it just read
row keys (hopefully from cache) until i say i need a row, then read the
KeyValues for the columns i want and then pass into filterKeyValue() ?

Is that the most efficient way of doing it ? I dont see a way of hinting
for the next row i'm interested in (I'm assuming row keys are ordered ??),
so does that mean for each region all the row keys are passed into the
filter ?


On Mon, Apr 8, 2013 at 1:39 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For Scan:
>
>  * To add a filter, execute {@link
> #setFilter(org.apache.hadoop.hbase.filter.Filter) setFilter}.
>
> Take a look at RowFilter:
>
>  * This filter is used to filter based on the key. It takes an operator
>
>  * (equal, greater, not equal, etc) and a byte [] comparator for the row,
>
> You can enhance RowFilter so that you may specify the pair(s) of start and
> end rows.
>
> Cheers
>
> On Mon, Apr 8, 2013 at 11:30 AM, Graeme Wallace <
> graeme.wallace@farecompare.com> wrote:
>
> > I thought a Scan could only cope with one start row and an end row ?
> >
> >
> > On Mon, Apr 8, 2013 at 1:27 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org
> > > wrote:
> >
> > > Hi Greame,
> > >
> > > The scans are the right way to do that.
> > >
> > > They will give you back all the data you need, chunck by chunk. Then
> > > yoiu have to iterate over the data to do what you want with it.
> > >
> > > What was your expectation? I'm not sure I'm getting your "so that i
> > > dont have to issue sequential Scans".
> > >
> > > jM
> > >
> > > 2013/4/8 Graeme Wallace <graeme.wallace@farecompare.com>:
> > > > Hi,
> > > >
> > > > Maybe there is an obvious way but i'm not seeing it.
> > > >
> > > > I have a need to query HBase for multiple chunks of data, that is
> > > something
> > > > equivalent to
> > > >
> > > > select columns
> > > > from table
> > > > where rowid between A and B
> > > > or rowid between C and D
> > > > or rowid between E and F
> > > > etc.
> > > >
> > > > in SQL.
> > > >
> > > > Whats the best way to go about doing this so that i dont have to
> issue
> > > > sequential Scans ?
> > > >
> > > > --
> > > > Graeme Wallace
> > > > CTO
> > > > FareCompare.com
> > > > O: 972 588 1414
> > > > M: 214 681 9018
> > >
> >
> >
> >
> > --
> > Graeme Wallace
> > CTO
> > FareCompare.com
> > O: 972 588 1414
> > M: 214 681 9018
> >
>


-- 
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018

--20cf302666884d985904d9de301c--