hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Best way to query multiple sets of rows
Date Mon, 08 Apr 2013 20:55:27 GMT
I forgot to mention that, assuming A is the smallest row key, you can use
the following method (of Scan) to narrow the rows scanned:

  public Scan setStartRow(byte [] startRow) {

Another related feature is HBASE-6509: Implement fast-forwarding
FuzzyRowFilter to allow filtering rows e.g. by "???alex?b"

Cheers

On Mon, Apr 8, 2013 at 1:31 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Graeme,
>
> Each time filterRowKey will return true, the entire row will be
> skipped, so the data related to this row will not be read. However,
> there might still be some disk access if everything is not in memory,
> but not more than if you are doing a "regular" scan without any
> filter.
>
> I still think that calling the 3 scan in a raw without any filter will
> be faster than using the filter since there will be less operations.
> But both options might work.
>
> JMS
>
> 2013/4/8 Graeme Wallace <graeme.wallace@farecompare.com>:
> > Everyone - thanks for the replies.
> >
> > I have a followup question on Filters.
> >
> > boolean filterRowKey(byte [] buffer, int offset, int length)
> >
> > If i implement this to decide to include or exclude a row based upon my
> > sets of rowkey pairs.
> >
> > How much I/O is involved to disk on each region server ? Will it just
> read
> > row keys (hopefully from cache) until i say i need a row, then read the
> > KeyValues for the columns i want and then pass into filterKeyValue() ?
> >
> > Is that the most efficient way of doing it ? I dont see a way of hinting
> > for the next row i'm interested in (I'm assuming row keys are ordered
> ??),
> > so does that mean for each region all the row keys are passed into the
> > filter ?
> >
> >
> >
> > On Mon, Apr 8, 2013 at 1:39 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> For Scan:
> >>
> >>  * To add a filter, execute {@link
> >> #setFilter(org.apache.hadoop.hbase.filter.Filter) setFilter}.
> >>
> >> Take a look at RowFilter:
> >>
> >>  * This filter is used to filter based on the key. It takes an operator
> >>
> >>  * (equal, greater, not equal, etc) and a byte [] comparator for the
> row,
> >>
> >> You can enhance RowFilter so that you may specify the pair(s) of start
> and
> >> end rows.
> >>
> >> Cheers
> >>
> >> On Mon, Apr 8, 2013 at 11:30 AM, Graeme Wallace <
> >> graeme.wallace@farecompare.com> wrote:
> >>
> >> > I thought a Scan could only cope with one start row and an end row ?
> >> >
> >> >
> >> > On Mon, Apr 8, 2013 at 1:27 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org
> >> > > wrote:
> >> >
> >> > > Hi Greame,
> >> > >
> >> > > The scans are the right way to do that.
> >> > >
> >> > > They will give you back all the data you need, chunck by chunk. Then
> >> > > yoiu have to iterate over the data to do what you want with it.
> >> > >
> >> > > What was your expectation? I'm not sure I'm getting your "so that
i
> >> > > dont have to issue sequential Scans".
> >> > >
> >> > > jM
> >> > >
> >> > > 2013/4/8 Graeme Wallace <graeme.wallace@farecompare.com>:
> >> > > > Hi,
> >> > > >
> >> > > > Maybe there is an obvious way but i'm not seeing it.
> >> > > >
> >> > > > I have a need to query HBase for multiple chunks of data, that
is
> >> > > something
> >> > > > equivalent to
> >> > > >
> >> > > > select columns
> >> > > > from table
> >> > > > where rowid between A and B
> >> > > > or rowid between C and D
> >> > > > or rowid between E and F
> >> > > > etc.
> >> > > >
> >> > > > in SQL.
> >> > > >
> >> > > > Whats the best way to go about doing this so that i dont have
to
> >> issue
> >> > > > sequential Scans ?
> >> > > >
> >> > > > --
> >> > > > Graeme Wallace
> >> > > > CTO
> >> > > > FareCompare.com
> >> > > > O: 972 588 1414
> >> > > > M: 214 681 9018
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Graeme Wallace
> >> > CTO
> >> > FareCompare.com
> >> > O: 972 588 1414
> >> > M: 214 681 9018
> >> >
> >>
> >
> >
> >
> > --
> > Graeme Wallace
> > CTO
> > FareCompare.com
> > O: 972 588 1414
> > M: 214 681 9018
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message