hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Would ROWCOL Bloom filter help in Scan
Date Thu, 03 Dec 2015 22:05:09 GMT
On Thu, Dec 3, 2015 at 12:54 PM, Jerry He <jerryjch@gmail.com> wrote:

> Thanks. Stack.
> I will look into the code more as well.
> Do you think Column only Bloom Filter will help more with this SCAN +
> explicit columns case and with space saving?
>
>
Come again Jerry. Column-only? (It has to have a row on it, right?). And
how do we get space savings?

There is a bloom at the start of every row already, to speed deletes. IIRC,
we always read this first before we do anything. Perhaps we could beef it
up with more than just delete?

St.Ack



> Jerry
>
> On Thu, Dec 3, 2015 at 9:01 AM, Stack <stack@duboce.net> wrote:
>
> > On Wed, Dec 2, 2015 at 10:01 PM, Jerry He <jerryjch@gmail.com> wrote:
> >
> > > Thanks for the response.  You got my question correctly.
> > > If we are scanning the rows one by one and we have the requested column
> > in
> > > the column tracker, we have the row+column to look up in the bloom
> > filter,
> > > don't we? We may not be able to filter out the file scanners upfront.
> But
> > > may at the later time and lower level to skip something?
> > >
> > >
> > <I've not looked at the code>You are right. If more than one explicit
> > column specified, we could do a bloom check for the second and so on
> since
> > we'd have the current row to hand. It could make for a nice speedup for
> > scans of many explicit columns traversing a dataset that is sparsely
> > populated.</I've not looked at the code>.
> >
> > St.Ack
> >
> >
> >
> > > Jerry
> > >
> > > On Mon, Nov 30, 2015 at 10:55 PM, Stack <stack@duboce.net> wrote:
> > >
> > > > On Mon, Nov 30, 2015 at 9:56 AM, Jerry He <jerryjch@gmail.com>
> wrote:
> > > >
> > > > > Hi, experts
> > > > >
> > > > > HBASE supports ROWCOL bloom filter. ROW+COL would be the bloom key.
> > > > > In most of the documentations, it says only GET would benefit. For
> > > > > multi-column as well.
> > > > >
> > > > > If I do scan with StartRow and EndRow, and also specify columns.
> > > > > Would ROWCOL bloom filter provide any benefit in anyway?
> > > > >
> > > > >
> > > > If I understand your question properly, the answer is no. While we
> > might
> > > > have a set of columns to check in the bloom, we'd not know the set of
> > > rows
> > > > between start and end row and so would not be able to formulate a
> query
> > > > against the ROW+COL bloom filter.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > Thank you.
> > > > >
> > > > > Jerry
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message