hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Dumon <br...@outerthought.org>
Subject Re: what's the roadmap of secondary index of hbase?
Date Tue, 01 Mar 2011 21:11:47 GMT
Have you thought of how the update of a secondary index would go?

For example, suppose currently for a row with key 1 the value A is indexed,
so in the index there's a row with key "A-1". Then row 1 gets updated to
value B. This means in the index you have to remove the entry "A-1" and
insert a new entry "B-1".

The problem I see here is that you have to know the previous value that was
indexed for the row, thus A in this case.

This information could maybe be put in the waledit, but that would assume
that a read is done before the write so that the previous value is known. I
think this will also require some consideration of how this will work in
case of recovery.

The old values could also be retrieved from the older versions of the cell,
but since the update of secondary indexes would be done asynchronously,
there's no guarantee that those will still be there.

Another alternative, which we use for the link-index in Lily (but is rather
expensive), is to keep the index in both directions, thus the real index
containing "A-1" and a 'forward index' containing "1-A". An index update
first reads the entries from the forward index, then removes them from the
real index, then removes them from the forward index, then inserts the new
entries in the forward index, and finally in the index.

On Mon, Feb 28, 2011 at 9:05 PM, Jonathan Gray <jgray@fb.com> wrote:

> I've started a wiki page:
> http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
>
> I gave a basic description of the idea I had and the open questions.
>
> Let's get all our thoughts in there.
>
> > -----Original Message-----
> > From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> > Sent: Friday, February 25, 2011 4:07 PM
> > To: dev@hbase.apache.org
> > Subject: Re: what's the roadmap of secondary index of hbase?
> >
> > The MegaStore paper,
> > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
> > 3.2.2, lists secondary indexing options MegaStore provides on top of
> > BigTable.  For example, MS allows specifying secondary index on protobuf
> > cell content or duplicating data into secondary index so you have the
> data to
> > hand to satisfy first query and only if the client wants more do you go
> dig in
> > the primary table.  It also talks about how secondary indices can be
> described
> > using their schema which might be of use.  Might be worth a gander.
> >
> > St.Ack
> >
> > On Fri, Feb 25, 2011 at 3:32 PM, Stack <stack@duboce.net> wrote:
> > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <ekoontz@hiro-tan.org>
> > wrote:
> > >> I'm thinking that we could use a coprocessor that watches the
> > >> Write-Ahead Log (using the WAL-edit operations
> > >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
> > >> Extend server side integration API to include HLog operations"). This
> > >> coprocessor would write these edits, perhaps filtering or
> > >> transforming them, and enqueing the results in a global queue. A
> > >> separate process would be responsible for pulling operations off the
> > >> queue and using HBase client operations to do the insert into a
> > >> secondary index table appropriate for that operation.
> > >>    Perhaps we could use some of the work that the Lily people have
> > >> done with HBase indexing (see
> > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
> > >> in order to do the edit->hbase operation transformations and the
> > >> secondary index table creation.
> > >
> > > This sounds good as first approach (including lily part).
> > >
> > > St.Ack
> > >
>



-- 
Bruno Dumon
Outerthought
http://outerthought.org/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message