hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: what's the roadmap of secondary index of hbase?
Date Tue, 01 Mar 2011 21:44:39 GMT
I was wondering if it wouldn't be simpler to start with the synchronous
version of secondary indexes.  It's more complex at the time of the Put, but
at least you don't have to worry about all the edge cases where things are
getting out of sync (or are there still some?).

Also seems like it's possible for people to build their own async indexes,
while it's very difficult to do the sync version.  I have a feeling that
most people on the mailing list who bring up indexes are assuming that
they'd be synchronous because that's how they are in relational databases.

The problems I see for the sync version are the slow down you would
encounter from two phase commit and the read-before-write required to delete
the previous index row.  But, many people may be perfectly happy to
sacrifice performance for such valuable consistency.

As for the read-before-write issue, maybe the API could optionally let the
client specify the previous value if it knows it, which is often the case
for applications that read a row, modify it, then write the whole thing
back.

Matt

On Tue, Mar 1, 2011 at 4:11 PM, Bruno Dumon <bruno@outerthought.org> wrote:

> Have you thought of how the update of a secondary index would go?
>
> For example, suppose currently for a row with key 1 the value A is indexed,
> so in the index there's a row with key "A-1". Then row 1 gets updated to
> value B. This means in the index you have to remove the entry "A-1" and
> insert a new entry "B-1".
>
> The problem I see here is that you have to know the previous value that was
> indexed for the row, thus A in this case.
>
> This information could maybe be put in the waledit, but that would assume
> that a read is done before the write so that the previous value is known. I
> think this will also require some consideration of how this will work in
> case of recovery.
>
> The old values could also be retrieved from the older versions of the cell,
> but since the update of secondary indexes would be done asynchronously,
> there's no guarantee that those will still be there.
>
> Another alternative, which we use for the link-index in Lily (but is rather
> expensive), is to keep the index in both directions, thus the real index
> containing "A-1" and a 'forward index' containing "1-A". An index update
> first reads the entries from the forward index, then removes them from the
> real index, then removes them from the forward index, then inserts the new
> entries in the forward index, and finally in the index.
>
> On Mon, Feb 28, 2011 at 9:05 PM, Jonathan Gray <jgray@fb.com> wrote:
>
> > I've started a wiki page:
> > http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing
> >
> > I gave a basic description of the idea I had and the open questions.
> >
> > Let's get all our thoughts in there.
> >
> > > -----Original Message-----
> > > From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> > Stack
> > > Sent: Friday, February 25, 2011 4:07 PM
> > > To: dev@hbase.apache.org
> > > Subject: Re: what's the roadmap of secondary index of hbase?
> > >
> > > The MegaStore paper,
> > > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf, in section
> > > 3.2.2, lists secondary indexing options MegaStore provides on top of
> > > BigTable.  For example, MS allows specifying secondary index on
> protobuf
> > > cell content or duplicating data into secondary index so you have the
> > data to
> > > hand to satisfy first query and only if the client wants more do you go
> > dig in
> > > the primary table.  It also talks about how secondary indices can be
> > described
> > > using their schema which might be of use.  Might be worth a gander.
> > >
> > > St.Ack
> > >
> > > On Fri, Feb 25, 2011 at 3:32 PM, Stack <stack@duboce.net> wrote:
> > > > On Fri, Feb 25, 2011 at 1:47 PM, Eugene Koontz <ekoontz@hiro-tan.org
> >
> > > wrote:
> > > >> I'm thinking that we could use a coprocessor that watches the
> > > >> Write-Ahead Log (using the WAL-edit operations
> > > >>  https://issues.apache.org/jira/browse/HBASE-3257 "Coprocessors:
> > > >> Extend server side integration API to include HLog operations").
> This
> > > >> coprocessor would write these edits, perhaps filtering or
> > > >> transforming them, and enqueing the results in a global queue. A
> > > >> separate process would be responsible for pulling operations off the
> > > >> queue and using HBase client operations to do the insert into a
> > > >> secondary index table appropriate for that operation.
> > > >>    Perhaps we could use some of the work that the Lily people have
> > > >> done with HBase indexing (see
> > > >> http://www.lilyproject.org/lily/about/playground/hbaseindexes.html)
> > > >> in order to do the edit->hbase operation transformations and the
> > > >> secondary index table creation.
> > > >
> > > > This sounds good as first approach (including lily part).
> > > >
> > > > St.Ack
> > > >
> >
>
>
>
> --
> Bruno Dumon
> Outerthought
> http://outerthought.org/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message