hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weishung Chung <weish...@gmail.com>
Subject Re: Stargate+hbase
Date Fri, 25 Mar 2011 16:27:22 GMT
Thank you so much for the informative info. It really helps me out.

For secondary index, even without transaction, I would think one could still
build a secondary index on another key especially if we have row level
locking. Correct me if I am wrong.

Also, I have read about clustered B-Tree used in InnoDB to implement
secondary index but I know that B-Tree is the primary limitation when come
to scalability and the main reason why NoSQL have discarded B-Tree. But it
would be super nice to be able to build the secondary index without using
another secondary table in HBase.

I am not complaining but I would love to see HBase continues to be the top
NoSQL solution out there :D
Way to go HBase !

On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <buttler1@llnl.gov> wrote:

> Do you know what it means to make secondary indexing a feature?  There are
> two reasonable outcomes:
> 1) adding ACID semantics (and thus killing scalability)
> 2) allowing the secondary index to be out of date (leading to every naïve
> user claiming that there is a serious bug that must be fixed).
>
> Secondary indexes are basically another way of storing (part of) the data.
>  E.g. another table, sorted on the field(s) that you want to search on.  In
> order to ensure consistency between the primary table and the secondary
> table (index), you have to guarantee that when you mutate the primary table
> that the secondary table is mutated in the same atomic transaction.  Since
> HBase only has row-level locks, this can't be guaranteed across tables.
>
> The situation is not hopeless, because in many cases you don't need to have
> perfectly consistent data and can afford to wait for cleanup tasks.  For
> some applications, you can ensure that the index is updated close enough to
> the table update (using external transactions, or something similar) that
> users would never notice.  One way to implement an eventually consistent
> secondary index would be to mimic the way cluster replication is done.
>
> However, what  I have described is difficult to do generically -- and there
> are engineering tradeoffs that need to be made.  If you absolutely need a
> transactional and consistent secondary index, I would suggest using Oracle,
> MySQL, or another relational database, where this was designed in as a
> primary feature.  Just don't complain that they are too slow or don't scale
> as well as HBase.
>
> </rant>
>
> Sorry for the rant.  If you want to have a secondary index here is what you
> need to do:
> Modify your application so that every time you write to the primary table,
> you also write to a secondary table, keyed off of the values you want to
> search on.  If you can't guarantee that the values form a secondary key
> (i.e. are unique across your entire table), you can make your key a compound
> key (see, for example, how "tsuna" designed OpenTSDB) with your primary key
> as a component.
>
> Then, when you need to query, you can do range queries over the secondary
> table to retrieve the keys in the primary table to return the full data row.
>
> Dave
>
> -----Original Message-----
> From: Wei Shung Chung [mailto:weishung@gmail.com]
> Sent: Friday, March 25, 2011 12:04 AM
> To: user@hbase.apache.org
> Subject: Re: Stargate+hbase
>
> I need to use secondary indexing too, hopefully this important feature
> will be made available soon :)
>
> Sent from my iPhone
>
> On Mar 25, 2011, at 12:48 AM, Stack <stack@duboce.net> wrote:
>
> > There is no native support for secondary indices in HBase (currently).
> > You will have to manage it yourself.
> > St.Ack
> >
> > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <sreejithpk@nesote.com
> > > wrote:
> >> I have tried secondary indexing. It seems I miss some points. Could
> >> you
> >> please explain how it is possible using secondary indexing?
> >>
> >>
> >> I have tried like,
> >>
> >>
> >>                Columnamilty1:kwd1
> >>                Columnamilty1:kwd2
> >> row1         Columnamilty1:kwd3
> >>                Columnamilty1:kwd2
> >>
> >>                Columnamilty1:kwd1
> >>                Columnamilty1:kwd2
> >> row2         Columnamilty1:kwd4
> >>                Columnamilty1:kwd5
> >>
> >>
> >> I need to get all rows which contain kwd1 and kwd2
> >>
> >> Please help.
> >> Thanks
> >>
> >>
> >> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >>
> >>> What you are asking for is a secondary index, and it doesn't exist
> >>> at
> >>> the moment in HBase (let alone REST). Googling a bit for "hbase
> >>> secondary indexing" will show you how people usually do it.
> >>>
> >>> J-D
> >>>
> >>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <sreejithpk@nesote.com
> >>> >
> >>> wrote:
> >>>> Is it possible using stargate interface to hbase,  fetch all rows
> >>>> where
> >>> more
> >>>> than one column family:<qualifier> must be present?
> >>>>
> >>>> like :select  rows which contains keyword:a and keyword:b ?
> >>>>
> >>>> Thanks
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Sreejith PK
> >> Nesote Technologies (P) Ltd
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message