hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Stargate+hbase
Date Fri, 25 Mar 2011 17:10:20 GMT
Ugh. Redo.  I added pointer to David Butler's response above as an
intro to secondary indexing issues in hbase.
St.Ack

On Fri, Mar 25, 2011 at 10:09 AM, Stack <stack@duboce.net> wrote:
> I added pointer to below into our book as 'intro to secondary indexing
> in hbase'.
> St.Ack
>
> On Fri, Mar 25, 2011 at 8:39 AM, Buttler, David <buttler1@llnl.gov> wrote:
>> Do you know what it means to make secondary indexing a feature?  There are two reasonable
outcomes:
>> 1) adding ACID semantics (and thus killing scalability)
>> 2) allowing the secondary index to be out of date (leading to every naïve user claiming
that there is a serious bug that must be fixed).
>>
>> Secondary indexes are basically another way of storing (part of) the data.  E.g.
another table, sorted on the field(s) that you want to search on.  In order to ensure consistency
between the primary table and the secondary table (index), you have to guarantee that when
you mutate the primary table that the secondary table is mutated in the same atomic transaction.
 Since HBase only has row-level locks, this can't be guaranteed across tables.
>>
>> The situation is not hopeless, because in many cases you don't need to have perfectly
consistent data and can afford to wait for cleanup tasks.  For some applications, you can
ensure that the index is updated close enough to the table update (using external transactions,
or something similar) that users would never notice.  One way to implement an eventually
consistent secondary index would be to mimic the way cluster replication is done.
>>
>> However, what  I have described is difficult to do generically -- and there are
engineering tradeoffs that need to be made.  If you absolutely need a transactional and consistent
secondary index, I would suggest using Oracle, MySQL, or another relational database, where
this was designed in as a primary feature.  Just don't complain that they are too slow or
don't scale as well as HBase.
>>
>> </rant>
>>
>> Sorry for the rant.  If you want to have a secondary index here is what you need
to do:
>> Modify your application so that every time you write to the primary table, you also
write to a secondary table, keyed off of the values you want to search on.  If you can't
guarantee that the values form a secondary key (i.e. are unique across your entire table),
you can make your key a compound key (see, for example, how "tsuna" designed OpenTSDB) with
your primary key as a component.
>>
>> Then, when you need to query, you can do range queries over the secondary table to
retrieve the keys in the primary table to return the full data row.
>>
>> Dave
>>
>> -----Original Message-----
>> From: Wei Shung Chung [mailto:weishung@gmail.com]
>> Sent: Friday, March 25, 2011 12:04 AM
>> To: user@hbase.apache.org
>> Subject: Re: Stargate+hbase
>>
>> I need to use secondary indexing too, hopefully this important feature
>> will be made available soon :)
>>
>> Sent from my iPhone
>>
>> On Mar 25, 2011, at 12:48 AM, Stack <stack@duboce.net> wrote:
>>
>>> There is no native support for secondary indices in HBase (currently).
>>> You will have to manage it yourself.
>>> St.Ack
>>>
>>> On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <sreejithpk@nesote.com
>>> > wrote:
>>>> I have tried secondary indexing. It seems I miss some points. Could
>>>> you
>>>> please explain how it is possible using secondary indexing?
>>>>
>>>>
>>>> I have tried like,
>>>>
>>>>
>>>>                Columnamilty1:kwd1
>>>>                Columnamilty1:kwd2
>>>> row1         Columnamilty1:kwd3
>>>>                Columnamilty1:kwd2
>>>>
>>>>                Columnamilty1:kwd1
>>>>                Columnamilty1:kwd2
>>>> row2         Columnamilty1:kwd4
>>>>                Columnamilty1:kwd5
>>>>
>>>>
>>>> I need to get all rows which contain kwd1 and kwd2
>>>>
>>>> Please help.
>>>> Thanks
>>>>
>>>>
>>>> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <jdcryans@apache.org
>>>> >wrote:
>>>>
>>>>> What you are asking for is a secondary index, and it doesn't exist
>>>>> at
>>>>> the moment in HBase (let alone REST). Googling a bit for "hbase
>>>>> secondary indexing" will show you how people usually do it.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <sreejithpk@nesote.com
>>>>> >
>>>>> wrote:
>>>>>> Is it possible using stargate interface to hbase,  fetch all rows
>>>>>> where
>>>>> more
>>>>>> than one column family:<qualifier> must be present?
>>>>>>
>>>>>> like :select  rows which contains keyword:a and keyword:b ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sreejith PK
>>>> Nesote Technologies (P) Ltd
>>>>
>>
>

Mime
View raw message