hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: HBASE - select distinct query against the rowkey
Date Thu, 20 Dec 2012 16:48:34 GMT
There is no concept of transaction in the NoSQL world.  At least not in HBase.

All writes are atomic. Note that you *could* hold a lock, however, not really a good idea
for a client to hold a lock. 

Don't know if its really a problem though... 

HTH 

-Mike

On Dec 20, 2012, at 10:08 AM, Shengjie Min <kelvin.msj@gmail.com> wrote:

> Thanks Michael,
> 
>> Not sure why you have timestamp in the key... assuming that message id
> would be incremented therefore rows would be in time order anyways.
> 
> I will need to do query like give me the message from timestamp1 to
> timestamp2.
> 
>> You will want to use a separate table.
> That's what I thought as well. If i don't have a separated table, i will
> end up having table scanning. But how about the atomicity? If you write a
> record in, succeeded on one table failed on another? Hbase has no concept
> of transaction in this case.
> 
> Shengjie
> 
> 
> On 20 December 2012 15:59, Michael Segel <michael_segel@hotmail.com> wrote:
> 
>> Not sure why you have timestamp in the key... assuming that message id
>> would be incremented therefore rows would be in time order anyways.
>> 
>> But to answer your question...
>> You will want to use a separate table.
>> 
>> In both instances you will end up doing a full table scan, however the
>> number of rows in a distinct user table would be much less than your user's
>> table.
>> 
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Dec 20, 2012, at 8:55 AM, Shengjie Min <kelvin.msj@gmail.com> wrote:
>> 
>>> I have a hbase table called "users", rowkey consists of three parts:
>>> 
>>>  1. userid
>>>  2. messageid
>>>  3. timestamp
>>> 
>>> rowkey looks like: ${userid}_${messageid}_${timestamp}
>>> 
>>> Given I can hash the userid and make the length of the field fixed, is
>>> there anyway I can do a query like SQL query:
>>> 
>>> select distinct(userid) from users
>>> 
>>> If rowkey doesn't allow me to query like this, does that mean I need to
>>> create a separated table just contains all the user ids? I guess if I do
>>> something like that, it won't be atomic anymore when I insert a record
>> in,
>>> becoz I am dealing with two tables without transaction.
>>> --
>>> All the best,
>>> Shengjie Min
>> 
>> 
> 
> 
> -- 
> All the best,
> Shengjie Min


Mime
View raw message