hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: HBASE - select distinct query against the rowkey
Date Thu, 20 Dec 2012 15:59:36 GMT
Not sure why you have timestamp in the key... assuming that message id would be incremented
therefore rows would be in time order anyways. 

But to answer your question... 
You will want to use a separate table.

In both instances you will end up doing a full table scan, however the number of rows in a
distinct user table would be much less than your user's table. 



On Dec 20, 2012, at 8:55 AM, Shengjie Min <kelvin.msj@gmail.com> wrote:

> I have a hbase table called "users", rowkey consists of three parts:
>   1. userid
>   2. messageid
>   3. timestamp
> rowkey looks like: ${userid}_${messageid}_${timestamp}
> Given I can hash the userid and make the length of the field fixed, is
> there anyway I can do a query like SQL query:
> select distinct(userid) from users
> If rowkey doesn't allow me to query like this, does that mean I need to
> create a separated table just contains all the user ids? I guess if I do
> something like that, it won't be atomic anymore when I insert a record in,
> becoz I am dealing with two tables without transaction.
> -- 
> All the best,
> Shengjie Min

View raw message