hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qingyan(Evan) Liu" <qingyan...@gmail.com>
Subject Re: help needed with base schema
Date Mon, 13 Jul 2009 18:14:57 GMT
Hi Piyush,

I think you just wanna fetch the most recent 20 updates for a user, do you?
If so, you can just use versions for the updates, and let hbase keep
only 20 versions, IMO.
How about?

sincerely,
Evan

2009/7/13 Piyush Goel <piyushgoel84@gmail.com>:
> Hi,
>>
>>
>> I am trying to design a high scale key value storage system. The hbase
>> table for the same is outlined below:
>>
>> {
>>   "userid1" : {
>>     "update" : {
>>         t3 : "some update1",
>>         t2 : "some update2",
>>         t1 : "some update3"
>>     },
>>     "sender" : {
>>         t3 : "sender3"
>>         t2 : "sender2"
>>         t1 : "sender1"
>>     },
>>
>>   "userid2" : {
>>     "update" : {
>>         t9 : "some update9",
>>         t6 : "some update534",
>>         t1 : "some update343"
>>     },
>>     "sender" : {
>>         t9 : "sender3"
>>         t6 : "sender2"
>>         t1 : "sender1"
>>     },
>>
>>
>> }
>>
>> The system is going to have around 15-20M users with around 3-4M put write
>> operations per day (which rules out mysql automatically). The max number of
>> entries in "update" and "sender" columns  will be around 1000 (around 1
>> weeks updates)
>>
>> My queries would be like "For a given userid, return top 20 updates,
>> senders based on timestamp". Is there a way to make a secondary index on
>> "userid, timestamp" which can help speed up my "get" calls? Or how can I
>> change my schema design to minimize response time for get calls ?
>>
>>
>> thanks,
>
> piyush
>

Mime
View raw message