hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Haidinyak <phaidin...@local.com>
Subject RE: Row Key Question
Date Wed, 16 Feb 2011 21:58:13 GMT
Originally sent to just Stack and now sent to the list.

If I assign a row key a random value the writes will be distributed and populating HBase will
be faster. On the other hand if my scans will bring back blocks of data (vendor by date) where
each block of data can have tens of thousands of rows would the retrieval process be faster
if the key wasn't random?



> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of 
> Stack
> Sent: Wednesday, February 16, 2011 10:52 AM
> To: user@hbase.apache.org
> Cc: Peter Haidinyak
> Subject: Re: Row Key Question
> On Wed, Feb 16, 2011 at 10:48 AM, Peter Haidinyak <phaidinyak@local.com> wrote:
>> I'm not using the Timestamp alone, it is part of a compound key.
>> My old key included
>> <timestamp>|<vendor name>|<other data>
>> My new key will include
>> <vendor name>|<timestamp>|<other data>
> Yes.  Got that.  Was just trying to give you a bit more background to 
> highlight what the lads were saying before me.
>> This is still not ideal since a couple of vendor makes up over 50% of the logs. It
would be nice to prefix the key with a server Id and force the row to that server. With my
limited knowledge I don't know how  to do that yet.
> You don't want to do that (You'll learn why when you pick up more hbasics).
> Would suggest you not worry about the distribution.  Thats the point 
> of hbase.  You don't have to worry about where the stuff is.
> St.Ack

View raw message