hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Czech <e...@nextbigsound.com>
Subject Re: Key formats and very low cardinality leading fields
Date Mon, 03 Sep 2012 19:06:52 GMT
Thanks for the response Jean-Marc!

I understand what you're saying but in a more extreme case, let's say
I'm choosing the leading number on the range 1 - 3 instead of 1 - 30.
In that case, it seems like all of the data for any one prefix would
already be split well across the cluster and as long as the second
value isn't written sequentially, there wouldn't be an issue.

Is my reasoning there flawed at all?

On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari
<jean-marc@spaggiari.org> wrote:
> Hi Eric,
>
> In HBase, data is stored sequentially based on the key alphabetical order.
>
> It will depend of the number of reqions and regionservers you have but
> if you write data from 23AAAAAA to 23ZZZZZZ they will most probably go
> to the same region even if the cardinality of the 2nd part of the key
> is high.
>
> If the first number is always changing between 1 and 30 for each
> write, then you will reach multiple region/servers if you have, else,
> you might have some hot-stopping.
>
> JM
>
> 2012/9/3, Eric Czech <eric@nextbigsound.com>:
>> Hi everyone,
>>
>> I was curious whether or not I should expect any write hot spots if I
>> structured my composite keys in a way such that the first field is a
>> low cardinality (maybe 30 distinct values) value and the next field
>> contains a very high cardinality value that would not be written
>> sequentially.
>>
>> More concisely, I want to do this:
>>
>> Given one number between 1 and 30, write many millions of rows with
>> keys like <number chosen> : <some generally distinct, non-sequential
>> value>
>>
>> Would there be any problem with the millions of writes happening with
>> the same first field key prefix even if the second field is largely
>> unique?
>>
>> Thank you!
>>

Mime
View raw message