hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: RowKey hashing in HBase 1.0
Date Tue, 05 May 2015 20:46:41 GMT
Yes, what you described  mod(hash(rowkey),n) where n is the number of regions will remove the
hotspotting issue. 

However, if your key is sequential you will only have regions half full post region split.


Look at it this way… 

If I have a key that is a sequential count 1,2,3,4,5 … I am always adding a new row to the
last region and its always being added to the right. (reading left from right.) Always at
the end of the line… 

So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999 and region 2 has
5000 to 10000.

Now my next row is 10001, the following is 10002 … so they will be added at the tail end
of region 2 until it splits.  (And so on, and so on…) 

If you take a modulus of the hash, you create n buckets. Again for each bucket… I will still
be adding a new larger number so it will be added to the right hand side or tail of the list.

Once a region is split… that’s it.  

Bucketing will solve the hot spotting issue by creating n lists of rows, but you’re still
always adding to the end of the list. 

Does that make sense? 


> On May 5, 2015, at 10:04 AM, jeremy p <athomewithagroovebox@gmail.com> wrote:
> 
> Thank you for your response!
> 
> So I guess 'salt' is a bit of a misnomer.  What I used to do is this :
> 
> 1) Say that my key value is something like '1234foobar'
> 2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
> 3) I mod the hash by my number of regions.  Let's say I have 2000 regions.
> 54824923 % 2000 = 923
> 4) I prepend that value to my original key value, so my new key is
> '923_1234foobar'
> 
> Is this the same thing you were talking about?
> 
> A couple questions :
> 
> * Why would my regions only be 1/2 full?
> * Why would I only use this for sequential keys?  I would think this would
> give better performance in any situation where I don't need range scans.
> For example, let's say my key value is a person's last name.  That will
> naturally cluster around certain letters, giving me an uneven distribution.
> 
> --Jeremy
> 
> 
> 
> On Sun, May 3, 2015 at 11:46 AM, Michael Segel <michael_segel@hotmail.com>
> wrote:
> 
>> Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
>> random) to the base table row key.
>> You’re better off using a truncated hash (md5 is fastest) so that at least
>> you can use a single get().
>> 
>> Common?
>> 
>> Only if your row key is mostly sequential.
>> 
>> Note that even with bucketing, you will still end up with regions only 1/2
>> full with the only exception being the last region.
>> 
>>> On May 1, 2015, at 11:09 AM, jeremy p <athomewithagroovebox@gmail.com>
>> wrote:
>>> 
>>> Hello all,
>>> 
>>> I've been out of the HBase world for a while, and I'm just now jumping
>> back
>>> in.
>>> 
>>> As of HBase .94, it was still common to take a hash of your RowKey and
>> use
>>> that to "salt" the beginning of your RowKey to obtain an even
>> distribution
>>> among your region servers.  Is this still a common practice, or is there
>> a
>>> better way to do this in HBase 1.0?
>>> 
>>> --Jeremy
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message