hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: OT - Hash Code Creation
Date Thu, 17 Mar 2011 18:32:21 GMT
Just that base-64 is not uniformly distributed relative to a binary
representation.  This is simply  because it is all printable characters.  If
you do a 256 way pre-split based on a binary interpretation of the key, 64
regions will get traffic and 192 will get none.  Among other things, this
can seriously mess up benchmarking.  The situation is even worse with
decimal integer representations.

On Thu, Mar 17, 2011 at 11:19 AM, Chris Tarnas <cft@email.com> wrote:

> I'm not sure I am clear, are you saying 64 bit chunks of a MD5 keys are not
> uniformly distributed? Or that a base-64 encoding is not evenly distributed?
> thanks,
> -chris
> On Mar 17, 2011, at 10:23 AM, Ted Dunning wrote:
> There can be some odd effects with this because the keys are not uniformly
> distributed.  Beware if you are using pre-split tables because the region
> traffic can be pretty unbalanced if you do a naive split.
> On Thu, Mar 17, 2011 at 9:20 AM, Chris Tarnas <cft@email.com> wrote:
>> I've been using base-64 encoding when I use my hashes as rowkeys - makes
>> them printable while still being fairly dense, IIRC a 64bit key should be
>> only 11 characters.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message