hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Is it necessary to set MD5 on rowkey?
Date Thu, 20 Dec 2012 01:47:17 GMT
This what wrote:
>> If you salt, you will have to do a *FULL* *TABLE* *SCAN* in order to
>> retrieve the row.
>> If you do something like a salt that uses only  a preset of N combinations,
>> you will have to do N get()s in order to fetch the row.
>> 

By definition the salt is a random number which is the first part of the one way crypt() function.
Using some modulo function is the second half of what I said. ;-)


Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 19, 2012, at 7:35 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org> wrote:

> I have to disagree with the *FULL* *TABLE* *SCAN* in order to retrieve the row.
> 
> If I know that I have one byte salting between 1 and 10, I will have
> to do 10 gets to get the row. And they will most probably all be on
> different RS, so it will not be more than 1 get per server. This will
> take almost the same time as doing a simple get.
> 
> I understand your point that salting is inducting some bad things, but
> on the other side, it's easy and can still be usefull. Hash will allow
> you a direct access with one call, but you still need to calculate the
> hash. So what's faster? Calculate the hash and do one call to one
> server? Or go directly with one call to multiple servers? It all
> depend on the way you access your data.
> 
> Personnaly, I'm using hash almost everwhere, but I still understand
> that some people might be able to use salting for their specific
> purposes.
> 
> JM
> 
> 2012/12/19, Michael Segel <michael_segel@hotmail.com>:
>> Ok,
>> 
>> Lets try this one more time...
>> 
>> If you salt, you will have to do a *FULL* *TABLE* *SCAN* in order to
>> retrieve the row.
>> If you do something like a salt that uses only  a preset of N combinations,
>> you will have to do N get()s in order to fetch the row.
>> 
>> This is bad. VERY BAD.
>> 
>> If you hash the row, you will get a consistent value each time you hash the
>> key.  If you use SHA-1, the odds of a collision are mathematically possible,
>> however highly improbable. So people have recommended that they append the
>> key to the hash to form the new key. Here, you might as well as truncate the
>> hash to just the most significant byte or two and the append the key. This
>> will give you enough of an even distribution that you can avoid hot
>> spotting.
> 

Mime
View raw message