hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Use of MD5 as row keys - is this safe?
Date Mon, 23 Jul 2012 11:55:18 GMT

Its not a question of collision attacks, although that is an indication of the strength of
the hash. 

Just because both MD5 and SHA-1 yield a key of the same length, that doesn't mean that they
will have the same chance of a collision. The algorithm has some impact on this too. 

I agree that if you're looking to hash the key, MD5 should be good enough. If you don't like
it, SHA-1 or SHA-2 are also available as part of Sun's JDK. 
(See Appendix A here: http://docs.oracle.com/javase/6/docs/technotes/guides/security/p11guide.html)

Key's longer than 512 are not exportable from the US. 

The point I was making was that if you are not comfortable in using MD5 and you want to append
the key to the hash, you may want to consider a different hash. 

In terms of Uniqueness, take a UUID , hash it, and then truncate the hash to 128 bits from
160 bits. This actually is a form of the UUID. While this could increase the likelihood of
a collision, I haven't seen one in real use. 

The idea of creating a hash and then prepending it to the key is a tad pessimistic.

That was the point. 

On Jul 22, 2012, at 9:21 AM, Ethan Jewett wrote:

> To echo Joe Pallas:
> Any fairly "random" hash algorithm producing the same length output should
> have about the same extremely small chance of producing the same output for
> two different inputs - a collision. It's a problem you need to be aware of
> no matter what hash algorithm you use. (Hash functions are mappings from a
> theoretically infinite input space to a finitely large output space, so
> they obviously generate the same output for multiple inputs.)
> SHA-1 specifically (and MD5 even more-so) has an attack that shows that
> given a specific input and output, we can calculate a new input that
> produces the same output with better than brute-force efficiency.
> Collisions and collision attacks are two different things. Collision
> attacks are a problem for cryptographic uses like signing, but how does
> this have anything to do with the problem of generating hBase row keys?
> Just use the fastest, most accessible, random-enough algorithm you can
> find, and if you are really worried about collisions then do something to
> ensure that the key will be unique. Right?
> Cheers,
> Ethan
> On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel <michael_segel@hotmail.com>wrote:
>> http://en.wikipedia.org/wiki/SHA-1
>> Check out the comparisons between the different SHA algos.
>> In theory a collision was found for SHA-1, but none found for SHA-2 does
>> that mean that a collision doesn't exist? No, it means that it hasn't
>> happened yet and the odds are that it won't be found. Possible? Yes,
>> however, highly improbable. You have a better chance of winning the lotto...
>> The point was that if you are going to hash your key,then concatenate the
>> initial key, you would be better off looking at the SHA-1 option. You have
>> to consider a couple of factors...
>> 1: availability of the algo. SHA-1 is in the standard java API and is
>> readily available.
>> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For
>> most, I'll say probably.
>> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and
>> the original key added.
>> Just food for thought...
>> Sent from a remote device. Please excuse any typos...
>> Mike Segel
>> On Jul 20, 2012, at 3:35 PM, Joe Pallas <pallas@cs.stanford.edu> wrote:
>>> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
>>>> I don't believe that there has been any reports of collisions, but if.
>> You are concerned you could use the SHA-1 for generating the hash.
>> Relatively speaking, SHA-1is slower, but still fast enough for most
>> applications.
>>> Every hash function can have collisions, by definition.  If the
>> correctness of your design depends on collisions being impossible, rather
>> than very rare, then your design is faulty.
>>> Cryptographic hash functions have the property that it is
>> computationally hard to create inputs that match a given output.  That
>> doesn’t in itself make cryptographic hash functions better than other hash
>> functions for avoiding hot-spotting.  (But it does usually make
>> cryptographic hash functions more expensive to compute than other hash
>> functions.)
>>> You may want to look at <http://www.strchr.com/hash_functions>  and <
>> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633
>>> .
>>> Hope this helps,
>>> joe

View raw message