hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ethan Jewett <esjew...@gmail.com>
Subject Re: Use of MD5 as row keys - is this safe?
Date Sun, 22 Jul 2012 14:21:45 GMT
To echo Joe Pallas:

Any fairly "random" hash algorithm producing the same length output should
have about the same extremely small chance of producing the same output for
two different inputs - a collision. It's a problem you need to be aware of
no matter what hash algorithm you use. (Hash functions are mappings from a
theoretically infinite input space to a finitely large output space, so
they obviously generate the same output for multiple inputs.)

SHA-1 specifically (and MD5 even more-so) has an attack that shows that
given a specific input and output, we can calculate a new input that
produces the same output with better than brute-force efficiency.

Collisions and collision attacks are two different things. Collision
attacks are a problem for cryptographic uses like signing, but how does
this have anything to do with the problem of generating hBase row keys?
Just use the fastest, most accessible, random-enough algorithm you can
find, and if you are really worried about collisions then do something to
ensure that the key will be unique. Right?

Cheers,
Ethan

On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel <michael_segel@hotmail.com>wrote:

> http://en.wikipedia.org/wiki/SHA-1
>
> Check out the comparisons between the different SHA algos.
>
> In theory a collision was found for SHA-1, but none found for SHA-2 does
> that mean that a collision doesn't exist? No, it means that it hasn't
> happened yet and the odds are that it won't be found. Possible? Yes,
> however, highly improbable. You have a better chance of winning the lotto...
>
> The point was that if you are going to hash your key,then concatenate the
> initial key, you would be better off looking at the SHA-1 option. You have
> to consider a couple of factors...
> 1: availability of the algo. SHA-1 is in the standard java API and is
> readily available.
> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For
> most, I'll say probably.
> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and
> the original key added.
>
> Just food for thought...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jul 20, 2012, at 3:35 PM, Joe Pallas <pallas@cs.stanford.edu> wrote:
>
> >
> > On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
> >
> >> I don't believe that there has been any reports of collisions, but if.
> You are concerned you could use the SHA-1 for generating the hash.
> Relatively speaking, SHA-1is slower, but still fast enough for most
> applications.
> >
> > Every hash function can have collisions, by definition.  If the
> correctness of your design depends on collisions being impossible, rather
> than very rare, then your design is faulty.
> >
> > Cryptographic hash functions have the property that it is
> computationally hard to create inputs that match a given output.  That
> doesn’t in itself make cryptographic hash functions better than other hash
> functions for avoiding hot-spotting.  (But it does usually make
> cryptographic hash functions more expensive to compute than other hash
> functions.)
> >
> > You may want to look at <http://www.strchr.com/hash_functions>  and <
> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633
> >.
> >
> > Hope this helps,
> > joe
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message