hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Bishop <jbishop....@gmail.com>
Subject Use of MD5 as row keys - is this safe?
Date Fri, 20 Jul 2012 16:22:07 GMT
Hi,

I know it is a commonly suggested to use an MD5 checksum to create a row
key from some other identifier, such as a string or long. This is usually
done to guard against hot-spotting and seems to work well.

My concern is that there no guard against collision when this is done - two
different strings or longs could produce the same row-key. Although this is
very unlikely, it is bothersome to consider this possibility for large
systems.

So what I usually do is concatenate the MD5 with the original identifier...

MD5(id) + id

which assures that the rowkey is both randomly distributed and unique.

Is this necessary, or is it the common practice to just use the MD5
checksum itself?

Thanks,

Jon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message