hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: [jira] Commented: (HADOOP-80) binary key
Date Wed, 15 Mar 2006 08:13:32 GMT
Owen O'Malley (JIRA) wrote:
>> 2. Why bother to use md5 for hashCode()?  That could be expensive.  Why not implement
this like 
>> java.util.Arrays.hashCode() and UTF8.hashCode():
> Yeah, I considered doing something lighter than md5, but using md5 prevents pathological
cases from doing bad things. We also use md5 a lot around here, so it is a really useful default
for us, but it might make sense to have a lighter hash alternative. However, since in map/reduce
the hash function is only used for partitioning the map output, it seemed better to use a
known good hash function than taking a chance on a fast but sloppy hash function.

You can find an FNV hash implementation here: http://www.getopt.org 
(Apache license). Computationally it's similar in complexity to the 
above hashing schemes, but gives much better distribution. Perhaps worth 
a try.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message