hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1385) MD5Hash has a bad hash function
Date Fri, 18 May 2007 22:27:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497062
] 

Owen O'Malley commented on HADOOP-1385:
---------------------------------------

The extra function was to give an explicit name to what I was doing for hash. Clearly there
are lots of things that you could use as a hash code, so if someone just wants the first 4
bytes they can call the new function rather than hash code.

The loop is easier to read and less likely to get wrong and a decent optimizer could unroll
the loop for you. Granted, I expect in Java it is not done, but I don't think this function
to be a performance bottleneck.

> MD5Hash has a bad hash function
> -------------------------------
>
>                 Key: HADOOP-1385
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1385
>             Project: Hadoop
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.13.0
>
>         Attachments: 1385.patch
>
>
> The MD5Hash class has a really bad hash function, that will cause most most md5s to hash
to 0xFFFFFFxx leaving only the low order byte as meaningful. The problem comes from the automatic
sign extension when promoting from byte to int.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message