hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shevek (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5727) Faster, simpler id.hashCode() which does not allocate memory
Date Tue, 28 Apr 2009 15:47:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703685#action_12703685

Shevek commented on HADOOP-5727:

bq. This is a good point. It is not necessary to create an object. However, returning the
id directly might not lead to a good hash code. We may have to add a hash code implementation.

This discussion was rampant in the very early days of Java: Consecutive objects, including
String, etc, have always had consecutive hash codes. In fact, it does not really matter because:

* HashMap is a list-hash, so consecutive hashes followed by a hash collision does not cause
a walk of a long linear hash chain. The length of the chain walked because of a collision
will only be 2.
* HashMap implements a supplementary transformation on hash codes (one of the Mersenne Twisters?),
so it is not necessary to ensure distribution or uniformity of the basic hash codes. In fact,
HashMap probably does better than the application code would do.
* Other Map strategies, such as R-B tree, do not use hashCode().
* In general, the issue of consecutive objects, especially numbers, generating consecutive
hash codes is accepted and understood by library authors who require more uniform distribution
of hash codes, and account is taken at that point.
* The J2SE internally uses this strategy, and they spend a lot more time thinking about these
problems than we do.

I therefore submit that this patch should be applied as-is.

> Faster, simpler id.hashCode() which does not allocate memory
> ------------------------------------------------------------
>                 Key: HADOOP-5727
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5727
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Shevek
>         Attachments: 00_id-noallocate.patch, 03_id-noallocate.patch
> Integer.valueOf allocates memory if the integer is not in the object-cache, which is
the vast majority of cases for the task id. It is possible to compute the hash code of an
integer without going via the integer cache, and hence avoiding allocating memory.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message