Hi,
As currently focus on performance, I find this JIRA(4064) and do
some investigation on it.
The Performance improvement of java.util.HashMap is about two
aspects: one is hash algorithm that make hashcode randomly distributed
enough, just as Tim's fix/comments in Harmony4064, the other is about
to make hash algorithm itself perform fast enough. HashMap is so
widely used so that both two aspects may be very important.
I strongly agree with Tim that hash algorithm that make hashcode
randomly distributed is very important, for HashMap.put(), get(), etc,
depends on it and it should perform better if the hash algorithm can
make it randomly distributed. On the other hand, the hash algorithm
should perform very fast as every HashMap.put and get method call this
hash algorithm, if it is slow itself, then its benfits on randomly
distribution became meaningless.
I do a lot of benchmark and microbenchmarks, it shows that (a &
b) is much faster than (a % b). And if b is Math.pow(2,n)1, the two
operations just get the same result. That's why if we choose
"&(Math.pow(2,n)1)" (here Math.pow(2,n) is the size of bucket of a
hashmap)to replace "%", by this way we improve performance greatly at
least 2% ~ 3% in some benchmarks . This shows if we avoid slow
operation like mod(%), multiply(*), division(/) and so forth, we can
get performance improvement.
Yes, I do think to randomly distrubute the hashcode is very
important and I suggest we can use other machnisms other than mod
prime to make it happen. One thought is rightshift the hashcode and
combine bits together (by AND or OR or Exclusive Or operation) into
low order(e.g, (a & (a >> 8) ^ (a >> 16)  (a >> 24)) &
(bucketSize1), a small test on it shows that it does distributed more
randomly, but it may be better algorithms. In this way the hashcode
may be more random destributed and performs fast. And in this way we
may keep a balance of a both fast and good hash algorithm.
Any comments/suggestions? Thanks!
2007/6/6, Tim Ellison (JIRA) <jira@apache.org>:
>
> [ https://issues.apache.org/jira/browse/HARMONY4064?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel#action_12501905
]
>
> Tim Ellison commented on HARMONY4064:
> 
>
> Robert. You patch undoes the change I made to make the length of the elements array
a prime number, so that it is now always a power of two. The problem is that identity hash
codes are not randomly distributed, they typically reflect object address alignments which
will be factors of two, so we don't get a good distribution across all the possible buckets.
>
> You can see the effect in this simple 'simulation':
>
> public class HashingTest {
>
> public static void main(String[] args) {
> int numBuckets = 16;
>
> int[] distrib = new int[numBuckets];
> for (int i = 0; i < distrib.length; i++) {
> distrib[i] = 0;
> }
>
> for (int i = 0; i < 1000; i++) {
> int hashCode = new Object().hashCode();
> int bucket = hashCode & (numBuckets  1);
> distrib[bucket] = distrib[bucket] + 1;
> }
>
> System.out.println("Bucket\tList_length");
> for (int i = 0; i < distrib.length; i++) {
> System.out.println(i + "\t" + distrib[i]);
> }
> }
> }
>
> I suggest that we have to ensure the elements array length is chosen more carefully,
or that we modify the hashCodes to avoid these collisions.
>
> > [classlib][luni] Performance improvement of java.util.HashMap
> > 
> >
> > Key: HARMONY4064
> > URL: https://issues.apache.org/jira/browse/HARMONY4064
> > Project: Harmony
> > Issue Type: Improvement
> > Components: Classlib
> > Reporter: Robert Hu
> > Assignee: Tim Ellison
> > Attachments: HARMONY4064.diff
> >
> >
> > The performance of HashMap can be improved, the hash method is improved.
> > By running the following test code, our HashMap can be improved from 110% time cost
(compared by RI) to 90% time cost (compared by RI).
> > public class TestHashMap {
> > public static void main(String[] args) {
> > Random ran = new Random();
> > HashMap map = new HashMap();
> > int elementNum = 500;
> > int times = 10000;
> > int[] rans = new int[elementNum];
> > for (int i = 0; i < elementNum; i++)
> > rans[i] = ran.nextInt(Integer.MAX_VALUE);
> > long start = System.currentTimeMillis();
> > for (int i = 0; i < elementNum; i++)
> > map.put(rans[i], "b");
> > System.out.println(System.currentTimeMillis()  start);
> > start = System.currentTimeMillis();
> > for (int i = 0; i < elementNum; i++)
> > for(int j = 0; j< times; j++){
> > map.get(rans[i]);
> > }
> > System.out.println(System.currentTimeMillis()  start);
> > }
> > }
>
> 
> This message is automatically generated by JIRA.
> 
> You can reply to this email to add a comment to the issue online.
>
>

Best Regards!
Jimmy, Jing Lv
China Software Development Lab, IBM
