harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy,Jing Lv" <firep...@gmail.com>
Subject Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)
Date Thu, 28 Jun 2007 03:07:39 GMT
Thanks so much Yuri !
But I'm a little concern before I click on these pages, I'm not sure
that we can touch that code/algorithm for Harmony development, is that
copyright-ed or patent-ed?

2007/6/27, Yuri Dolgov <dolgov.g.yuri@gmail.com>:
> Hi Jimmy,
> Here are couple of interesting links on hash functions implementation:
> http://burtleburtle.net/bob/hash/doobs.html
> http://www.azillionmonkeys.com/qed/hash.html
>
> In my point of view usage of shifts should give some benefit as in
> performance so in a distribution quality.
>
> Thanks,
> Yuri
>
>
>
> On 6/27/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> >
> > Hi,
> >
> >     As currently focus on performance, I find this JIRA(4064) and do
> > some investigation on it.
> >
> >     The Performance improvement of java.util.HashMap is about two
> > aspects: one is hash algorithm that make hashcode randomly distributed
> > enough, just as Tim's fix/comments in Harmony-4064, the other is about
> > to make hash algorithm itself perform fast enough. HashMap is so
> > widely used so that both two aspects may be very important.
> >
> >     I strongly agree with Tim that hash algorithm that make hashcode
> > randomly distributed is very important, for HashMap.put(), get(), etc,
> > depends on it and it should perform better if the hash algorithm can
> > make it randomly distributed. On the other hand, the hash algorithm
> > should perform very fast as every HashMap.put and get method call this
> > hash algorithm, if it is slow itself, then its benfits on randomly
> > distribution became meaningless.
> >
> >     I do a lot of benchmark and micro-benchmarks,  it shows that (a &
> > b) is much faster than (a % b). And if b is Math.pow(2,n)-1, the two
> > operations just get the same result. That's why if we choose
> > "&(Math.pow(2,n)-1)" (here Math.pow(2,n) is the size of bucket of a
> > hashmap)to replace "%", by this way we improve performance greatly at
> > least 2% ~ 3% in some benchmarks . This shows if we avoid slow
> > operation like mod(%), multiply(*), division(/) and so forth, we can
> > get performance improvement.
> >
> >     Yes, I do think to randomly distrubute the hashcode is very
> > important and I suggest we can use other machnisms other than mod
> > prime to make it happen. One thought is right-shift the hashcode and
> > combine bits together (by AND or OR or Exclusive Or operation) into
> > low order(e.g, (a & (a >> 8) ^ (a >> 16) | (a >> 24)) &
> > (bucketSize-1), a small test on it shows that it does distributed more
> > randomly, but it may be better algorithms. In this way the hashcode
> > may be more random destributed and performs fast. And in this way we
> > may keep a balance of a both fast and good hash algorithm.
> >
> > Any comments/suggestions? Thanks!
> >
> >
> > 2007/6/6, Tim Ellison (JIRA) <jira@apache.org>:
> > >
> > >     [
> > https://issues.apache.org/jira/browse/HARMONY-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501905]
> > >
> > > Tim Ellison commented on HARMONY-4064:
> > > --------------------------------------
> > >
> > > Robert.  You patch undoes the change I made to make the length of the
> > elements array a prime number, so that it is now always a power of two.  The
> > problem is that identity hash codes are not randomly distributed, they
> > typically reflect object address alignments which will be factors of two, so
> > we don't get a good distribution across all the possible buckets.
> > >
> > > You can see the effect in this simple 'simulation':
> > >
> > > public class HashingTest {
> > >
> > >     public static void main(String[] args) {
> > >         int numBuckets = 16;
> > >
> > >         int[] distrib = new int[numBuckets];
> > >         for (int i = 0; i < distrib.length; i++) {
> > >             distrib[i] = 0;
> > >         }
> > >
> > >         for (int i = 0; i < 1000; i++) {
> > >             int hashCode = new Object().hashCode();
> > >             int bucket = hashCode & (numBuckets - 1);
> > >             distrib[bucket] = distrib[bucket] + 1;
> > >         }
> > >
> > >         System.out.println("Bucket\tList_length");
> > >         for (int i = 0; i < distrib.length; i++) {
> > >             System.out.println(i + "\t" + distrib[i]);
> > >         }
> > >     }
> > > }
> > >
> > > I suggest that we have to ensure the elements array length is chosen
> > more carefully, or that we modify the hashCodes to avoid these collisions.
> > >
> > > > [classlib][luni] Performance improvement of java.util.HashMap
> > > > -------------------------------------------------------------
> > > >
> > > >                 Key: HARMONY-4064
> > > >                 URL:
> > https://issues.apache.org/jira/browse/HARMONY-4064
> > > >             Project: Harmony
> > > >          Issue Type: Improvement
> > > >          Components: Classlib
> > > >            Reporter: Robert Hu
> > > >            Assignee: Tim Ellison
> > > >         Attachments: HARMONY-4064.diff
> > > >
> > > >
> > > > The performance of HashMap can be improved, the hash method is
> > improved.
> > > > By running the following test code, our HashMap can be improved from
> > 110% time cost (compared by RI) to 90% time cost (compared by RI).
> > > > public class TestHashMap {
> > > >     public static void main(String[] args) {
> > > >         Random ran = new Random();
> > > >         HashMap map = new HashMap();
> > > >         int elementNum = 500;
> > > >         int times = 10000;
> > > >         int[] rans = new int[elementNum];
> > > >         for (int i = 0; i < elementNum; i++)
> > > >             rans[i] = ran.nextInt(Integer.MAX_VALUE);
> > > >         long start = System.currentTimeMillis();
> > > >         for (int i = 0; i < elementNum; i++)
> > > >             map.put(rans[i], "b");
> > > >         System.out.println(System.currentTimeMillis() - start);
> > > >         start = System.currentTimeMillis();
> > > >         for (int i = 0; i < elementNum; i++)
> > > >             for(int j = 0; j< times; j++){
> > > >                 map.get(rans[i]);
> > > >             }
> > > >         System.out.println(System.currentTimeMillis() - start);
> > > >     }
> > > > }
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > You can reply to this email to add a comment to the issue online.
> > >
> > >
> >
> >
> > --
> >
> > Best Regards!
> >
> > Jimmy, Jing Lv
> > China Software Development Lab, IBM
> >
>


-- 

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Mime
View raw message