harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Kuksenko" <sergey.kukse...@gmail.com>
Subject Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)
Date Fri, 29 Jun 2007 13:34:42 GMT
Hi All,

I wish to include myself into discussion and express some my thought in this
area.
I am absolutely agree with the idea of using bitwise operations & (or power
of two remainder) instead of  % (arbirary remainder) because of power of two
remainder is really faster and was proved by my tests and etc...

First of all I wish to note that if HashMap works with user-defined HashCode
it doesn't matter which reminder is used: power fo two (&) or prime(or
arbitrary) number reminder (%). In case if user defined HashCode we can't
predict hash and the hash can be suitable as for & and for %.
But prime number reminder has one disadavntages (except speed of operation).
When HashMap uses prime number (or some irreglar) we get negative effect of
unused part of memory page. Size of memory pages is power of two and it is
better to fill it completely. Espesially it is visible on large hash maps.
Unfortunately I can privide any numbers at the moment.
Thus in my opinion power of two reminder gives boost in time of operation
plus benefits from more natural memory usage (layouts etc.)

As to default hashCode which is IdentityHashCode.
There are two ways here:
1. Add new hashing function (additional hashing) inside HashMap. As was
suggested here.
2. Change IdentityHashCode implementation. There are no rules that
IdentityHashCode should return exactly address of object. And last zero bits
from IdentityHashCode (which caused by object alignment) can be removed
inside IdentityHashCode. Other words - to add additional hashing inside
IdentityHashCode.

The first solution has one advantage - it can be done only in classlib. The
second variant should be implemented in all VMs which are based on Harmony's
classlib. But this way has other advantages -  Well-formed IdentityHashCode
will helps not only for HashMap, but for Hashtable, IdentityHashMap and
moreover for all hashing implemented by users.
In the second variant we don't add rehashing function into HashMap. It is
common practice when advanced Java developers tune their HashCode functions.
If we add rehashing function into HashMap we may break their tuning, without
knowledge of internal rehashing functions it is impossible to tune HashCode
function.
>From my opinion we should think about the last argument against adding
rehashing into HashMap and move rehashing into VM.




On 6/27/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
>
> Hi,
>
>    As currently focus on performance, I find this JIRA(4064) and do
> some investigation on it.
>
>    The Performance improvement of java.util.HashMap is about two
> aspects: one is hash algorithm that make hashcode randomly distributed
> enough, just as Tim's fix/comments in Harmony-4064, the other is about
> to make hash algorithm itself perform fast enough. HashMap is so
> widely used so that both two aspects may be very important.
>
>    I strongly agree with Tim that hash algorithm that make hashcode
> randomly distributed is very important, for HashMap.put(), get(), etc,
> depends on it and it should perform better if the hash algorithm can
> make it randomly distributed. On the other hand, the hash algorithm
> should perform very fast as every HashMap.put and get method call this
> hash algorithm, if it is slow itself, then its benfits on randomly
> distribution became meaningless.
>
>    I do a lot of benchmark and micro-benchmarks,  it shows that (a &
> b) is much faster than (a % b). And if b is Math.pow(2,n)-1, the two
> operations just get the same result. That's why if we choose
> "&(Math.pow(2,n)-1)" (here Math.pow(2,n) is the size of bucket of a
> hashmap)to replace "%", by this way we improve performance greatly at
> least 2% ~ 3% in some benchmarks . This shows if we avoid slow
> operation like mod(%), multiply(*), division(/) and so forth, we can
> get performance improvement.
>
>    Yes, I do think to randomly distrubute the hashcode is very
> important and I suggest we can use other machnisms other than mod
> prime to make it happen. One thought is right-shift the hashcode and
> combine bits together (by AND or OR or Exclusive Or operation) into
> low order( e.g, (a & (a >> 8) ^ (a >> 16) | (a >> 24)) &
> (bucketSize-1), a small test on it shows that it does distributed more
> randomly, but it may be better algorithms. In this way the hashcode
> may be more random destributed and performs fast. And in this way we
> may keep a balance of a both fast and good hash algorithm.
>
> Any comments/suggestions? Thanks!
>
>
> 2007/6/6, Tim Ellison (JIRA) <jira@apache.org>:
> >
> >     [ https://issues.apache.org/jira/browse/HARMONY-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501905
> ]
> >
> > Tim Ellison commented on HARMONY-4064:
> > --------------------------------------
> >
> > Robert.  You patch undoes the change I made to make the length of the
> elements array a prime number, so that it is now always a power of two.  The
> problem is that identity hash codes are not randomly distributed, they
> typically reflect object address alignments which will be factors of two, so
> we don't get a good distribution across all the possible buckets.
> >
> > You can see the effect in this simple 'simulation':
> >
> > public class HashingTest {
> >
> >     public static void main(String[] args) {
> >         int numBuckets = 16;
> >
> >         int[] distrib = new int[numBuckets];
> >         for (int i = 0; i < distrib.length; i++) {
> >             distrib[i] = 0;
> >         }
> >
> >         for (int i = 0; i < 1000; i++) {
> >             int hashCode = new Object().hashCode();
> >             int bucket = hashCode & (numBuckets - 1);
> >             distrib[bucket] = distrib[bucket] + 1;
> >         }
> >
> >         System.out.println("Bucket\tList_length");
> >         for (int i = 0; i < distrib.length; i++) {
> >             System.out.println(i + "\t" + distrib[i]);
> >         }
> >     }
> > }
> >
> > I suggest that we have to ensure the elements array length is chosen
> more carefully, or that we modify the hashCodes to avoid these collisions.
> >
> > > [classlib][luni] Performance improvement of java.util.HashMap
> > > -------------------------------------------------------------
> > >
> > >                 Key: HARMONY-4064
> > >                 URL:
> https://issues.apache.org/jira/browse/HARMONY-4064
> > >             Project: Harmony
> > >          Issue Type: Improvement
> > >          Components: Classlib
> > >            Reporter: Robert Hu
> > >            Assignee: Tim Ellison
> > >         Attachments: HARMONY-4064.diff
> > >
> > >
> > > The performance of HashMap can be improved, the hash method is
> improved.
> > > By running the following test code, our HashMap can be improved from
> 110% time cost (compared by RI) to 90% time cost (compared by RI).
> > > public class TestHashMap {
> > >     public static void main(String[] args) {
> > >         Random ran = new Random();
> > >         HashMap map = new HashMap();
> > >         int elementNum = 500;
> > >         int times = 10000;
> > >         int[] rans = new int[elementNum];
> > >         for (int i = 0; i < elementNum; i++)
> > >             rans[i] = ran.nextInt(Integer.MAX_VALUE );
> > >         long start = System.currentTimeMillis();
> > >         for (int i = 0; i < elementNum; i++)
> > >             map.put(rans[i], "b");
> > >         System.out.println (System.currentTimeMillis() - start);
> > >         start = System.currentTimeMillis();
> > >         for (int i = 0; i < elementNum; i++)
> > >             for(int j = 0; j< times; j++){
> > >                 map.get(rans[i]);
> > >             }
> > >         System.out.println(System.currentTimeMillis() - start);
> > >     }
> > > }
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>
>
> --
>
> Best Regards!
>
> Jimmy, Jing Lv
> China Software Development Lab, IBM
>



-- 
Best regards,
---
Sergey Kuksenko.
Intel Enterprise Solutions Software Division.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message