harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy,Jing Lv" <firep...@gmail.com>
Subject Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)
Date Fri, 29 Jun 2007 14:13:45 GMT
2007/6/29, Sergey Kuksenko <sergey.kuksenko@gmail.com>:
> Hi All,
>
> I wish to include myself into discussion and express some my thought in this
> area.
> I am absolutely agree with the idea of using bitwise operations & (or power
> of two remainder) instead of  % (arbirary remainder) because of power of two
> remainder is really faster and was proved by my tests and etc...
>
> First of all I wish to note that if HashMap works with user-defined HashCode
> it doesn't matter which reminder is used: power fo two (&) or prime(or
> arbitrary) number reminder (%). In case if user defined HashCode we can't
> predict hash and the hash can be suitable as for & and for %.
> But prime number reminder has one disadavntages (except speed of operation).
> When HashMap uses prime number (or some irreglar) we get negative effect of
> unused part of memory page. Size of memory pages is power of two and it is
> better to fill it completely. Espesially it is visible on large hash maps.
> Unfortunately I can privide any numbers at the moment.
> Thus in my opinion power of two reminder gives boost in time of operation
> plus benefits from more natural memory usage (layouts etc.)
>
> As to default hashCode which is IdentityHashCode.
> There are two ways here:
> 1. Add new hashing function (additional hashing) inside HashMap. As was
> suggested here.
> 2. Change IdentityHashCode implementation. There are no rules that
> IdentityHashCode should return exactly address of object. And last zero bits
> from IdentityHashCode (which caused by object alignment) can be removed
> inside IdentityHashCode. Other words - to add additional hashing inside
> IdentityHashCode.
>
> The first solution has one advantage - it can be done only in classlib. The
> second variant should be implemented in all VMs which are based on Harmony's
> classlib. But this way has other advantages -  Well-formed IdentityHashCode
> will helps not only for HashMap, but for Hashtable, IdentityHashMap and
> moreover for all hashing implemented by users.

Ah, I never think of it, so you mean IdentityHashCode can be
implemented in vm? Doesn't vm return directly the address as the
IdentityHashCode?(it is easiest and fastest way)

> In the second variant we don't add rehashing function into HashMap. It is
> common practice when advanced Java developers tune their HashCode functions.
> If we add rehashing function into HashMap we may break their tuning, without
> knowledge of internal rehashing functions it is impossible to tune HashCode
> function.
> From my opinion we should think about the last argument against adding
> rehashing into HashMap and move rehashing into VM.
>

Yes, the improved identityHashCode may help, but usually users also
define their own hashcode, then the identityHashCode may not help,
while rehash in classlib still works, that's why I guess improvement
in classlib is better :)

>
>
>
> On 6/27/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> >
> > Hi,
> >
> >    As currently focus on performance, I find this JIRA(4064) and do
> > some investigation on it.
> >
> >    The Performance improvement of java.util.HashMap is about two
> > aspects: one is hash algorithm that make hashcode randomly distributed
> > enough, just as Tim's fix/comments in Harmony-4064, the other is about
> > to make hash algorithm itself perform fast enough. HashMap is so
> > widely used so that both two aspects may be very important.
> >
> >    I strongly agree with Tim that hash algorithm that make hashcode
> > randomly distributed is very important, for HashMap.put(), get(), etc,
> > depends on it and it should perform better if the hash algorithm can
> > make it randomly distributed. On the other hand, the hash algorithm
> > should perform very fast as every HashMap.put and get method call this
> > hash algorithm, if it is slow itself, then its benfits on randomly
> > distribution became meaningless.
> >
> >    I do a lot of benchmark and micro-benchmarks,  it shows that (a &
> > b) is much faster than (a % b). And if b is Math.pow(2,n)-1, the two
> > operations just get the same result. That's why if we choose
> > "&(Math.pow(2,n)-1)" (here Math.pow(2,n) is the size of bucket of a
> > hashmap)to replace "%", by this way we improve performance greatly at
> > least 2% ~ 3% in some benchmarks . This shows if we avoid slow
> > operation like mod(%), multiply(*), division(/) and so forth, we can
> > get performance improvement.
> >
> >    Yes, I do think to randomly distrubute the hashcode is very
> > important and I suggest we can use other machnisms other than mod
> > prime to make it happen. One thought is right-shift the hashcode and
> > combine bits together (by AND or OR or Exclusive Or operation) into
> > low order( e.g, (a & (a >> 8) ^ (a >> 16) | (a >> 24)) &
> > (bucketSize-1), a small test on it shows that it does distributed more
> > randomly, but it may be better algorithms. In this way the hashcode
> > may be more random destributed and performs fast. And in this way we
> > may keep a balance of a both fast and good hash algorithm.
> >
> > Any comments/suggestions? Thanks!
> >
> >
> > 2007/6/6, Tim Ellison (JIRA) <jira@apache.org>:
> > >
> > >     [ https://issues.apache.org/jira/browse/HARMONY-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501905
> > ]
> > >
> > > Tim Ellison commented on HARMONY-4064:
> > > --------------------------------------
> > >
> > > Robert.  You patch undoes the change I made to make the length of the
> > elements array a prime number, so that it is now always a power of two.  The
> > problem is that identity hash codes are not randomly distributed, they
> > typically reflect object address alignments which will be factors of two, so
> > we don't get a good distribution across all the possible buckets.
> > >
> > > You can see the effect in this simple 'simulation':
> > >
> > > public class HashingTest {
> > >
> > >     public static void main(String[] args) {
> > >         int numBuckets = 16;
> > >
> > >         int[] distrib = new int[numBuckets];
> > >         for (int i = 0; i < distrib.length; i++) {
> > >             distrib[i] = 0;
> > >         }
> > >
> > >         for (int i = 0; i < 1000; i++) {
> > >             int hashCode = new Object().hashCode();
> > >             int bucket = hashCode & (numBuckets - 1);
> > >             distrib[bucket] = distrib[bucket] + 1;
> > >         }
> > >
> > >         System.out.println("Bucket\tList_length");
> > >         for (int i = 0; i < distrib.length; i++) {
> > >             System.out.println(i + "\t" + distrib[i]);
> > >         }
> > >     }
> > > }
> > >
> > > I suggest that we have to ensure the elements array length is chosen
> > more carefully, or that we modify the hashCodes to avoid these collisions.
> > >
> > > > [classlib][luni] Performance improvement of java.util.HashMap
> > > > -------------------------------------------------------------
> > > >
> > > >                 Key: HARMONY-4064
> > > >                 URL:
> > https://issues.apache.org/jira/browse/HARMONY-4064
> > > >             Project: Harmony
> > > >          Issue Type: Improvement
> > > >          Components: Classlib
> > > >            Reporter: Robert Hu
> > > >            Assignee: Tim Ellison
> > > >         Attachments: HARMONY-4064.diff
> > > >
> > > >
> > > > The performance of HashMap can be improved, the hash method is
> > improved.
> > > > By running the following test code, our HashMap can be improved from
> > 110% time cost (compared by RI) to 90% time cost (compared by RI).
> > > > public class TestHashMap {
> > > >     public static void main(String[] args) {
> > > >         Random ran = new Random();
> > > >         HashMap map = new HashMap();
> > > >         int elementNum = 500;
> > > >         int times = 10000;
> > > >         int[] rans = new int[elementNum];
> > > >         for (int i = 0; i < elementNum; i++)
> > > >             rans[i] = ran.nextInt(Integer.MAX_VALUE );
> > > >         long start = System.currentTimeMillis();
> > > >         for (int i = 0; i < elementNum; i++)
> > > >             map.put(rans[i], "b");
> > > >         System.out.println (System.currentTimeMillis() - start);
> > > >         start = System.currentTimeMillis();
> > > >         for (int i = 0; i < elementNum; i++)
> > > >             for(int j = 0; j< times; j++){
> > > >                 map.get(rans[i]);
> > > >             }
> > > >         System.out.println(System.currentTimeMillis() - start);
> > > >     }
> > > > }
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > You can reply to this email to add a comment to the issue online.
> > >
> > >
> >
> >
> > --
> >
> > Best Regards!
> >
> > Jimmy, Jing Lv
> > China Software Development Lab, IBM
> >
>
>
>
> --
> Best regards,
> ---
> Sergey Kuksenko.
> Intel Enterprise Solutions Software Division.
>


-- 

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Mime
View raw message