harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy,Jing Lv" <firep...@gmail.com>
Subject Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)
Date Fri, 29 Jun 2007 13:57:32 GMT
Properly our function is not good enough, perhaps we may find some
operater other than '|' and '^', e.g, '+', '-' or something? Though I
guess '|' and '^' are the fastest.

2007/6/29, Yuri Dolgov <dolgov.g.yuri@gmail.com>:
> Hi Jimmy,
>
> I've done couple of experiments with different hash functions (including all
> the functions proposed in this mail thread), but none of them worked
> satisfactorily for hashmap with small number of elemts (32-64) when I've
> changed only  upper bits. Do you have any progress so far?
>
>
> Thanks,
> Yuri
>
> On 6/28/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> >
> > 2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> > > Jimmy,Jing Lv wrote:
> > > > 2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> > > >> Jimmy,Jing Lv wrote:
> > > >> > One thought is right-shift the hashcode and combine bits together
> > (by
> > > >> > AND or OR or Exclusive Or operation) into low order(e.g, (a &
(a >>
> > > >> > 8) ^ (a >> 16) | (a >> 24)) & (bucketSize-1),
a small test on it
> > > >> > shows that it does distributed more randomly, but it may be better
> > > >> > algorithms.
> > > >>
> > > >> Why so complex bit shifting?  How did you settle on this?
> > > >>
> > > >> If you are optimizing for the identity hashcode case then you can
> > > >> probably ignore the bottom 'y' bits due to object alignment (and top
> > 'z'
> > > >> bits for address range limits), so just do (a >> 8) & (bucketSize
-
> > 1)
> > > >> assuming the objects are 8-bit aligned.
> > > >
> > > > It sounds interesting and reasonable :)
> > > > But I'm afraid not all hashcode() returns the object's memory
> > > > address,
> > >
> > > Understood.  The hashCode() can return any int value, so in principle
> > > you have a random set of 32 bits an no bit manipulation is required.
> > >
> > > However, in practice I think we can optimize for typical hashcode
> > > values.  The debate is what is 'typical'?  What is the most common type
> > > of HashMap key? (then we can look at it's hashCode implementation).
> > >
> >
> > Ah yes I hope someone may know what is 'typical' or common hashcode? :)
> > But i'm afraid everyone who program in java may define his own
> > hashcode. After all, the basic principle of hashcode is to make it as
> > dispersive as possible, that's why I think we may use all 32 bits to
> > find the bucket.
> >
> > > > in some other cases, the difference of hashcodes may remain
> > > > in buttom 'y' bits or top 'z' bits, at that time our HashMap will
> > > > perform very slow.
> > >
> > > You are right, I should not have thrown away those top bits.  So how
> > > about a *rotate* right by eight bits operation?
> > >
> > > > So the best way is to use all 32 bit of a hashcode, and take extra
> > > > operator '|' and '^' does not cost much time.( though I'm not sure if
> > > > the the right-shift paramter '8' is proper)
> > >
> > > The operation you suggested was
> > >   a & (a >> 8) ^ (a >> 16) | (a >> 24)
> > >
> > > can you please explain how you came up with this?
> > >
> >
> > If we make it small, like 4, we have to spend much more time like:
> > a & (a >> 4) ^ (a >> 8) | (a >> 12)^(a >> 16) ^ (a >>
20) | (a >> 24)...
> > and if make it bigger, we may still lose the information after the
> > operation &(2^n-1) if the parameter is a bit bigger than n, so I'm try
> > to find a balance here, though the parameter '8' may be not best.
> > As you know, the basic bucket of HashMap is 16( that is 2^4), and I
> > have no idea what is the most commen size of the hashMap bucket? Or we
> > may still use the size of the bucket as the parameter? As a result:
> > a & (a >> n) ^ (a >> n*2) | (a >> n*3), here bucket size is
2^n
> >
> > I have no idea what is better, I'll try to benckmark and find some
> > hash theory if any. Thanks to Yuri, I take a glance of two papers he
> > offered, though one is copyright-ed, the other offered many classic
> > hash algorithm, I may study and find out a best way by myself (of
> > cource, avoid copying the code :) )
> >
> > > Regards,
> > > Tim
> > >
> >
> >
> > --
> >
> > Best Regards!
> >
> > Jimmy, Jing Lv
> > China Software Development Lab, IBM
> >
>


-- 

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Mime
View raw message