harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuri Dolgov" <dolgov.g.y...@gmail.com>
Subject Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)
Date Fri, 29 Jun 2007 12:32:13 GMT
Hi Jimmy,

I've done couple of experiments with different hash functions (including all
the functions proposed in this mail thread), but none of them worked
satisfactorily for hashmap with small number of elemts (32-64) when I've
changed only  upper bits. Do you have any progress so far?


Thanks,
Yuri

On 6/28/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
>
> 2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> > Jimmy,Jing Lv wrote:
> > > 2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> > >> Jimmy,Jing Lv wrote:
> > >> > One thought is right-shift the hashcode and combine bits together
> (by
> > >> > AND or OR or Exclusive Or operation) into low order(e.g, (a &
(a >>
> > >> > 8) ^ (a >> 16) | (a >> 24)) & (bucketSize-1), a small
test on it
> > >> > shows that it does distributed more randomly, but it may be better
> > >> > algorithms.
> > >>
> > >> Why so complex bit shifting?  How did you settle on this?
> > >>
> > >> If you are optimizing for the identity hashcode case then you can
> > >> probably ignore the bottom 'y' bits due to object alignment (and top
> 'z'
> > >> bits for address range limits), so just do (a >> 8) & (bucketSize
-
> 1)
> > >> assuming the objects are 8-bit aligned.
> > >
> > > It sounds interesting and reasonable :)
> > > But I'm afraid not all hashcode() returns the object's memory
> > > address,
> >
> > Understood.  The hashCode() can return any int value, so in principle
> > you have a random set of 32 bits an no bit manipulation is required.
> >
> > However, in practice I think we can optimize for typical hashcode
> > values.  The debate is what is 'typical'?  What is the most common type
> > of HashMap key? (then we can look at it's hashCode implementation).
> >
>
> Ah yes I hope someone may know what is 'typical' or common hashcode? :)
> But i'm afraid everyone who program in java may define his own
> hashcode. After all, the basic principle of hashcode is to make it as
> dispersive as possible, that's why I think we may use all 32 bits to
> find the bucket.
>
> > > in some other cases, the difference of hashcodes may remain
> > > in buttom 'y' bits or top 'z' bits, at that time our HashMap will
> > > perform very slow.
> >
> > You are right, I should not have thrown away those top bits.  So how
> > about a *rotate* right by eight bits operation?
> >
> > > So the best way is to use all 32 bit of a hashcode, and take extra
> > > operator '|' and '^' does not cost much time.( though I'm not sure if
> > > the the right-shift paramter '8' is proper)
> >
> > The operation you suggested was
> >   a & (a >> 8) ^ (a >> 16) | (a >> 24)
> >
> > can you please explain how you came up with this?
> >
>
> If we make it small, like 4, we have to spend much more time like:
> a & (a >> 4) ^ (a >> 8) | (a >> 12)^(a >> 16) ^ (a >>
20) | (a >> 24)...
> and if make it bigger, we may still lose the information after the
> operation &(2^n-1) if the parameter is a bit bigger than n, so I'm try
> to find a balance here, though the parameter '8' may be not best.
> As you know, the basic bucket of HashMap is 16( that is 2^4), and I
> have no idea what is the most commen size of the hashMap bucket? Or we
> may still use the size of the bucket as the parameter? As a result:
> a & (a >> n) ^ (a >> n*2) | (a >> n*3), here bucket size is 2^n
>
> I have no idea what is better, I'll try to benckmark and find some
> hash theory if any. Thanks to Yuri, I take a glance of two papers he
> offered, though one is copyright-ed, the other offered many classic
> hash algorithm, I may study and find out a best way by myself (of
> cource, avoid copying the code :) )
>
> > Regards,
> > Tim
> >
>
>
> --
>
> Best Regards!
>
> Jimmy, Jing Lv
> China Software Development Lab, IBM
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message