harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy,Jing Lv" <firep...@gmail.com>
Subject Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)
Date Thu, 28 Jun 2007 09:35:02 GMT
2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> Jimmy,Jing Lv wrote:
> > 2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> >> Jimmy,Jing Lv wrote:
> >> > One thought is right-shift the hashcode and combine bits together (by
> >> > AND or OR or Exclusive Or operation) into low order(e.g, (a & (a >>
> >> > 8) ^ (a >> 16) | (a >> 24)) & (bucketSize-1), a small test
on it
> >> > shows that it does distributed more randomly, but it may be better
> >> > algorithms.
> >>
> >> Why so complex bit shifting?  How did you settle on this?
> >>
> >> If you are optimizing for the identity hashcode case then you can
> >> probably ignore the bottom 'y' bits due to object alignment (and top 'z'
> >> bits for address range limits), so just do (a >> 8) & (bucketSize
- 1)
> >> assuming the objects are 8-bit aligned.
> >
> > It sounds interesting and reasonable :)
> > But I'm afraid not all hashcode() returns the object's memory
> > address,
>
> Understood.  The hashCode() can return any int value, so in principle
> you have a random set of 32 bits an no bit manipulation is required.
>
> However, in practice I think we can optimize for typical hashcode
> values.  The debate is what is 'typical'?  What is the most common type
> of HashMap key? (then we can look at it's hashCode implementation).
>

Ah yes I hope someone may know what is 'typical' or common hashcode? :)
But i'm afraid everyone who program in java may define his own
hashcode. After all, the basic principle of hashcode is to make it as
dispersive as possible, that's why I think we may use all 32 bits to
find the bucket.

> > in some other cases, the difference of hashcodes may remain
> > in buttom 'y' bits or top 'z' bits, at that time our HashMap will
> > perform very slow.
>
> You are right, I should not have thrown away those top bits.  So how
> about a *rotate* right by eight bits operation?
>
> > So the best way is to use all 32 bit of a hashcode, and take extra
> > operator '|' and '^' does not cost much time.( though I'm not sure if
> > the the right-shift paramter '8' is proper)
>
> The operation you suggested was
>   a & (a >> 8) ^ (a >> 16) | (a >> 24)
>
> can you please explain how you came up with this?
>

If we make it small, like 4, we have to spend much more time like:
a & (a >> 4) ^ (a >> 8) | (a >> 12)^(a >> 16) ^ (a >> 20)
| (a >> 24)...
and if make it bigger, we may still lose the information after the
operation &(2^n-1) if the parameter is a bit bigger than n, so I'm try
to find a balance here, though the parameter '8' may be not best.
As you know, the basic bucket of HashMap is 16( that is 2^4), and I
have no idea what is the most commen size of the hashMap bucket? Or we
may still use the size of the bucket as the parameter? As a result:
a & (a >> n) ^ (a >> n*2) | (a >> n*3), here bucket size is 2^n

I have no idea what is better, I'll try to benckmark and find some
hash theory if any. Thanks to Yuri, I take a glance of two papers he
offered, though one is copyright-ed, the other offered many classic
hash algorithm, I may study and find out a best way by myself (of
cource, avoid copying the code :) )

> Regards,
> Tim
>


-- 

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Mime
View raw message