2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> Jimmy,Jing Lv wrote:
> > 2007/6/28, Tim Ellison <t.p.ellison@gmail.com>:
> >> Jimmy,Jing Lv wrote:
> >> > One thought is rightshift the hashcode and combine bits together (by
> >> > AND or OR or Exclusive Or operation) into low order(e.g, (a & (a >>
> >> > 8) ^ (a >> 16)  (a >> 24)) & (bucketSize1), a small test
on it
> >> > shows that it does distributed more randomly, but it may be better
> >> > algorithms.
> >>
> >> Why so complex bit shifting? How did you settle on this?
> >>
> >> If you are optimizing for the identity hashcode case then you can
> >> probably ignore the bottom 'y' bits due to object alignment (and top 'z'
> >> bits for address range limits), so just do (a >> 8) & (bucketSize
 1)
> >> assuming the objects are 8bit aligned.
> >
> > It sounds interesting and reasonable :)
> > But I'm afraid not all hashcode() returns the object's memory
> > address,
>
> Understood. The hashCode() can return any int value, so in principle
> you have a random set of 32 bits an no bit manipulation is required.
>
> However, in practice I think we can optimize for typical hashcode
> values. The debate is what is 'typical'? What is the most common type
> of HashMap key? (then we can look at it's hashCode implementation).
>
Ah yes I hope someone may know what is 'typical' or common hashcode? :)
But i'm afraid everyone who program in java may define his own
hashcode. After all, the basic principle of hashcode is to make it as
dispersive as possible, that's why I think we may use all 32 bits to
find the bucket.
> > in some other cases, the difference of hashcodes may remain
> > in buttom 'y' bits or top 'z' bits, at that time our HashMap will
> > perform very slow.
>
> You are right, I should not have thrown away those top bits. So how
> about a *rotate* right by eight bits operation?
>
> > So the best way is to use all 32 bit of a hashcode, and take extra
> > operator '' and '^' does not cost much time.( though I'm not sure if
> > the the rightshift paramter '8' is proper)
>
> The operation you suggested was
> a & (a >> 8) ^ (a >> 16)  (a >> 24)
>
> can you please explain how you came up with this?
>
If we make it small, like 4, we have to spend much more time like:
a & (a >> 4) ^ (a >> 8)  (a >> 12)^(a >> 16) ^ (a >> 20)
 (a >> 24)...
and if make it bigger, we may still lose the information after the
operation &(2^n1) if the parameter is a bit bigger than n, so I'm try
to find a balance here, though the parameter '8' may be not best.
As you know, the basic bucket of HashMap is 16( that is 2^4), and I
have no idea what is the most commen size of the hashMap bucket? Or we
may still use the size of the bucket as the parameter? As a result:
a & (a >> n) ^ (a >> n*2)  (a >> n*3), here bucket size is 2^n
I have no idea what is better, I'll try to benckmark and find some
hash theory if any. Thanks to Yuri, I take a glance of two papers he
offered, though one is copyrighted, the other offered many classic
hash algorithm, I may study and find out a best way by myself (of
cource, avoid copying the code :) )
> Regards,
> Tim
>

Best Regards!
Jimmy, Jing Lv
China Software Development Lab, IBM
