harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao-Feng Li" <xiaofeng...@gmail.com>
Subject Re: [classlib] Some further study on hashcode and HashMap(Was Re: Performance improvement of java.util.HashMap)
Date Wed, 04 Jul 2007 08:33:54 GMT
On 7/4/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> Hi XiaoFeng,
>
>      Ah, I may be explain myself very well here :) The bit shifting
> algorism is NOT used for randomization itself, but for trying to
> gather all infomation of 32 bits into log(2,buckectsize) bits, if the
> hashcode itself is not random enough, the indexing algorism will
> surely make a bad index. So I've made two hypothesis above. The more
> random the hashcode is, the better the shifting-algorism works. On the
> other hand, as the lower bits are important, if the last bit is
> wasted, it does harm to this shifting-algorism.
>      At the first I only check IBMVME, finding it always ends with a
> even number, which means the the last bit of default hashcode of an
> object is always '0', as a result the last bit is useless for hashing.
> And by checking DRLVM, I find it similar,  ends with 2 bits as '00',
> so we waste 2 bits here.
>      Hashcode is used for hash, as a result,  I believe obj_addr >> 2
> will improve hashcode randomness at a very low cost. I was thinking if
> we take it in the classlib, however we can not tell if the hashcode is
> from obj_addr or user define. As a result, we'd better put it in VM.
> This cost is worth, for the more random the hoshcode is, the better
> hash-related classes performs (That's the reason why we discuss
> HashMap indexing algorism, this shifting algorism will properly
> consume more time than using hashcode alone in this single method,
> however it do improve a lot in the whole)
>      Will it do some harm to other part of vm/classlib if we modify
> IdentityHashCode in VM (or kernel class java.lang.Object)?  If not, I
> guess we'd better modify IdentityHashCode to improve performance.

Thanks for the explanation. No, I don't think this >>2 costs anything
if implemented in VM. Probably you are right the >>2 helps to remove
some useless info.

Well, maybe I should read some random number theory, since I think
although >>2  removes the meaningless bits in LSB, it introduces two
meaningless bits in MSB. If you treat the 32 bit equally when encoding
their info into log(num_bucket) bits, the bits in LSB and MSB give you
the same effect. Or you assume the lower bits are more significant by
default?

Thanks,
xiaofeng


> 2007/7/4, Xiao-Feng Li <xiaofeng.li@gmail.com>:
> > On 7/4/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> > > Hi All,
> > >
> <SNIP>
> > >
> > >     To sum up, 1) I believe Sergey's idea do helps a lot in
> > > hash-related classes, which requires VM guys to do some modification
> > > in IdentityHashCode (a simple guess, object address >> 2 ?)
> >
> > Jimmy, do you suggest to use obj_addr >> 2 or a more complicated
> > hashing in VM? I am not sure if this >>2 really helps if the bucket
> > indexing does bit-shift or more randomization itself.
> >
> > (Note if we use a formula for hashing, it will be computed every time
> > it's accessed, we need consider the overhead of the computation. >>2
> > is very cheap, but as said above, I doubt about its effectiveness.)
> >
> > Thanks,
> > xiaofeng
> >
> <SNIP>
> > >
> >
> >
> > --
> > http://xiao-feng.blogspot.com
> >
>
>
> --
>
> Best Regards!
>
> Jimmy, Jing Lv
> China Software Development Lab, IBM
>


-- 
http://xiao-feng.blogspot.com

Mime
View raw message