harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao-Feng Li" <xiaofeng...@gmail.com>
Subject Re: [classlib] Some further study on hashcode and HashMap(Was Re: Performance improvement of java.util.HashMap)
Date Wed, 04 Jul 2007 09:07:54 GMT
On 7/4/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> 2007/7/4, Xiao-Feng Li <xiaofeng.li@gmail.com>:
> > On 7/4/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> > > Hi XiaoFeng,
> > >
> > >      Ah, I may be explain myself very well here :) The bit shifting
>
> Oops! I mean I may 'NOT' (instead of 'be') explain myself in the first mail :)
>
> > > algorism is NOT used for randomization itself, but for trying to
> > > gather all infomation of 32 bits into log(2,buckectsize) bits, if the
> > > hashcode itself is not random enough, the indexing algorism will
> > > surely make a bad index. So I've made two hypothesis above. The more
> > > random the hashcode is, the better the shifting-algorism works. On the
> > > other hand, as the lower bits are important, if the last bit is
> > > wasted, it does harm to this shifting-algorism.
> > >      At the first I only check IBMVME, finding it always ends with a
> > > even number, which means the the last bit of default hashcode of an
> > > object is always '0', as a result the last bit is useless for hashing.
> > > And by checking DRLVM, I find it similar,  ends with 2 bits as '00',
> > > so we waste 2 bits here.
> > >      Hashcode is used for hash, as a result,  I believe obj_addr >> 2
> > > will improve hashcode randomness at a very low cost. I was thinking if
> > > we take it in the classlib, however we can not tell if the hashcode is
> > > from obj_addr or user define. As a result, we'd better put it in VM.
> > > This cost is worth, for the more random the hoshcode is, the better
> > > hash-related classes performs (That's the reason why we discuss
> > > HashMap indexing algorism, this shifting algorism will properly
> > > consume more time than using hashcode alone in this single method,
> > > however it do improve a lot in the whole)
> > >      Will it do some harm to other part of vm/classlib if we modify
> > > IdentityHashCode in VM (or kernel class java.lang.Object)?  If not, I
> > > guess we'd better modify IdentityHashCode to improve performance.
> >
> > Thanks for the explanation. No, I don't think this >>2 costs anything
> > if implemented in VM. Probably you are right the >>2 helps to remove
> > some useless info.
> >
> > Well, maybe I should read some random number theory, since I think
> > although >>2  removes the meaningless bits in LSB, it introduces two
> > meaningless bits in MSB. If you treat the 32 bit equally when encoding
> > their info into log(num_bucket) bits, the bits in LSB and MSB give you
> > the same effect. Or you assume the lower bits are more significant by
> > default?
> >
>
> Yes,  the second hypothesis I made here is that the lower bits are
> more important than higher bits, we focus on lower bits more than
> higher bits (though not throw higher bits away) however it is
> imposible for them to be equal in indexing, especially in small
> buckets. We choose lower bits for most of time hashcodes alter in
> lower bits. And move meaningless bits to MSB here fits this algorithm
> :)

Ok, got it. Now I decide to modify the hashcode implementation with the >>2. :-)

> > Thanks,
> > xiaofeng
> >
> >
> > > 2007/7/4, Xiao-Feng Li <xiaofeng.li@gmail.com>:
> > > > On 7/4/07, Jimmy,Jing Lv <firepure@gmail.com> wrote:
> > > > > Hi All,
> > > > >
> > > <SNIP>
> > > > >
> > > > >     To sum up, 1) I believe Sergey's idea do helps a lot in
> > > > > hash-related classes, which requires VM guys to do some modification
> > > > > in IdentityHashCode (a simple guess, object address >> 2 ?)
> > > >
> > > > Jimmy, do you suggest to use obj_addr >> 2 or a more complicated
> > > > hashing in VM? I am not sure if this >>2 really helps if the bucket
> > > > indexing does bit-shift or more randomization itself.
> > > >
> > > > (Note if we use a formula for hashing, it will be computed every time
> > > > it's accessed, we need consider the overhead of the computation. >>2
> > > > is very cheap, but as said above, I doubt about its effectiveness.)
> > > >
> > > > Thanks,
> > > > xiaofeng
> > > >
> > > <SNIP>
> > > > >
> > > >
> > > >
> > > > --
> > > > http://xiao-feng.blogspot.com
> > > >
> > >
> > >
> > > --
> > >
> > > Best Regards!
> > >
> > > Jimmy, Jing Lv
> > > China Software Development Lab, IBM
> > >
> >
> >
> > --
> > http://xiao-feng.blogspot.com
> >
>
>
> --
>
> Best Regards!
>
> Jimmy, Jing Lv
> China Software Development Lab, IBM
>


-- 
http://xiao-feng.blogspot.com

Mime
View raw message