harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Cornwall" <andrew.pack...@gmail.com>
Subject Re: [classlib][pack200][performance] HashCode optimizations (was: Re: [jira] Updated: (HARMONY-5907) [classlib][pack200]CPUTF8.hashCode() is slow)
Date Thu, 24 Jul 2008 00:52:25 GMT
I recently tried a multiplicative PRNG as the hashcode rather than a simple
counter. This brought the time back up to about the same as using
Object.hashCode(). However, it's not very intuitive, and given that
Object.hashCode() isn't a bottleneck, I'd suggest we continue using that.

On Fri, Jul 18, 2008 at 3:49 PM, Andrew Cornwall <andrew.pack200@gmail.com>
wrote:

> Just to come back to this: I tried replacing Object.hashCode() with a
> static counter. The performance was significantly worse: >5 minutes instead
> of 34 seconds using my test case, even when using a large prime as the
> counter increment rather than a smaller number.
>
>
> On Mon, Jul 14, 2008 at 2:08 PM, Aleksey Shipilev <
> aleksey.shipilev@gmail.com> wrote:
>
>> That's the question worth experimenting.
>> But taking into consideration that hashcode will not be generated
>> frequently, Object.hashCode() seem to be well too.
>>
>> On Tue, Jul 15, 2008 at 12:37 AM, Andrew Cornwall
>> <andrew.pack200@gmail.com> wrote:
>> > I think Object.hashCode() will be better distributed than a static
>> increment
>> > - wouldn't that have better performance in Hash* objects?
>> >
>> >
>> > On Mon, Jul 14, 2008 at 11:44 AM, Aleksey Shipilev <
>> > aleksey.shipilev@gmail.com> wrote:
>> >
>> >> Nevertheless, Object.hashCode() performance is unclear and as such it
>> >> should be specifically avoided if not required explicitly. Will static
>> >> increment suit better?
>> >>
>> >> Thanks,
>> >> Aleksey.
>> >>
>> >> On Mon, Jul 14, 2008 at 10:38 PM, Andrew Cornwall
>> >> <andrew.pack200@gmail.com> wrote:
>> >> > Sorry - I didn't explain fully. I intended that my code would cache
>> the
>> >> > Object.hashCode() rather than recomputing the hashCode each time.
>> >> >
>> >> > On Mon, Jul 14, 2008 at 11:25 AM, Aleksey Shipilev <
>> >> > aleksey.shipilev@gmail.com> wrote:
>> >> >
>> >> >> I think that's bad for performance. Using Object.hashCode() leads
to
>> >> >> System.identityHashCode(), which is handled by VM. The exact
>> mechanism
>> >> >> is VM-dependent, but at least on Harmony it's pretty slow. If you
>> want
>> >> >> to mark each instance as not-identical to another (warning here,
you
>> >> >> may break something), then I suggest to use static increment, e.g.:
>> >> >>
>> >> >> private static int hashcodeBase = 1;
>> >> >> private int cachedHashCode = hashcodeBase++;
>> >> >>
>> >> >> This may end up with cache collisions if objects are created from
>> >> >> different threads, but that's not the case for now.
>> >> >>
>> >> >> Anyway, Andrew, your code lacks caching again ;)
>> >> >>
>> >> >> Thanks,
>> >> >> Aleksey.
>> >> >>
>> >> >> On Mon, Jul 14, 2008 at 9:51 PM, Andrew Cornwall
>> >> >> <andrew.pack200@gmail.com> wrote:
>> >> >> > And in fact, I just tried the following (which makes even
more
>> sense):
>> >> >> >
>> >> >> >  - add objectHashCode() to ClassFileEntry:
>> >> >> >    protected int objectHashCode() {
>> >> >> >        return super.hashCode();
>> >> >> >    }
>> >> >> >
>> >> >> > - change generateHashCode in ByteCode to return objectHashCode()
>> >> >> >   private int generateHashCode() {
>> >> >> >        hashcodeComputed = true;
>> >> >> >        return objectHashCode();
>> >> >> >   }
>> >> >> >
>> >> >> > Since ByteCodes are equal if and only if they are identical,
this
>> >> seems
>> >> >> to
>> >> >> > be the right thing to do. What do you think?
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Jul 14, 2008 at 10:44 AM, Andrew Cornwall <
>> >> >> andrew.pack200@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> >> I applied Aleksey's changes, and they look pretty good.
I
>> disagree
>> >> with
>> >> >> >> Sian to some degree about ByteCode. On my VM (which isn't
>> Harmony), a
>> >> >> new
>> >> >> >> empty array's hashCode() is dependent on the array's location
in
>> >> memory,
>> >> >> and
>> >> >> >> not the array's contents. In other words:
>> >> >> >>
>> >> >> >>         int[] x = new int[3];
>> >> >> >>         System.out.println(x.hashCode());
>> >> >> >>         x[1] = 5;
>> >> >> >>         System.out.println(x.hashCode());
>> >> >> >>
>> >> >> >> prints the same value for in both cases. rewrite.hashCode()
is a
>> >> handy
>> >> >> (if
>> >> >> >> lazy) way to distinguish among different instances.
>> >> >> >>
>> >> >> >> If we take rewrite out of hashCode(), HashMap and HashSet
get
>> really
>> >> >> slow -
>> >> >> >> essentially a linear search among all the ByteCodes of
the same
>> form.
>> >> >> This
>> >> >> >> brings my test case from 39 seconds up to 1:02.
>> >> >> >>
>> >> >> >> Perhaps the right thing to do is to give each unique instance
of
>> >> >> ByteCode
>> >> >> >> an integer ID which is used in creating the hashCode rather
than
>> >> relying
>> >> >> on
>> >> >> >> rewrite to give us uniqueness?
>> >> >> >>
>> >> >> >>
>> >> >> >> On Mon, Jul 14, 2008 at 7:19 AM, Sian January <
>> >> >> sianjanuary@googlemail.com>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >>> Ok - we'll wait and see what Andrew says.  The only
one that I'm
>> not
>> >> >> happy
>> >> >> >>> with is Bytecode.hashCode, because rewrite always
seems to be an
>> >> empty
>> >> >> >>> array
>> >> >> >>> at the point when generateHashCode is called so it's
a bit
>> >> misleading
>> >> >> >>> using
>> >> >> >>> it.  I think it should be ok to just remove that line,
and still
>> >> cache
>> >> >> the
>> >> >> >>> hashCode.
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On 14/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
>> wrote:
>> >> >> >>> >
>> >> >> >>> > Sian,
>> >> >> >>> >
>> >> >> >>> > Actually I had tried to extend Andrew's approach
to these
>> classes
>> >> >> >>> > first, but somehow I caught the degradation,
that leaved me no
>> >> choice
>> >> >> >>> > except the lazy initialization. My concern is,
the
>> >> >> >>> > constructor-initialized hashcode is really depend
on usage
>> pattern
>> >> >> for
>> >> >> >>> > each specific class, while lazy initialization
has more
>> guarantees
>> >> to
>> >> >> >>> > be performance-stable. Moreover, I suspect the
lazy
>> initialization
>> >> >> can
>> >> >> >>> > degrade performance much less because the only
overhead it
>> causes
>> >> is
>> >> >> >>> > checking the value of boolean field. On the other
hand, the
>> >> >> >>> > constructor initialization may degrade performance
a lot since
>> the
>> >> >> >>> > generation of hashCode is expensive.
>> >> >> >>> >
>> >> >> >>> > I can recheck which classes favor lazy initialization
and
>> which
>> >> are
>> >> >> >>> > not, but I think it's not valuable in terms of
efficiency. I
>> mean
>> >> >> here
>> >> >> >>> > that the boost connected with changing lazy initialization
to
>> >> >> >>> > constructor one is much lower than boost from
caching hashcode
>> >> >> anyway.
>> >> >> >>> >
>> >> >> >>> > Can we accept the patch in this form and revisit
this
>> difference
>> >> >> later?
>> >> >> >>> > It would be better to focus on more profitable
areas for
>> >> improvements
>> >> >> >>> for
>> >> >> >>> > now.
>> >> >> >>> >
>> >> >> >>> > P.S. I had asked Andrew to recheck whether my
patch works as
>> fast
>> >> as
>> >> >> >>> > his, also to check lazy initialization approach.
On my tests
>> the
>> >> >> boost
>> >> >> >>> > is stable and good.
>> >> >> >>> >
>> >> >> >>> > Thanks,
>> >> >> >>> > Aleksey.
>> >> >> >>> >
>> >> >> >>> > On Mon, Jul 14, 2008 at 5:25 PM, Sian January
>> >> >> >>> > <sianjanuary@googlemail.com> wrote:
>> >> >> >>> > > Hi Aleksey,
>> >> >> >>> > >
>> >> >> >>> > > It's a really good idea to extend this patch
to cover some
>> more
>> >> >> >>> classes,
>> >> >> >>> > but
>> >> >> >>> > > I think Andrew's method is faster (a final
cachedHashCode
>> field
>> >> >> that
>> >> >> >>> is
>> >> >> >>> > > initialized in the constructor).  The only
reason I see to
>> do it
>> >> >> later
>> >> >> >>> > would
>> >> >> >>> > > be if we thought some of these objects never
had hashCode
>> called
>> >> on
>> >> >> >>> them,
>> >> >> >>> > > but I don't think that's the case.  Would
you be able to try
>> >> that
>> >> >> >>> method
>> >> >> >>> > > instead in your patch and see if I'm right
about it being
>> >> faster?
>> >> >> >>> > >
>> >> >> >>> > > Thanks,
>> >> >> >>> > >
>> >> >> >>> > > Sian
>> >> >> >>> > >
>> >> >> >>> > >
>> >> >> >>> > >
>> >> >> >>> > > On 12/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com
>> >
>> >> >> wrote:
>> >> >> >>> > >>
>> >> >> >>> > >> Andrew,
>> >> >> >>> > >>
>> >> >> >>> > >> I had attached the patch to HARMONY-5907,
covering several
>> >> first
>> >> >> >>> > >> methods. Can you confirm this patch
helps for your
>> scenario?
>> >> >> >>> > >>
>> >> >> >>> > >> Thanks,
>> >> >> >>> > >> Aleksey.
>> >> >> >>> > >>
>> >> >> >>> > >> On Sat, Jul 12, 2008 at 1:58 PM, Aleksey
Shipilev
>> >> >> >>> > >> <aleksey.shipilev@gmail.com> wrote:
>> >> >> >>> > >> > And the sorted list:
>> >> >> >>> > >> >
>> >> >> >>> > >> > 95462388         bc.cputf8
>> >> >> >>> > >> > 18646908         bc.bytecode
>> >> >> >>> > >> > 15118425         bc.cpclass
>> >> >> >>> > >> > 14928914         bc.cpnametype
>> >> >> >>> > >> > 12103799         bc.cpmethref
>> >> >> >>> > >> > 5159994  bc.cpfieldref
>> >> >> >>> > >> > 3420605  bc.methref
>> >> >> >>> > >> > 1840965  bc.cpstring
>> >> >> >>> > >> > 839916   bc.codeattr
>> >> >> >>> > >> > 839916   bc.locvarattr
>> >> >> >>> > >> > 839916   bc.linenumattr
>> >> >> >>> > >> > 430234   bc.cpmethod
>> >> >> >>> > >> > 277144   bc.cpfield
>> >> >> >>> > >> > 263753   bc.attr
>> >> >> >>> > >> > 153811   bc.cpinteger
>> >> >> >>> > >> > 121856   bc.newattr
>> >> >> >>> > >> > 93471    bc.cvalattr
>> >> >> >>> > >> > 72492    bc.excpattr
>> >> >> >>> > >> > 57428    bc.srcfileattr
>> >> >> >>> > >> > 57428    bc.srcfileattr
>> >> >> >>> > >> > 48104    bc.cplong
>> >> >> >>> > >> > 40362    bc.innerclass
>> >> >> >>> > >> > 5593     bc.depattr
>> >> >> >>> > >> > 3255     bc.cpfloat
>> >> >> >>> > >> > 1638     bc.cpdouble
>> >> >> >>> > >> > 532      attrlayout
>> >> >> >>> > >> > 0       archive
>> >> >> >>> > >> > 0        attrdef
>> >> >> >>> > >> > 0        newattrband
>> >> >> >>> > >> > 0        bc.anndefarg
>> >> >> >>> > >> > 0        bc.rtannattr
>> >> >> >>> > >> > 0        classbands
>> >> >> >>> > >> > 0        filebands
>> >> >> >>> > >> > 0        metabandgr
>> >> >> >>> > >> > 0        segheader
>> >> >> >>> > >> > 0        bc.remattr
>> >> >> >>> > >> > 0        bc.annattr
>> >> >> >>> > >> > 0        bc.cpconst
>> >> >> >>> > >> > 0        bc.cpmember
>> >> >> >>> > >> > 0        bc.signattr
>> >> >> >>> > >> > 0        bandset
>> >> >> >>> > >> > 0        bcbands
>> >> >> >>> > >> > 0        cpbands
>> >> >> >>> > >> > 0        icbands
>> >> >> >>> > >> > 0        ictuple
>> >> >> >>> > >> > 0        segment
>> >> >> >>> > >> > 0        segopts
>> >> >> >>> > >> > 0        bc.classf
>> >> >> >>> > >> > 0        bc.cpref
>> >> >> >>> > >> > 0        bc.opmgr
>> >> >> >>> > >> > 0        bc.rtattr
>> >> >> >>> > >> > 0        segcp
>> >> >> >>> > >> > 0        bc.ccp
>> >> >> >>> > >> > 0        attrlayoutmap
>> >> >> >>> > >> > 0        bc.encmethattr
>> >> >> >>> > >> > 0        bc.exptableent
>> >> >> >>> > >> > 0        bc.locvartable
>> >> >> >>> > >> > 0        bc.signattr
>> >> >> >>> > >> >
>> >> >> >>> > >> > Thanks,
>> >> >> >>> > >> > Aleksey.
>> >> >> >>> > >> >
>> >> >> >>> > >> > On Sat, Jul 12, 2008 at 1:50 PM,
Aleksey Shipilev
>> >> >> >>> > >> > <aleksey.shipilev@gmail.com>
wrote:
>> >> >> >>> > >> >> Hi, Andrew!
>> >> >> >>> > >> >>
>> >> >> >>> > >> >> I had updated the internal
profiler to support
>> hashCode()
>> >> >> probes
>> >> >> >>> [1],
>> >> >> >>> > >> >> to extend your effort in hashcode
optimization. There
>> are
>> >> bunch
>> >> >> of
>> >> >> >>> > >> >> heavily used hashcodes, most
of them are going to
>> >> >> >>> Object.hashCode()
>> >> >> >>> > >> >> and then to System.identityHashCode().
We can
>> >> cache/implement
>> >> >> >>> > hashcode
>> >> >> >>> > >> >> for these classes. Here's the
profile:
>> >> >> >>> > >> >>
>> >> >> >>> > >> >> Hashcodes:
>> >> >> >>> > >> >>  archive:       0
>> >> >> >>> > >> >>  attrdef:       0
>> >> >> >>> > >> >>  attrlayout:    532
>> >> >> >>> > >> >>  attrlayoutmap: 0
>> >> >> >>> > >> >>  bandset:       0
>> >> >> >>> > >> >>  bcbands:       0
>> >> >> >>> > >> >>  classbands:    0
>> >> >> >>> > >> >>  cpbands:       0
>> >> >> >>> > >> >>  filebands:     0
>> >> >> >>> > >> >>  icbands:       0
>> >> >> >>> > >> >>  ictuple:       0
>> >> >> >>> > >> >>  metabandgr:    0
>> >> >> >>> > >> >>  newattrband:   0
>> >> >> >>> > >> >>  segcp:         0
>> >> >> >>> > >> >>  segheader:     0
>> >> >> >>> > >> >>  segment:       0
>> >> >> >>> > >> >>  segopts:       0
>> >> >> >>> > >> >>  bc.attr:        263753
>> >> >> >>> > >> >>  bc.remattr:     0
>> >> >> >>> > >> >>  bc.anndefarg:   0
>> >> >> >>> > >> >>  bc.annattr:     0
>> >> >> >>> > >> >>  bc.bytecode:    18646908
>> >> >> >>> > >> >>  bc.ccp:         0
>> >> >> >>> > >> >>  bc.classf:      0
>> >> >> >>> > >> >>  bc.codeattr:    839916
>> >> >> >>> > >> >>  bc.cvalattr:    93471
>> >> >> >>> > >> >>  bc.cpclass:     15118425
>> >> >> >>> > >> >>  bc.cpconst:     0
>> >> >> >>> > >> >>  bc.cpdouble:    1638
>> >> >> >>> > >> >>  bc.cpfield:     277144
>> >> >> >>> > >> >>  bc.cpfieldref:  5159994
>> >> >> >>> > >> >>  bc.cpfloat:     3255
>> >> >> >>> > >> >>  bc.cpinteger:   153811
>> >> >> >>> > >> >>  bc.methref:     3420605
>> >> >> >>> > >> >>  bc.cplong:      48104
>> >> >> >>> > >> >>  bc.cpmember:    0
>> >> >> >>> > >> >>  bc.cpmethod:    430234
>> >> >> >>> > >> >>  bc.cpmethref:   12103799
>> >> >> >>> > >> >>  bc.cpnametype:  14928914
>> >> >> >>> > >> >>  bc.cpref:       0
>> >> >> >>> > >> >>  bc.cpstring:    1840965
>> >> >> >>> > >> >>  bc.cputf8:      95462388
>> >> >> >>> > >> >>  bc.depattr:     5593
>> >> >> >>> > >> >>  bc.encmethattr: 0
>> >> >> >>> > >> >>  bc.excpattr:    72492
>> >> >> >>> > >> >>  bc.exptableent: 0
>> >> >> >>> > >> >>  bc.innerclass:  40362
>> >> >> >>> > >> >>  bc.linenumattr: 839916
>> >> >> >>> > >> >>  bc.locvarattr:  839916
>> >> >> >>> > >> >>  bc.locvartable: 0
>> >> >> >>> > >> >>  bc.newattr:     121856
>> >> >> >>> > >> >>  bc.opmgr:       0
>> >> >> >>> > >> >>  bc.rtattr:      0
>> >> >> >>> > >> >>  bc.rtannattr:   0
>> >> >> >>> > >> >>  bc.signattr:    0
>> >> >> >>> > >> >>  bc.srcfileattr: 57428
>> >> >> >>> > >> >>
>> >> >> >>> > >> >> Would you like to produce the
patch?
>> >> >> >>> > >> >> I think it would be funny :)
>> >> >> >>> > >> >>
>> >> >> >>> > >> >> Thanks,
>> >> >> >>> > >> >> Aleksey.
>> >> >> >>> > >> >>
>> >> >> >>> > >> >> [1] https://issues.apache.org/jira/browse/HARMONY-5905
>> >> >> >>> > >> >>
>> >> >> >>> > >> >> On Sat, Jul 12, 2008 at 12:48
AM, Andrew Cornwall (JIRA)
>> >> >> >>> > >> >> <jira@apache.org> wrote:
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>>     [
>> >> >> >>> > >>
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> https://issues.apache.org/jira/browse/HARMONY-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >> >> >>> > ]
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>> Andrew Cornwall updated
HARMONY-5907:
>> >> >> >>> > >> >>> -------------------------------------
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>>    Attachment: main.patch
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>> main.patch includes change
to CPUTF8.java
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>>> [classlib][pack200]CPUTF8.hashCode()
is slow
>> >> >> >>> > >> >>>> --------------------------------------------
>> >> >> >>> > >> >>>>
>> >> >> >>> > >> >>>>                 Key:
HARMONY-5907
>> >> >> >>> > >> >>>>                 URL:
>> >> >> >>> > >> https://issues.apache.org/jira/browse/HARMONY-5907
>> >> >> >>> > >> >>>>             Project:
Harmony
>> >> >> >>> > >> >>>>          Issue Type:
Improvement
>> >> >> >>> > >> >>>>    Affects Versions:
5.0M6
>> >> >> >>> > >> >>>>         Environment:
Latest pack200
>> >> >> >>> > >> >>>>            Reporter:
Andrew Cornwall
>> >> >> >>> > >> >>>>         Attachments:
main.patch
>> >> >> >>> > >> >>>>
>> >> >> >>> > >> >>>>
>> >> >> >>> > >> >>>> The unpack process
spends a lot of time doing
>> >> >> CPUTF8.hashCode()
>> >> >> >>> -
>> >> >> >>> > >> which does String.hashCode(). We can
save about 1.5 seconds
>> of
>> >> my
>> >> >> 39
>> >> >> >>> > second
>> >> >> >>> > >> test case (about 4%) by caching the
hashCode. (I thought we
>> did
>> >> >> this
>> >> >> >>> > before
>> >> >> >>> > >> - or maybe I dreamt it?)
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>> --
>> >> >> >>> > >> >>> This message is automatically
generated by JIRA.
>> >> >> >>> > >> >>> -
>> >> >> >>> > >> >>> You can reply to this email
to add a comment to the
>> issue
>> >> >> online.
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>>
>> >> >> >>> > >> >>
>> >> >> >>> > >> >
>> >> >> >>> > >>
>> >> >> >>> > >
>> >> >> >>> > >
>> >> >> >>> > >
>> >> >> >>> > > --
>> >> >> >>> > > Unless stated otherwise above:
>> >> >> >>> > > IBM United Kingdom Limited - Registered
in England and Wales
>> >> with
>> >> >> >>> number
>> >> >> >>> > > 741598.
>> >> >> >>> > > Registered office: PO Box 41, North Harbour,
Portsmouth,
>> >> Hampshire
>> >> >> PO6
>> >> >> >>> > 3AU
>> >> >> >>> > >
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Unless stated otherwise above:
>> >> >> >>> IBM United Kingdom Limited - Registered in England
and Wales
>> with
>> >> >> number
>> >> >> >>> 741598.
>> >> >> >>> Registered office: PO Box 41, North Harbour, Portsmouth,
>> Hampshire
>> >> PO6
>> >> >> 3AU
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message