harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [classlib][pack200][performance] HashCode optimizations (was: Re: [jira] Updated: (HARMONY-5907) [classlib][pack200]CPUTF8.hashCode() is slow)
Date Mon, 14 Jul 2008 18:25:13 GMT
I think that's bad for performance. Using Object.hashCode() leads to
System.identityHashCode(), which is handled by VM. The exact mechanism
is VM-dependent, but at least on Harmony it's pretty slow. If you want
to mark each instance as not-identical to another (warning here, you
may break something), then I suggest to use static increment, e.g.:

private static int hashcodeBase = 1;
private int cachedHashCode = hashcodeBase++;

This may end up with cache collisions if objects are created from
different threads, but that's not the case for now.

Anyway, Andrew, your code lacks caching again ;)

Thanks,
Aleksey.

On Mon, Jul 14, 2008 at 9:51 PM, Andrew Cornwall
<andrew.pack200@gmail.com> wrote:
> And in fact, I just tried the following (which makes even more sense):
>
>  - add objectHashCode() to ClassFileEntry:
>    protected int objectHashCode() {
>        return super.hashCode();
>    }
>
> - change generateHashCode in ByteCode to return objectHashCode()
>   private int generateHashCode() {
>        hashcodeComputed = true;
>        return objectHashCode();
>   }
>
> Since ByteCodes are equal if and only if they are identical, this seems to
> be the right thing to do. What do you think?
>
>
> On Mon, Jul 14, 2008 at 10:44 AM, Andrew Cornwall <andrew.pack200@gmail.com>
> wrote:
>
>> I applied Aleksey's changes, and they look pretty good. I disagree with
>> Sian to some degree about ByteCode. On my VM (which isn't Harmony), a new
>> empty array's hashCode() is dependent on the array's location in memory, and
>> not the array's contents. In other words:
>>
>>         int[] x = new int[3];
>>         System.out.println(x.hashCode());
>>         x[1] = 5;
>>         System.out.println(x.hashCode());
>>
>> prints the same value for in both cases. rewrite.hashCode() is a handy (if
>> lazy) way to distinguish among different instances.
>>
>> If we take rewrite out of hashCode(), HashMap and HashSet get really slow -
>> essentially a linear search among all the ByteCodes of the same form. This
>> brings my test case from 39 seconds up to 1:02.
>>
>> Perhaps the right thing to do is to give each unique instance of ByteCode
>> an integer ID which is used in creating the hashCode rather than relying on
>> rewrite to give us uniqueness?
>>
>>
>> On Mon, Jul 14, 2008 at 7:19 AM, Sian January <sianjanuary@googlemail.com>
>> wrote:
>>
>>> Ok - we'll wait and see what Andrew says.  The only one that I'm not happy
>>> with is Bytecode.hashCode, because rewrite always seems to be an empty
>>> array
>>> at the point when generateHashCode is called so it's a bit misleading
>>> using
>>> it.  I think it should be ok to just remove that line, and still cache the
>>> hashCode.
>>>
>>>
>>> On 14/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com> wrote:
>>> >
>>> > Sian,
>>> >
>>> > Actually I had tried to extend Andrew's approach to these classes
>>> > first, but somehow I caught the degradation, that leaved me no choice
>>> > except the lazy initialization. My concern is, the
>>> > constructor-initialized hashcode is really depend on usage pattern for
>>> > each specific class, while lazy initialization has more guarantees to
>>> > be performance-stable. Moreover, I suspect the lazy initialization can
>>> > degrade performance much less because the only overhead it causes is
>>> > checking the value of boolean field. On the other hand, the
>>> > constructor initialization may degrade performance a lot since the
>>> > generation of hashCode is expensive.
>>> >
>>> > I can recheck which classes favor lazy initialization and which are
>>> > not, but I think it's not valuable in terms of efficiency. I mean here
>>> > that the boost connected with changing lazy initialization to
>>> > constructor one is much lower than boost from caching hashcode anyway.
>>> >
>>> > Can we accept the patch in this form and revisit this difference later?
>>> > It would be better to focus on more profitable areas for improvements
>>> for
>>> > now.
>>> >
>>> > P.S. I had asked Andrew to recheck whether my patch works as fast as
>>> > his, also to check lazy initialization approach. On my tests the boost
>>> > is stable and good.
>>> >
>>> > Thanks,
>>> > Aleksey.
>>> >
>>> > On Mon, Jul 14, 2008 at 5:25 PM, Sian January
>>> > <sianjanuary@googlemail.com> wrote:
>>> > > Hi Aleksey,
>>> > >
>>> > > It's a really good idea to extend this patch to cover some more
>>> classes,
>>> > but
>>> > > I think Andrew's method is faster (a final cachedHashCode field that
>>> is
>>> > > initialized in the constructor).  The only reason I see to do it later
>>> > would
>>> > > be if we thought some of these objects never had hashCode called on
>>> them,
>>> > > but I don't think that's the case.  Would you be able to try that
>>> method
>>> > > instead in your patch and see if I'm right about it being faster?
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Sian
>>> > >
>>> > >
>>> > >
>>> > > On 12/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
wrote:
>>> > >>
>>> > >> Andrew,
>>> > >>
>>> > >> I had attached the patch to HARMONY-5907, covering several first
>>> > >> methods. Can you confirm this patch helps for your scenario?
>>> > >>
>>> > >> Thanks,
>>> > >> Aleksey.
>>> > >>
>>> > >> On Sat, Jul 12, 2008 at 1:58 PM, Aleksey Shipilev
>>> > >> <aleksey.shipilev@gmail.com> wrote:
>>> > >> > And the sorted list:
>>> > >> >
>>> > >> > 95462388         bc.cputf8
>>> > >> > 18646908         bc.bytecode
>>> > >> > 15118425         bc.cpclass
>>> > >> > 14928914         bc.cpnametype
>>> > >> > 12103799         bc.cpmethref
>>> > >> > 5159994  bc.cpfieldref
>>> > >> > 3420605  bc.methref
>>> > >> > 1840965  bc.cpstring
>>> > >> > 839916   bc.codeattr
>>> > >> > 839916   bc.locvarattr
>>> > >> > 839916   bc.linenumattr
>>> > >> > 430234   bc.cpmethod
>>> > >> > 277144   bc.cpfield
>>> > >> > 263753   bc.attr
>>> > >> > 153811   bc.cpinteger
>>> > >> > 121856   bc.newattr
>>> > >> > 93471    bc.cvalattr
>>> > >> > 72492    bc.excpattr
>>> > >> > 57428    bc.srcfileattr
>>> > >> > 57428    bc.srcfileattr
>>> > >> > 48104    bc.cplong
>>> > >> > 40362    bc.innerclass
>>> > >> > 5593     bc.depattr
>>> > >> > 3255     bc.cpfloat
>>> > >> > 1638     bc.cpdouble
>>> > >> > 532      attrlayout
>>> > >> > 0       archive
>>> > >> > 0        attrdef
>>> > >> > 0        newattrband
>>> > >> > 0        bc.anndefarg
>>> > >> > 0        bc.rtannattr
>>> > >> > 0        classbands
>>> > >> > 0        filebands
>>> > >> > 0        metabandgr
>>> > >> > 0        segheader
>>> > >> > 0        bc.remattr
>>> > >> > 0        bc.annattr
>>> > >> > 0        bc.cpconst
>>> > >> > 0        bc.cpmember
>>> > >> > 0        bc.signattr
>>> > >> > 0        bandset
>>> > >> > 0        bcbands
>>> > >> > 0        cpbands
>>> > >> > 0        icbands
>>> > >> > 0        ictuple
>>> > >> > 0        segment
>>> > >> > 0        segopts
>>> > >> > 0        bc.classf
>>> > >> > 0        bc.cpref
>>> > >> > 0        bc.opmgr
>>> > >> > 0        bc.rtattr
>>> > >> > 0        segcp
>>> > >> > 0        bc.ccp
>>> > >> > 0        attrlayoutmap
>>> > >> > 0        bc.encmethattr
>>> > >> > 0        bc.exptableent
>>> > >> > 0        bc.locvartable
>>> > >> > 0        bc.signattr
>>> > >> >
>>> > >> > Thanks,
>>> > >> > Aleksey.
>>> > >> >
>>> > >> > On Sat, Jul 12, 2008 at 1:50 PM, Aleksey Shipilev
>>> > >> > <aleksey.shipilev@gmail.com> wrote:
>>> > >> >> Hi, Andrew!
>>> > >> >>
>>> > >> >> I had updated the internal profiler to support hashCode()
probes
>>> [1],
>>> > >> >> to extend your effort in hashcode optimization. There
are bunch of
>>> > >> >> heavily used hashcodes, most of them are going to
>>> Object.hashCode()
>>> > >> >> and then to System.identityHashCode(). We can cache/implement
>>> > hashcode
>>> > >> >> for these classes. Here's the profile:
>>> > >> >>
>>> > >> >> Hashcodes:
>>> > >> >>  archive:       0
>>> > >> >>  attrdef:       0
>>> > >> >>  attrlayout:    532
>>> > >> >>  attrlayoutmap: 0
>>> > >> >>  bandset:       0
>>> > >> >>  bcbands:       0
>>> > >> >>  classbands:    0
>>> > >> >>  cpbands:       0
>>> > >> >>  filebands:     0
>>> > >> >>  icbands:       0
>>> > >> >>  ictuple:       0
>>> > >> >>  metabandgr:    0
>>> > >> >>  newattrband:   0
>>> > >> >>  segcp:         0
>>> > >> >>  segheader:     0
>>> > >> >>  segment:       0
>>> > >> >>  segopts:       0
>>> > >> >>  bc.attr:        263753
>>> > >> >>  bc.remattr:     0
>>> > >> >>  bc.anndefarg:   0
>>> > >> >>  bc.annattr:     0
>>> > >> >>  bc.bytecode:    18646908
>>> > >> >>  bc.ccp:         0
>>> > >> >>  bc.classf:      0
>>> > >> >>  bc.codeattr:    839916
>>> > >> >>  bc.cvalattr:    93471
>>> > >> >>  bc.cpclass:     15118425
>>> > >> >>  bc.cpconst:     0
>>> > >> >>  bc.cpdouble:    1638
>>> > >> >>  bc.cpfield:     277144
>>> > >> >>  bc.cpfieldref:  5159994
>>> > >> >>  bc.cpfloat:     3255
>>> > >> >>  bc.cpinteger:   153811
>>> > >> >>  bc.methref:     3420605
>>> > >> >>  bc.cplong:      48104
>>> > >> >>  bc.cpmember:    0
>>> > >> >>  bc.cpmethod:    430234
>>> > >> >>  bc.cpmethref:   12103799
>>> > >> >>  bc.cpnametype:  14928914
>>> > >> >>  bc.cpref:       0
>>> > >> >>  bc.cpstring:    1840965
>>> > >> >>  bc.cputf8:      95462388
>>> > >> >>  bc.depattr:     5593
>>> > >> >>  bc.encmethattr: 0
>>> > >> >>  bc.excpattr:    72492
>>> > >> >>  bc.exptableent: 0
>>> > >> >>  bc.innerclass:  40362
>>> > >> >>  bc.linenumattr: 839916
>>> > >> >>  bc.locvarattr:  839916
>>> > >> >>  bc.locvartable: 0
>>> > >> >>  bc.newattr:     121856
>>> > >> >>  bc.opmgr:       0
>>> > >> >>  bc.rtattr:      0
>>> > >> >>  bc.rtannattr:   0
>>> > >> >>  bc.signattr:    0
>>> > >> >>  bc.srcfileattr: 57428
>>> > >> >>
>>> > >> >> Would you like to produce the patch?
>>> > >> >> I think it would be funny :)
>>> > >> >>
>>> > >> >> Thanks,
>>> > >> >> Aleksey.
>>> > >> >>
>>> > >> >> [1] https://issues.apache.org/jira/browse/HARMONY-5905
>>> > >> >>
>>> > >> >> On Sat, Jul 12, 2008 at 12:48 AM, Andrew Cornwall (JIRA)
>>> > >> >> <jira@apache.org> wrote:
>>> > >> >>>
>>> > >> >>>     [
>>> > >>
>>> >
>>> https://issues.apache.org/jira/browse/HARMONY-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>> > ]
>>> > >> >>>
>>> > >> >>> Andrew Cornwall updated HARMONY-5907:
>>> > >> >>> -------------------------------------
>>> > >> >>>
>>> > >> >>>    Attachment: main.patch
>>> > >> >>>
>>> > >> >>> main.patch includes change to CPUTF8.java
>>> > >> >>>
>>> > >> >>>
>>> > >> >>>> [classlib][pack200]CPUTF8.hashCode() is slow
>>> > >> >>>> --------------------------------------------
>>> > >> >>>>
>>> > >> >>>>                 Key: HARMONY-5907
>>> > >> >>>>                 URL:
>>> > >> https://issues.apache.org/jira/browse/HARMONY-5907
>>> > >> >>>>             Project: Harmony
>>> > >> >>>>          Issue Type: Improvement
>>> > >> >>>>    Affects Versions: 5.0M6
>>> > >> >>>>         Environment: Latest pack200
>>> > >> >>>>            Reporter: Andrew Cornwall
>>> > >> >>>>         Attachments: main.patch
>>> > >> >>>>
>>> > >> >>>>
>>> > >> >>>> The unpack process spends a lot of time doing
CPUTF8.hashCode()
>>> -
>>> > >> which does String.hashCode(). We can save about 1.5 seconds of
my 39
>>> > second
>>> > >> test case (about 4%) by caching the hashCode. (I thought we did
this
>>> > before
>>> > >> - or maybe I dreamt it?)
>>> > >> >>>
>>> > >> >>> --
>>> > >> >>> This message is automatically generated by JIRA.
>>> > >> >>> -
>>> > >> >>> You can reply to this email to add a comment to the
issue online.
>>> > >> >>>
>>> > >> >>>
>>> > >> >>
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Unless stated otherwise above:
>>> > > IBM United Kingdom Limited - Registered in England and Wales with
>>> number
>>> > > 741598.
>>> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
PO6
>>> > 3AU
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>>
>>
>>
>

Mime
View raw message