harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [classlib][pack200][performance] HashCode optimizations (was: Re: [jira] Updated: (HARMONY-5907) [classlib][pack200]CPUTF8.hashCode() is slow)
Date Mon, 14 Jul 2008 18:44:15 GMT
Nevertheless, Object.hashCode() performance is unclear and as such it
should be specifically avoided if not required explicitly. Will static
increment suit better?

Thanks,
Aleksey.

On Mon, Jul 14, 2008 at 10:38 PM, Andrew Cornwall
<andrew.pack200@gmail.com> wrote:
> Sorry - I didn't explain fully. I intended that my code would cache the
> Object.hashCode() rather than recomputing the hashCode each time.
>
> On Mon, Jul 14, 2008 at 11:25 AM, Aleksey Shipilev <
> aleksey.shipilev@gmail.com> wrote:
>
>> I think that's bad for performance. Using Object.hashCode() leads to
>> System.identityHashCode(), which is handled by VM. The exact mechanism
>> is VM-dependent, but at least on Harmony it's pretty slow. If you want
>> to mark each instance as not-identical to another (warning here, you
>> may break something), then I suggest to use static increment, e.g.:
>>
>> private static int hashcodeBase = 1;
>> private int cachedHashCode = hashcodeBase++;
>>
>> This may end up with cache collisions if objects are created from
>> different threads, but that's not the case for now.
>>
>> Anyway, Andrew, your code lacks caching again ;)
>>
>> Thanks,
>> Aleksey.
>>
>> On Mon, Jul 14, 2008 at 9:51 PM, Andrew Cornwall
>> <andrew.pack200@gmail.com> wrote:
>> > And in fact, I just tried the following (which makes even more sense):
>> >
>> >  - add objectHashCode() to ClassFileEntry:
>> >    protected int objectHashCode() {
>> >        return super.hashCode();
>> >    }
>> >
>> > - change generateHashCode in ByteCode to return objectHashCode()
>> >   private int generateHashCode() {
>> >        hashcodeComputed = true;
>> >        return objectHashCode();
>> >   }
>> >
>> > Since ByteCodes are equal if and only if they are identical, this seems
>> to
>> > be the right thing to do. What do you think?
>> >
>> >
>> > On Mon, Jul 14, 2008 at 10:44 AM, Andrew Cornwall <
>> andrew.pack200@gmail.com>
>> > wrote:
>> >
>> >> I applied Aleksey's changes, and they look pretty good. I disagree with
>> >> Sian to some degree about ByteCode. On my VM (which isn't Harmony), a
>> new
>> >> empty array's hashCode() is dependent on the array's location in memory,
>> and
>> >> not the array's contents. In other words:
>> >>
>> >>         int[] x = new int[3];
>> >>         System.out.println(x.hashCode());
>> >>         x[1] = 5;
>> >>         System.out.println(x.hashCode());
>> >>
>> >> prints the same value for in both cases. rewrite.hashCode() is a handy
>> (if
>> >> lazy) way to distinguish among different instances.
>> >>
>> >> If we take rewrite out of hashCode(), HashMap and HashSet get really
>> slow -
>> >> essentially a linear search among all the ByteCodes of the same form.
>> This
>> >> brings my test case from 39 seconds up to 1:02.
>> >>
>> >> Perhaps the right thing to do is to give each unique instance of
>> ByteCode
>> >> an integer ID which is used in creating the hashCode rather than relying
>> on
>> >> rewrite to give us uniqueness?
>> >>
>> >>
>> >> On Mon, Jul 14, 2008 at 7:19 AM, Sian January <
>> sianjanuary@googlemail.com>
>> >> wrote:
>> >>
>> >>> Ok - we'll wait and see what Andrew says.  The only one that I'm not
>> happy
>> >>> with is Bytecode.hashCode, because rewrite always seems to be an empty
>> >>> array
>> >>> at the point when generateHashCode is called so it's a bit misleading
>> >>> using
>> >>> it.  I think it should be ok to just remove that line, and still cache
>> the
>> >>> hashCode.
>> >>>
>> >>>
>> >>> On 14/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com> wrote:
>> >>> >
>> >>> > Sian,
>> >>> >
>> >>> > Actually I had tried to extend Andrew's approach to these classes
>> >>> > first, but somehow I caught the degradation, that leaved me no
choice
>> >>> > except the lazy initialization. My concern is, the
>> >>> > constructor-initialized hashcode is really depend on usage pattern
>> for
>> >>> > each specific class, while lazy initialization has more guarantees
to
>> >>> > be performance-stable. Moreover, I suspect the lazy initialization
>> can
>> >>> > degrade performance much less because the only overhead it causes
is
>> >>> > checking the value of boolean field. On the other hand, the
>> >>> > constructor initialization may degrade performance a lot since
the
>> >>> > generation of hashCode is expensive.
>> >>> >
>> >>> > I can recheck which classes favor lazy initialization and which
are
>> >>> > not, but I think it's not valuable in terms of efficiency. I mean
>> here
>> >>> > that the boost connected with changing lazy initialization to
>> >>> > constructor one is much lower than boost from caching hashcode
>> anyway.
>> >>> >
>> >>> > Can we accept the patch in this form and revisit this difference
>> later?
>> >>> > It would be better to focus on more profitable areas for improvements
>> >>> for
>> >>> > now.
>> >>> >
>> >>> > P.S. I had asked Andrew to recheck whether my patch works as fast
as
>> >>> > his, also to check lazy initialization approach. On my tests the
>> boost
>> >>> > is stable and good.
>> >>> >
>> >>> > Thanks,
>> >>> > Aleksey.
>> >>> >
>> >>> > On Mon, Jul 14, 2008 at 5:25 PM, Sian January
>> >>> > <sianjanuary@googlemail.com> wrote:
>> >>> > > Hi Aleksey,
>> >>> > >
>> >>> > > It's a really good idea to extend this patch to cover some
more
>> >>> classes,
>> >>> > but
>> >>> > > I think Andrew's method is faster (a final cachedHashCode
field
>> that
>> >>> is
>> >>> > > initialized in the constructor).  The only reason I see to
do it
>> later
>> >>> > would
>> >>> > > be if we thought some of these objects never had hashCode
called on
>> >>> them,
>> >>> > > but I don't think that's the case.  Would you be able to try
that
>> >>> method
>> >>> > > instead in your patch and see if I'm right about it being
faster?
>> >>> > >
>> >>> > > Thanks,
>> >>> > >
>> >>> > > Sian
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > On 12/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
>> wrote:
>> >>> > >>
>> >>> > >> Andrew,
>> >>> > >>
>> >>> > >> I had attached the patch to HARMONY-5907, covering several
first
>> >>> > >> methods. Can you confirm this patch helps for your scenario?
>> >>> > >>
>> >>> > >> Thanks,
>> >>> > >> Aleksey.
>> >>> > >>
>> >>> > >> On Sat, Jul 12, 2008 at 1:58 PM, Aleksey Shipilev
>> >>> > >> <aleksey.shipilev@gmail.com> wrote:
>> >>> > >> > And the sorted list:
>> >>> > >> >
>> >>> > >> > 95462388         bc.cputf8
>> >>> > >> > 18646908         bc.bytecode
>> >>> > >> > 15118425         bc.cpclass
>> >>> > >> > 14928914         bc.cpnametype
>> >>> > >> > 12103799         bc.cpmethref
>> >>> > >> > 5159994  bc.cpfieldref
>> >>> > >> > 3420605  bc.methref
>> >>> > >> > 1840965  bc.cpstring
>> >>> > >> > 839916   bc.codeattr
>> >>> > >> > 839916   bc.locvarattr
>> >>> > >> > 839916   bc.linenumattr
>> >>> > >> > 430234   bc.cpmethod
>> >>> > >> > 277144   bc.cpfield
>> >>> > >> > 263753   bc.attr
>> >>> > >> > 153811   bc.cpinteger
>> >>> > >> > 121856   bc.newattr
>> >>> > >> > 93471    bc.cvalattr
>> >>> > >> > 72492    bc.excpattr
>> >>> > >> > 57428    bc.srcfileattr
>> >>> > >> > 57428    bc.srcfileattr
>> >>> > >> > 48104    bc.cplong
>> >>> > >> > 40362    bc.innerclass
>> >>> > >> > 5593     bc.depattr
>> >>> > >> > 3255     bc.cpfloat
>> >>> > >> > 1638     bc.cpdouble
>> >>> > >> > 532      attrlayout
>> >>> > >> > 0       archive
>> >>> > >> > 0        attrdef
>> >>> > >> > 0        newattrband
>> >>> > >> > 0        bc.anndefarg
>> >>> > >> > 0        bc.rtannattr
>> >>> > >> > 0        classbands
>> >>> > >> > 0        filebands
>> >>> > >> > 0        metabandgr
>> >>> > >> > 0        segheader
>> >>> > >> > 0        bc.remattr
>> >>> > >> > 0        bc.annattr
>> >>> > >> > 0        bc.cpconst
>> >>> > >> > 0        bc.cpmember
>> >>> > >> > 0        bc.signattr
>> >>> > >> > 0        bandset
>> >>> > >> > 0        bcbands
>> >>> > >> > 0        cpbands
>> >>> > >> > 0        icbands
>> >>> > >> > 0        ictuple
>> >>> > >> > 0        segment
>> >>> > >> > 0        segopts
>> >>> > >> > 0        bc.classf
>> >>> > >> > 0        bc.cpref
>> >>> > >> > 0        bc.opmgr
>> >>> > >> > 0        bc.rtattr
>> >>> > >> > 0        segcp
>> >>> > >> > 0        bc.ccp
>> >>> > >> > 0        attrlayoutmap
>> >>> > >> > 0        bc.encmethattr
>> >>> > >> > 0        bc.exptableent
>> >>> > >> > 0        bc.locvartable
>> >>> > >> > 0        bc.signattr
>> >>> > >> >
>> >>> > >> > Thanks,
>> >>> > >> > Aleksey.
>> >>> > >> >
>> >>> > >> > On Sat, Jul 12, 2008 at 1:50 PM, Aleksey Shipilev
>> >>> > >> > <aleksey.shipilev@gmail.com> wrote:
>> >>> > >> >> Hi, Andrew!
>> >>> > >> >>
>> >>> > >> >> I had updated the internal profiler to support
hashCode()
>> probes
>> >>> [1],
>> >>> > >> >> to extend your effort in hashcode optimization.
There are bunch
>> of
>> >>> > >> >> heavily used hashcodes, most of them are going
to
>> >>> Object.hashCode()
>> >>> > >> >> and then to System.identityHashCode(). We can
cache/implement
>> >>> > hashcode
>> >>> > >> >> for these classes. Here's the profile:
>> >>> > >> >>
>> >>> > >> >> Hashcodes:
>> >>> > >> >>  archive:       0
>> >>> > >> >>  attrdef:       0
>> >>> > >> >>  attrlayout:    532
>> >>> > >> >>  attrlayoutmap: 0
>> >>> > >> >>  bandset:       0
>> >>> > >> >>  bcbands:       0
>> >>> > >> >>  classbands:    0
>> >>> > >> >>  cpbands:       0
>> >>> > >> >>  filebands:     0
>> >>> > >> >>  icbands:       0
>> >>> > >> >>  ictuple:       0
>> >>> > >> >>  metabandgr:    0
>> >>> > >> >>  newattrband:   0
>> >>> > >> >>  segcp:         0
>> >>> > >> >>  segheader:     0
>> >>> > >> >>  segment:       0
>> >>> > >> >>  segopts:       0
>> >>> > >> >>  bc.attr:        263753
>> >>> > >> >>  bc.remattr:     0
>> >>> > >> >>  bc.anndefarg:   0
>> >>> > >> >>  bc.annattr:     0
>> >>> > >> >>  bc.bytecode:    18646908
>> >>> > >> >>  bc.ccp:         0
>> >>> > >> >>  bc.classf:      0
>> >>> > >> >>  bc.codeattr:    839916
>> >>> > >> >>  bc.cvalattr:    93471
>> >>> > >> >>  bc.cpclass:     15118425
>> >>> > >> >>  bc.cpconst:     0
>> >>> > >> >>  bc.cpdouble:    1638
>> >>> > >> >>  bc.cpfield:     277144
>> >>> > >> >>  bc.cpfieldref:  5159994
>> >>> > >> >>  bc.cpfloat:     3255
>> >>> > >> >>  bc.cpinteger:   153811
>> >>> > >> >>  bc.methref:     3420605
>> >>> > >> >>  bc.cplong:      48104
>> >>> > >> >>  bc.cpmember:    0
>> >>> > >> >>  bc.cpmethod:    430234
>> >>> > >> >>  bc.cpmethref:   12103799
>> >>> > >> >>  bc.cpnametype:  14928914
>> >>> > >> >>  bc.cpref:       0
>> >>> > >> >>  bc.cpstring:    1840965
>> >>> > >> >>  bc.cputf8:      95462388
>> >>> > >> >>  bc.depattr:     5593
>> >>> > >> >>  bc.encmethattr: 0
>> >>> > >> >>  bc.excpattr:    72492
>> >>> > >> >>  bc.exptableent: 0
>> >>> > >> >>  bc.innerclass:  40362
>> >>> > >> >>  bc.linenumattr: 839916
>> >>> > >> >>  bc.locvarattr:  839916
>> >>> > >> >>  bc.locvartable: 0
>> >>> > >> >>  bc.newattr:     121856
>> >>> > >> >>  bc.opmgr:       0
>> >>> > >> >>  bc.rtattr:      0
>> >>> > >> >>  bc.rtannattr:   0
>> >>> > >> >>  bc.signattr:    0
>> >>> > >> >>  bc.srcfileattr: 57428
>> >>> > >> >>
>> >>> > >> >> Would you like to produce the patch?
>> >>> > >> >> I think it would be funny :)
>> >>> > >> >>
>> >>> > >> >> Thanks,
>> >>> > >> >> Aleksey.
>> >>> > >> >>
>> >>> > >> >> [1] https://issues.apache.org/jira/browse/HARMONY-5905
>> >>> > >> >>
>> >>> > >> >> On Sat, Jul 12, 2008 at 12:48 AM, Andrew Cornwall
(JIRA)
>> >>> > >> >> <jira@apache.org> wrote:
>> >>> > >> >>>
>> >>> > >> >>>     [
>> >>> > >>
>> >>> >
>> >>>
>> https://issues.apache.org/jira/browse/HARMONY-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >>> > ]
>> >>> > >> >>>
>> >>> > >> >>> Andrew Cornwall updated HARMONY-5907:
>> >>> > >> >>> -------------------------------------
>> >>> > >> >>>
>> >>> > >> >>>    Attachment: main.patch
>> >>> > >> >>>
>> >>> > >> >>> main.patch includes change to CPUTF8.java
>> >>> > >> >>>
>> >>> > >> >>>
>> >>> > >> >>>> [classlib][pack200]CPUTF8.hashCode()
is slow
>> >>> > >> >>>> --------------------------------------------
>> >>> > >> >>>>
>> >>> > >> >>>>                 Key: HARMONY-5907
>> >>> > >> >>>>                 URL:
>> >>> > >> https://issues.apache.org/jira/browse/HARMONY-5907
>> >>> > >> >>>>             Project: Harmony
>> >>> > >> >>>>          Issue Type: Improvement
>> >>> > >> >>>>    Affects Versions: 5.0M6
>> >>> > >> >>>>         Environment: Latest pack200
>> >>> > >> >>>>            Reporter: Andrew Cornwall
>> >>> > >> >>>>         Attachments: main.patch
>> >>> > >> >>>>
>> >>> > >> >>>>
>> >>> > >> >>>> The unpack process spends a lot of time
doing
>> CPUTF8.hashCode()
>> >>> -
>> >>> > >> which does String.hashCode(). We can save about 1.5 seconds
of my
>> 39
>> >>> > second
>> >>> > >> test case (about 4%) by caching the hashCode. (I thought
we did
>> this
>> >>> > before
>> >>> > >> - or maybe I dreamt it?)
>> >>> > >> >>>
>> >>> > >> >>> --
>> >>> > >> >>> This message is automatically generated by
JIRA.
>> >>> > >> >>> -
>> >>> > >> >>> You can reply to this email to add a comment
to the issue
>> online.
>> >>> > >> >>>
>> >>> > >> >>>
>> >>> > >> >>
>> >>> > >> >
>> >>> > >>
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > Unless stated otherwise above:
>> >>> > > IBM United Kingdom Limited - Registered in England and Wales
with
>> >>> number
>> >>> > > 741598.
>> >>> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
>> PO6
>> >>> > 3AU
>> >>> > >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Unless stated otherwise above:
>> >>> IBM United Kingdom Limited - Registered in England and Wales with
>> number
>> >>> 741598.
>> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>> 3AU
>> >>>
>> >>
>> >>
>> >
>>
>

Mime
View raw message