harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Cornwall" <andrew.pack...@gmail.com>
Subject Re: [classlib][pack200][performance] HashCode optimizations (was: Re: [jira] Updated: (HARMONY-5907) [classlib][pack200]CPUTF8.hashCode() is slow)
Date Mon, 14 Jul 2008 20:37:01 GMT
I think Object.hashCode() will be better distributed than a static increment
- wouldn't that have better performance in Hash* objects?


On Mon, Jul 14, 2008 at 11:44 AM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Nevertheless, Object.hashCode() performance is unclear and as such it
> should be specifically avoided if not required explicitly. Will static
> increment suit better?
>
> Thanks,
> Aleksey.
>
> On Mon, Jul 14, 2008 at 10:38 PM, Andrew Cornwall
> <andrew.pack200@gmail.com> wrote:
> > Sorry - I didn't explain fully. I intended that my code would cache the
> > Object.hashCode() rather than recomputing the hashCode each time.
> >
> > On Mon, Jul 14, 2008 at 11:25 AM, Aleksey Shipilev <
> > aleksey.shipilev@gmail.com> wrote:
> >
> >> I think that's bad for performance. Using Object.hashCode() leads to
> >> System.identityHashCode(), which is handled by VM. The exact mechanism
> >> is VM-dependent, but at least on Harmony it's pretty slow. If you want
> >> to mark each instance as not-identical to another (warning here, you
> >> may break something), then I suggest to use static increment, e.g.:
> >>
> >> private static int hashcodeBase = 1;
> >> private int cachedHashCode = hashcodeBase++;
> >>
> >> This may end up with cache collisions if objects are created from
> >> different threads, but that's not the case for now.
> >>
> >> Anyway, Andrew, your code lacks caching again ;)
> >>
> >> Thanks,
> >> Aleksey.
> >>
> >> On Mon, Jul 14, 2008 at 9:51 PM, Andrew Cornwall
> >> <andrew.pack200@gmail.com> wrote:
> >> > And in fact, I just tried the following (which makes even more sense):
> >> >
> >> >  - add objectHashCode() to ClassFileEntry:
> >> >    protected int objectHashCode() {
> >> >        return super.hashCode();
> >> >    }
> >> >
> >> > - change generateHashCode in ByteCode to return objectHashCode()
> >> >   private int generateHashCode() {
> >> >        hashcodeComputed = true;
> >> >        return objectHashCode();
> >> >   }
> >> >
> >> > Since ByteCodes are equal if and only if they are identical, this
> seems
> >> to
> >> > be the right thing to do. What do you think?
> >> >
> >> >
> >> > On Mon, Jul 14, 2008 at 10:44 AM, Andrew Cornwall <
> >> andrew.pack200@gmail.com>
> >> > wrote:
> >> >
> >> >> I applied Aleksey's changes, and they look pretty good. I disagree
> with
> >> >> Sian to some degree about ByteCode. On my VM (which isn't Harmony),
a
> >> new
> >> >> empty array's hashCode() is dependent on the array's location in
> memory,
> >> and
> >> >> not the array's contents. In other words:
> >> >>
> >> >>         int[] x = new int[3];
> >> >>         System.out.println(x.hashCode());
> >> >>         x[1] = 5;
> >> >>         System.out.println(x.hashCode());
> >> >>
> >> >> prints the same value for in both cases. rewrite.hashCode() is a
> handy
> >> (if
> >> >> lazy) way to distinguish among different instances.
> >> >>
> >> >> If we take rewrite out of hashCode(), HashMap and HashSet get really
> >> slow -
> >> >> essentially a linear search among all the ByteCodes of the same form.
> >> This
> >> >> brings my test case from 39 seconds up to 1:02.
> >> >>
> >> >> Perhaps the right thing to do is to give each unique instance of
> >> ByteCode
> >> >> an integer ID which is used in creating the hashCode rather than
> relying
> >> on
> >> >> rewrite to give us uniqueness?
> >> >>
> >> >>
> >> >> On Mon, Jul 14, 2008 at 7:19 AM, Sian January <
> >> sianjanuary@googlemail.com>
> >> >> wrote:
> >> >>
> >> >>> Ok - we'll wait and see what Andrew says.  The only one that I'm
not
> >> happy
> >> >>> with is Bytecode.hashCode, because rewrite always seems to be an
> empty
> >> >>> array
> >> >>> at the point when generateHashCode is called so it's a bit
> misleading
> >> >>> using
> >> >>> it.  I think it should be ok to just remove that line, and still
> cache
> >> the
> >> >>> hashCode.
> >> >>>
> >> >>>
> >> >>> On 14/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
wrote:
> >> >>> >
> >> >>> > Sian,
> >> >>> >
> >> >>> > Actually I had tried to extend Andrew's approach to these
classes
> >> >>> > first, but somehow I caught the degradation, that leaved me
no
> choice
> >> >>> > except the lazy initialization. My concern is, the
> >> >>> > constructor-initialized hashcode is really depend on usage
pattern
> >> for
> >> >>> > each specific class, while lazy initialization has more guarantees
> to
> >> >>> > be performance-stable. Moreover, I suspect the lazy initialization
> >> can
> >> >>> > degrade performance much less because the only overhead it
causes
> is
> >> >>> > checking the value of boolean field. On the other hand, the
> >> >>> > constructor initialization may degrade performance a lot since
the
> >> >>> > generation of hashCode is expensive.
> >> >>> >
> >> >>> > I can recheck which classes favor lazy initialization and
which
> are
> >> >>> > not, but I think it's not valuable in terms of efficiency.
I mean
> >> here
> >> >>> > that the boost connected with changing lazy initialization
to
> >> >>> > constructor one is much lower than boost from caching hashcode
> >> anyway.
> >> >>> >
> >> >>> > Can we accept the patch in this form and revisit this difference
> >> later?
> >> >>> > It would be better to focus on more profitable areas for
> improvements
> >> >>> for
> >> >>> > now.
> >> >>> >
> >> >>> > P.S. I had asked Andrew to recheck whether my patch works
as fast
> as
> >> >>> > his, also to check lazy initialization approach. On my tests
the
> >> boost
> >> >>> > is stable and good.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Aleksey.
> >> >>> >
> >> >>> > On Mon, Jul 14, 2008 at 5:25 PM, Sian January
> >> >>> > <sianjanuary@googlemail.com> wrote:
> >> >>> > > Hi Aleksey,
> >> >>> > >
> >> >>> > > It's a really good idea to extend this patch to cover
some more
> >> >>> classes,
> >> >>> > but
> >> >>> > > I think Andrew's method is faster (a final cachedHashCode
field
> >> that
> >> >>> is
> >> >>> > > initialized in the constructor).  The only reason I see
to do it
> >> later
> >> >>> > would
> >> >>> > > be if we thought some of these objects never had hashCode
called
> on
> >> >>> them,
> >> >>> > > but I don't think that's the case.  Would you be able
to try
> that
> >> >>> method
> >> >>> > > instead in your patch and see if I'm right about it being
> faster?
> >> >>> > >
> >> >>> > > Thanks,
> >> >>> > >
> >> >>> > > Sian
> >> >>> > >
> >> >>> > >
> >> >>> > >
> >> >>> > > On 12/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
> >> wrote:
> >> >>> > >>
> >> >>> > >> Andrew,
> >> >>> > >>
> >> >>> > >> I had attached the patch to HARMONY-5907, covering
several
> first
> >> >>> > >> methods. Can you confirm this patch helps for your
scenario?
> >> >>> > >>
> >> >>> > >> Thanks,
> >> >>> > >> Aleksey.
> >> >>> > >>
> >> >>> > >> On Sat, Jul 12, 2008 at 1:58 PM, Aleksey Shipilev
> >> >>> > >> <aleksey.shipilev@gmail.com> wrote:
> >> >>> > >> > And the sorted list:
> >> >>> > >> >
> >> >>> > >> > 95462388         bc.cputf8
> >> >>> > >> > 18646908         bc.bytecode
> >> >>> > >> > 15118425         bc.cpclass
> >> >>> > >> > 14928914         bc.cpnametype
> >> >>> > >> > 12103799         bc.cpmethref
> >> >>> > >> > 5159994  bc.cpfieldref
> >> >>> > >> > 3420605  bc.methref
> >> >>> > >> > 1840965  bc.cpstring
> >> >>> > >> > 839916   bc.codeattr
> >> >>> > >> > 839916   bc.locvarattr
> >> >>> > >> > 839916   bc.linenumattr
> >> >>> > >> > 430234   bc.cpmethod
> >> >>> > >> > 277144   bc.cpfield
> >> >>> > >> > 263753   bc.attr
> >> >>> > >> > 153811   bc.cpinteger
> >> >>> > >> > 121856   bc.newattr
> >> >>> > >> > 93471    bc.cvalattr
> >> >>> > >> > 72492    bc.excpattr
> >> >>> > >> > 57428    bc.srcfileattr
> >> >>> > >> > 57428    bc.srcfileattr
> >> >>> > >> > 48104    bc.cplong
> >> >>> > >> > 40362    bc.innerclass
> >> >>> > >> > 5593     bc.depattr
> >> >>> > >> > 3255     bc.cpfloat
> >> >>> > >> > 1638     bc.cpdouble
> >> >>> > >> > 532      attrlayout
> >> >>> > >> > 0       archive
> >> >>> > >> > 0        attrdef
> >> >>> > >> > 0        newattrband
> >> >>> > >> > 0        bc.anndefarg
> >> >>> > >> > 0        bc.rtannattr
> >> >>> > >> > 0        classbands
> >> >>> > >> > 0        filebands
> >> >>> > >> > 0        metabandgr
> >> >>> > >> > 0        segheader
> >> >>> > >> > 0        bc.remattr
> >> >>> > >> > 0        bc.annattr
> >> >>> > >> > 0        bc.cpconst
> >> >>> > >> > 0        bc.cpmember
> >> >>> > >> > 0        bc.signattr
> >> >>> > >> > 0        bandset
> >> >>> > >> > 0        bcbands
> >> >>> > >> > 0        cpbands
> >> >>> > >> > 0        icbands
> >> >>> > >> > 0        ictuple
> >> >>> > >> > 0        segment
> >> >>> > >> > 0        segopts
> >> >>> > >> > 0        bc.classf
> >> >>> > >> > 0        bc.cpref
> >> >>> > >> > 0        bc.opmgr
> >> >>> > >> > 0        bc.rtattr
> >> >>> > >> > 0        segcp
> >> >>> > >> > 0        bc.ccp
> >> >>> > >> > 0        attrlayoutmap
> >> >>> > >> > 0        bc.encmethattr
> >> >>> > >> > 0        bc.exptableent
> >> >>> > >> > 0        bc.locvartable
> >> >>> > >> > 0        bc.signattr
> >> >>> > >> >
> >> >>> > >> > Thanks,
> >> >>> > >> > Aleksey.
> >> >>> > >> >
> >> >>> > >> > On Sat, Jul 12, 2008 at 1:50 PM, Aleksey Shipilev
> >> >>> > >> > <aleksey.shipilev@gmail.com> wrote:
> >> >>> > >> >> Hi, Andrew!
> >> >>> > >> >>
> >> >>> > >> >> I had updated the internal profiler to support
hashCode()
> >> probes
> >> >>> [1],
> >> >>> > >> >> to extend your effort in hashcode optimization.
There are
> bunch
> >> of
> >> >>> > >> >> heavily used hashcodes, most of them are
going to
> >> >>> Object.hashCode()
> >> >>> > >> >> and then to System.identityHashCode(). We
can
> cache/implement
> >> >>> > hashcode
> >> >>> > >> >> for these classes. Here's the profile:
> >> >>> > >> >>
> >> >>> > >> >> Hashcodes:
> >> >>> > >> >>  archive:       0
> >> >>> > >> >>  attrdef:       0
> >> >>> > >> >>  attrlayout:    532
> >> >>> > >> >>  attrlayoutmap: 0
> >> >>> > >> >>  bandset:       0
> >> >>> > >> >>  bcbands:       0
> >> >>> > >> >>  classbands:    0
> >> >>> > >> >>  cpbands:       0
> >> >>> > >> >>  filebands:     0
> >> >>> > >> >>  icbands:       0
> >> >>> > >> >>  ictuple:       0
> >> >>> > >> >>  metabandgr:    0
> >> >>> > >> >>  newattrband:   0
> >> >>> > >> >>  segcp:         0
> >> >>> > >> >>  segheader:     0
> >> >>> > >> >>  segment:       0
> >> >>> > >> >>  segopts:       0
> >> >>> > >> >>  bc.attr:        263753
> >> >>> > >> >>  bc.remattr:     0
> >> >>> > >> >>  bc.anndefarg:   0
> >> >>> > >> >>  bc.annattr:     0
> >> >>> > >> >>  bc.bytecode:    18646908
> >> >>> > >> >>  bc.ccp:         0
> >> >>> > >> >>  bc.classf:      0
> >> >>> > >> >>  bc.codeattr:    839916
> >> >>> > >> >>  bc.cvalattr:    93471
> >> >>> > >> >>  bc.cpclass:     15118425
> >> >>> > >> >>  bc.cpconst:     0
> >> >>> > >> >>  bc.cpdouble:    1638
> >> >>> > >> >>  bc.cpfield:     277144
> >> >>> > >> >>  bc.cpfieldref:  5159994
> >> >>> > >> >>  bc.cpfloat:     3255
> >> >>> > >> >>  bc.cpinteger:   153811
> >> >>> > >> >>  bc.methref:     3420605
> >> >>> > >> >>  bc.cplong:      48104
> >> >>> > >> >>  bc.cpmember:    0
> >> >>> > >> >>  bc.cpmethod:    430234
> >> >>> > >> >>  bc.cpmethref:   12103799
> >> >>> > >> >>  bc.cpnametype:  14928914
> >> >>> > >> >>  bc.cpref:       0
> >> >>> > >> >>  bc.cpstring:    1840965
> >> >>> > >> >>  bc.cputf8:      95462388
> >> >>> > >> >>  bc.depattr:     5593
> >> >>> > >> >>  bc.encmethattr: 0
> >> >>> > >> >>  bc.excpattr:    72492
> >> >>> > >> >>  bc.exptableent: 0
> >> >>> > >> >>  bc.innerclass:  40362
> >> >>> > >> >>  bc.linenumattr: 839916
> >> >>> > >> >>  bc.locvarattr:  839916
> >> >>> > >> >>  bc.locvartable: 0
> >> >>> > >> >>  bc.newattr:     121856
> >> >>> > >> >>  bc.opmgr:       0
> >> >>> > >> >>  bc.rtattr:      0
> >> >>> > >> >>  bc.rtannattr:   0
> >> >>> > >> >>  bc.signattr:    0
> >> >>> > >> >>  bc.srcfileattr: 57428
> >> >>> > >> >>
> >> >>> > >> >> Would you like to produce the patch?
> >> >>> > >> >> I think it would be funny :)
> >> >>> > >> >>
> >> >>> > >> >> Thanks,
> >> >>> > >> >> Aleksey.
> >> >>> > >> >>
> >> >>> > >> >> [1] https://issues.apache.org/jira/browse/HARMONY-5905
> >> >>> > >> >>
> >> >>> > >> >> On Sat, Jul 12, 2008 at 12:48 AM, Andrew
Cornwall (JIRA)
> >> >>> > >> >> <jira@apache.org> wrote:
> >> >>> > >> >>>
> >> >>> > >> >>>     [
> >> >>> > >>
> >> >>> >
> >> >>>
> >>
> https://issues.apache.org/jira/browse/HARMONY-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >> >>> > ]
> >> >>> > >> >>>
> >> >>> > >> >>> Andrew Cornwall updated HARMONY-5907:
> >> >>> > >> >>> -------------------------------------
> >> >>> > >> >>>
> >> >>> > >> >>>    Attachment: main.patch
> >> >>> > >> >>>
> >> >>> > >> >>> main.patch includes change to CPUTF8.java
> >> >>> > >> >>>
> >> >>> > >> >>>
> >> >>> > >> >>>> [classlib][pack200]CPUTF8.hashCode()
is slow
> >> >>> > >> >>>> --------------------------------------------
> >> >>> > >> >>>>
> >> >>> > >> >>>>                 Key: HARMONY-5907
> >> >>> > >> >>>>                 URL:
> >> >>> > >> https://issues.apache.org/jira/browse/HARMONY-5907
> >> >>> > >> >>>>             Project: Harmony
> >> >>> > >> >>>>          Issue Type: Improvement
> >> >>> > >> >>>>    Affects Versions: 5.0M6
> >> >>> > >> >>>>         Environment: Latest pack200
> >> >>> > >> >>>>            Reporter: Andrew Cornwall
> >> >>> > >> >>>>         Attachments: main.patch
> >> >>> > >> >>>>
> >> >>> > >> >>>>
> >> >>> > >> >>>> The unpack process spends a lot
of time doing
> >> CPUTF8.hashCode()
> >> >>> -
> >> >>> > >> which does String.hashCode(). We can save about 1.5
seconds of
> my
> >> 39
> >> >>> > second
> >> >>> > >> test case (about 4%) by caching the hashCode. (I
thought we did
> >> this
> >> >>> > before
> >> >>> > >> - or maybe I dreamt it?)
> >> >>> > >> >>>
> >> >>> > >> >>> --
> >> >>> > >> >>> This message is automatically generated
by JIRA.
> >> >>> > >> >>> -
> >> >>> > >> >>> You can reply to this email to add a
comment to the issue
> >> online.
> >> >>> > >> >>>
> >> >>> > >> >>>
> >> >>> > >> >>
> >> >>> > >> >
> >> >>> > >>
> >> >>> > >
> >> >>> > >
> >> >>> > >
> >> >>> > > --
> >> >>> > > Unless stated otherwise above:
> >> >>> > > IBM United Kingdom Limited - Registered in England and
Wales
> with
> >> >>> number
> >> >>> > > 741598.
> >> >>> > > Registered office: PO Box 41, North Harbour, Portsmouth,
> Hampshire
> >> PO6
> >> >>> > 3AU
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Unless stated otherwise above:
> >> >>> IBM United Kingdom Limited - Registered in England and Wales with
> >> number
> >> >>> 741598.
> >> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> >> 3AU
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message