harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Cornwall" <andrew.pack...@gmail.com>
Subject Re: [classlib][pack200][performance] Profiling unpacking scenario
Date Tue, 05 Aug 2008 22:44:14 GMT
I'm not sure about [5]=5931 - if I remember right, I'd taken out ByteCode
processing in add(), and that led to problems resolving bytecodes which
referred to things in the classpool. But that was a while ago; I'm
interested in trying it and seeing if there are any differences in my wad of
40M or so testcases.

The only other patch I've looked at so far has been [2]=5928. That's a nifty
change.

All in all, it looks like you've done a lot of good work!


On Tue, Aug 5, 2008 at 3:16 PM, Aleksey Shipilev <aleksey.shipilev@gmail.com
> wrote:

> Sian, Andrew,
>
> An update here. I had updated the profiler [1]  and run it over on JDT
> unpacking scenario several times. That's what we have today (times are
> msecs, indentation resembles call hierarchy):
>
> Unpack: 220529
>
>  segment read:  28853
>   cpBands:     13791
>   adBands:     214
>   icBands:     343
>   cbBands:     9158
>   bcBands:     4694
>   fbBands:     118
>   fBits:       407
>
>  segment parse: 176929
>   header:      0
>   cpBands:     0
>   adBands:     0
>   icBands:     1
>   cbBands:     0
>
>   bcBands:     77245
>      exceptn:  656
>      newCA:    71145
>       getBC:   16372
>       extOpnd: 26776
>       fixup:   1885
>      methAttr: 591
>      curAttr:  1808
>
>   fbBands:     0
>
>   buildCF:     82057
>     ccp.addN:  36182
>     ccp.addNW: 433
>     ccp.resv:  27452
>     ic.getIC:  11717
>
>   cfWrite:     17379
>
>  segment write: 14349
>
>
> As you can see in commits, I had filed a couple of JIRAs with the
> bunch of pack200 optimizations [2,3,4,5], here what I got with all
> them applied:
>
> Unpack: 193165 (-14% in total)
>
>  segment read:  28334
>   cpBands:     13034
>   adBands:     249
>   icBands:     341
>   cbBands:     9299
>   bcBands:     4645
>   fbBands:     169
>   fBits:       459
>
>  segment parse: 150032
>   header:      1
>   cpBands:     0
>   adBands:     0
>   icBands:     82
>   cbBands:     0
>
>   bcBands:     73936
>      exceptn:  633
>      newCA:    67871
>       getBC:   13874    <--- (-18% due to [2])
>       extOpnd: 26583
>       fixup:   1808
>      methAttr: 615
>      curAttr:  1663
>
>   fbBands:     0
>
>   buildCF:     58319
>     ccp.addN:  26199  <---- (-38% due to [5])
>     ccp.addNW: 424
>     ccp.resv:  23642   <---- (-16% due to [5])
>     ic.getIC:  2245    <--- (-80% due to [3,4])
>
>   cfWrite:     17463
>
>  segment write: 14413
>
>
> Of course, the boosts are diminished with the performance overheads of
> profiling. But still, this profile gives pretty good insight on what's
> going on. CodeAttribute ["newCA" is the "new CodeAttribute(...)"] is
> the next candidate for optimization, I guess.
>
> Sian, Andrew, can you please review the patches? I'm particularly
> interested in [5], because it's proof-of-concept and kind of
> controversial.
>
> Thanks,
> Aleksey.
>
> [1] "classlib][pack200] Internal profiler for pack200"
> https://issues.apache.org/jira/browse/HARMONY-5905
>
> [2] [classlib][pack200][performance] java.util.HashMap usage optimization
> https://issues.apache.org/jira/browse/HARMONY-5928
>
> [3] [classlib][pack200][performance] Segment.computeIcStored rewrite
> https://issues.apache.org/jira/browse/HARMONY-5929
>
> [4] [classlib][pack200][performance] IcBands.getRelevantIcTuples rewrite
> https://issues.apache.org/jira/browse/HARMONY-5930
>
> [5] [classlib][pack200][performance] Some ClassConstantPool content
> may not be needed
> https://issues.apache.org/jira/browse/HARMONY-5931
>
>
> On Thu, Jul 10, 2008 at 7:59 PM, Aleksey Shipilev
> <aleksey.shipilev@gmail.com> wrote:
> > I had quickly drafted the internal Java profiler for pack200 at [1].
> > Here are the results of profiling for 50Mb Eclipse JDT jar, times are
> > microsecs, identation emulates the call tree. Some of the label are
> > not distinguishable, but you may look up probe positions in the patch.
> >
> > Unpack: 38311
> >  segment unpack: 38217
> >   parse segment: 11575
> >     parse header:  0
> >     parse ADB:     78
> >     parse bcbands: 5342
> >       parse1:      453
> >       parse2:      93
> >       select:      252
> >       attrlayout:  0
> >       methods:     3997
> >     parse cbands:  3358
> >       classattr:   1002
> >       code:        1636
> >       fields:      173
> >       methods:     515
> >     parse cpbands: 2483
> >     parse fbands:  63
> >     parse icbands: 16
> >   write jar: 26642
> >     build classf:  21111
> >       sfattrs:     47
> >       cfattrs:     0
> >       fields:      218
> >       interfaces:  0
> >       methods:     362
> >       addNested:   8146
> >       inner:       3051
> >       final:       8774
> >     write classf:  1934
> >       constpool:   1015
> >       interfaces:  0
> >       attributes:  31
> >       methods:     827
> >       fields:      31
> >     write primit:  486
> >
> > That's the point where one can take the method and optimize it locally :)
> >
> > Thanks,
> > Aleksey.
> >
> > [1] https://issues.apache.org/jira/browse/HARMONY-5905
> >
> > On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
> > <aleksey.shipilev@gmail.com> wrote:
> >> I had disabled the compression in my test to throw away ZIP overhead
> >> and focus on pack200 performance only. Thus the performance data is
> >> not relevant to previous measurements. The data are assumed with
> >> HARMONY-5900 incorporated.
> >>
> >> Harmony's pack200: 43 secs (3.5 Mb/secs)
> >> Sun's pack200: 9 secs (16.6 Mb/secs)
> >>
> >> Profile:
> >>
> >> 22.0% java.util.HashMap.*
> >> 11.4% java.io.FileInputStream.readBytes()
> >>  7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
> >>  6.9% java.util.zip.*
> >>  5.6% o.a.h.pack200.BHSDCodec.decode()
> >>  4.8% java.lang.*
> >>  4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples()
> >>  3.9%
> o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
> >>  3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other)
> >>  3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
> >>  2.8% java.io.FileOutputStream.writeBytes()
> >>  2.8% o.a.h.unpack200.bytecode.ByteCode.*
> >>  2.75% java.util.TreeMap.*
> >>
> >> Note ArrayList is gone!
> >> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples() and
> >> NoArgumentForm.setByteCodeOperands() are next candidates for tuning.
> >> After that, the performance improvement is not possible without deep
> >> changes, like overall algorithmic improvements. Anyway, that should be
> >> first, but I'm not familiar with the code yet. This can't stop us
> >> though ;)
> >>
> >> Thanks,
> >> Aleksey.
> >>
> >> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <
> sianjanuary@googlemail.com> wrote:
> >>> Thanks for doing that Aleksey.  In fact I think Sun's was 20 or 30
> times
> >>> faster before we started doing any performance optimizations, but it
> looks
> >>> like there's still some ground that we could make up!
> >>>
> >>>
> >>>
> >>> On 08/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I took the liberty of profiling of pack200 implementation on unpacking
> >>>> scenario. Source data was obtained from Eclipse JDT jars, repacked in
> >>>> single 60 Mb jar file, then packed with pack200 from Sun's JDK (-E9
> >>>> used), resulting in 20 Mb pack200-compressed file. Then Sun JDK
> >>>> 1.6.0_05 (Windows, -server) was used together with hprof (cpu=time)
to
> >>>> obtain the profile. My patch from HARMONY-5900 is onboard. The head
of
> >>>> the profile looks like this:
> >>>>
> >>>> 4.76%
> org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
> >>>> 4.22% java.util.HashMap.getEntry
> >>>> 2.99% java.util.AbstractList$Itr.next
> >>>> 2.92% java.util.AbstractList$Itr.hasNext
> >>>> 2.84% java.util.ArrayList.get
> >>>> 2.43% java.util.AbstractList$Itr.next
> >>>> 2.41% java.util.HashMap.containsKey
> >>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
> >>>> 2.00% java.util.HashSet.contains
> >>>> 1.57% java.io.DataOutputStream.writeUTF
> >>>>
> >>>> Composite occupancy:
> >>>>
> >>>> 18.4% java.util.AbstractList
> >>>> 18.0% java.util.HashMap
> >>>> 15.8% java.util.ArrayList
> >>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.*
> >>>> 5.3%  o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly)
> >>>> 4.5% java.io.*
> >>>> 4.5% java.lang.String.*
> >>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
> >>>> 3.9%  o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
> >>>> 14.7% other
> >>>>
> >>>> So the main concern is Collections usage. ClassConstantPool uses Lists
> >>>> excessively, so I suspect the significant amount of time is spent
> >>>> there.
> >>>>
> >>>> NB:
> >>>> Timings for the scenario (the less the better):
> >>>> Harmony's pack200: 67 secs
> >>>> Sun's pack200: 6 secs
> >>>>
> >>>> Yep, 10 times faster.
> >>>>
> >>>> Thanks,
> >>>> Aleksey.
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Unless stated otherwise above:
> >>> IBM United Kingdom Limited - Registered in England and Wales with
> number
> >>> 741598.
> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message