harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sian January" <sianjanu...@googlemail.com>
Subject Re: [classlib][pack200][performance] Profiling unpacking scenario
Date Mon, 11 Aug 2008 09:16:33 GMT
Hi Aleksey, Andrew,

Just a quick note to say I've been out of the office on vacation for the
last two weeks, but I'll defnitely try and review those patches either today
or in the next couple of days.

Thanks,

Sian


On 06/08/2008, Andrew Cornwall <andrew.pack200@gmail.com> wrote:
>
> I just ran my batch of testcases with all 4 of your changes ([2] through
> [5]). They appear to generate the same output as without the changes, but
> at
> improved speed. Based on this, I'd recommend we incorporate them in HEAD.
> Sian, do you see similar behaviour?
>
> On Tue, Aug 5, 2008 at 4:22 PM, Andrew Cornwall <andrew.pack200@gmail.com
> >wrote:
>
> > I don't see any differences in results with your patch [5]=5931 when
> > running against my big batch of tests. That surprises me, honestly - I
> think
> > when I did my test I commented out the "nestedEntries.addAll(byteCodes)"
> in
> > CodeAttribute.getNestedClassFileEntries(), and doing that certainly
> causes
> > problems.
> >
> > I guess once the nested entries of the ByteCodes have been figured out
> and
> > added to the class pool, the bytecodes themselves are no longer needed...
> > which is why your patch works and mine didn't.
> >
> > Cool stuff!
> >
> >
> > On Tue, Aug 5, 2008 at 3:44 PM, Andrew Cornwall <
> andrew.pack200@gmail.com>wrote:
> >
> >> I'm not sure about [5]=5931 - if I remember right, I'd taken out
> ByteCode
> >> processing in add(), and that led to problems resolving bytecodes which
> >> referred to things in the classpool. But that was a while ago; I'm
> >> interested in trying it and seeing if there are any differences in my
> wad of
> >> 40M or so testcases.
> >>
> >> The only other patch I've looked at so far has been [2]=5928. That's a
> >> nifty change.
> >>
> >> All in all, it looks like you've done a lot of good work!
> >>
> >>
> >>
> >> On Tue, Aug 5, 2008 at 3:16 PM, Aleksey Shipilev <
> >> aleksey.shipilev@gmail.com> wrote:
> >>
> >>> Sian, Andrew,
> >>>
> >>> An update here. I had updated the profiler [1]  and run it over on JDT
> >>> unpacking scenario several times. That's what we have today (times are
> >>> msecs, indentation resembles call hierarchy):
> >>>
> >>> Unpack: 220529
> >>>
> >>>  segment read:  28853
> >>>   cpBands:     13791
> >>>   adBands:     214
> >>>   icBands:     343
> >>>   cbBands:     9158
> >>>   bcBands:     4694
> >>>   fbBands:     118
> >>>   fBits:       407
> >>>
> >>>  segment parse: 176929
> >>>   header:      0
> >>>   cpBands:     0
> >>>   adBands:     0
> >>>   icBands:     1
> >>>   cbBands:     0
> >>>
> >>>   bcBands:     77245
> >>>      exceptn:  656
> >>>      newCA:    71145
> >>>       getBC:   16372
> >>>       extOpnd: 26776
> >>>       fixup:   1885
> >>>      methAttr: 591
> >>>      curAttr:  1808
> >>>
> >>>   fbBands:     0
> >>>
> >>>   buildCF:     82057
> >>>     ccp.addN:  36182
> >>>     ccp.addNW: 433
> >>>     ccp.resv:  27452
> >>>     ic.getIC:  11717
> >>>
> >>>   cfWrite:     17379
> >>>
> >>>  segment write: 14349
> >>>
> >>>
> >>> As you can see in commits, I had filed a couple of JIRAs with the
> >>> bunch of pack200 optimizations [2,3,4,5], here what I got with all
> >>> them applied:
> >>>
> >>> Unpack: 193165 (-14% in total)
> >>>
> >>>  segment read:  28334
> >>>   cpBands:     13034
> >>>   adBands:     249
> >>>   icBands:     341
> >>>   cbBands:     9299
> >>>   bcBands:     4645
> >>>   fbBands:     169
> >>>   fBits:       459
> >>>
> >>>  segment parse: 150032
> >>>   header:      1
> >>>   cpBands:     0
> >>>   adBands:     0
> >>>   icBands:     82
> >>>   cbBands:     0
> >>>
> >>>   bcBands:     73936
> >>>      exceptn:  633
> >>>      newCA:    67871
> >>>       getBC:   13874    <--- (-18% due to [2])
> >>>       extOpnd: 26583
> >>>       fixup:   1808
> >>>      methAttr: 615
> >>>      curAttr:  1663
> >>>
> >>>   fbBands:     0
> >>>
> >>>   buildCF:     58319
> >>>     ccp.addN:  26199  <---- (-38% due to [5])
> >>>     ccp.addNW: 424
> >>>     ccp.resv:  23642   <---- (-16% due to [5])
> >>>     ic.getIC:  2245    <--- (-80% due to [3,4])
> >>>
> >>>   cfWrite:     17463
> >>>
> >>>  segment write: 14413
> >>>
> >>>
> >>> Of course, the boosts are diminished with the performance overheads of
> >>> profiling. But still, this profile gives pretty good insight on what's
> >>> going on. CodeAttribute ["newCA" is the "new CodeAttribute(...)"] is
> >>> the next candidate for optimization, I guess.
> >>>
> >>> Sian, Andrew, can you please review the patches? I'm particularly
> >>> interested in [5], because it's proof-of-concept and kind of
> >>> controversial.
> >>>
> >>> Thanks,
> >>> Aleksey.
> >>>
> >>> [1] "classlib][pack200] Internal profiler for pack200"
> >>> https://issues.apache.org/jira/browse/HARMONY-5905
> >>>
> >>> [2] [classlib][pack200][performance] java.util.HashMap usage
> optimization
> >>> https://issues.apache.org/jira/browse/HARMONY-5928
> >>>
> >>> [3] [classlib][pack200][performance] Segment.computeIcStored rewrite
> >>> https://issues.apache.org/jira/browse/HARMONY-5929
> >>>
> >>> [4] [classlib][pack200][performance] IcBands.getRelevantIcTuples
> rewrite
> >>> https://issues.apache.org/jira/browse/HARMONY-5930
> >>>
> >>> [5] [classlib][pack200][performance] Some ClassConstantPool content
> >>> may not be needed
> >>> https://issues.apache.org/jira/browse/HARMONY-5931
> >>>
> >>>
> >>> On Thu, Jul 10, 2008 at 7:59 PM, Aleksey Shipilev
> >>> <aleksey.shipilev@gmail.com> wrote:
> >>> > I had quickly drafted the internal Java profiler for pack200 at [1].
> >>> > Here are the results of profiling for 50Mb Eclipse JDT jar, times are
> >>> > microsecs, identation emulates the call tree. Some of the label are
> >>> > not distinguishable, but you may look up probe positions in the
> patch.
> >>> >
> >>> > Unpack: 38311
> >>> >  segment unpack: 38217
> >>> >   parse segment: 11575
> >>> >     parse header:  0
> >>> >     parse ADB:     78
> >>> >     parse bcbands: 5342
> >>> >       parse1:      453
> >>> >       parse2:      93
> >>> >       select:      252
> >>> >       attrlayout:  0
> >>> >       methods:     3997
> >>> >     parse cbands:  3358
> >>> >       classattr:   1002
> >>> >       code:        1636
> >>> >       fields:      173
> >>> >       methods:     515
> >>> >     parse cpbands: 2483
> >>> >     parse fbands:  63
> >>> >     parse icbands: 16
> >>> >   write jar: 26642
> >>> >     build classf:  21111
> >>> >       sfattrs:     47
> >>> >       cfattrs:     0
> >>> >       fields:      218
> >>> >       interfaces:  0
> >>> >       methods:     362
> >>> >       addNested:   8146
> >>> >       inner:       3051
> >>> >       final:       8774
> >>> >     write classf:  1934
> >>> >       constpool:   1015
> >>> >       interfaces:  0
> >>> >       attributes:  31
> >>> >       methods:     827
> >>> >       fields:      31
> >>> >     write primit:  486
> >>> >
> >>> > That's the point where one can take the method and optimize it
> locally
> >>> :)
> >>> >
> >>> > Thanks,
> >>> > Aleksey.
> >>> >
> >>> > [1] https://issues.apache.org/jira/browse/HARMONY-5905
> >>> >
> >>> > On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
> >>> > <aleksey.shipilev@gmail.com> wrote:
> >>> >> I had disabled the compression in my test to throw away ZIP overhead
> >>> >> and focus on pack200 performance only. Thus the performance data
is
> >>> >> not relevant to previous measurements. The data are assumed with
> >>> >> HARMONY-5900 incorporated.
> >>> >>
> >>> >> Harmony's pack200: 43 secs (3.5 Mb/secs)
> >>> >> Sun's pack200: 9 secs (16.6 Mb/secs)
> >>> >>
> >>> >> Profile:
> >>> >>
> >>> >> 22.0% java.util.HashMap.*
> >>> >> 11.4% java.io.FileInputStream.readBytes()
> >>> >>  7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
> >>> >>  6.9% java.util.zip.*
> >>> >>  5.6% o.a.h.pack200.BHSDCodec.decode()
> >>> >>  4.8% java.lang.*
> >>> >>  4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples()
> >>> >>  3.9%
> >>> o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
> >>> >>  3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other)
> >>> >>  3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
> >>> >>  2.8% java.io.FileOutputStream.writeBytes()
> >>> >>  2.8% o.a.h.unpack200.bytecode.ByteCode.*
> >>> >>  2.75% java.util.TreeMap.*
> >>> >>
> >>> >> Note ArrayList is gone!
> >>> >> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples()
and
> >>> >> NoArgumentForm.setByteCodeOperands() are next candidates for tuning.
> >>> >> After that, the performance improvement is not possible without
deep
> >>> >> changes, like overall algorithmic improvements. Anyway, that should
> be
> >>> >> first, but I'm not familiar with the code yet. This can't stop
us
> >>> >> though ;)
> >>> >>
> >>> >> Thanks,
> >>> >> Aleksey.
> >>> >>
> >>> >> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <
> >>> sianjanuary@googlemail.com> wrote:
> >>> >>> Thanks for doing that Aleksey.  In fact I think Sun's was 20
or 30
> >>> times
> >>> >>> faster before we started doing any performance optimizations,
but
> it
> >>> looks
> >>> >>> like there's still some ground that we could make up!
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On 08/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
> wrote:
> >>> >>>>
> >>> >>>> Hi,
> >>> >>>>
> >>> >>>> I took the liberty of profiling of pack200 implementation
on
> >>> unpacking
> >>> >>>> scenario. Source data was obtained from Eclipse JDT jars,
repacked
> >>> in
> >>> >>>> single 60 Mb jar file, then packed with pack200 from Sun's
JDK
> (-E9
> >>> >>>> used), resulting in 20 Mb pack200-compressed file. Then
Sun JDK
> >>> >>>> 1.6.0_05 (Windows, -server) was used together with hprof
> (cpu=time)
> >>> to
> >>> >>>> obtain the profile. My patch from HARMONY-5900 is onboard.
The
> head
> >>> of
> >>> >>>> the profile looks like this:
> >>> >>>>
> >>> >>>> 4.76%
> >>> org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
> >>> >>>> 4.22% java.util.HashMap.getEntry
> >>> >>>> 2.99% java.util.AbstractList$Itr.next
> >>> >>>> 2.92% java.util.AbstractList$Itr.hasNext
> >>> >>>> 2.84% java.util.ArrayList.get
> >>> >>>> 2.43% java.util.AbstractList$Itr.next
> >>> >>>> 2.41% java.util.HashMap.containsKey
> >>> >>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
> >>> >>>> 2.00% java.util.HashSet.contains
> >>> >>>> 1.57% java.io.DataOutputStream.writeUTF
> >>> >>>>
> >>> >>>> Composite occupancy:
> >>> >>>>
> >>> >>>> 18.4% java.util.AbstractList
> >>> >>>> 18.0% java.util.HashMap
> >>> >>>> 15.8% java.util.ArrayList
> >>> >>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.*
> >>> >>>> 5.3%  o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly)
> >>> >>>> 4.5% java.io.*
> >>> >>>> 4.5% java.lang.String.*
> >>> >>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
> >>> >>>> 3.9%  o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
> >>> >>>> 14.7% other
> >>> >>>>
> >>> >>>> So the main concern is Collections usage. ClassConstantPool
uses
> >>> Lists
> >>> >>>> excessively, so I suspect the significant amount of time
is spent
> >>> >>>> there.
> >>> >>>>
> >>> >>>> NB:
> >>> >>>> Timings for the scenario (the less the better):
> >>> >>>> Harmony's pack200: 67 secs
> >>> >>>> Sun's pack200: 6 secs
> >>> >>>>
> >>> >>>> Yep, 10 times faster.
> >>> >>>>
> >>> >>>> Thanks,
> >>> >>>> Aleksey.
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Unless stated otherwise above:
> >>> >>> IBM United Kingdom Limited - Registered in England and Wales
with
> >>> number
> >>> >>> 741598.
> >>> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> >>> PO6 3AU
> >>> >>>
> >>> >>
> >>> >
> >>>
> >>
> >>
> >
>



-- 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message