harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [classlib][pack200][performance] Profiling unpacking scenario
Date Tue, 05 Aug 2008 23:37:46 GMT
Andrew, I believe that's a side effect of HARMONY-5906, see separate thread :)

Thanks,
Aleksey.

On Wed, Aug 6, 2008 at 3:22 AM, Andrew Cornwall
<andrew.pack200@gmail.com> wrote:
> I don't see any differences in results with your patch [5]=5931 when running
> against my big batch of tests. That surprises me, honestly - I think when I
> did my test I commented out the "nestedEntries.addAll(byteCodes)" in
> CodeAttribute.getNestedClassFileEntries(), and doing that certainly causes
> problems.
>
> I guess once the nested entries of the ByteCodes have been figured out and
> added to the class pool, the bytecodes themselves are no longer needed...
> which is why your patch works and mine didn't.
>
> Cool stuff!
>
> On Tue, Aug 5, 2008 at 3:44 PM, Andrew Cornwall <andrew.pack200@gmail.com>wrote:
>
>> I'm not sure about [5]=5931 - if I remember right, I'd taken out ByteCode
>> processing in add(), and that led to problems resolving bytecodes which
>> referred to things in the classpool. But that was a while ago; I'm
>> interested in trying it and seeing if there are any differences in my wad of
>> 40M or so testcases.
>>
>> The only other patch I've looked at so far has been [2]=5928. That's a
>> nifty change.
>>
>> All in all, it looks like you've done a lot of good work!
>>
>>
>>
>> On Tue, Aug 5, 2008 at 3:16 PM, Aleksey Shipilev <
>> aleksey.shipilev@gmail.com> wrote:
>>
>>> Sian, Andrew,
>>>
>>> An update here. I had updated the profiler [1]  and run it over on JDT
>>> unpacking scenario several times. That's what we have today (times are
>>> msecs, indentation resembles call hierarchy):
>>>
>>> Unpack: 220529
>>>
>>>  segment read:  28853
>>>   cpBands:     13791
>>>   adBands:     214
>>>   icBands:     343
>>>   cbBands:     9158
>>>   bcBands:     4694
>>>   fbBands:     118
>>>   fBits:       407
>>>
>>>  segment parse: 176929
>>>   header:      0
>>>   cpBands:     0
>>>   adBands:     0
>>>   icBands:     1
>>>   cbBands:     0
>>>
>>>   bcBands:     77245
>>>      exceptn:  656
>>>      newCA:    71145
>>>       getBC:   16372
>>>       extOpnd: 26776
>>>       fixup:   1885
>>>      methAttr: 591
>>>      curAttr:  1808
>>>
>>>   fbBands:     0
>>>
>>>   buildCF:     82057
>>>     ccp.addN:  36182
>>>     ccp.addNW: 433
>>>     ccp.resv:  27452
>>>     ic.getIC:  11717
>>>
>>>   cfWrite:     17379
>>>
>>>  segment write: 14349
>>>
>>>
>>> As you can see in commits, I had filed a couple of JIRAs with the
>>> bunch of pack200 optimizations [2,3,4,5], here what I got with all
>>> them applied:
>>>
>>> Unpack: 193165 (-14% in total)
>>>
>>>  segment read:  28334
>>>   cpBands:     13034
>>>   adBands:     249
>>>   icBands:     341
>>>   cbBands:     9299
>>>   bcBands:     4645
>>>   fbBands:     169
>>>   fBits:       459
>>>
>>>  segment parse: 150032
>>>   header:      1
>>>   cpBands:     0
>>>   adBands:     0
>>>   icBands:     82
>>>   cbBands:     0
>>>
>>>   bcBands:     73936
>>>      exceptn:  633
>>>      newCA:    67871
>>>       getBC:   13874    <--- (-18% due to [2])
>>>       extOpnd: 26583
>>>       fixup:   1808
>>>      methAttr: 615
>>>      curAttr:  1663
>>>
>>>   fbBands:     0
>>>
>>>   buildCF:     58319
>>>     ccp.addN:  26199  <---- (-38% due to [5])
>>>     ccp.addNW: 424
>>>     ccp.resv:  23642   <---- (-16% due to [5])
>>>     ic.getIC:  2245    <--- (-80% due to [3,4])
>>>
>>>   cfWrite:     17463
>>>
>>>  segment write: 14413
>>>
>>>
>>> Of course, the boosts are diminished with the performance overheads of
>>> profiling. But still, this profile gives pretty good insight on what's
>>> going on. CodeAttribute ["newCA" is the "new CodeAttribute(...)"] is
>>> the next candidate for optimization, I guess.
>>>
>>> Sian, Andrew, can you please review the patches? I'm particularly
>>> interested in [5], because it's proof-of-concept and kind of
>>> controversial.
>>>
>>> Thanks,
>>> Aleksey.
>>>
>>> [1] "classlib][pack200] Internal profiler for pack200"
>>> https://issues.apache.org/jira/browse/HARMONY-5905
>>>
>>> [2] [classlib][pack200][performance] java.util.HashMap usage optimization
>>> https://issues.apache.org/jira/browse/HARMONY-5928
>>>
>>> [3] [classlib][pack200][performance] Segment.computeIcStored rewrite
>>> https://issues.apache.org/jira/browse/HARMONY-5929
>>>
>>> [4] [classlib][pack200][performance] IcBands.getRelevantIcTuples rewrite
>>> https://issues.apache.org/jira/browse/HARMONY-5930
>>>
>>> [5] [classlib][pack200][performance] Some ClassConstantPool content
>>> may not be needed
>>> https://issues.apache.org/jira/browse/HARMONY-5931
>>>
>>>
>>> On Thu, Jul 10, 2008 at 7:59 PM, Aleksey Shipilev
>>> <aleksey.shipilev@gmail.com> wrote:
>>> > I had quickly drafted the internal Java profiler for pack200 at [1].
>>> > Here are the results of profiling for 50Mb Eclipse JDT jar, times are
>>> > microsecs, identation emulates the call tree. Some of the label are
>>> > not distinguishable, but you may look up probe positions in the patch.
>>> >
>>> > Unpack: 38311
>>> >  segment unpack: 38217
>>> >   parse segment: 11575
>>> >     parse header:  0
>>> >     parse ADB:     78
>>> >     parse bcbands: 5342
>>> >       parse1:      453
>>> >       parse2:      93
>>> >       select:      252
>>> >       attrlayout:  0
>>> >       methods:     3997
>>> >     parse cbands:  3358
>>> >       classattr:   1002
>>> >       code:        1636
>>> >       fields:      173
>>> >       methods:     515
>>> >     parse cpbands: 2483
>>> >     parse fbands:  63
>>> >     parse icbands: 16
>>> >   write jar: 26642
>>> >     build classf:  21111
>>> >       sfattrs:     47
>>> >       cfattrs:     0
>>> >       fields:      218
>>> >       interfaces:  0
>>> >       methods:     362
>>> >       addNested:   8146
>>> >       inner:       3051
>>> >       final:       8774
>>> >     write classf:  1934
>>> >       constpool:   1015
>>> >       interfaces:  0
>>> >       attributes:  31
>>> >       methods:     827
>>> >       fields:      31
>>> >     write primit:  486
>>> >
>>> > That's the point where one can take the method and optimize it locally
>>> :)
>>> >
>>> > Thanks,
>>> > Aleksey.
>>> >
>>> > [1] https://issues.apache.org/jira/browse/HARMONY-5905
>>> >
>>> > On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
>>> > <aleksey.shipilev@gmail.com> wrote:
>>> >> I had disabled the compression in my test to throw away ZIP overhead
>>> >> and focus on pack200 performance only. Thus the performance data is
>>> >> not relevant to previous measurements. The data are assumed with
>>> >> HARMONY-5900 incorporated.
>>> >>
>>> >> Harmony's pack200: 43 secs (3.5 Mb/secs)
>>> >> Sun's pack200: 9 secs (16.6 Mb/secs)
>>> >>
>>> >> Profile:
>>> >>
>>> >> 22.0% java.util.HashMap.*
>>> >> 11.4% java.io.FileInputStream.readBytes()
>>> >>  7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
>>> >>  6.9% java.util.zip.*
>>> >>  5.6% o.a.h.pack200.BHSDCodec.decode()
>>> >>  4.8% java.lang.*
>>> >>  4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples()
>>> >>  3.9%
>>> o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
>>> >>  3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other)
>>> >>  3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
>>> >>  2.8% java.io.FileOutputStream.writeBytes()
>>> >>  2.8% o.a.h.unpack200.bytecode.ByteCode.*
>>> >>  2.75% java.util.TreeMap.*
>>> >>
>>> >> Note ArrayList is gone!
>>> >> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples() and
>>> >> NoArgumentForm.setByteCodeOperands() are next candidates for tuning.
>>> >> After that, the performance improvement is not possible without deep
>>> >> changes, like overall algorithmic improvements. Anyway, that should
be
>>> >> first, but I'm not familiar with the code yet. This can't stop us
>>> >> though ;)
>>> >>
>>> >> Thanks,
>>> >> Aleksey.
>>> >>
>>> >> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <
>>> sianjanuary@googlemail.com> wrote:
>>> >>> Thanks for doing that Aleksey.  In fact I think Sun's was 20 or
30
>>> times
>>> >>> faster before we started doing any performance optimizations, but
it
>>> looks
>>> >>> like there's still some ground that we could make up!
>>> >>>
>>> >>>
>>> >>>
>>> >>> On 08/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I took the liberty of profiling of pack200 implementation on
>>> unpacking
>>> >>>> scenario. Source data was obtained from Eclipse JDT jars, repacked
in
>>> >>>> single 60 Mb jar file, then packed with pack200 from Sun's JDK
(-E9
>>> >>>> used), resulting in 20 Mb pack200-compressed file. Then Sun
JDK
>>> >>>> 1.6.0_05 (Windows, -server) was used together with hprof (cpu=time)
>>> to
>>> >>>> obtain the profile. My patch from HARMONY-5900 is onboard. The
head
>>> of
>>> >>>> the profile looks like this:
>>> >>>>
>>> >>>> 4.76%
>>> org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
>>> >>>> 4.22% java.util.HashMap.getEntry
>>> >>>> 2.99% java.util.AbstractList$Itr.next
>>> >>>> 2.92% java.util.AbstractList$Itr.hasNext
>>> >>>> 2.84% java.util.ArrayList.get
>>> >>>> 2.43% java.util.AbstractList$Itr.next
>>> >>>> 2.41% java.util.HashMap.containsKey
>>> >>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
>>> >>>> 2.00% java.util.HashSet.contains
>>> >>>> 1.57% java.io.DataOutputStream.writeUTF
>>> >>>>
>>> >>>> Composite occupancy:
>>> >>>>
>>> >>>> 18.4% java.util.AbstractList
>>> >>>> 18.0% java.util.HashMap
>>> >>>> 15.8% java.util.ArrayList
>>> >>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.*
>>> >>>> 5.3%  o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly)
>>> >>>> 4.5% java.io.*
>>> >>>> 4.5% java.lang.String.*
>>> >>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
>>> >>>> 3.9%  o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
>>> >>>> 14.7% other
>>> >>>>
>>> >>>> So the main concern is Collections usage. ClassConstantPool
uses
>>> Lists
>>> >>>> excessively, so I suspect the significant amount of time is
spent
>>> >>>> there.
>>> >>>>
>>> >>>> NB:
>>> >>>> Timings for the scenario (the less the better):
>>> >>>> Harmony's pack200: 67 secs
>>> >>>> Sun's pack200: 6 secs
>>> >>>>
>>> >>>> Yep, 10 times faster.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Aleksey.
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Unless stated otherwise above:
>>> >>> IBM United Kingdom Limited - Registered in England and Wales with
>>> number
>>> >>> 741598.
>>> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
PO6
>>> 3AU
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Mime
View raw message