harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sian January" <sianjanu...@googlemail.com>
Subject Re: [classlib][pack200][performance] Profiling unpacking scenario
Date Wed, 13 Aug 2008 13:47:30 GMT
Hi Aleksey,

I've reviewed those four JIRAs now.  5928 and 5931 look good - I'm
happy to commit those after the code freeze is over. Re. 5931 -
Bytecodes do not need to be in the ClassConstantPool, but their
children do. I did try not adding them a while ago, but it didn't work
because their children didn't get added properly.  It looks like your
rewrite of addNestedEntries() has fixed that now, so that's looking
good now.

5929 and 5930 do not pass the long-running tests for me.  Are you
still able to run those and investigate in more detail?

Thanks,

Sian

On 11/08/2008, Sian January <sianjanuary@googlemail.com> wrote:
> Hi Aleksey, Andrew,
>
> Just a quick note to say I've been out of the office on vacation for the
> last two weeks, but I'll defnitely try and review those patches either today
> or in the next couple of days.
>
> Thanks,
>
> Sian
>
>
> On 06/08/2008, Andrew Cornwall <andrew.pack200@gmail.com> wrote:
> > I just ran my batch of testcases with all 4 of your changes ([2] through
> > [5]). They appear to generate the same output as without the changes, but
> at
> > improved speed. Based on this, I'd recommend we incorporate them in HEAD.
> > Sian, do you see similar behaviour?
> >
> > On Tue, Aug 5, 2008 at 4:22 PM, Andrew Cornwall
> <andrew.pack200@gmail.com>wrote:
> >
> > > I don't see any differences in results with your patch [5]=5931 when
> > > running against my big batch of tests. That surprises me, honestly - I
> think
> > > when I did my test I commented out the
> "nestedEntries.addAll(byteCodes)" in
> > > CodeAttribute.getNestedClassFileEntries(), and doing
> that certainly causes
> > > problems.
> > >
> > > I guess once the nested entries of the ByteCodes have been figured out
> and
> > > added to the class pool, the bytecodes themselves are no longer
> needed...
> > > which is why your patch works and mine didn't.
> > >
> > > Cool stuff!
> > >
> > >
> > > On Tue, Aug 5, 2008 at 3:44 PM, Andrew Cornwall
> <andrew.pack200@gmail.com>wrote:
> > >
> > >> I'm not sure about [5]=5931 - if I remember right, I'd taken out
> ByteCode
> > >> processing in add(), and that led to problems resolving bytecodes which
> > >> referred to things in the classpool. But that was a while ago; I'm
> > >> interested in trying it and seeing if there are any differences in my
> wad of
> > >> 40M or so testcases.
> > >>
> > >> The only other patch I've looked at so far has been [2]=5928. That's a
> > >> nifty change.
> > >>
> > >> All in all, it looks like you've done a lot of good work!
> > >>
> > >>
> > >>
> > >> On Tue, Aug 5, 2008 at 3:16 PM, Aleksey Shipilev <
> > >> aleksey.shipilev@gmail.com> wrote:
> > >>
> > >>> Sian, Andrew,
> > >>>
> > >>> An update here. I had updated the profiler [1]  and run it over on
JDT
> > >>> unpacking scenario several times. That's what we have today (times
are
> > >>> msecs, indentation resembles call hierarchy):
> > >>>
> > >>> Unpack: 220529
> > >>>
> > >>>  segment read:  28853
> > >>>   cpBands:     13791
> > >>>   adBands:     214
> > >>>   icBands:     343
> > >>>   cbBands:     9158
> > >>>   bcBands:     4694
> > >>>   fbBands:     118
> > >>>   fBits:       407
> > >>>
> > >>>  segment parse: 176929
> > >>>   header:      0
> > >>>   cpBands:     0
> > >>>   adBands:     0
> > >>>   icBands:     1
> > >>>   cbBands:     0
> > >>>
> > >>>   bcBands:     77245
> > >>>      exceptn:  656
> > >>>      newCA:    71145
> > >>>       getBC:   16372
> > >>>       extOpnd: 26776
> > >>>       fixup:   1885
> > >>>      methAttr: 591
> > >>>      curAttr:  1808
> > >>>
> > >>>   fbBands:     0
> > >>>
> > >>>   buildCF:     82057
> > >>>     ccp.addN:  36182
> > >>>     ccp.addNW: 433
> > >>>     ccp.resv:  27452
> > >>>     ic.getIC:  11717
> > >>>
> > >>>   cfWrite:     17379
> > >>>
> > >>>  segment write: 14349
> > >>>
> > >>>
> > >>> As you can see in commits, I had filed a couple of JIRAs with the
> > >>> bunch of pack200 optimizations [2,3,4,5], here what I got with all
> > >>> them applied:
> > >>>
> > >>> Unpack: 193165 (-14% in total)
> > >>>
> > >>>  segment read:  28334
> > >>>   cpBands:     13034
> > >>>   adBands:     249
> > >>>   icBands:     341
> > >>>   cbBands:     9299
> > >>>   bcBands:     4645
> > >>>   fbBands:     169
> > >>>   fBits:       459
> > >>>
> > >>>  segment parse: 150032
> > >>>   header:      1
> > >>>   cpBands:     0
> > >>>   adBands:     0
> > >>>   icBands:     82
> > >>>   cbBands:     0
> > >>>
> > >>>   bcBands:     73936
> > >>>      exceptn:  633
> > >>>      newCA:    67871
> > >>>       getBC:   13874    <--- (-18% due to [2])
> > >>>       extOpnd: 26583
> > >>>       fixup:   1808
> > >>>      methAttr: 615
> > >>>      curAttr:  1663
> > >>>
> > >>>   fbBands:     0
> > >>>
> > >>>   buildCF:     58319
> > >>>     ccp.addN:  26199  <---- (-38% due to [5])
> > >>>     ccp.addNW: 424
> > >>>     ccp.resv:  23642   <---- (-16% due to [5])
> > >>>     ic.getIC:  2245    <--- (-80% due to [3,4])
> > >>>
> > >>>   cfWrite:     17463
> > >>>
> > >>>  segment write: 14413
> > >>>
> > >>>
> > >>> Of course, the boosts are diminished with the performance overheads
of
> > >>> profiling. But still, this profile gives pretty good insight on what's
> > >>> going on. CodeAttribute ["newCA" is the "new CodeAttribute(...)"] is
> > >>> the next candidate for optimization, I guess.
> > >>>
> > >>> Sian, Andrew, can you please review the patches? I'm particularly
> > >>> interested in [5], because it's proof-of-concept and kind of
> > >>> controversial.
> > >>>
> > >>> Thanks,
> > >>> Aleksey.
> > >>>
> > >>> [1] "classlib][pack200] Internal profiler for pack200"
> > >>> https://issues.apache.org/jira/browse/HARMONY-5905
> > >>>
> > >>> [2] [classlib][pack200][performance]
> java.util.HashMap usage optimization
> > >>> https://issues.apache.org/jira/browse/HARMONY-5928
> > >>>
> > >>> [3] [classlib][pack200][performance]
> Segment.computeIcStored rewrite
> > >>> https://issues.apache.org/jira/browse/HARMONY-5929
> > >>>
> > >>> [4] [classlib][pack200][performance]
> IcBands.getRelevantIcTuples rewrite
> > >>> https://issues.apache.org/jira/browse/HARMONY-5930
> > >>>
> > >>> [5] [classlib][pack200][performance] Some
> ClassConstantPool content
> > >>> may not be needed
> > >>> https://issues.apache.org/jira/browse/HARMONY-5931
> > >>>
> > >>>
> > >>> On Thu, Jul 10, 2008 at 7:59 PM, Aleksey Shipilev
> > >>> <aleksey.shipilev@gmail.com> wrote:
> > >>> > I had quickly drafted the internal Java profiler for pack200 at
[1].
> > >>> > Here are the results of profiling for 50Mb Eclipse JDT jar, times
> are
> > >>> > microsecs, identation emulates the call tree. Some of the label
are
> > >>> > not distinguishable, but you may look up probe positions in the
> patch.
> > >>> >
> > >>> > Unpack: 38311
> > >>> >  segment unpack: 38217
> > >>> >   parse segment: 11575
> > >>> >     parse header:  0
> > >>> >     parse ADB:     78
> > >>> >     parse bcbands: 5342
> > >>> >       parse1:      453
> > >>> >       parse2:      93
> > >>> >       select:      252
> > >>> >       attrlayout:  0
> > >>> >       methods:     3997
> > >>> >     parse cbands:  3358
> > >>> >       classattr:   1002
> > >>> >       code:        1636
> > >>> >       fields:      173
> > >>> >       methods:     515
> > >>> >     parse cpbands: 2483
> > >>> >     parse fbands:  63
> > >>> >     parse icbands: 16
> > >>> >   write jar: 26642
> > >>> >     build classf:  21111
> > >>> >       sfattrs:     47
> > >>> >       cfattrs:     0
> > >>> >       fields:      218
> > >>> >       interfaces:  0
> > >>> >       methods:     362
> > >>> >       addNested:   8146
> > >>> >       inner:       3051
> > >>> >       final:       8774
> > >>> >     write classf:  1934
> > >>> >       constpool:   1015
> > >>> >       interfaces:  0
> > >>> >       attributes:  31
> > >>> >       methods:     827
> > >>> >       fields:      31
> > >>> >     write primit:  486
> > >>> >
> > >>> > That's the point where one can take the method and optimize it
> locally
> > >>> :)
> > >>> >
> > >>> > Thanks,
> > >>> > Aleksey.
> > >>> >
> > >>> > [1]
> https://issues.apache.org/jira/browse/HARMONY-5905
> > >>> >
> > >>> > On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
> > >>> > <aleksey.shipilev@gmail.com> wrote:
> > >>> >> I had disabled the compression in my test to throw away ZIP
> overhead
> > >>> >> and focus on pack200 performance only. Thus the performance
data is
> > >>> >> not relevant to previous measurements. The data are assumed
with
> > >>> >> HARMONY-5900 incorporated.
> > >>> >>
> > >>> >> Harmony's pack200: 43 secs (3.5 Mb/secs)
> > >>> >> Sun's pack200: 9 secs (16.6 Mb/secs)
> > >>> >>
> > >>> >> Profile:
> > >>> >>
> > >>> >> 22.0% java.util.HashMap.*
> > >>> >> 11.4% java.io.FileInputStream.readBytes()
> > >>> >>  7.5%
> o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
> > >>> >>  6.9% java.util.zip.*
> > >>> >>  5.6% o.a.h.pack200.BHSDCodec.decode()
> > >>> >>  4.8% java.lang.*
> > >>> >>  4.4%
> o.a.h.unpack200.IcBands.getRelevantIcTuples()
> > >>> >>  3.9%
> > >>>
> o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
> > >>> >>  3.2% o.a.h.unpack200.bytecode.ClassConstantPool.*
> (other)
> > >>> >>  3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
> > >>> >>  2.8% java.io.FileOutputStream.writeBytes()
> > >>> >>  2.8% o.a.h.unpack200.bytecode.ByteCode.*
> > >>> >>  2.75% java.util.TreeMap.*
> > >>> >>
> > >>> >> Note ArrayList is gone!
> > >>> >> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples()
and
> > >>> >> NoArgumentForm.setByteCodeOperands() are next
> candidates for tuning.
> > >>> >> After that, the performance improvement is not possible without
> deep
> > >>> >> changes, like overall algorithmic improvements. Anyway, that
should
> be
> > >>> >> first, but I'm not familiar with the code yet. This can't
stop us
> > >>> >> though ;)
> > >>> >>
> > >>> >> Thanks,
> > >>> >> Aleksey.
> > >>> >>
> > >>> >> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <
> > >>> sianjanuary@googlemail.com> wrote:
> > >>> >>> Thanks for doing that Aleksey.  In fact I think Sun's
was 20 or 30
> > >>> times
> > >>> >>> faster before we started doing any performance optimizations,
but
> it
> > >>> looks
> > >>> >>> like there's still some ground that we could make up!
> > >>> >>>
> > >>> >>>
> > >>> >>>
> > >>> >>> On 08/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com>
> wrote:
> > >>> >>>>
> > >>> >>>> Hi,
> > >>> >>>>
> > >>> >>>> I took the liberty of profiling of pack200 implementation
on
> > >>> unpacking
> > >>> >>>> scenario. Source data was obtained from Eclipse JDT
jars,
> repacked
> > >>> in
> > >>> >>>> single 60 Mb jar file, then packed with pack200 from
Sun's JDK
> (-E9
> > >>> >>>> used), resulting in 20 Mb pack200-compressed file.
Then Sun JDK
> > >>> >>>> 1.6.0_05 (Windows, -server) was used together with
hprof
> (cpu=time)
> > >>> to
> > >>> >>>> obtain the profile. My patch from HARMONY-5900 is
onboard. The
> head
> > >>> of
> > >>> >>>> the profile looks like this:
> > >>> >>>>
> > >>> >>>> 4.76%
> > >>>
> org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
> > >>> >>>> 4.22% java.util.HashMap.getEntry
> > >>> >>>> 2.99% java.util.AbstractList$Itr.next
> > >>> >>>> 2.92% java.util.AbstractList$Itr.hasNext
> > >>> >>>> 2.84% java.util.ArrayList.get
> > >>> >>>> 2.43% java.util.AbstractList$Itr.next
> > >>> >>>> 2.41% java.util.HashMap.containsKey
> > >>> >>>> 2.15%
> org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
> > >>> >>>> 2.00% java.util.HashSet.contains
> > >>> >>>> 1.57% java.io.DataOutputStream.writeUTF
> > >>> >>>>
> > >>> >>>> Composite occupancy:
> > >>> >>>>
> > >>> >>>> 18.4% java.util.AbstractList
> > >>> >>>> 18.0% java.util.HashMap
> > >>> >>>> 15.8% java.util.ArrayList
> > >>> >>>> 10.5%
> o.a.h.unpack200.bytecode.ClassConstantPool.*
> > >>> >>>> 5.3%  o.a.h.unpack200.bytecode.CPUTF8.*
> (hashcode mostly)
> > >>> >>>> 4.5% java.io.*
> > >>> >>>> 4.5% java.lang.String.*
> > >>> >>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
> > >>> >>>> 3.9%  o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
> > >>> >>>> 14.7% other
> > >>> >>>>
> > >>> >>>> So the main concern is Collections usage. ClassConstantPool
uses
> > >>> Lists
> > >>> >>>> excessively, so I suspect the significant amount of
time is spent
> > >>> >>>> there.
> > >>> >>>>
> > >>> >>>> NB:
> > >>> >>>> Timings for the scenario (the less the better):
> > >>> >>>> Harmony's pack200: 67 secs
> > >>> >>>> Sun's pack200: 6 secs
> > >>> >>>>
> > >>> >>>> Yep, 10 times faster.
> > >>> >>>>
> > >>> >>>> Thanks,
> > >>> >>>> Aleksey.
> > >>> >>>>
> > >>> >>>
> > >>> >>>
> > >>> >>>
> > >>> >>> --
> > >>> >>> Unless stated otherwise above:
> > >>> >>> IBM United Kingdom Limited - Registered in England and
Wales with
> > >>> number
> > >>> >>> 741598.
> > >>> >>> Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire
> > >>> PO6 3AU
> > >>> >>>
> > >>> >>
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


-- 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mime
View raw message