Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 20225 invoked from network); 6 Aug 2008 00:05:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Aug 2008 00:05:16 -0000 Received: (qmail 30921 invoked by uid 500); 6 Aug 2008 00:05:13 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 30894 invoked by uid 500); 6 Aug 2008 00:05:13 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 30883 invoked by uid 99); 6 Aug 2008 00:05:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Aug 2008 17:05:13 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrew.pack200@gmail.com designates 209.85.134.189 as permitted sender) Received: from [209.85.134.189] (HELO mu-out-0910.google.com) (209.85.134.189) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2008 00:04:18 +0000 Received: by mu-out-0910.google.com with SMTP id w9so2415962mue.0 for ; Tue, 05 Aug 2008 17:04:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=YbSkyYzK+oXGAAE3HiUt6IqHY2q7TyEnHz5XfyciRU8=; b=ivm3PFWHRpFqFD9c/P2+DJH4pGCsh/f8au/gTqvF0MVa5fx/lSDn81REEPCSE7yKJD 2pr7VpsT1TvQ3JDcI/wR8V9cW7p5WrQtHAcQ0W2wZflgYMKeLmtoDO9Sm94Skfp8fnpp 2RAFJG6zGCSaqLP2usYT0RviB7xBEs2FfShfw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=KIDyfEr0fdOb/KkFic4ACoOawptmAJXPlGDTj8PDBc7j/UvPSMOXapznMULHwNeVF+ EOD5nlQglbo2mtHy8Kn8FzPx+LVoVJGMayzZwk9/r6WK/NGYE+niubNg1g2YbuEsHdgl YguKobrHYow+j3FekBzmNM0Yt8bsoABkScGLs= Received: by 10.103.224.17 with SMTP id b17mr1059403mur.16.1217981066796; Tue, 05 Aug 2008 17:04:26 -0700 (PDT) Received: by 10.103.12.9 with HTTP; Tue, 5 Aug 2008 17:04:26 -0700 (PDT) Message-ID: <6aa6e3690808051704n776b6d32s97ae44acae08f017@mail.gmail.com> Date: Tue, 5 Aug 2008 17:04:26 -0700 From: "Andrew Cornwall" To: dev@harmony.apache.org Subject: Re: [classlib][pack200][performance] Profiling unpacking scenario In-Reply-To: <6aa6e3690808051622r411df7b1t8eb2ad77da9e2066@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_38683_20405622.1217981066811" References: <4bebff790807081043g4d42c754q53ff0006d2a170bf@mail.gmail.com> <4bebff790807091020v3bc0d6eaj83e3a3b33892bb9c@mail.gmail.com> <4bebff790807100859s5d36ea11w5fe128be3a5375bd@mail.gmail.com> <4bebff790808051516x7916173awaab7a57d59d5c098@mail.gmail.com> <6aa6e3690808051544s152165dcu3c41434d6539db40@mail.gmail.com> <6aa6e3690808051622r411df7b1t8eb2ad77da9e2066@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_38683_20405622.1217981066811 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I just ran my batch of testcases with all 4 of your changes ([2] through [5]). They appear to generate the same output as without the changes, but at improved speed. Based on this, I'd recommend we incorporate them in HEAD. Sian, do you see similar behaviour? On Tue, Aug 5, 2008 at 4:22 PM, Andrew Cornwall wrote: > I don't see any differences in results with your patch [5]=5931 when > running against my big batch of tests. That surprises me, honestly - I think > when I did my test I commented out the "nestedEntries.addAll(byteCodes)" in > CodeAttribute.getNestedClassFileEntries(), and doing that certainly causes > problems. > > I guess once the nested entries of the ByteCodes have been figured out and > added to the class pool, the bytecodes themselves are no longer needed... > which is why your patch works and mine didn't. > > Cool stuff! > > > On Tue, Aug 5, 2008 at 3:44 PM, Andrew Cornwall wrote: > >> I'm not sure about [5]=5931 - if I remember right, I'd taken out ByteCode >> processing in add(), and that led to problems resolving bytecodes which >> referred to things in the classpool. But that was a while ago; I'm >> interested in trying it and seeing if there are any differences in my wad of >> 40M or so testcases. >> >> The only other patch I've looked at so far has been [2]=5928. That's a >> nifty change. >> >> All in all, it looks like you've done a lot of good work! >> >> >> >> On Tue, Aug 5, 2008 at 3:16 PM, Aleksey Shipilev < >> aleksey.shipilev@gmail.com> wrote: >> >>> Sian, Andrew, >>> >>> An update here. I had updated the profiler [1] and run it over on JDT >>> unpacking scenario several times. That's what we have today (times are >>> msecs, indentation resembles call hierarchy): >>> >>> Unpack: 220529 >>> >>> segment read: 28853 >>> cpBands: 13791 >>> adBands: 214 >>> icBands: 343 >>> cbBands: 9158 >>> bcBands: 4694 >>> fbBands: 118 >>> fBits: 407 >>> >>> segment parse: 176929 >>> header: 0 >>> cpBands: 0 >>> adBands: 0 >>> icBands: 1 >>> cbBands: 0 >>> >>> bcBands: 77245 >>> exceptn: 656 >>> newCA: 71145 >>> getBC: 16372 >>> extOpnd: 26776 >>> fixup: 1885 >>> methAttr: 591 >>> curAttr: 1808 >>> >>> fbBands: 0 >>> >>> buildCF: 82057 >>> ccp.addN: 36182 >>> ccp.addNW: 433 >>> ccp.resv: 27452 >>> ic.getIC: 11717 >>> >>> cfWrite: 17379 >>> >>> segment write: 14349 >>> >>> >>> As you can see in commits, I had filed a couple of JIRAs with the >>> bunch of pack200 optimizations [2,3,4,5], here what I got with all >>> them applied: >>> >>> Unpack: 193165 (-14% in total) >>> >>> segment read: 28334 >>> cpBands: 13034 >>> adBands: 249 >>> icBands: 341 >>> cbBands: 9299 >>> bcBands: 4645 >>> fbBands: 169 >>> fBits: 459 >>> >>> segment parse: 150032 >>> header: 1 >>> cpBands: 0 >>> adBands: 0 >>> icBands: 82 >>> cbBands: 0 >>> >>> bcBands: 73936 >>> exceptn: 633 >>> newCA: 67871 >>> getBC: 13874 <--- (-18% due to [2]) >>> extOpnd: 26583 >>> fixup: 1808 >>> methAttr: 615 >>> curAttr: 1663 >>> >>> fbBands: 0 >>> >>> buildCF: 58319 >>> ccp.addN: 26199 <---- (-38% due to [5]) >>> ccp.addNW: 424 >>> ccp.resv: 23642 <---- (-16% due to [5]) >>> ic.getIC: 2245 <--- (-80% due to [3,4]) >>> >>> cfWrite: 17463 >>> >>> segment write: 14413 >>> >>> >>> Of course, the boosts are diminished with the performance overheads of >>> profiling. But still, this profile gives pretty good insight on what's >>> going on. CodeAttribute ["newCA" is the "new CodeAttribute(...)"] is >>> the next candidate for optimization, I guess. >>> >>> Sian, Andrew, can you please review the patches? I'm particularly >>> interested in [5], because it's proof-of-concept and kind of >>> controversial. >>> >>> Thanks, >>> Aleksey. >>> >>> [1] "classlib][pack200] Internal profiler for pack200" >>> https://issues.apache.org/jira/browse/HARMONY-5905 >>> >>> [2] [classlib][pack200][performance] java.util.HashMap usage optimization >>> https://issues.apache.org/jira/browse/HARMONY-5928 >>> >>> [3] [classlib][pack200][performance] Segment.computeIcStored rewrite >>> https://issues.apache.org/jira/browse/HARMONY-5929 >>> >>> [4] [classlib][pack200][performance] IcBands.getRelevantIcTuples rewrite >>> https://issues.apache.org/jira/browse/HARMONY-5930 >>> >>> [5] [classlib][pack200][performance] Some ClassConstantPool content >>> may not be needed >>> https://issues.apache.org/jira/browse/HARMONY-5931 >>> >>> >>> On Thu, Jul 10, 2008 at 7:59 PM, Aleksey Shipilev >>> wrote: >>> > I had quickly drafted the internal Java profiler for pack200 at [1]. >>> > Here are the results of profiling for 50Mb Eclipse JDT jar, times are >>> > microsecs, identation emulates the call tree. Some of the label are >>> > not distinguishable, but you may look up probe positions in the patch. >>> > >>> > Unpack: 38311 >>> > segment unpack: 38217 >>> > parse segment: 11575 >>> > parse header: 0 >>> > parse ADB: 78 >>> > parse bcbands: 5342 >>> > parse1: 453 >>> > parse2: 93 >>> > select: 252 >>> > attrlayout: 0 >>> > methods: 3997 >>> > parse cbands: 3358 >>> > classattr: 1002 >>> > code: 1636 >>> > fields: 173 >>> > methods: 515 >>> > parse cpbands: 2483 >>> > parse fbands: 63 >>> > parse icbands: 16 >>> > write jar: 26642 >>> > build classf: 21111 >>> > sfattrs: 47 >>> > cfattrs: 0 >>> > fields: 218 >>> > interfaces: 0 >>> > methods: 362 >>> > addNested: 8146 >>> > inner: 3051 >>> > final: 8774 >>> > write classf: 1934 >>> > constpool: 1015 >>> > interfaces: 0 >>> > attributes: 31 >>> > methods: 827 >>> > fields: 31 >>> > write primit: 486 >>> > >>> > That's the point where one can take the method and optimize it locally >>> :) >>> > >>> > Thanks, >>> > Aleksey. >>> > >>> > [1] https://issues.apache.org/jira/browse/HARMONY-5905 >>> > >>> > On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev >>> > wrote: >>> >> I had disabled the compression in my test to throw away ZIP overhead >>> >> and focus on pack200 performance only. Thus the performance data is >>> >> not relevant to previous measurements. The data are assumed with >>> >> HARMONY-5900 incorporated. >>> >> >>> >> Harmony's pack200: 43 secs (3.5 Mb/secs) >>> >> Sun's pack200: 9 secs (16.6 Mb/secs) >>> >> >>> >> Profile: >>> >> >>> >> 22.0% java.util.HashMap.* >>> >> 11.4% java.io.FileInputStream.readBytes() >>> >> 7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested() >>> >> 6.9% java.util.zip.* >>> >> 5.6% o.a.h.pack200.BHSDCodec.decode() >>> >> 4.8% java.lang.* >>> >> 4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples() >>> >> 3.9% >>> o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands() >>> >> 3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other) >>> >> 3.0% o.a.h.unpack200.bytecode.CodeAttribute.* >>> >> 2.8% java.io.FileOutputStream.writeBytes() >>> >> 2.8% o.a.h.unpack200.bytecode.ByteCode.* >>> >> 2.75% java.util.TreeMap.* >>> >> >>> >> Note ArrayList is gone! >>> >> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples() and >>> >> NoArgumentForm.setByteCodeOperands() are next candidates for tuning. >>> >> After that, the performance improvement is not possible without deep >>> >> changes, like overall algorithmic improvements. Anyway, that should be >>> >> first, but I'm not familiar with the code yet. This can't stop us >>> >> though ;) >>> >> >>> >> Thanks, >>> >> Aleksey. >>> >> >>> >> On Wed, Jul 9, 2008 at 7:27 PM, Sian January < >>> sianjanuary@googlemail.com> wrote: >>> >>> Thanks for doing that Aleksey. In fact I think Sun's was 20 or 30 >>> times >>> >>> faster before we started doing any performance optimizations, but it >>> looks >>> >>> like there's still some ground that we could make up! >>> >>> >>> >>> >>> >>> >>> >>> On 08/07/2008, Aleksey Shipilev wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> I took the liberty of profiling of pack200 implementation on >>> unpacking >>> >>>> scenario. Source data was obtained from Eclipse JDT jars, repacked >>> in >>> >>>> single 60 Mb jar file, then packed with pack200 from Sun's JDK (-E9 >>> >>>> used), resulting in 20 Mb pack200-compressed file. Then Sun JDK >>> >>>> 1.6.0_05 (Windows, -server) was used together with hprof (cpu=time) >>> to >>> >>>> obtain the profile. My patch from HARMONY-5900 is onboard. The head >>> of >>> >>>> the profile looks like this: >>> >>>> >>> >>>> 4.76% >>> org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested >>> >>>> 4.22% java.util.HashMap.getEntry >>> >>>> 2.99% java.util.AbstractList$Itr.next >>> >>>> 2.92% java.util.AbstractList$Itr.hasNext >>> >>>> 2.84% java.util.ArrayList.get >>> >>>> 2.43% java.util.AbstractList$Itr.next >>> >>>> 2.41% java.util.HashMap.containsKey >>> >>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples >>> >>>> 2.00% java.util.HashSet.contains >>> >>>> 1.57% java.io.DataOutputStream.writeUTF >>> >>>> >>> >>>> Composite occupancy: >>> >>>> >>> >>>> 18.4% java.util.AbstractList >>> >>>> 18.0% java.util.HashMap >>> >>>> 15.8% java.util.ArrayList >>> >>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.* >>> >>>> 5.3% o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly) >>> >>>> 4.5% java.io.* >>> >>>> 4.5% java.lang.String.* >>> >>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.* >>> >>>> 3.9% o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.* >>> >>>> 14.7% other >>> >>>> >>> >>>> So the main concern is Collections usage. ClassConstantPool uses >>> Lists >>> >>>> excessively, so I suspect the significant amount of time is spent >>> >>>> there. >>> >>>> >>> >>>> NB: >>> >>>> Timings for the scenario (the less the better): >>> >>>> Harmony's pack200: 67 secs >>> >>>> Sun's pack200: 6 secs >>> >>>> >>> >>>> Yep, 10 times faster. >>> >>>> >>> >>>> Thanks, >>> >>>> Aleksey. >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Unless stated otherwise above: >>> >>> IBM United Kingdom Limited - Registered in England and Wales with >>> number >>> >>> 741598. >>> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire >>> PO6 3AU >>> >>> >>> >> >>> > >>> >> >> > ------=_Part_38683_20405622.1217981066811--