Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 46001 invoked from network); 14 Jul 2008 18:25:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Jul 2008 18:25:42 -0000 Received: (qmail 57698 invoked by uid 500); 14 Jul 2008 18:25:41 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 57662 invoked by uid 500); 14 Jul 2008 18:25:41 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 57651 invoked by uid 99); 14 Jul 2008 18:25:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2008 11:25:41 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aleksey.shipilev@gmail.com designates 209.85.198.249 as permitted sender) Received: from [209.85.198.249] (HELO rv-out-0708.google.com) (209.85.198.249) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2008 18:24:49 +0000 Received: by rv-out-0708.google.com with SMTP id k29so4766933rvb.0 for ; Mon, 14 Jul 2008 11:25:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=hcJeEE+mh7/MXq0U5iK8Hp5p1FfG8RfaPbfPgMgIy6o=; b=Pgs0flv0K5fSecMoQSq+UDOG+4H+xBYzZVQkPMIFHlCXmYrArwvJLSBcd4MqmHbG3W XJfbLa4VMKYE2Ay/HF78iPZodNqSCavlqsC4U89fdYG6OHOe7NY5uYYXhs2BHGZnK3Yi alrOoJzVO+aZdnBkO9w9DKysI3h7Q+3jECJtM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=nbBtY6D3jlrvtamEUOme7Prem5yaIckhnsA0Arz8mHtQtApeh8zYD2GJ0bzDtSB0Yi mJt9Ce9lNWLkSDn5CePQyHNn4upLCfBDG9nABvqj8irFSH9rQSY2LVX3N7XFawj7iaaW CBP24dCsb3jB/LwU5N5k2Kbb2529AActn3Ayc= Received: by 10.140.249.20 with SMTP id w20mr6820645rvh.21.1216059913388; Mon, 14 Jul 2008 11:25:13 -0700 (PDT) Received: by 10.141.167.15 with HTTP; Mon, 14 Jul 2008 11:25:13 -0700 (PDT) Message-ID: <4bebff790807141125t32d2b558ydc74e538dcb6325c@mail.gmail.com> Date: Mon, 14 Jul 2008 22:25:13 +0400 From: "Aleksey Shipilev" To: dev@harmony.apache.org Subject: Re: [classlib][pack200][performance] HashCode optimizations (was: Re: [jira] Updated: (HARMONY-5907) [classlib][pack200]CPUTF8.hashCode() is slow) In-Reply-To: <6aa6e3690807141051w3805929ataaaf1c2e5110e542@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4bebff790807120250y45efe696je7c2faf7f99d041@mail.gmail.com> <4bebff790807120258m2fd68802s384c258bf86874b7@mail.gmail.com> <4bebff790807120353m63db6992h2af9105f2f4005c8@mail.gmail.com> <4bebff790807140653i51fdd90enb1240c297a11e92f@mail.gmail.com> <6aa6e3690807141044g3a990723qde0db5443e094810@mail.gmail.com> <6aa6e3690807141051w3805929ataaaf1c2e5110e542@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I think that's bad for performance. Using Object.hashCode() leads to System.identityHashCode(), which is handled by VM. The exact mechanism is VM-dependent, but at least on Harmony it's pretty slow. If you want to mark each instance as not-identical to another (warning here, you may break something), then I suggest to use static increment, e.g.: private static int hashcodeBase = 1; private int cachedHashCode = hashcodeBase++; This may end up with cache collisions if objects are created from different threads, but that's not the case for now. Anyway, Andrew, your code lacks caching again ;) Thanks, Aleksey. On Mon, Jul 14, 2008 at 9:51 PM, Andrew Cornwall wrote: > And in fact, I just tried the following (which makes even more sense): > > - add objectHashCode() to ClassFileEntry: > protected int objectHashCode() { > return super.hashCode(); > } > > - change generateHashCode in ByteCode to return objectHashCode() > private int generateHashCode() { > hashcodeComputed = true; > return objectHashCode(); > } > > Since ByteCodes are equal if and only if they are identical, this seems to > be the right thing to do. What do you think? > > > On Mon, Jul 14, 2008 at 10:44 AM, Andrew Cornwall > wrote: > >> I applied Aleksey's changes, and they look pretty good. I disagree with >> Sian to some degree about ByteCode. On my VM (which isn't Harmony), a new >> empty array's hashCode() is dependent on the array's location in memory, and >> not the array's contents. In other words: >> >> int[] x = new int[3]; >> System.out.println(x.hashCode()); >> x[1] = 5; >> System.out.println(x.hashCode()); >> >> prints the same value for in both cases. rewrite.hashCode() is a handy (if >> lazy) way to distinguish among different instances. >> >> If we take rewrite out of hashCode(), HashMap and HashSet get really slow - >> essentially a linear search among all the ByteCodes of the same form. This >> brings my test case from 39 seconds up to 1:02. >> >> Perhaps the right thing to do is to give each unique instance of ByteCode >> an integer ID which is used in creating the hashCode rather than relying on >> rewrite to give us uniqueness? >> >> >> On Mon, Jul 14, 2008 at 7:19 AM, Sian January >> wrote: >> >>> Ok - we'll wait and see what Andrew says. The only one that I'm not happy >>> with is Bytecode.hashCode, because rewrite always seems to be an empty >>> array >>> at the point when generateHashCode is called so it's a bit misleading >>> using >>> it. I think it should be ok to just remove that line, and still cache the >>> hashCode. >>> >>> >>> On 14/07/2008, Aleksey Shipilev wrote: >>> > >>> > Sian, >>> > >>> > Actually I had tried to extend Andrew's approach to these classes >>> > first, but somehow I caught the degradation, that leaved me no choice >>> > except the lazy initialization. My concern is, the >>> > constructor-initialized hashcode is really depend on usage pattern for >>> > each specific class, while lazy initialization has more guarantees to >>> > be performance-stable. Moreover, I suspect the lazy initialization can >>> > degrade performance much less because the only overhead it causes is >>> > checking the value of boolean field. On the other hand, the >>> > constructor initialization may degrade performance a lot since the >>> > generation of hashCode is expensive. >>> > >>> > I can recheck which classes favor lazy initialization and which are >>> > not, but I think it's not valuable in terms of efficiency. I mean here >>> > that the boost connected with changing lazy initialization to >>> > constructor one is much lower than boost from caching hashcode anyway. >>> > >>> > Can we accept the patch in this form and revisit this difference later? >>> > It would be better to focus on more profitable areas for improvements >>> for >>> > now. >>> > >>> > P.S. I had asked Andrew to recheck whether my patch works as fast as >>> > his, also to check lazy initialization approach. On my tests the boost >>> > is stable and good. >>> > >>> > Thanks, >>> > Aleksey. >>> > >>> > On Mon, Jul 14, 2008 at 5:25 PM, Sian January >>> > wrote: >>> > > Hi Aleksey, >>> > > >>> > > It's a really good idea to extend this patch to cover some more >>> classes, >>> > but >>> > > I think Andrew's method is faster (a final cachedHashCode field that >>> is >>> > > initialized in the constructor). The only reason I see to do it later >>> > would >>> > > be if we thought some of these objects never had hashCode called on >>> them, >>> > > but I don't think that's the case. Would you be able to try that >>> method >>> > > instead in your patch and see if I'm right about it being faster? >>> > > >>> > > Thanks, >>> > > >>> > > Sian >>> > > >>> > > >>> > > >>> > > On 12/07/2008, Aleksey Shipilev wrote: >>> > >> >>> > >> Andrew, >>> > >> >>> > >> I had attached the patch to HARMONY-5907, covering several first >>> > >> methods. Can you confirm this patch helps for your scenario? >>> > >> >>> > >> Thanks, >>> > >> Aleksey. >>> > >> >>> > >> On Sat, Jul 12, 2008 at 1:58 PM, Aleksey Shipilev >>> > >> wrote: >>> > >> > And the sorted list: >>> > >> > >>> > >> > 95462388 bc.cputf8 >>> > >> > 18646908 bc.bytecode >>> > >> > 15118425 bc.cpclass >>> > >> > 14928914 bc.cpnametype >>> > >> > 12103799 bc.cpmethref >>> > >> > 5159994 bc.cpfieldref >>> > >> > 3420605 bc.methref >>> > >> > 1840965 bc.cpstring >>> > >> > 839916 bc.codeattr >>> > >> > 839916 bc.locvarattr >>> > >> > 839916 bc.linenumattr >>> > >> > 430234 bc.cpmethod >>> > >> > 277144 bc.cpfield >>> > >> > 263753 bc.attr >>> > >> > 153811 bc.cpinteger >>> > >> > 121856 bc.newattr >>> > >> > 93471 bc.cvalattr >>> > >> > 72492 bc.excpattr >>> > >> > 57428 bc.srcfileattr >>> > >> > 57428 bc.srcfileattr >>> > >> > 48104 bc.cplong >>> > >> > 40362 bc.innerclass >>> > >> > 5593 bc.depattr >>> > >> > 3255 bc.cpfloat >>> > >> > 1638 bc.cpdouble >>> > >> > 532 attrlayout >>> > >> > 0 archive >>> > >> > 0 attrdef >>> > >> > 0 newattrband >>> > >> > 0 bc.anndefarg >>> > >> > 0 bc.rtannattr >>> > >> > 0 classbands >>> > >> > 0 filebands >>> > >> > 0 metabandgr >>> > >> > 0 segheader >>> > >> > 0 bc.remattr >>> > >> > 0 bc.annattr >>> > >> > 0 bc.cpconst >>> > >> > 0 bc.cpmember >>> > >> > 0 bc.signattr >>> > >> > 0 bandset >>> > >> > 0 bcbands >>> > >> > 0 cpbands >>> > >> > 0 icbands >>> > >> > 0 ictuple >>> > >> > 0 segment >>> > >> > 0 segopts >>> > >> > 0 bc.classf >>> > >> > 0 bc.cpref >>> > >> > 0 bc.opmgr >>> > >> > 0 bc.rtattr >>> > >> > 0 segcp >>> > >> > 0 bc.ccp >>> > >> > 0 attrlayoutmap >>> > >> > 0 bc.encmethattr >>> > >> > 0 bc.exptableent >>> > >> > 0 bc.locvartable >>> > >> > 0 bc.signattr >>> > >> > >>> > >> > Thanks, >>> > >> > Aleksey. >>> > >> > >>> > >> > On Sat, Jul 12, 2008 at 1:50 PM, Aleksey Shipilev >>> > >> > wrote: >>> > >> >> Hi, Andrew! >>> > >> >> >>> > >> >> I had updated the internal profiler to support hashCode() probes >>> [1], >>> > >> >> to extend your effort in hashcode optimization. There are bunch of >>> > >> >> heavily used hashcodes, most of them are going to >>> Object.hashCode() >>> > >> >> and then to System.identityHashCode(). We can cache/implement >>> > hashcode >>> > >> >> for these classes. Here's the profile: >>> > >> >> >>> > >> >> Hashcodes: >>> > >> >> archive: 0 >>> > >> >> attrdef: 0 >>> > >> >> attrlayout: 532 >>> > >> >> attrlayoutmap: 0 >>> > >> >> bandset: 0 >>> > >> >> bcbands: 0 >>> > >> >> classbands: 0 >>> > >> >> cpbands: 0 >>> > >> >> filebands: 0 >>> > >> >> icbands: 0 >>> > >> >> ictuple: 0 >>> > >> >> metabandgr: 0 >>> > >> >> newattrband: 0 >>> > >> >> segcp: 0 >>> > >> >> segheader: 0 >>> > >> >> segment: 0 >>> > >> >> segopts: 0 >>> > >> >> bc.attr: 263753 >>> > >> >> bc.remattr: 0 >>> > >> >> bc.anndefarg: 0 >>> > >> >> bc.annattr: 0 >>> > >> >> bc.bytecode: 18646908 >>> > >> >> bc.ccp: 0 >>> > >> >> bc.classf: 0 >>> > >> >> bc.codeattr: 839916 >>> > >> >> bc.cvalattr: 93471 >>> > >> >> bc.cpclass: 15118425 >>> > >> >> bc.cpconst: 0 >>> > >> >> bc.cpdouble: 1638 >>> > >> >> bc.cpfield: 277144 >>> > >> >> bc.cpfieldref: 5159994 >>> > >> >> bc.cpfloat: 3255 >>> > >> >> bc.cpinteger: 153811 >>> > >> >> bc.methref: 3420605 >>> > >> >> bc.cplong: 48104 >>> > >> >> bc.cpmember: 0 >>> > >> >> bc.cpmethod: 430234 >>> > >> >> bc.cpmethref: 12103799 >>> > >> >> bc.cpnametype: 14928914 >>> > >> >> bc.cpref: 0 >>> > >> >> bc.cpstring: 1840965 >>> > >> >> bc.cputf8: 95462388 >>> > >> >> bc.depattr: 5593 >>> > >> >> bc.encmethattr: 0 >>> > >> >> bc.excpattr: 72492 >>> > >> >> bc.exptableent: 0 >>> > >> >> bc.innerclass: 40362 >>> > >> >> bc.linenumattr: 839916 >>> > >> >> bc.locvarattr: 839916 >>> > >> >> bc.locvartable: 0 >>> > >> >> bc.newattr: 121856 >>> > >> >> bc.opmgr: 0 >>> > >> >> bc.rtattr: 0 >>> > >> >> bc.rtannattr: 0 >>> > >> >> bc.signattr: 0 >>> > >> >> bc.srcfileattr: 57428 >>> > >> >> >>> > >> >> Would you like to produce the patch? >>> > >> >> I think it would be funny :) >>> > >> >> >>> > >> >> Thanks, >>> > >> >> Aleksey. >>> > >> >> >>> > >> >> [1] https://issues.apache.org/jira/browse/HARMONY-5905 >>> > >> >> >>> > >> >> On Sat, Jul 12, 2008 at 12:48 AM, Andrew Cornwall (JIRA) >>> > >> >> wrote: >>> > >> >>> >>> > >> >>> [ >>> > >> >>> > >>> https://issues.apache.org/jira/browse/HARMONY-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >>> > ] >>> > >> >>> >>> > >> >>> Andrew Cornwall updated HARMONY-5907: >>> > >> >>> ------------------------------------- >>> > >> >>> >>> > >> >>> Attachment: main.patch >>> > >> >>> >>> > >> >>> main.patch includes change to CPUTF8.java >>> > >> >>> >>> > >> >>> >>> > >> >>>> [classlib][pack200]CPUTF8.hashCode() is slow >>> > >> >>>> -------------------------------------------- >>> > >> >>>> >>> > >> >>>> Key: HARMONY-5907 >>> > >> >>>> URL: >>> > >> https://issues.apache.org/jira/browse/HARMONY-5907 >>> > >> >>>> Project: Harmony >>> > >> >>>> Issue Type: Improvement >>> > >> >>>> Affects Versions: 5.0M6 >>> > >> >>>> Environment: Latest pack200 >>> > >> >>>> Reporter: Andrew Cornwall >>> > >> >>>> Attachments: main.patch >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> The unpack process spends a lot of time doing CPUTF8.hashCode() >>> - >>> > >> which does String.hashCode(). We can save about 1.5 seconds of my 39 >>> > second >>> > >> test case (about 4%) by caching the hashCode. (I thought we did this >>> > before >>> > >> - or maybe I dreamt it?) >>> > >> >>> >>> > >> >>> -- >>> > >> >>> This message is automatically generated by JIRA. >>> > >> >>> - >>> > >> >>> You can reply to this email to add a comment to the issue online. >>> > >> >>> >>> > >> >>> >>> > >> >> >>> > >> > >>> > >> >>> > > >>> > > >>> > > >>> > > -- >>> > > Unless stated otherwise above: >>> > > IBM United Kingdom Limited - Registered in England and Wales with >>> number >>> > > 741598. >>> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 >>> > 3AU >>> > > >>> > >>> >>> >>> >>> -- >>> Unless stated otherwise above: >>> IBM United Kingdom Limited - Registered in England and Wales with number >>> 741598. >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>> >> >> >