Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 44647 invoked from network); 14 Jul 2008 20:37:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Jul 2008 20:37:40 -0000 Received: (qmail 65569 invoked by uid 500); 14 Jul 2008 20:37:37 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 65546 invoked by uid 500); 14 Jul 2008 20:37:37 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 65535 invoked by uid 99); 14 Jul 2008 20:37:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2008 13:37:37 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrew.pack200@gmail.com designates 216.239.58.186 as permitted sender) Received: from [216.239.58.186] (HELO gv-out-0910.google.com) (216.239.58.186) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2008 20:36:45 +0000 Received: by gv-out-0910.google.com with SMTP id c6so933170gvd.22 for ; Mon, 14 Jul 2008 13:37:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=Dbw41O2qIt4Ato1gTbV/ho5h6Vv3EL58WkGaxeDnD1o=; b=N1IIu4oVqt3K9eea7AkTrdTj4abO5xINK5ozHxONaDrYe7jrFVqQgKYstLtPQ94TH6 Iyw4b3EeFsjgz8FBM4aa8XFdgb6qLv2Kf2rN/ZP6Uq5uCRMQlpnUTnblQufGB/J0+lGk uW45wNLeQXDIB5ZDw55R5Uh+Lv85RE4kn7JWs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=UB+YufBv7Uz9dx8tVSNbUSHkfmF5Xq6twO9s7Ahixi4SEfaDaVfwXvuLhOZXo4cLiN Ib1cbXVDSgHpjAEEdexXnS3MfzUhULNpVpqN+TTjTaeirn8ePRZeWV5U/xjR7QzQoBtc KvZhISxIs+5fvqFCfY/+s4ZrJxgo2mtC6SyOA= Received: by 10.103.250.11 with SMTP id c11mr8236655mus.23.1216067821500; Mon, 14 Jul 2008 13:37:01 -0700 (PDT) Received: by 10.103.173.10 with HTTP; Mon, 14 Jul 2008 13:37:01 -0700 (PDT) Message-ID: <6aa6e3690807141337g6e917dd3qd0fb222f3ce1bbaa@mail.gmail.com> Date: Mon, 14 Jul 2008 13:37:01 -0700 From: "Andrew Cornwall" To: dev@harmony.apache.org Subject: Re: [classlib][pack200][performance] HashCode optimizations (was: Re: [jira] Updated: (HARMONY-5907) [classlib][pack200]CPUTF8.hashCode() is slow) In-Reply-To: <4bebff790807141144o78f57c5er70fcf3b50088c843@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_34768_7742566.1216067821476" References: <4bebff790807120250y45efe696je7c2faf7f99d041@mail.gmail.com> <4bebff790807120353m63db6992h2af9105f2f4005c8@mail.gmail.com> <4bebff790807140653i51fdd90enb1240c297a11e92f@mail.gmail.com> <6aa6e3690807141044g3a990723qde0db5443e094810@mail.gmail.com> <6aa6e3690807141051w3805929ataaaf1c2e5110e542@mail.gmail.com> <4bebff790807141125t32d2b558ydc74e538dcb6325c@mail.gmail.com> <6aa6e3690807141138qff2a570nc0b111240d86440c@mail.gmail.com> <4bebff790807141144o78f57c5er70fcf3b50088c843@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_34768_7742566.1216067821476 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I think Object.hashCode() will be better distributed than a static increment - wouldn't that have better performance in Hash* objects? On Mon, Jul 14, 2008 at 11:44 AM, Aleksey Shipilev < aleksey.shipilev@gmail.com> wrote: > Nevertheless, Object.hashCode() performance is unclear and as such it > should be specifically avoided if not required explicitly. Will static > increment suit better? > > Thanks, > Aleksey. > > On Mon, Jul 14, 2008 at 10:38 PM, Andrew Cornwall > wrote: > > Sorry - I didn't explain fully. I intended that my code would cache the > > Object.hashCode() rather than recomputing the hashCode each time. > > > > On Mon, Jul 14, 2008 at 11:25 AM, Aleksey Shipilev < > > aleksey.shipilev@gmail.com> wrote: > > > >> I think that's bad for performance. Using Object.hashCode() leads to > >> System.identityHashCode(), which is handled by VM. The exact mechanism > >> is VM-dependent, but at least on Harmony it's pretty slow. If you want > >> to mark each instance as not-identical to another (warning here, you > >> may break something), then I suggest to use static increment, e.g.: > >> > >> private static int hashcodeBase = 1; > >> private int cachedHashCode = hashcodeBase++; > >> > >> This may end up with cache collisions if objects are created from > >> different threads, but that's not the case for now. > >> > >> Anyway, Andrew, your code lacks caching again ;) > >> > >> Thanks, > >> Aleksey. > >> > >> On Mon, Jul 14, 2008 at 9:51 PM, Andrew Cornwall > >> wrote: > >> > And in fact, I just tried the following (which makes even more sense): > >> > > >> > - add objectHashCode() to ClassFileEntry: > >> > protected int objectHashCode() { > >> > return super.hashCode(); > >> > } > >> > > >> > - change generateHashCode in ByteCode to return objectHashCode() > >> > private int generateHashCode() { > >> > hashcodeComputed = true; > >> > return objectHashCode(); > >> > } > >> > > >> > Since ByteCodes are equal if and only if they are identical, this > seems > >> to > >> > be the right thing to do. What do you think? > >> > > >> > > >> > On Mon, Jul 14, 2008 at 10:44 AM, Andrew Cornwall < > >> andrew.pack200@gmail.com> > >> > wrote: > >> > > >> >> I applied Aleksey's changes, and they look pretty good. I disagree > with > >> >> Sian to some degree about ByteCode. On my VM (which isn't Harmony), a > >> new > >> >> empty array's hashCode() is dependent on the array's location in > memory, > >> and > >> >> not the array's contents. In other words: > >> >> > >> >> int[] x = new int[3]; > >> >> System.out.println(x.hashCode()); > >> >> x[1] = 5; > >> >> System.out.println(x.hashCode()); > >> >> > >> >> prints the same value for in both cases. rewrite.hashCode() is a > handy > >> (if > >> >> lazy) way to distinguish among different instances. > >> >> > >> >> If we take rewrite out of hashCode(), HashMap and HashSet get really > >> slow - > >> >> essentially a linear search among all the ByteCodes of the same form. > >> This > >> >> brings my test case from 39 seconds up to 1:02. > >> >> > >> >> Perhaps the right thing to do is to give each unique instance of > >> ByteCode > >> >> an integer ID which is used in creating the hashCode rather than > relying > >> on > >> >> rewrite to give us uniqueness? > >> >> > >> >> > >> >> On Mon, Jul 14, 2008 at 7:19 AM, Sian January < > >> sianjanuary@googlemail.com> > >> >> wrote: > >> >> > >> >>> Ok - we'll wait and see what Andrew says. The only one that I'm not > >> happy > >> >>> with is Bytecode.hashCode, because rewrite always seems to be an > empty > >> >>> array > >> >>> at the point when generateHashCode is called so it's a bit > misleading > >> >>> using > >> >>> it. I think it should be ok to just remove that line, and still > cache > >> the > >> >>> hashCode. > >> >>> > >> >>> > >> >>> On 14/07/2008, Aleksey Shipilev wrote: > >> >>> > > >> >>> > Sian, > >> >>> > > >> >>> > Actually I had tried to extend Andrew's approach to these classes > >> >>> > first, but somehow I caught the degradation, that leaved me no > choice > >> >>> > except the lazy initialization. My concern is, the > >> >>> > constructor-initialized hashcode is really depend on usage pattern > >> for > >> >>> > each specific class, while lazy initialization has more guarantees > to > >> >>> > be performance-stable. Moreover, I suspect the lazy initialization > >> can > >> >>> > degrade performance much less because the only overhead it causes > is > >> >>> > checking the value of boolean field. On the other hand, the > >> >>> > constructor initialization may degrade performance a lot since the > >> >>> > generation of hashCode is expensive. > >> >>> > > >> >>> > I can recheck which classes favor lazy initialization and which > are > >> >>> > not, but I think it's not valuable in terms of efficiency. I mean > >> here > >> >>> > that the boost connected with changing lazy initialization to > >> >>> > constructor one is much lower than boost from caching hashcode > >> anyway. > >> >>> > > >> >>> > Can we accept the patch in this form and revisit this difference > >> later? > >> >>> > It would be better to focus on more profitable areas for > improvements > >> >>> for > >> >>> > now. > >> >>> > > >> >>> > P.S. I had asked Andrew to recheck whether my patch works as fast > as > >> >>> > his, also to check lazy initialization approach. On my tests the > >> boost > >> >>> > is stable and good. > >> >>> > > >> >>> > Thanks, > >> >>> > Aleksey. > >> >>> > > >> >>> > On Mon, Jul 14, 2008 at 5:25 PM, Sian January > >> >>> > wrote: > >> >>> > > Hi Aleksey, > >> >>> > > > >> >>> > > It's a really good idea to extend this patch to cover some more > >> >>> classes, > >> >>> > but > >> >>> > > I think Andrew's method is faster (a final cachedHashCode field > >> that > >> >>> is > >> >>> > > initialized in the constructor). The only reason I see to do it > >> later > >> >>> > would > >> >>> > > be if we thought some of these objects never had hashCode called > on > >> >>> them, > >> >>> > > but I don't think that's the case. Would you be able to try > that > >> >>> method > >> >>> > > instead in your patch and see if I'm right about it being > faster? > >> >>> > > > >> >>> > > Thanks, > >> >>> > > > >> >>> > > Sian > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > On 12/07/2008, Aleksey Shipilev > >> wrote: > >> >>> > >> > >> >>> > >> Andrew, > >> >>> > >> > >> >>> > >> I had attached the patch to HARMONY-5907, covering several > first > >> >>> > >> methods. Can you confirm this patch helps for your scenario? > >> >>> > >> > >> >>> > >> Thanks, > >> >>> > >> Aleksey. > >> >>> > >> > >> >>> > >> On Sat, Jul 12, 2008 at 1:58 PM, Aleksey Shipilev > >> >>> > >> wrote: > >> >>> > >> > And the sorted list: > >> >>> > >> > > >> >>> > >> > 95462388 bc.cputf8 > >> >>> > >> > 18646908 bc.bytecode > >> >>> > >> > 15118425 bc.cpclass > >> >>> > >> > 14928914 bc.cpnametype > >> >>> > >> > 12103799 bc.cpmethref > >> >>> > >> > 5159994 bc.cpfieldref > >> >>> > >> > 3420605 bc.methref > >> >>> > >> > 1840965 bc.cpstring > >> >>> > >> > 839916 bc.codeattr > >> >>> > >> > 839916 bc.locvarattr > >> >>> > >> > 839916 bc.linenumattr > >> >>> > >> > 430234 bc.cpmethod > >> >>> > >> > 277144 bc.cpfield > >> >>> > >> > 263753 bc.attr > >> >>> > >> > 153811 bc.cpinteger > >> >>> > >> > 121856 bc.newattr > >> >>> > >> > 93471 bc.cvalattr > >> >>> > >> > 72492 bc.excpattr > >> >>> > >> > 57428 bc.srcfileattr > >> >>> > >> > 57428 bc.srcfileattr > >> >>> > >> > 48104 bc.cplong > >> >>> > >> > 40362 bc.innerclass > >> >>> > >> > 5593 bc.depattr > >> >>> > >> > 3255 bc.cpfloat > >> >>> > >> > 1638 bc.cpdouble > >> >>> > >> > 532 attrlayout > >> >>> > >> > 0 archive > >> >>> > >> > 0 attrdef > >> >>> > >> > 0 newattrband > >> >>> > >> > 0 bc.anndefarg > >> >>> > >> > 0 bc.rtannattr > >> >>> > >> > 0 classbands > >> >>> > >> > 0 filebands > >> >>> > >> > 0 metabandgr > >> >>> > >> > 0 segheader > >> >>> > >> > 0 bc.remattr > >> >>> > >> > 0 bc.annattr > >> >>> > >> > 0 bc.cpconst > >> >>> > >> > 0 bc.cpmember > >> >>> > >> > 0 bc.signattr > >> >>> > >> > 0 bandset > >> >>> > >> > 0 bcbands > >> >>> > >> > 0 cpbands > >> >>> > >> > 0 icbands > >> >>> > >> > 0 ictuple > >> >>> > >> > 0 segment > >> >>> > >> > 0 segopts > >> >>> > >> > 0 bc.classf > >> >>> > >> > 0 bc.cpref > >> >>> > >> > 0 bc.opmgr > >> >>> > >> > 0 bc.rtattr > >> >>> > >> > 0 segcp > >> >>> > >> > 0 bc.ccp > >> >>> > >> > 0 attrlayoutmap > >> >>> > >> > 0 bc.encmethattr > >> >>> > >> > 0 bc.exptableent > >> >>> > >> > 0 bc.locvartable > >> >>> > >> > 0 bc.signattr > >> >>> > >> > > >> >>> > >> > Thanks, > >> >>> > >> > Aleksey. > >> >>> > >> > > >> >>> > >> > On Sat, Jul 12, 2008 at 1:50 PM, Aleksey Shipilev > >> >>> > >> > wrote: > >> >>> > >> >> Hi, Andrew! > >> >>> > >> >> > >> >>> > >> >> I had updated the internal profiler to support hashCode() > >> probes > >> >>> [1], > >> >>> > >> >> to extend your effort in hashcode optimization. There are > bunch > >> of > >> >>> > >> >> heavily used hashcodes, most of them are going to > >> >>> Object.hashCode() > >> >>> > >> >> and then to System.identityHashCode(). We can > cache/implement > >> >>> > hashcode > >> >>> > >> >> for these classes. Here's the profile: > >> >>> > >> >> > >> >>> > >> >> Hashcodes: > >> >>> > >> >> archive: 0 > >> >>> > >> >> attrdef: 0 > >> >>> > >> >> attrlayout: 532 > >> >>> > >> >> attrlayoutmap: 0 > >> >>> > >> >> bandset: 0 > >> >>> > >> >> bcbands: 0 > >> >>> > >> >> classbands: 0 > >> >>> > >> >> cpbands: 0 > >> >>> > >> >> filebands: 0 > >> >>> > >> >> icbands: 0 > >> >>> > >> >> ictuple: 0 > >> >>> > >> >> metabandgr: 0 > >> >>> > >> >> newattrband: 0 > >> >>> > >> >> segcp: 0 > >> >>> > >> >> segheader: 0 > >> >>> > >> >> segment: 0 > >> >>> > >> >> segopts: 0 > >> >>> > >> >> bc.attr: 263753 > >> >>> > >> >> bc.remattr: 0 > >> >>> > >> >> bc.anndefarg: 0 > >> >>> > >> >> bc.annattr: 0 > >> >>> > >> >> bc.bytecode: 18646908 > >> >>> > >> >> bc.ccp: 0 > >> >>> > >> >> bc.classf: 0 > >> >>> > >> >> bc.codeattr: 839916 > >> >>> > >> >> bc.cvalattr: 93471 > >> >>> > >> >> bc.cpclass: 15118425 > >> >>> > >> >> bc.cpconst: 0 > >> >>> > >> >> bc.cpdouble: 1638 > >> >>> > >> >> bc.cpfield: 277144 > >> >>> > >> >> bc.cpfieldref: 5159994 > >> >>> > >> >> bc.cpfloat: 3255 > >> >>> > >> >> bc.cpinteger: 153811 > >> >>> > >> >> bc.methref: 3420605 > >> >>> > >> >> bc.cplong: 48104 > >> >>> > >> >> bc.cpmember: 0 > >> >>> > >> >> bc.cpmethod: 430234 > >> >>> > >> >> bc.cpmethref: 12103799 > >> >>> > >> >> bc.cpnametype: 14928914 > >> >>> > >> >> bc.cpref: 0 > >> >>> > >> >> bc.cpstring: 1840965 > >> >>> > >> >> bc.cputf8: 95462388 > >> >>> > >> >> bc.depattr: 5593 > >> >>> > >> >> bc.encmethattr: 0 > >> >>> > >> >> bc.excpattr: 72492 > >> >>> > >> >> bc.exptableent: 0 > >> >>> > >> >> bc.innerclass: 40362 > >> >>> > >> >> bc.linenumattr: 839916 > >> >>> > >> >> bc.locvarattr: 839916 > >> >>> > >> >> bc.locvartable: 0 > >> >>> > >> >> bc.newattr: 121856 > >> >>> > >> >> bc.opmgr: 0 > >> >>> > >> >> bc.rtattr: 0 > >> >>> > >> >> bc.rtannattr: 0 > >> >>> > >> >> bc.signattr: 0 > >> >>> > >> >> bc.srcfileattr: 57428 > >> >>> > >> >> > >> >>> > >> >> Would you like to produce the patch? > >> >>> > >> >> I think it would be funny :) > >> >>> > >> >> > >> >>> > >> >> Thanks, > >> >>> > >> >> Aleksey. > >> >>> > >> >> > >> >>> > >> >> [1] https://issues.apache.org/jira/browse/HARMONY-5905 > >> >>> > >> >> > >> >>> > >> >> On Sat, Jul 12, 2008 at 12:48 AM, Andrew Cornwall (JIRA) > >> >>> > >> >> wrote: > >> >>> > >> >>> > >> >>> > >> >>> [ > >> >>> > >> > >> >>> > > >> >>> > >> > https://issues.apache.org/jira/browse/HARMONY-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > >> >>> > ] > >> >>> > >> >>> > >> >>> > >> >>> Andrew Cornwall updated HARMONY-5907: > >> >>> > >> >>> ------------------------------------- > >> >>> > >> >>> > >> >>> > >> >>> Attachment: main.patch > >> >>> > >> >>> > >> >>> > >> >>> main.patch includes change to CPUTF8.java > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>>> [classlib][pack200]CPUTF8.hashCode() is slow > >> >>> > >> >>>> -------------------------------------------- > >> >>> > >> >>>> > >> >>> > >> >>>> Key: HARMONY-5907 > >> >>> > >> >>>> URL: > >> >>> > >> https://issues.apache.org/jira/browse/HARMONY-5907 > >> >>> > >> >>>> Project: Harmony > >> >>> > >> >>>> Issue Type: Improvement > >> >>> > >> >>>> Affects Versions: 5.0M6 > >> >>> > >> >>>> Environment: Latest pack200 > >> >>> > >> >>>> Reporter: Andrew Cornwall > >> >>> > >> >>>> Attachments: main.patch > >> >>> > >> >>>> > >> >>> > >> >>>> > >> >>> > >> >>>> The unpack process spends a lot of time doing > >> CPUTF8.hashCode() > >> >>> - > >> >>> > >> which does String.hashCode(). We can save about 1.5 seconds of > my > >> 39 > >> >>> > second > >> >>> > >> test case (about 4%) by caching the hashCode. (I thought we did > >> this > >> >>> > before > >> >>> > >> - or maybe I dreamt it?) > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> > >> >>> This message is automatically generated by JIRA. > >> >>> > >> >>> - > >> >>> > >> >>> You can reply to this email to add a comment to the issue > >> online. > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >> > >> >>> > >> > > >> >>> > >> > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > -- > >> >>> > > Unless stated otherwise above: > >> >>> > > IBM United Kingdom Limited - Registered in England and Wales > with > >> >>> number > >> >>> > > 741598. > >> >>> > > Registered office: PO Box 41, North Harbour, Portsmouth, > Hampshire > >> PO6 > >> >>> > 3AU > >> >>> > > > >> >>> > > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Unless stated otherwise above: > >> >>> IBM United Kingdom Limited - Registered in England and Wales with > >> number > >> >>> 741598. > >> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire > PO6 > >> 3AU > >> >>> > >> >> > >> >> > >> > > >> > > > ------=_Part_34768_7742566.1216067821476--