Return-Path: Delivered-To: apmail-xml-security-dev-archive@www.apache.org Received: (qmail 35948 invoked from network); 24 Aug 2010 21:12:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Aug 2010 21:12:27 -0000 Received: (qmail 36201 invoked by uid 500); 24 Aug 2010 21:12:27 -0000 Delivered-To: apmail-xml-security-dev-archive@xml.apache.org Received: (qmail 36171 invoked by uid 500); 24 Aug 2010 21:12:27 -0000 Mailing-List: contact security-dev-help@xml.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: Reply-To: security-dev@xml.apache.org List-Id: Delivered-To: mailing list security-dev@xml.apache.org Received: (qmail 36164 invoked by uid 99); 24 Aug 2010 21:12:26 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 21:12:26 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 21:12:04 +0000 Received: by iwn33 with SMTP id 33so5442848iwn.28 for ; Tue, 24 Aug 2010 14:11:42 -0700 (PDT) Received: by 10.231.15.195 with SMTP id l3mr8434860iba.188.1282684302360; Tue, 24 Aug 2010 14:11:42 -0700 (PDT) Received: from lypse.local (c-68-40-239-226.hsd1.mi.comcast.net [68.40.239.226]) by mx.google.com with ESMTPS id e8sm440178ibb.14.2010.08.24.14.11.41 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 24 Aug 2010 14:11:41 -0700 (PDT) Message-ID: <4C74358C.6020400@itumi.biz> Date: Tue, 24 Aug 2010 17:11:40 -0400 From: Chad La Joie Organization: Itumi, LLC User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: security-dev@xml.apache.org Subject: Re: Status of == vs equals() RESULTS References: <4C56D21A.20007@itumi.biz> <4C600E7D.9030408@itumi.biz> <4C601D6B.7040607@itumi.biz> <4C609F40.9010804@brettingham-moore.net> <4C613DA1.8060009@itumi.biz> <4C72D21A.7060304@itumi.biz> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Okay, I'll prepare a patch for you by the end of the week. On 8/24/10 2:23 PM, Colm O hEigeartaigh wrote: > Sounds fine to me. > > Colm. > > On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joie wrote: >> Okay, getting back to this. >> >> I tried my tests again this time with: >> - a 7.5MB SAML metadata document (so lots of comparisons) >> - 100 warm up runs then 100 timed runs >> - an explicit GC between each run to keep it from happening during the runs >> since the DOMs were so large >> >> No real difference in results. equals() was faster. >> >> So, at this point, I can't see any reason to do anything other than >> equals(). It's the actual correct way of doing the comparison in that it >> will always return the proper result and the JVM definitely seems to be >> optimizing its use. >> >> On 8/10/10 7:53 AM, Chad La Joie wrote: >>> >>> Okay, I certainly have a number of SAML documents lying around so I'll >>> try with those as well. And, of course, I'll report back the results I >>> get. >>> >>> On 8/10/10 4:46 AM, Raul Benito wrote: >>>> >>>> As the original author of the changes of equals to == in intern >>>> namespaces, I can tell that original in 1.4 and 1.5 and with my data >>>> (that was the verification of a SAML/Liberty AuthnReq in a multi thread >>>> tests, and the old Juice JCE provider). The change was 10% to 20% faster. >>>> The SAML is one of the real example of signing and has some url with >>>> common prefixes and same length url. >>>> The Juice provider also helps to get rid of the signing/digest cost (a >>>> verification is two c14n one of the signing part and c14n of the >>>> signature), but i think just a c14n is a good way of measure it. >>>> Also take into account that the == vs equals debate is more a memory >>>> workload cache problem, if we have to iterate over and over every char >>>> just to see if it is not equals, we trash the cache (That's why i used >>>> the multi thread to simulate a server decoding requests with more or >>>> less the same code, but in different times and different "workload") >>>> Nevertheless if you have test with a more modern jre and the code >>>> .equals is behaving better, just go ahead and kiss goodbye to the ==. >>>> >>>> Clive, using the .hashCode for strings in this case is not a big >>>> speed-up as it is going to go through all the chars of the string, >>>> trashing cache again, and multiplying and adding the result to an >>>> integer, instead of a fail in the first different char or just summarize >>>> to a boolean.\ >>>> >>>> Regards, >>>> >>>> >>>> On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore >>>> > >>>> wrote: >>>> >>>> Have to agree .equals is the way to go, since correctness of == is too >>>> reliant on what must be considered implementation optimisations in the >>>> parser. >>>> >>>> Benchmarking in JVM is notoriously difficult, but it does look like >>>> there is no gross difference, which should kill any objections to doing >>>> it correctly. >>>> >>>> Since I recently spend far to long researching this for an unrelated >>>> problem I'll add my 10c to the detail discussion. >>>> >>>> On 10/08/10 01:23, Chad La Joie wrote: >>>> >>>>> Not necessarily, there are a number of not equal checks in there that >>>>> should, in theory, perform better if you only use == only. In such a >>>>> case, the use of != will just be a single check while !equals() will >>>>> result in a char-by-char comparison. >>>> >>>> Actually, the next thing String.equals tests is length equality - so >>>> character comparison will only be reached if the strings are the same >>>> length. >>>> >>>> Since the char by char comparison returns on the first mismatch, then >>>> only same length strings with shared prefixes will show the expected >>>> slowness. (namespace URIs are likely to share prefixes, but I think are >>>> not particularly likely to be the same length, unless actually equal)... >>>> thus String.equals is only likely to be slow where comparing long >>>> distinct but equal strings (so intern or alternative string pooling >>>> techniques needed for == benefit .equals without all the nasty >>>> loopholes: even if .equals is occasionally slow, at least it is always >>>> right). >>>> >>>> In circumstances where doing repeated tests with many length and prefix >>>> matches, adding a hash code inequality test ((s1.hashCode()== >>>> s2.hashCode())&&s1.equals(s2)) could prevent practically all >>>> char-by-char checks for !equal cases (but if the same strings are never >>>> repeatedly used, the hash code calculation could be an issue; nb intern >>>> results in hash calculation for all strings anyway)... pooling is still >>>> needed to speed up matches for equality though. >>>> >>>> Re VM options I would feel -server is definitely the right test bed, >>>> both because of the more aggressive JIT, and also because the code is >>>> likely to see heaviest real world cases in -server VMs. >>>> >>>> >>> >> >> -- >> Chad La Joie >> http://itumi.biz >> trusted identities, delivered >> > -- Chad La Joie http://itumi.biz trusted identities, delivered