Mailing-List: contact security-dev-help@xml.apache.org; run by ezmlm
Precedence: bulk
Reply-To: security-dev@xml.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-ID: <4C74358C.6020400@itumi.biz>
Date: Tue, 24 Aug 2010 17:11:40 -0400
From: Chad La Joie <lajoie@itumi.biz>
Organization: Itumi, LLC
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
 rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: security-dev@xml.apache.org
Subject: Re: Status of == vs equals() RESULTS
References: <4C56D21A.20007@itumi.biz>	<4C600E7D.9030408@itumi.biz>
	<D55E3EC422FE1E43A478CFAD9676DDF31278C2CC89@IBIUSMBSB.ibi.com>
	<4C601D6B.7040607@itumi.biz>	<4C609F40.9010804@brettingham-moore.net>
	<AANLkTim8A9j1rk3=jmc5rYFdpOf32YyBhzq5e8H5Pwbw@mail.gmail.com>
	<4C613DA1.8060009@itumi.biz>	<4C72D21A.7060304@itumi.biz>
 <AANLkTinH7A5VC02Hzj2pA-umqUf3bY1cnyw8VfsiE8b2@mail.gmail.com>
In-Reply-To: <AANLkTinH7A5VC02Hzj2pA-umqUf3bY1cnyw8VfsiE8b2@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Okay, I'll prepare a patch for you by the end of the week.

On 8/24/10 2:23 PM, Colm O hEigeartaigh wrote:
> Sounds fine to me.
>
> Colm.
>
> On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joie<lajoie@itumi.biz>  wrote:
>> Okay, getting back to this.
>>
>> I tried my tests again this time with:
>>   - a 7.5MB SAML metadata document (so lots of comparisons)
>>   - 100 warm up runs then 100 timed runs
>>   - an explicit GC between each run to keep it from happening during the runs
>> since the DOMs were so large
>>
>> No real difference in results. equals() was faster.
>>
>> So, at this point, I can't see any reason to do anything other than
>> equals().  It's the actual correct way of doing the comparison in that it
>> will always return the proper result and the JVM definitely seems to be
>> optimizing its use.
>>
>> On 8/10/10 7:53 AM, Chad La Joie wrote:
>>>
>>> Okay, I certainly have a number of SAML documents lying around so I'll
>>> try with those as well. And, of course, I'll report back the results I
>>> get.
>>>
>>> On 8/10/10 4:46 AM, Raul Benito wrote:
>>>>
>>>> As the original author of the changes of equals to == in intern
>>>> namespaces, I can tell that original in 1.4 and 1.5 and with my data
>>>> (that was the verification of a SAML/Liberty AuthnReq in a multi thread
>>>> tests, and the old Juice JCE provider). The change was 10% to 20% faster.
>>>> The SAML is one of the real example of signing and has some url with
>>>> common prefixes and same length url.
>>>> The Juice provider also helps to get rid of the signing/digest cost (a
>>>> verification is two c14n one of the signing part and c14n of the
>>>> signature), but i think just a c14n is a good way of measure it.
>>>> Also take into account that the == vs equals debate is more a memory
>>>> workload cache problem, if we have to iterate over and over every char
>>>> just to see if it is not equals, we trash the cache (That's why i used
>>>> the multi thread to simulate a server decoding requests with more or
>>>> less the same code, but in different times and different "workload")
>>>> Nevertheless if you have test with a more modern jre and the code
>>>> .equals is behaving better, just go ahead and kiss goodbye to the ==.
>>>>
>>>> Clive, using the .hashCode for strings in this case is not a big
>>>> speed-up as it is going to go through all the chars of the string,
>>>> trashing cache again, and multiplying and adding the result to an
>>>> integer, instead of a fail in the first different char or just summarize
>>>> to a boolean.\
>>>>
>>>> Regards,
>>>>
>>>>
>>>> On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore
>>>> <xmlsec@brettingham-moore.net<mailto:xmlsec@brettingham-moore.net>>
>>>> wrote:
>>>>
>>>> Have to agree .equals is the way to go, since correctness of == is too
>>>> reliant on what must be considered implementation optimisations in the
>>>> parser.
>>>>
>>>> Benchmarking in JVM is notoriously difficult, but it does look like
>>>> there is no gross difference, which should kill any objections to doing
>>>> it correctly.
>>>>
>>>> Since I recently spend far to long researching this for an unrelated
>>>> problem I'll add my 10c to the detail discussion.
>>>>
>>>> On 10/08/10 01:23, Chad La Joie wrote:
>>>>
>>>>> Not necessarily, there are a number of not equal checks in there that
>>>>> should, in theory, perform better if you only use == only. In such a
>>>>> case, the use of != will just be a single check while !equals() will
>>>>> result in a char-by-char comparison.
>>>>
>>>> Actually, the next thing String.equals tests is length equality - so
>>>> character comparison will only be reached if the strings are the same
>>>> length.
>>>>
>>>> Since the char by char comparison returns on the first mismatch, then
>>>> only same length strings with shared prefixes will show the expected
>>>> slowness. (namespace URIs are likely to share prefixes, but I think are
>>>> not particularly likely to be the same length, unless actually equal)...
>>>> thus String.equals is only likely to be slow where comparing long
>>>> distinct but equal strings (so intern or alternative string pooling
>>>> techniques needed for == benefit .equals without all the nasty
>>>> loopholes: even if .equals is occasionally slow, at least it is always
>>>> right).
>>>>
>>>> In circumstances where doing repeated tests with many length and prefix
>>>> matches, adding a hash code inequality test ((s1.hashCode()==
>>>> s2.hashCode())&&s1.equals(s2)) could prevent practically all
>>>> char-by-char checks for !equal cases (but if the same strings are never
>>>> repeatedly used, the hash code calculation could be an issue; nb intern
>>>> results in hash calculation for all strings anyway)... pooling is still
>>>> needed to speed up matches for equality though.
>>>>
>>>> Re VM options I would feel -server is definitely the right test bed,
>>>> both because of the more aggressive JIT, and also because the code is
>>>> likely to see heaviest real world cases in -server VMs.
>>>>
>>>>
>>>
>>
>> --
>> Chad La Joie
>> http://itumi.biz
>> trusted identities, delivered
>>
>

-- 
Chad La Joie
http://itumi.biz
trusted identities, delivered