santuario-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad La Joie <>
Subject Re: Status of == vs equals() RESULTS
Date Tue, 24 Aug 2010 21:11:40 GMT
Okay, I'll prepare a patch for you by the end of the week.

On 8/24/10 2:23 PM, Colm O hEigeartaigh wrote:
> Sounds fine to me.
> Colm.
> On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joie<>  wrote:
>> Okay, getting back to this.
>> I tried my tests again this time with:
>>   - a 7.5MB SAML metadata document (so lots of comparisons)
>>   - 100 warm up runs then 100 timed runs
>>   - an explicit GC between each run to keep it from happening during the runs
>> since the DOMs were so large
>> No real difference in results. equals() was faster.
>> So, at this point, I can't see any reason to do anything other than
>> equals().  It's the actual correct way of doing the comparison in that it
>> will always return the proper result and the JVM definitely seems to be
>> optimizing its use.
>> On 8/10/10 7:53 AM, Chad La Joie wrote:
>>> Okay, I certainly have a number of SAML documents lying around so I'll
>>> try with those as well. And, of course, I'll report back the results I
>>> get.
>>> On 8/10/10 4:46 AM, Raul Benito wrote:
>>>> As the original author of the changes of equals to == in intern
>>>> namespaces, I can tell that original in 1.4 and 1.5 and with my data
>>>> (that was the verification of a SAML/Liberty AuthnReq in a multi thread
>>>> tests, and the old Juice JCE provider). The change was 10% to 20% faster.
>>>> The SAML is one of the real example of signing and has some url with
>>>> common prefixes and same length url.
>>>> The Juice provider also helps to get rid of the signing/digest cost (a
>>>> verification is two c14n one of the signing part and c14n of the
>>>> signature), but i think just a c14n is a good way of measure it.
>>>> Also take into account that the == vs equals debate is more a memory
>>>> workload cache problem, if we have to iterate over and over every char
>>>> just to see if it is not equals, we trash the cache (That's why i used
>>>> the multi thread to simulate a server decoding requests with more or
>>>> less the same code, but in different times and different "workload")
>>>> Nevertheless if you have test with a more modern jre and the code
>>>> .equals is behaving better, just go ahead and kiss goodbye to the ==.
>>>> Clive, using the .hashCode for strings in this case is not a big
>>>> speed-up as it is going to go through all the chars of the string,
>>>> trashing cache again, and multiplying and adding the result to an
>>>> integer, instead of a fail in the first different char or just summarize
>>>> to a boolean.\
>>>> Regards,
>>>> On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore
>>>> <<>>
>>>> wrote:
>>>> Have to agree .equals is the way to go, since correctness of == is too
>>>> reliant on what must be considered implementation optimisations in the
>>>> parser.
>>>> Benchmarking in JVM is notoriously difficult, but it does look like
>>>> there is no gross difference, which should kill any objections to doing
>>>> it correctly.
>>>> Since I recently spend far to long researching this for an unrelated
>>>> problem I'll add my 10c to the detail discussion.
>>>> On 10/08/10 01:23, Chad La Joie wrote:
>>>>> Not necessarily, there are a number of not equal checks in there that
>>>>> should, in theory, perform better if you only use == only. In such a
>>>>> case, the use of != will just be a single check while !equals() will
>>>>> result in a char-by-char comparison.
>>>> Actually, the next thing String.equals tests is length equality - so
>>>> character comparison will only be reached if the strings are the same
>>>> length.
>>>> Since the char by char comparison returns on the first mismatch, then
>>>> only same length strings with shared prefixes will show the expected
>>>> slowness. (namespace URIs are likely to share prefixes, but I think are
>>>> not particularly likely to be the same length, unless actually equal)...
>>>> thus String.equals is only likely to be slow where comparing long
>>>> distinct but equal strings (so intern or alternative string pooling
>>>> techniques needed for == benefit .equals without all the nasty
>>>> loopholes: even if .equals is occasionally slow, at least it is always
>>>> right).
>>>> In circumstances where doing repeated tests with many length and prefix
>>>> matches, adding a hash code inequality test ((s1.hashCode()==
>>>> s2.hashCode())&&s1.equals(s2)) could prevent practically all
>>>> char-by-char checks for !equal cases (but if the same strings are never
>>>> repeatedly used, the hash code calculation could be an issue; nb intern
>>>> results in hash calculation for all strings anyway)... pooling is still
>>>> needed to speed up matches for equality though.
>>>> Re VM options I would feel -server is definitely the right test bed,
>>>> both because of the more aggressive JIT, and also because the code is
>>>> likely to see heaviest real world cases in -server VMs.
>> --
>> Chad La Joie
>> trusted identities, delivered

Chad La Joie
trusted identities, delivered

View raw message