santuario-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pellerin, Clement" <>
Subject RE: Status of == vs equals() RESULTS
Date Mon, 09 Aug 2010 15:13:31 GMT
In JDK 1.5, String.equals() begins with:

public boolean equals(Object anObject) {
	if (this == anObject) {
	    return true;

Since String is a final class, the JIT compiler is free to in-line String.equals()
This is such a common case, I bet the JIT compiler team made it a special case to in-line
at least the beginning of String.equals() at every invocation site.

If your test bed only uses intern Strings this will return early with the same behavior as
== for equal strings.
Is it possible your test bed calls String.equals() with an overwhelming percentage of equal

-----Original Message-----
From: Chad La Joie [] 
Sent: Monday, August 09, 2010 10:20 AM
Subject: Re: Status of == vs equals() RESULTS

So, I have some unexpected results from this work.

I implemented a helper class that checked the equality of element local 
names, attribute local names, namespace URIs, and namespace prefixes 
(i.e. everything that Xerces always interns).  Then I made sure to 
replace all == != and equals() that I could find with the appropriate call.

To test, I picked the Canonicalizer20010315ExclusiveTest test case and 
made two alterations to the test22*excl methods:
   - do one c14n operation out the timing loop just to make sure all the 
classes are in memory, constants are loaded, etc.
   - in a 100 iteration loop, create a new canonicalizer, canonicalize a 
DOM tree, and time it using nanosecond time

I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 
2_2_3.xml[3] input files (test221excl, test221excl, test223excl 

Here are the results, measured in nanosecond timing.  "total" indicates 
the total time spent in all 100 runs, i.e. the summation of each of the 
100 results.

         equals()    ==
min     101000	   99000
max     123000	   191000
median  103000	   105000
avg     103760	   106540
total   10376000   10654000

         equals()    ==
min     99000      101000
max     192000     128000
median  100000     108000
avg     102110     108480
total   10211000   10848000

test223excl (an XPath nodeset canonicalization)
         equals()    ==
min     254000     248000
max     290000     353000
median  266000     265000
avg     266820     265800
total   26682000   26580000

So, what these numbers appear to suggest is that, in fact, equals() is 
more often faster than ==.  This seems counter-intuitive unless the JVM 
has specialized optimization for the String.equals() method.

Can anyone see where my testing is likely to be flawed?


On 8/2/10 10:11 AM, Chad La Joie wrote:
> So, while I don't have my access yet, Colm asked me if I'd take a look
> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])
> My executive summary is that clearly, as things stand, the current code
> favors optimization over correctness. Rarely is this a good thing.
> Colm notes[4] that the reliance on intern'ed strings (and thus the
> ability to use ==) occurs sporadically throughout the code and not just
> within the ElementChecker implementations. He specifically mentioned
> that the various C14N implementations, and indeed the == is used about 6
> times there for string comparison.
> My recommendation then is two fold:
> - Ensure that nothing other than namespace bits are compared via ==. I
> don't know that this occurs but the code should definitely be reviewed
> to ensure that.
> - Create a new "NamespaceEqualityChecker" that provides methods for
> checking the various bits of a namespace (URIs, prefixes) and use it
> anywhere that either == or equals() is used today. Implementations based
> on == and equals() would be provided with the default implementation
> being equals()-based. A configuration option should then be made
> available to control which impl gets used. Additionally, it might even
> be possible to add some smarts that could detect known "good" parsers
> that use interning and automatically use the == based implementation.
> I do not recommend changing any part of the code without addressing the
> whole codebase (i.e. all the =='s need to be fixed or no change should
> be made) because of the possibility of creating new, unwanted, effects.
> The current functionality is undesirable but better the devil you know.
> I think that this should be addressed in the upcoming 1.4.4 release. If
> quick consensus can be reached I'm willing to do the work with a window
> of time I have available over the next 2-3 weeks.
> [1]
> [2]
> [3]
> [4]

Chad La Joie
trusted identities, delivered

View raw message