xerces-j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject Re: [Performance] Artifical EOFException in XMLEntityScanner.load()
Date Fri, 03 Sep 2004 07:24:00 GMT
Neeraj, actually there seem to be two separate issues (sorry for mixing  
them up):

1) The exception throwing hotspot (see prior mail for profiles):
	Observed when using an XMLReader without validation
	
2) Xerces HEAD is 2x faster than 1.5 RC xerces internal:
	Observed when using an XMLReader with validation against a W3C XML  
Schema when two schemas instead of one schema are specified with the  
"http://apache.org/xml/properties/schema/external-schemaLocation"  
property. Two schemas vs. one schema is the only diff in the setup.  
There is no performance regression when using one schema only. Again,  
the profiles below seem to indicate Exception raising to make the  
difference.
About the usage: In the setup the documents themselves do not specify  
schema locations - the property does. The schemas are pretty  
straightforward - nothing fancy. We are reusing the XMLReader object  
for each parse via a ThreadLocal. We initialize the XMLReader with a  
grammar pool along the following lines:

	// These are stored in the order of preference.
	private static final String[] schemaParsers = {
		"org.apache.xerces.parsers.SAXParser",
		"com.sun.org.apache.xerces.internal.parsers.SAXParser" // JDK 1.5
	};

	// These are stored in the order of preference.
	private static final String[] schemaGrammarPools = {
		"org.apache.xerces.util.XMLGrammarPoolImpl",
		"com.sun.org.apache.xerces.internal.util.XMLGrammarPoolImpl" // JDK  
1.5
	};

		XMLReader parser;
		for (int i = 0; i < schemaParsers.length; i++) {
			try {
				parser = XMLReaderFactory.createXMLReader(schemaParsers[i]);
                    boolean wantReusableParser = true;
				if (wantReusableParser) {
					// This improves performance and, more importantly, prevents xerces
					// bugs/exceptions when the Builder is used more than once (at  
least for
					// xerces-2.6.2). I'm not sure how to actually reproduce those  
problems :-(
					// See http://xml.apache.org/xerces2-j/faq-grammars.html
				  
	parser.setProperty("http://apache.org/xml/properties/internal/grammar- 
pool",
						Class.forName(schemaGrammarPools[i]).newInstance());
				}
				setupFeaturesAndProps(parser);
               } catch (....
               ...
          }



Here are the profiler snippets:

JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas):
------------------------------------------------------

      Compiled + native   Method
   8.5%   355  +     0    ExceptionBlob
   3.4%   141  +     1     
com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStar 
tElement


          Stub + native   Method
  38.2%     0  +  1603    java.lang.Throwable.fillInStackTrace
   1.4%     0  +    60    java.lang.StrictMath.floor
  39.6%     0  +  1663    Total stub

   Thread-local ticks:
   0.1%     3             Blocked (of total)
   0.1%     3             Class loader
   0.1%     6             Compilation
   0.1%     4             Unknown: thread_state


Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM

   Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 85.75 seconds:
100.0%  4223             Received ticks
   0.5%    21             Received GC ticks
   6.7%   285             Compilation
   0.1%     3             Class loader
   0.1%     4             Unknown code

real    1m26.020s
user    1m23.680s
sys     0m1.410s



JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas):
------------------------------------------------------
      Compiled + native   Method
   6.1%   132  +     0     
org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement
   4.7%   102  +     0    java.nio.ByteBuffer.<init>
   4.0%    87  +     0     
org.apache.xerces.impl.dv.xs.XSSimpleTypeDecl.normalize
   3.6%    78  +     0     
org.apache.xerces.impl.dv.xs.AnyURIDV.getActualValue
   3.4%    10  +    64    org.apache.xerces.impl.xs.XSModelImpl.<init>

          Stub + native   Method
   4.6%     0  +   101    java.lang.Throwable.fillInStackTrace
   2.1%     0  +    45    java.lang.StrictMath.floor
   6.7%     0  +   146    Total stub

   Thread-local ticks:
   0.3%     7             Blocked (of total)
   0.0%     1             Class loader
   0.2%     4             Compilation
   0.1%     3             Unknown: thread_state


Flat profile of 0.02 secs (1 total ticks): DestroyJavaVM

   Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 45.67 seconds:
100.0%  2207             Received ticks
   0.9%    19             Received GC ticks
  12.4%   274             Compilation
   0.0%     1             Class loader
   0.1%     3             Unknown code

real    0m45.928s
user    0m43.250s
sys     0m0.220s

Any insights?
Wolfgang.

On Sep 2, 2004, at 11:14 PM, Neeraj Bajaj wrote:

> Interestingly we identified the same problem few weeks back and  i  
> have the fix available on my local machine but didn't put back because  
> it's not that clean.
> Currently the fix includes making some changes in  
> DocumentScannerImpl.java & XMLEntityScanner based on the depth of  
> entities and current entity being null.
> I was looking for some thing more clean which could tell the scanner  
> about the end of Document.  I will look into this again and put back  
> the change in next
> few days.
>
>> I have a server app that parsers millions of smallish documents.
>>
>> Performance has been improved at lot by reusing XMLReaders. It's  
>> pretty  good but could perhaps get better in light of the (perhaps  
>> dubious?)  hints given by the profiler snippet below.
>>
>> Accordingly, the theory is that throwing an (artifical) EOFException  
>> in  XMLEntityScanner.load() at the end of each document/entity  
>> consumes  some 25% (JDK 1.5) and some 15% (JDK 1.4.2) of the total  
>> execution  time, the single hottest spot in the program. Probably due  
>> too the  heavy nature of exceptions and in particular   
>> Throwable.fillInStackTrace(). If this can indeed be confirmed by   
>> others, would it perhaps be possibly (and correct) to restructure the  
>>  relevant xerces internals to avoid raising artificial exceptions for  
>>  what appears to be normal program control flow (the documents and   
>> streams are fine)?
>>
>> Configuration: Sun JDK 1.5 RC and Sun JDK 1.4.2, xerces CVS head  
>> [never  using the JDK internal xerces which appears to be twice as  
>> slow in this  case, for whatever reason]
>
> JDK 1.5 RC contains almost latest Xerces. Could you tell what are you  
> doing so that we can identify the problem and fix it ?
>
>
> Thanks,
> Neeraj
>
>>
>> Here is the JDK 1.5 profiler snippet (java -server -Xprof):
>> -----------------------------------------------------------
>>          Stub + native   Method
>>  28.6%     0  +   487    java.lang.Throwable.fillInStackTrace
>>  28.6%     0  +   487    Total stub
>>
>>   Thread-local ticks:
>>   0.1%     1             Blocked (of total)
>>   0.1%     2             Class loader
>>   0.1%     2             Compilation
>>   0.2%     3             Unknown: thread_state
>>
>> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>>
>>   Thread-local ticks:
>> 100.0%     1             Blocked (of total)
>>
>>
>> Global summary of 35.44 seconds:
>> 100.0%  1718             Received ticks
>>   0.7%    12             Received GC ticks
>>   9.7%   167             Compilation
>>   0.1%     2             Class loader
>>   0.2%     3             Unknown code
>>
>> real    0m35.715s
>> user    0m34.170s
>> sys     0m0.190s
>>
>>
>> Here is the JDK 1.4 profiler snippet (java -server -Xprof):
>> -----------------------------------------------------------
>>         Stub + native   Method
>>  12.7%     4  +   239    java.lang.Throwable.fillInStackTrace
>>  12.7%     4  +   239    Total stub
>>
>>   Runtime stub + native  Method
>>   0.2%     3  +     0    _rethrow_Java
>>   0.2%     3  +     0    Total runtime stubs
>>
>>   Thread-local ticks:
>>   3.1%    61             Blocked (of total)
>>   0.4%     7             Interpreter
>>   0.1%     2             Compilation
>>   4.9%    93             Unknown: running frame
>>
>>
>> Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
>>
>>   Thread-local ticks:
>> 100.0%     1             Blocked (of total)
>>
>>
>> Global summary of 43.25 seconds:
>> 100.0%  2071             Received ticks
>>   3.8%    79             Received GC ticks
>>   6.2%   128             Compilation
>>   0.5%    10             Other VM operations
>>   0.3%     7             Interpreter
>>   4.5%    93             Unknown code
>>
>> real    0m43.517s
>> user    0m42.100s
>> sys     0m0.530s
>>
>>
>>
>> Trace via java -server -agentlib:hprof=cpu=samples,depth=30:
>> -----------------------------------------------------------
>> TRACE 300347:
>>         java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown   
>> line)
>>         java.lang.Throwable.<init>(Throwable.java:181)
>>         java.lang.Exception.<init>(Exception.java:29)
>>         java.io.IOException.<init>(IOException.java:28)
>>         java.io.EOFException.<init>(EOFException.java:32)
>>         org.apache.xerces.impl.XMLEntityScanner.load(<Unknown   
>> Source>:Unknown line)
>>         org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown   
>> Source>:Unknown line)
>>           
>> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.d 
>> is patch(<Unknown Source>:Unknown line)
>>           
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(<Un 
>> kn own Source>:Unknown line)
>>         org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown   
>> Source>:Unknown line)
>>         org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown   
>> Source>:Unknown line)
>>         org.apache.xerces.parsers.XMLParser.parse(<Unknown   
>> Source>:Unknown line)
>>         org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown   
>> Source>:Unknown line)
>>         nu.xom.Builder.build(Builder.java:786)
>>         nu.xom.Builder.build(Builder.java:569)
>>          
>> gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
>>
>>
>> I guess the relevant block is:
>> -----------------------------------------------------------
>>
>> XMLEntityScanner.load(...):
>>             ...
>>             if (changeEntity) {
>>                 fEntityManager.endEntity();
>>                 if (fCurrentEntity == null) {
>>                     throw new EOFException();
>>                 }
>>                 // handle the trailing edges
>>                 if (fCurrentEntity.position == fCurrentEntity.count) {
>>                     load(0, true);
>>                 }
>>             }
>>
>>
>> Comments?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Mime
View raw message