xerces-j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neeraj Bajaj <Neeraj.Ba...@Sun.COM>
Subject Re: [Performance] Artifical EOFException in XMLEntityScanner.load()
Date Fri, 03 Sep 2004 08:34:27 GMT


Wolfgang Hoschek wrote:

> Neeraj, actually there seem to be two separate issues (sorry for 
> mixing  them up):
>
> 1) The exception throwing hotspot (see prior mail for profiles):
>     Observed when using an XMLReader without validation

This is a problem when you are parsing small sized documents and is 
reproducible with  1.5 RC and Xerces.  Right ?
This is understandable as when validation is not being done there isn't 
any overhead of reading schema, doing validation etc.
So %time spent in throwing exception becomes high.

>     
> 2) Xerces HEAD is 2x faster than 1.5 RC xerces internal:
>     Observed when using an XMLReader with validation against a W3C 
> XML  Schema when two schemas instead of one schema are specified with 
> the  
> "http://apache.org/xml/properties/schema/external-schemaLocation"  
> property. Two schemas vs. one schema is the only diff in the setup.  
> There is no performance regression when using one schema only. 

Ok.

> Again,  the profiles below seem to indicate Exception raising to make 
> the  difference.
> About the usage: In the setup the documents themselves do not specify  
> schema locations - the property does. The schemas are pretty  
> straightforward - nothing fancy. We are reusing the XMLReader object  
> for each parse via a ThreadLocal. We initialize the XMLReader with a  
> grammar pool along the following lines:

Ah.. so you are using GrammarPool.  Difference in performance means that 
it is possible  that Grammars are not being cached. I will look into 
this too. It might not have been tested well.
Actually,  JAXP 1.3 (JSR 206) [1]  introduces a new validation framework 
[2] (javax.xml.validation.SchemaFactory)  It allows you to preparse 
Schema and get hold of schema object, against which application can 
parse various instance documents.  (It is similar to "Translets" where 
one can transform XML documents into different format as per the style 
sheet.) Schemas are already parsed and
available as Object, this reduces the overhead of reading schema, 
constructing internal memory reprsentation (as with GrammarCaching). It 
also makes sure that instance document would
be validated against the Schema you have reference and not against any 
un trusted Schema.  This part have been tested pretty well.

If you have free cycles, Can you also try setting the schemaLocations 
property using JAXP 1.2 [2]  
"http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://java.sun.com/xml/jaxp/properties/schemaSource" and see if the 
result is same. This would help in debugging the problem.


Thanks,
Neeraj

[1] http://www.jcp.org/en/jsr/detail?id=206
[2] 
http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/package-summary.html
[3] http://java.sun.com/xml/jaxp/change-requests-11.html

>
>     // These are stored in the order of preference.
>     private static final String[] schemaParsers = {
>         "org.apache.xerces.parsers.SAXParser",
>         "com.sun.org.apache.xerces.internal.parsers.SAXParser" // JDK 1.5
>     };
>
>     // These are stored in the order of preference.
>     private static final String[] schemaGrammarPools = {
>         "org.apache.xerces.util.XMLGrammarPoolImpl",
>         "com.sun.org.apache.xerces.internal.util.XMLGrammarPoolImpl" 
> // JDK  1.5
>     };
>
>         XMLReader parser;
>         for (int i = 0; i < schemaParsers.length; i++) {
>             try {
>                 parser = 
> XMLReaderFactory.createXMLReader(schemaParsers[i]);
>                    boolean wantReusableParser = true;
>                 if (wantReusableParser) {
>                     // This improves performance and, more 
> importantly, prevents xerces
>                     // bugs/exceptions when the Builder is used more 
> than once (at  least for
>                     // xerces-2.6.2). I'm not sure how to actually 
> reproduce those  problems :-(
>                     // See 
> http://xml.apache.org/xerces2-j/faq-grammars.html
>                   
>     parser.setProperty("http://apache.org/xml/properties/internal/grammar- 
> pool",
>                         
> Class.forName(schemaGrammarPools[i]).newInstance());
>                 }
>                 setupFeaturesAndProps(parser);
>               } catch (....
>               ...
>          }
>
>
>
> Here are the profiler snippets:
>
> JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas):
> ------------------------------------------------------
>
>      Compiled + native   Method
>   8.5%   355  +     0    ExceptionBlob
>   3.4%   141  +     1     
> com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStar 
> tElement
>
>
>          Stub + native   Method
>  38.2%     0  +  1603    java.lang.Throwable.fillInStackTrace
>   1.4%     0  +    60    java.lang.StrictMath.floor
>  39.6%     0  +  1663    Total stub
>
>   Thread-local ticks:
>   0.1%     3             Blocked (of total)
>   0.1%     3             Class loader
>   0.1%     6             Compilation
>   0.1%     4             Unknown: thread_state
>
>
> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>
>   Thread-local ticks:
> 100.0%     1             Blocked (of total)
>
>
> Global summary of 85.75 seconds:
> 100.0%  4223             Received ticks
>   0.5%    21             Received GC ticks
>   6.7%   285             Compilation
>   0.1%     3             Class loader
>   0.1%     4             Unknown code
>
> real    1m26.020s
> user    1m23.680s
> sys     0m1.410s
>
>
>
> JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas):
> ------------------------------------------------------
>      Compiled + native   Method
>   6.1%   132  +     0     
> org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement
>   4.7%   102  +     0    java.nio.ByteBuffer.<init>
>   4.0%    87  +     0     
> org.apache.xerces.impl.dv.xs.XSSimpleTypeDecl.normalize
>   3.6%    78  +     0     
> org.apache.xerces.impl.dv.xs.AnyURIDV.getActualValue
>   3.4%    10  +    64    org.apache.xerces.impl.xs.XSModelImpl.<init>
>
>          Stub + native   Method
>   4.6%     0  +   101    java.lang.Throwable.fillInStackTrace
>   2.1%     0  +    45    java.lang.StrictMath.floor
>   6.7%     0  +   146    Total stub
>
>   Thread-local ticks:
>   0.3%     7             Blocked (of total)
>   0.0%     1             Class loader
>   0.2%     4             Compilation
>   0.1%     3             Unknown: thread_state
>
>
> Flat profile of 0.02 secs (1 total ticks): DestroyJavaVM
>
>   Thread-local ticks:
> 100.0%     1             Blocked (of total)
>
>
> Global summary of 45.67 seconds:
> 100.0%  2207             Received ticks
>   0.9%    19             Received GC ticks
>  12.4%   274             Compilation
>   0.0%     1             Class loader
>   0.1%     3             Unknown code
>
> real    0m45.928s
> user    0m43.250s
> sys     0m0.220s
>
> Any insights?
> Wolfgang.
>
> On Sep 2, 2004, at 11:14 PM, Neeraj Bajaj wrote:
>
>> Interestingly we identified the same problem few weeks back and  i  
>> have the fix available on my local machine but didn't put back 
>> because  it's not that clean.
>> Currently the fix includes making some changes in  
>> DocumentScannerImpl.java & XMLEntityScanner based on the depth of  
>> entities and current entity being null.
>> I was looking for some thing more clean which could tell the scanner  
>> about the end of Document.  I will look into this again and put back  
>> the change in next
>> few days.
>>
>>> I have a server app that parsers millions of smallish documents.
>>>
>>> Performance has been improved at lot by reusing XMLReaders. It's  
>>> pretty  good but could perhaps get better in light of the (perhaps  
>>> dubious?)  hints given by the profiler snippet below.
>>>
>>> Accordingly, the theory is that throwing an (artifical) 
>>> EOFException  in  XMLEntityScanner.load() at the end of each 
>>> document/entity  consumes  some 25% (JDK 1.5) and some 15% (JDK 
>>> 1.4.2) of the total  execution  time, the single hottest spot in the 
>>> program. Probably due  too the  heavy nature of exceptions and in 
>>> particular   Throwable.fillInStackTrace(). If this can indeed be 
>>> confirmed by   others, would it perhaps be possibly (and correct) to 
>>> restructure the   relevant xerces internals to avoid raising 
>>> artificial exceptions for   what appears to be normal program 
>>> control flow (the documents and   streams are fine)?
>>>
>>> Configuration: Sun JDK 1.5 RC and Sun JDK 1.4.2, xerces CVS head  
>>> [never  using the JDK internal xerces which appears to be twice as  
>>> slow in this  case, for whatever reason]
>>
>>
>> JDK 1.5 RC contains almost latest Xerces. Could you tell what are 
>> you  doing so that we can identify the problem and fix it ?
>>
>>
>> Thanks,
>> Neeraj
>>
>>>
>>> Here is the JDK 1.5 profiler snippet (java -server -Xprof):
>>> -----------------------------------------------------------
>>>          Stub + native   Method
>>>  28.6%     0  +   487    java.lang.Throwable.fillInStackTrace
>>>  28.6%     0  +   487    Total stub
>>>
>>>   Thread-local ticks:
>>>   0.1%     1             Blocked (of total)
>>>   0.1%     2             Class loader
>>>   0.1%     2             Compilation
>>>   0.2%     3             Unknown: thread_state
>>>
>>> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>>>
>>>   Thread-local ticks:
>>> 100.0%     1             Blocked (of total)
>>>
>>>
>>> Global summary of 35.44 seconds:
>>> 100.0%  1718             Received ticks
>>>   0.7%    12             Received GC ticks
>>>   9.7%   167             Compilation
>>>   0.1%     2             Class loader
>>>   0.2%     3             Unknown code
>>>
>>> real    0m35.715s
>>> user    0m34.170s
>>> sys     0m0.190s
>>>
>>>
>>> Here is the JDK 1.4 profiler snippet (java -server -Xprof):
>>> -----------------------------------------------------------
>>>         Stub + native   Method
>>>  12.7%     4  +   239    java.lang.Throwable.fillInStackTrace
>>>  12.7%     4  +   239    Total stub
>>>
>>>   Runtime stub + native  Method
>>>   0.2%     3  +     0    _rethrow_Java
>>>   0.2%     3  +     0    Total runtime stubs
>>>
>>>   Thread-local ticks:
>>>   3.1%    61             Blocked (of total)
>>>   0.4%     7             Interpreter
>>>   0.1%     2             Compilation
>>>   4.9%    93             Unknown: running frame
>>>
>>>
>>> Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
>>>
>>>   Thread-local ticks:
>>> 100.0%     1             Blocked (of total)
>>>
>>>
>>> Global summary of 43.25 seconds:
>>> 100.0%  2071             Received ticks
>>>   3.8%    79             Received GC ticks
>>>   6.2%   128             Compilation
>>>   0.5%    10             Other VM operations
>>>   0.3%     7             Interpreter
>>>   4.5%    93             Unknown code
>>>
>>> real    0m43.517s
>>> user    0m42.100s
>>> sys     0m0.530s
>>>
>>>
>>>
>>> Trace via java -server -agentlib:hprof=cpu=samples,depth=30:
>>> -----------------------------------------------------------
>>> TRACE 300347:
>>>         
>>> java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown   line)
>>>         java.lang.Throwable.<init>(Throwable.java:181)
>>>         java.lang.Exception.<init>(Exception.java:29)
>>>         java.io.IOException.<init>(IOException.java:28)
>>>         java.io.EOFException.<init>(EOFException.java:32)
>>>         org.apache.xerces.impl.XMLEntityScanner.load(<Unknown   
>>> Source>:Unknown line)
>>>         
>>> org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown   
>>> Source>:Unknown line)
>>>           
>>> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.d 
>>> is patch(<Unknown Source>:Unknown line)
>>>           
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(<Un 
>>> kn own Source>:Unknown line)
>>>         org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown   
>>> Source>:Unknown line)
>>>         org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown   
>>> Source>:Unknown line)
>>>         org.apache.xerces.parsers.XMLParser.parse(<Unknown   
>>> Source>:Unknown line)
>>>         org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown   
>>> Source>:Unknown line)
>>>         nu.xom.Builder.build(Builder.java:786)
>>>         nu.xom.Builder.build(Builder.java:569)
>>>          
>>> gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
>>>
>>>
>>> I guess the relevant block is:
>>> -----------------------------------------------------------
>>>
>>> XMLEntityScanner.load(...):
>>>             ...
>>>             if (changeEntity) {
>>>                 fEntityManager.endEntity();
>>>                 if (fCurrentEntity == null) {
>>>                     throw new EOFException();
>>>                 }
>>>                 // handle the trailing edges
>>>                 if (fCurrentEntity.position == fCurrentEntity.count) {
>>>                     load(0, true);
>>>                 }
>>>             }
>>>
>>>
>>> Comments?
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Mime
View raw message