xerces-j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject Re: [Performance] Artifical EOFException in XMLEntityScanner.load()
Date Fri, 03 Sep 2004 16:31:32 GMT
On Sep 3, 2004, at 1:34 AM, Neeraj Bajaj wrote:

>
>
> Wolfgang Hoschek wrote:
>
>> Neeraj, actually there seem to be two separate issues (sorry for  
>> mixing  them up):
>>
>> 1) The exception throwing hotspot (see prior mail for profiles):
>>     Observed when using an XMLReader without validation
>
> This is a problem when you are parsing small sized documents and is  
> reproducible with  1.5 RC and Xerces.  Right ?

Right.

> This is understandable as when validation is not being done there  
> isn't any overhead of reading schema, doing validation etc.
> So %time spent in throwing exception becomes high.

The %time spent in throwing exceptions is higher in 1.5 RC (25% vs.  
15%) according to the profiles I sent in the first mail. Perhaps  
raising those (artificial?) exceptions could be avoided in the first  
place via some code restructuring.

>
>>     2) Xerces HEAD is 2x faster than 1.5 RC xerces internal:
>>     Observed when using an XMLReader with validation against a W3C  
>> XML  Schema when two schemas instead of one schema are specified with  
>> the   
>> "http://apache.org/xml/properties/schema/external-schemaLocation"   
>> property. Two schemas vs. one schema is the only diff in the setup.   
>> There is no performance regression when using one schema only.
>
> Ok.
>
>> Again,  the profiles below seem to indicate Exception raising to make  
>> the  difference.
>> About the usage: In the setup the documents themselves do not specify  
>>  schema locations - the property does. The schemas are pretty   
>> straightforward - nothing fancy. We are reusing the XMLReader object   
>> for each parse via a ThreadLocal. We initialize the XMLReader with a   
>> grammar pool along the following lines:
>
> Ah.. so you are using GrammarPool.  Difference in performance means  
> that it is possible  that Grammars are not being cached. I will look  
> into this too. It might not have been tested well.
> Actually,  JAXP 1.3 (JSR 206) [1]  introduces a new validation  
> framework [2] (javax.xml.validation.SchemaFactory)  It allows you to  
> preparse Schema and get hold of schema object, against which  
> application can parse various instance documents.  (It is similar to  
> "Translets" where one can transform XML documents into different  
> format as per the style sheet.) Schemas are already parsed and
> available as Object, this reduces the overhead of reading schema,  
> constructing internal memory reprsentation (as with GrammarCaching).  
> It also makes sure that instance document would
> be validated against the Schema you have reference and not against any  
> un trusted Schema.  This part have been tested pretty well.

Great. Unfortunately at this time we are reluctant to make JAXP 1.3 a  
dependency :-(.

>
> If you have free cycles, Can you also try setting the schemaLocations  
> property using JAXP 1.2 [2]   
> "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
> "http://java.sun.com/xml/jaxp/properties/schemaSource" and see if the  
> result is same. This would help in debugging the problem.

Tried the JAXP 1.2 setting you suggested (still with  
XMLGrammarPoolImpl), and it improves considerably, but there is still a  
regression with factor 1.5x (40 vs. 55 secs) due too extra exceptions  
(5% vs. 20% Throwable.fillInStackTrace). As always, using java -server.  
Here are the profiles for that:

JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas):
------------------------------------------------------
      Compiled + native   Method
   4.1%   107  +     0    ExceptionBlob
   3.5%    92  +     0     
com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStar 
tElement

          Stub + native   Method
  19.2%     0  +   500    java.lang.Throwable.fillInStackTrace
   1.3%     0  +    34    java.lang.StrictMath.floor
   1.3%     0  +    33    java.lang.Object.clone
  21.7%     0  +   567    Total stub

   Thread-local ticks:
   0.2%     5             Blocked (of total)
   0.0%     1             Class loader
   0.2%     5             Compilation


Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM

   Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 54.15 seconds:
100.0%  2634             Received ticks
   0.6%    17             Received GC ticks
   9.9%   261             Compilation
   0.0%     1             Class loader

real    0m54.399s
user    0m52.460s
sys     0m0.150s



JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas):
------------------------------------------------------

      Compiled + native   Method
   6.4%   126  +     0     
org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement

          Stub + native   Method
   5.3%     0  +   103    java.lang.Throwable.fillInStackTrace
   3.4%     0  +    66    java.lang.Object.clone
   3.0%     0  +    59    java.lang.StrictMath.floor
  11.7%     0  +   228    Total stub

   Thread-local ticks:
   0.1%     1             Blocked (of total)
   0.1%     2             Class loader
   0.1%     2             Compilation
   0.3%     6             Unknown: thread_state


Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM

   Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 40.60 seconds:
100.0%  1963             Received ticks
   0.4%     7             Received GC ticks
  11.2%   220             Compilation
   0.1%     2             Class loader
   0.3%     6             Unknown code

real    0m40.909s
user    0m39.250s
sys     0m0.200s

Thanks again for looking into this.
Wolfgang.

>
>
> Thanks,
> Neeraj
>
> [1] http://www.jcp.org/en/jsr/detail?id=206
> [2]  
> http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/package- 
> summary.html
> [3] http://java.sun.com/xml/jaxp/change-requests-11.html
>
>>
>>     // These are stored in the order of preference.
>>     private static final String[] schemaParsers = {
>>         "org.apache.xerces.parsers.SAXParser",
>>         "com.sun.org.apache.xerces.internal.parsers.SAXParser" // JDK  
>> 1.5
>>     };
>>
>>     // These are stored in the order of preference.
>>     private static final String[] schemaGrammarPools = {
>>         "org.apache.xerces.util.XMLGrammarPoolImpl",
>>         "com.sun.org.apache.xerces.internal.util.XMLGrammarPoolImpl"  
>> // JDK  1.5
>>     };
>>
>>         XMLReader parser;
>>         for (int i = 0; i < schemaParsers.length; i++) {
>>             try {
>>                 parser =  
>> XMLReaderFactory.createXMLReader(schemaParsers[i]);
>>                    boolean wantReusableParser = true;
>>                 if (wantReusableParser) {
>>                     // This improves performance and, more  
>> importantly, prevents xerces
>>                     // bugs/exceptions when the Builder is used more  
>> than once (at  least for
>>                     // xerces-2.6.2). I'm not sure how to actually  
>> reproduce those  problems :-(
>>                     // See  
>> http://xml.apache.org/xerces2-j/faq-grammars.html
>>                        
>> parser.setProperty("http://apache.org/xml/properties/internal/ 
>> grammar- pool",
>>                          
>> Class.forName(schemaGrammarPools[i]).newInstance());
>>                 }
>>                 setupFeaturesAndProps(parser);
>>               } catch (....
>>               ...
>>          }
>>
>>
>>
>> Here are the profiler snippets:
>>
>> JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas):
>> ------------------------------------------------------
>>
>>      Compiled + native   Method
>>   8.5%   355  +     0    ExceptionBlob
>>   3.4%   141  +     1      
>> com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleSt 
>> ar tElement
>>
>>
>>          Stub + native   Method
>>  38.2%     0  +  1603    java.lang.Throwable.fillInStackTrace
>>   1.4%     0  +    60    java.lang.StrictMath.floor
>>  39.6%     0  +  1663    Total stub
>>
>>   Thread-local ticks:
>>   0.1%     3             Blocked (of total)
>>   0.1%     3             Class loader
>>   0.1%     6             Compilation
>>   0.1%     4             Unknown: thread_state
>>
>>
>> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>>
>>   Thread-local ticks:
>> 100.0%     1             Blocked (of total)
>>
>>
>> Global summary of 85.75 seconds:
>> 100.0%  4223             Received ticks
>>   0.5%    21             Received GC ticks
>>   6.7%   285             Compilation
>>   0.1%     3             Class loader
>>   0.1%     4             Unknown code
>>
>> real    1m26.020s
>> user    1m23.680s
>> sys     0m1.410s
>>
>>
>>
>> JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas):
>> ------------------------------------------------------
>>      Compiled + native   Method
>>   6.1%   132  +     0      
>> org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement
>>   4.7%   102  +     0    java.nio.ByteBuffer.<init>
>>   4.0%    87  +     0      
>> org.apache.xerces.impl.dv.xs.XSSimpleTypeDecl.normalize
>>   3.6%    78  +     0      
>> org.apache.xerces.impl.dv.xs.AnyURIDV.getActualValue
>>   3.4%    10  +    64    org.apache.xerces.impl.xs.XSModelImpl.<init>
>>
>>          Stub + native   Method
>>   4.6%     0  +   101    java.lang.Throwable.fillInStackTrace
>>   2.1%     0  +    45    java.lang.StrictMath.floor
>>   6.7%     0  +   146    Total stub
>>
>>   Thread-local ticks:
>>   0.3%     7             Blocked (of total)
>>   0.0%     1             Class loader
>>   0.2%     4             Compilation
>>   0.1%     3             Unknown: thread_state
>>
>>
>> Flat profile of 0.02 secs (1 total ticks): DestroyJavaVM
>>
>>   Thread-local ticks:
>> 100.0%     1             Blocked (of total)
>>
>>
>> Global summary of 45.67 seconds:
>> 100.0%  2207             Received ticks
>>   0.9%    19             Received GC ticks
>>  12.4%   274             Compilation
>>   0.0%     1             Class loader
>>   0.1%     3             Unknown code
>>
>> real    0m45.928s
>> user    0m43.250s
>> sys     0m0.220s
>>
>> Any insights?
>> Wolfgang.
>>
>> On Sep 2, 2004, at 11:14 PM, Neeraj Bajaj wrote:
>>
>>> Interestingly we identified the same problem few weeks back and  i   
>>> have the fix available on my local machine but didn't put back  
>>> because  it's not that clean.
>>> Currently the fix includes making some changes in   
>>> DocumentScannerImpl.java & XMLEntityScanner based on the depth of   
>>> entities and current entity being null.
>>> I was looking for some thing more clean which could tell the scanner  
>>>  about the end of Document.  I will look into this again and put  
>>> back  the change in next
>>> few days.
>>>
>>>> I have a server app that parsers millions of smallish documents.
>>>>
>>>> Performance has been improved at lot by reusing XMLReaders. It's   
>>>> pretty  good but could perhaps get better in light of the (perhaps   
>>>> dubious?)  hints given by the profiler snippet below.
>>>>
>>>> Accordingly, the theory is that throwing an (artifical)  
>>>> EOFException  in  XMLEntityScanner.load() at the end of each  
>>>> document/entity  consumes  some 25% (JDK 1.5) and some 15% (JDK  
>>>> 1.4.2) of the total  execution  time, the single hottest spot in  
>>>> the program. Probably due  too the  heavy nature of exceptions and  
>>>> in particular   Throwable.fillInStackTrace(). If this can indeed be  
>>>> confirmed by   others, would it perhaps be possibly (and correct)  
>>>> to restructure the   relevant xerces internals to avoid raising  
>>>> artificial exceptions for   what appears to be normal program  
>>>> control flow (the documents and   streams are fine)?
>>>>
>>>> Configuration: Sun JDK 1.5 RC and Sun JDK 1.4.2, xerces CVS head   
>>>> [never  using the JDK internal xerces which appears to be twice as   
>>>> slow in this  case, for whatever reason]
>>>
>>>
>>> JDK 1.5 RC contains almost latest Xerces. Could you tell what are  
>>> you  doing so that we can identify the problem and fix it ?
>>>
>>>
>>> Thanks,
>>> Neeraj
>>>
>>>>
>>>> Here is the JDK 1.5 profiler snippet (java -server -Xprof):
>>>> -----------------------------------------------------------
>>>>          Stub + native   Method
>>>>  28.6%     0  +   487    java.lang.Throwable.fillInStackTrace
>>>>  28.6%     0  +   487    Total stub
>>>>
>>>>   Thread-local ticks:
>>>>   0.1%     1             Blocked (of total)
>>>>   0.1%     2             Class loader
>>>>   0.1%     2             Compilation
>>>>   0.2%     3             Unknown: thread_state
>>>>
>>>> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>>>>
>>>>   Thread-local ticks:
>>>> 100.0%     1             Blocked (of total)
>>>>
>>>>
>>>> Global summary of 35.44 seconds:
>>>> 100.0%  1718             Received ticks
>>>>   0.7%    12             Received GC ticks
>>>>   9.7%   167             Compilation
>>>>   0.1%     2             Class loader
>>>>   0.2%     3             Unknown code
>>>>
>>>> real    0m35.715s
>>>> user    0m34.170s
>>>> sys     0m0.190s
>>>>
>>>>
>>>> Here is the JDK 1.4 profiler snippet (java -server -Xprof):
>>>> -----------------------------------------------------------
>>>>         Stub + native   Method
>>>>  12.7%     4  +   239    java.lang.Throwable.fillInStackTrace
>>>>  12.7%     4  +   239    Total stub
>>>>
>>>>   Runtime stub + native  Method
>>>>   0.2%     3  +     0    _rethrow_Java
>>>>   0.2%     3  +     0    Total runtime stubs
>>>>
>>>>   Thread-local ticks:
>>>>   3.1%    61             Blocked (of total)
>>>>   0.4%     7             Interpreter
>>>>   0.1%     2             Compilation
>>>>   4.9%    93             Unknown: running frame
>>>>
>>>>
>>>> Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
>>>>
>>>>   Thread-local ticks:
>>>> 100.0%     1             Blocked (of total)
>>>>
>>>>
>>>> Global summary of 43.25 seconds:
>>>> 100.0%  2071             Received ticks
>>>>   3.8%    79             Received GC ticks
>>>>   6.2%   128             Compilation
>>>>   0.5%    10             Other VM operations
>>>>   0.3%     7             Interpreter
>>>>   4.5%    93             Unknown code
>>>>
>>>> real    0m43.517s
>>>> user    0m42.100s
>>>> sys     0m0.530s
>>>>
>>>>
>>>>
>>>> Trace via java -server -agentlib:hprof=cpu=samples,depth=30:
>>>> -----------------------------------------------------------
>>>> TRACE 300347:
>>>>         java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown  
>>>>   line)
>>>>         java.lang.Throwable.<init>(Throwable.java:181)
>>>>         java.lang.Exception.<init>(Exception.java:29)
>>>>         java.io.IOException.<init>(IOException.java:28)
>>>>         java.io.EOFException.<init>(EOFException.java:32)
>>>>         org.apache.xerces.impl.XMLEntityScanner.load(<Unknown    
>>>> Source>:Unknown line)
>>>>         org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown  
>>>>   Source>:Unknown line)
>>>>            
>>>> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher 
>>>> .d is patch(<Unknown Source>:Unknown line)
>>>>            
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(< 
>>>> Un kn own Source>:Unknown line)
>>>>         org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown    
>>>> Source>:Unknown line)
>>>>         org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown    
>>>> Source>:Unknown line)
>>>>         org.apache.xerces.parsers.XMLParser.parse(<Unknown    
>>>> Source>:Unknown line)
>>>>         org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown   
>>>>  Source>:Unknown line)
>>>>         nu.xom.Builder.build(Builder.java:786)
>>>>         nu.xom.Builder.build(Builder.java:569)
>>>>           
>>>> gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
>>>>
>>>>
>>>> I guess the relevant block is:
>>>> -----------------------------------------------------------
>>>>
>>>> XMLEntityScanner.load(...):
>>>>             ...
>>>>             if (changeEntity) {
>>>>                 fEntityManager.endEntity();
>>>>                 if (fCurrentEntity == null) {
>>>>                     throw new EOFException();
>>>>                 }
>>>>                 // handle the trailing edges
>>>>                 if (fCurrentEntity.position ==  
>>>> fCurrentEntity.count) {
>>>>                     load(0, true);
>>>>                 }
>>>>             }
>>>>
>>>>
>>>> Comments?
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>>>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Mime
View raw message