Return-Path: Delivered-To: apmail-xml-xerces-j-dev-archive@www.apache.org Received: (qmail 80262 invoked from network); 3 Sep 2004 16:31:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 3 Sep 2004 16:31:49 -0000 Received: (qmail 66579 invoked by uid 500); 3 Sep 2004 16:31:47 -0000 Delivered-To: apmail-xml-xerces-j-dev-archive@xml.apache.org Received: (qmail 66552 invoked by uid 500); 3 Sep 2004 16:31:46 -0000 Mailing-List: contact xerces-j-dev-help@xml.apache.org; run by ezmlm Precedence: bulk Reply-To: xerces-j-dev@xml.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list xerces-j-dev@xml.apache.org Received: (qmail 66539 invoked by uid 99); 3 Sep 2004 16:31:46 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [128.3.41.24] (HELO mta1.lbl.gov) (128.3.41.24) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 03 Sep 2004 09:31:45 -0700 Received: from mta1.lbl.gov (localhost [127.0.0.1]) by mta1.lbl.gov (8.12.10/8.12.10) with ESMTP id i83GVdj5027240 for ; Fri, 3 Sep 2004 09:31:42 -0700 (PDT) Received: from [10.0.1.4] (adsl-67-125-77-9.dsl.snfc21.pacbell.net [67.125.77.9]) by mta1.lbl.gov (8.12.10/8.12.10) with ESMTP id i83GVWKE027220; Fri, 3 Sep 2004 09:31:32 -0700 (PDT) In-Reply-To: <41382C93.9070401@sun.com> References: <98622227-FD6C-11D8-AA38-000A95BD16CE@lbl.gov> <41380BC7.1050206@sun.com> <398E0F6C-FD7A-11D8-AA38-000A95BD16CE@lbl.gov> <41382C93.9070401@sun.com> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Cc: Neeraj.Bajaj@Sun.COM From: Wolfgang Hoschek Subject: Re: [Performance] Artifical EOFException in XMLEntityScanner.load() Date: Fri, 3 Sep 2004 09:31:32 -0700 To: xerces-j-dev@xml.apache.org X-Mailer: Apple Mail (2.619) X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Sep 3, 2004, at 1:34 AM, Neeraj Bajaj wrote: > > > Wolfgang Hoschek wrote: > >> Neeraj, actually there seem to be two separate issues (sorry for >> mixing them up): >> >> 1) The exception throwing hotspot (see prior mail for profiles): >> Observed when using an XMLReader without validation > > This is a problem when you are parsing small sized documents and is > reproducible with 1.5 RC and Xerces. Right ? Right. > This is understandable as when validation is not being done there > isn't any overhead of reading schema, doing validation etc. > So %time spent in throwing exception becomes high. The %time spent in throwing exceptions is higher in 1.5 RC (25% vs. 15%) according to the profiles I sent in the first mail. Perhaps raising those (artificial?) exceptions could be avoided in the first place via some code restructuring. > >> 2) Xerces HEAD is 2x faster than 1.5 RC xerces internal: >> Observed when using an XMLReader with validation against a W3C >> XML Schema when two schemas instead of one schema are specified with >> the >> "http://apache.org/xml/properties/schema/external-schemaLocation" >> property. Two schemas vs. one schema is the only diff in the setup. >> There is no performance regression when using one schema only. > > Ok. > >> Again, the profiles below seem to indicate Exception raising to make >> the difference. >> About the usage: In the setup the documents themselves do not specify >> schema locations - the property does. The schemas are pretty >> straightforward - nothing fancy. We are reusing the XMLReader object >> for each parse via a ThreadLocal. We initialize the XMLReader with a >> grammar pool along the following lines: > > Ah.. so you are using GrammarPool. Difference in performance means > that it is possible that Grammars are not being cached. I will look > into this too. It might not have been tested well. > Actually, JAXP 1.3 (JSR 206) [1] introduces a new validation > framework [2] (javax.xml.validation.SchemaFactory) It allows you to > preparse Schema and get hold of schema object, against which > application can parse various instance documents. (It is similar to > "Translets" where one can transform XML documents into different > format as per the style sheet.) Schemas are already parsed and > available as Object, this reduces the overhead of reading schema, > constructing internal memory reprsentation (as with GrammarCaching). > It also makes sure that instance document would > be validated against the Schema you have reference and not against any > un trusted Schema. This part have been tested pretty well. Great. Unfortunately at this time we are reluctant to make JAXP 1.3 a dependency :-(. > > If you have free cycles, Can you also try setting the schemaLocations > property using JAXP 1.2 [2] > "http://java.sun.com/xml/jaxp/properties/schemaLanguage", > "http://java.sun.com/xml/jaxp/properties/schemaSource" and see if the > result is same. This would help in debugging the problem. Tried the JAXP 1.2 setting you suggested (still with XMLGrammarPoolImpl), and it improves considerably, but there is still a regression with factor 1.5x (40 vs. 55 secs) due too extra exceptions (5% vs. 20% Throwable.fillInStackTrace). As always, using java -server. Here are the profiles for that: JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas): ------------------------------------------------------ Compiled + native Method 4.1% 107 + 0 ExceptionBlob 3.5% 92 + 0 com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStar tElement Stub + native Method 19.2% 0 + 500 java.lang.Throwable.fillInStackTrace 1.3% 0 + 34 java.lang.StrictMath.floor 1.3% 0 + 33 java.lang.Object.clone 21.7% 0 + 567 Total stub Thread-local ticks: 0.2% 5 Blocked (of total) 0.0% 1 Class loader 0.2% 5 Compilation Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 54.15 seconds: 100.0% 2634 Received ticks 0.6% 17 Received GC ticks 9.9% 261 Compilation 0.0% 1 Class loader real 0m54.399s user 0m52.460s sys 0m0.150s JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas): ------------------------------------------------------ Compiled + native Method 6.4% 126 + 0 org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement Stub + native Method 5.3% 0 + 103 java.lang.Throwable.fillInStackTrace 3.4% 0 + 66 java.lang.Object.clone 3.0% 0 + 59 java.lang.StrictMath.floor 11.7% 0 + 228 Total stub Thread-local ticks: 0.1% 1 Blocked (of total) 0.1% 2 Class loader 0.1% 2 Compilation 0.3% 6 Unknown: thread_state Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 40.60 seconds: 100.0% 1963 Received ticks 0.4% 7 Received GC ticks 11.2% 220 Compilation 0.1% 2 Class loader 0.3% 6 Unknown code real 0m40.909s user 0m39.250s sys 0m0.200s Thanks again for looking into this. Wolfgang. > > > Thanks, > Neeraj > > [1] http://www.jcp.org/en/jsr/detail?id=206 > [2] > http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/package- > summary.html > [3] http://java.sun.com/xml/jaxp/change-requests-11.html > >> >> // These are stored in the order of preference. >> private static final String[] schemaParsers = { >> "org.apache.xerces.parsers.SAXParser", >> "com.sun.org.apache.xerces.internal.parsers.SAXParser" // JDK >> 1.5 >> }; >> >> // These are stored in the order of preference. >> private static final String[] schemaGrammarPools = { >> "org.apache.xerces.util.XMLGrammarPoolImpl", >> "com.sun.org.apache.xerces.internal.util.XMLGrammarPoolImpl" >> // JDK 1.5 >> }; >> >> XMLReader parser; >> for (int i = 0; i < schemaParsers.length; i++) { >> try { >> parser = >> XMLReaderFactory.createXMLReader(schemaParsers[i]); >> boolean wantReusableParser = true; >> if (wantReusableParser) { >> // This improves performance and, more >> importantly, prevents xerces >> // bugs/exceptions when the Builder is used more >> than once (at least for >> // xerces-2.6.2). I'm not sure how to actually >> reproduce those problems :-( >> // See >> http://xml.apache.org/xerces2-j/faq-grammars.html >> >> parser.setProperty("http://apache.org/xml/properties/internal/ >> grammar- pool", >> >> Class.forName(schemaGrammarPools[i]).newInstance()); >> } >> setupFeaturesAndProps(parser); >> } catch (.... >> ... >> } >> >> >> >> Here are the profiler snippets: >> >> JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas): >> ------------------------------------------------------ >> >> Compiled + native Method >> 8.5% 355 + 0 ExceptionBlob >> 3.4% 141 + 1 >> com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleSt >> ar tElement >> >> >> Stub + native Method >> 38.2% 0 + 1603 java.lang.Throwable.fillInStackTrace >> 1.4% 0 + 60 java.lang.StrictMath.floor >> 39.6% 0 + 1663 Total stub >> >> Thread-local ticks: >> 0.1% 3 Blocked (of total) >> 0.1% 3 Class loader >> 0.1% 6 Compilation >> 0.1% 4 Unknown: thread_state >> >> >> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM >> >> Thread-local ticks: >> 100.0% 1 Blocked (of total) >> >> >> Global summary of 85.75 seconds: >> 100.0% 4223 Received ticks >> 0.5% 21 Received GC ticks >> 6.7% 285 Compilation >> 0.1% 3 Class loader >> 0.1% 4 Unknown code >> >> real 1m26.020s >> user 1m23.680s >> sys 0m1.410s >> >> >> >> JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas): >> ------------------------------------------------------ >> Compiled + native Method >> 6.1% 132 + 0 >> org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement >> 4.7% 102 + 0 java.nio.ByteBuffer. >> 4.0% 87 + 0 >> org.apache.xerces.impl.dv.xs.XSSimpleTypeDecl.normalize >> 3.6% 78 + 0 >> org.apache.xerces.impl.dv.xs.AnyURIDV.getActualValue >> 3.4% 10 + 64 org.apache.xerces.impl.xs.XSModelImpl. >> >> Stub + native Method >> 4.6% 0 + 101 java.lang.Throwable.fillInStackTrace >> 2.1% 0 + 45 java.lang.StrictMath.floor >> 6.7% 0 + 146 Total stub >> >> Thread-local ticks: >> 0.3% 7 Blocked (of total) >> 0.0% 1 Class loader >> 0.2% 4 Compilation >> 0.1% 3 Unknown: thread_state >> >> >> Flat profile of 0.02 secs (1 total ticks): DestroyJavaVM >> >> Thread-local ticks: >> 100.0% 1 Blocked (of total) >> >> >> Global summary of 45.67 seconds: >> 100.0% 2207 Received ticks >> 0.9% 19 Received GC ticks >> 12.4% 274 Compilation >> 0.0% 1 Class loader >> 0.1% 3 Unknown code >> >> real 0m45.928s >> user 0m43.250s >> sys 0m0.220s >> >> Any insights? >> Wolfgang. >> >> On Sep 2, 2004, at 11:14 PM, Neeraj Bajaj wrote: >> >>> Interestingly we identified the same problem few weeks back and i >>> have the fix available on my local machine but didn't put back >>> because it's not that clean. >>> Currently the fix includes making some changes in >>> DocumentScannerImpl.java & XMLEntityScanner based on the depth of >>> entities and current entity being null. >>> I was looking for some thing more clean which could tell the scanner >>> about the end of Document. I will look into this again and put >>> back the change in next >>> few days. >>> >>>> I have a server app that parsers millions of smallish documents. >>>> >>>> Performance has been improved at lot by reusing XMLReaders. It's >>>> pretty good but could perhaps get better in light of the (perhaps >>>> dubious?) hints given by the profiler snippet below. >>>> >>>> Accordingly, the theory is that throwing an (artifical) >>>> EOFException in XMLEntityScanner.load() at the end of each >>>> document/entity consumes some 25% (JDK 1.5) and some 15% (JDK >>>> 1.4.2) of the total execution time, the single hottest spot in >>>> the program. Probably due too the heavy nature of exceptions and >>>> in particular Throwable.fillInStackTrace(). If this can indeed be >>>> confirmed by others, would it perhaps be possibly (and correct) >>>> to restructure the relevant xerces internals to avoid raising >>>> artificial exceptions for what appears to be normal program >>>> control flow (the documents and streams are fine)? >>>> >>>> Configuration: Sun JDK 1.5 RC and Sun JDK 1.4.2, xerces CVS head >>>> [never using the JDK internal xerces which appears to be twice as >>>> slow in this case, for whatever reason] >>> >>> >>> JDK 1.5 RC contains almost latest Xerces. Could you tell what are >>> you doing so that we can identify the problem and fix it ? >>> >>> >>> Thanks, >>> Neeraj >>> >>>> >>>> Here is the JDK 1.5 profiler snippet (java -server -Xprof): >>>> ----------------------------------------------------------- >>>> Stub + native Method >>>> 28.6% 0 + 487 java.lang.Throwable.fillInStackTrace >>>> 28.6% 0 + 487 Total stub >>>> >>>> Thread-local ticks: >>>> 0.1% 1 Blocked (of total) >>>> 0.1% 2 Class loader >>>> 0.1% 2 Compilation >>>> 0.2% 3 Unknown: thread_state >>>> >>>> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM >>>> >>>> Thread-local ticks: >>>> 100.0% 1 Blocked (of total) >>>> >>>> >>>> Global summary of 35.44 seconds: >>>> 100.0% 1718 Received ticks >>>> 0.7% 12 Received GC ticks >>>> 9.7% 167 Compilation >>>> 0.1% 2 Class loader >>>> 0.2% 3 Unknown code >>>> >>>> real 0m35.715s >>>> user 0m34.170s >>>> sys 0m0.190s >>>> >>>> >>>> Here is the JDK 1.4 profiler snippet (java -server -Xprof): >>>> ----------------------------------------------------------- >>>> Stub + native Method >>>> 12.7% 4 + 239 java.lang.Throwable.fillInStackTrace >>>> 12.7% 4 + 239 Total stub >>>> >>>> Runtime stub + native Method >>>> 0.2% 3 + 0 _rethrow_Java >>>> 0.2% 3 + 0 Total runtime stubs >>>> >>>> Thread-local ticks: >>>> 3.1% 61 Blocked (of total) >>>> 0.4% 7 Interpreter >>>> 0.1% 2 Compilation >>>> 4.9% 93 Unknown: running frame >>>> >>>> >>>> Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM >>>> >>>> Thread-local ticks: >>>> 100.0% 1 Blocked (of total) >>>> >>>> >>>> Global summary of 43.25 seconds: >>>> 100.0% 2071 Received ticks >>>> 3.8% 79 Received GC ticks >>>> 6.2% 128 Compilation >>>> 0.5% 10 Other VM operations >>>> 0.3% 7 Interpreter >>>> 4.5% 93 Unknown code >>>> >>>> real 0m43.517s >>>> user 0m42.100s >>>> sys 0m0.530s >>>> >>>> >>>> >>>> Trace via java -server -agentlib:hprof=cpu=samples,depth=30: >>>> ----------------------------------------------------------- >>>> TRACE 300347: >>>> java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown >>>> line) >>>> java.lang.Throwable.(Throwable.java:181) >>>> java.lang.Exception.(Exception.java:29) >>>> java.io.IOException.(IOException.java:28) >>>> java.io.EOFException.(EOFException.java:32) >>>> org.apache.xerces.impl.XMLEntityScanner.load(>>> Source>:Unknown line) >>>> org.apache.xerces.impl.XMLEntityScanner.skipSpaces(>>> Source>:Unknown line) >>>> >>>> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher >>>> .d is patch(:Unknown line) >>>> >>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(< >>>> Un kn own Source>:Unknown line) >>>> org.apache.xerces.parsers.DTDConfiguration.parse(>>> Source>:Unknown line) >>>> org.apache.xerces.parsers.DTDConfiguration.parse(>>> Source>:Unknown line) >>>> org.apache.xerces.parsers.XMLParser.parse(>>> Source>:Unknown line) >>>> org.apache.xerces.parsers.AbstractSAXParser.parse(>>> Source>:Unknown line) >>>> nu.xom.Builder.build(Builder.java:786) >>>> nu.xom.Builder.build(Builder.java:569) >>>> >>>> gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62) >>>> >>>> >>>> I guess the relevant block is: >>>> ----------------------------------------------------------- >>>> >>>> XMLEntityScanner.load(...): >>>> ... >>>> if (changeEntity) { >>>> fEntityManager.endEntity(); >>>> if (fCurrentEntity == null) { >>>> throw new EOFException(); >>>> } >>>> // handle the trailing edges >>>> if (fCurrentEntity.position == >>>> fCurrentEntity.count) { >>>> load(0, true); >>>> } >>>> } >>>> >>>> >>>> Comments? >>>> >>>> >>>> -------------------------------------------------------------------- >>>> - >>>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org >>>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org >>>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org >>> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org >> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For additional commands, e-mail: xerces-j-dev-help@xml.apache.org