Return-Path: Delivered-To: apmail-xml-xerces-j-dev-archive@www.apache.org Received: (qmail 36950 invoked from network); 3 Sep 2004 07:24:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 3 Sep 2004 07:24:19 -0000 Received: (qmail 81963 invoked by uid 500); 3 Sep 2004 07:24:15 -0000 Delivered-To: apmail-xml-xerces-j-dev-archive@xml.apache.org Received: (qmail 81929 invoked by uid 500); 3 Sep 2004 07:24:14 -0000 Mailing-List: contact xerces-j-dev-help@xml.apache.org; run by ezmlm Precedence: bulk Reply-To: xerces-j-dev@xml.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list xerces-j-dev@xml.apache.org Received: (qmail 81913 invoked by uid 99); 3 Sep 2004 07:24:14 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [128.3.41.24] (HELO mta1.lbl.gov) (128.3.41.24) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 03 Sep 2004 00:24:13 -0700 Received: from mta1.lbl.gov (localhost [127.0.0.1]) by mta1.lbl.gov (8.12.10/8.12.10) with ESMTP id i837O7j5001510 for ; Fri, 3 Sep 2004 00:24:10 -0700 (PDT) Received: from [10.0.1.4] (adsl-67-125-77-9.dsl.snfc21.pacbell.net [67.125.77.9]) by mta1.lbl.gov (8.12.10/8.12.10) with ESMTP id i837O0KE001502; Fri, 3 Sep 2004 00:24:00 -0700 (PDT) In-Reply-To: <41380BC7.1050206@sun.com> References: <98622227-FD6C-11D8-AA38-000A95BD16CE@lbl.gov> <41380BC7.1050206@sun.com> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <398E0F6C-FD7A-11D8-AA38-000A95BD16CE@lbl.gov> Content-Transfer-Encoding: 7bit Cc: Neeraj.Bajaj@Sun.COM From: Wolfgang Hoschek Subject: Re: [Performance] Artifical EOFException in XMLEntityScanner.load() Date: Fri, 3 Sep 2004 00:24:00 -0700 To: xerces-j-dev@xml.apache.org X-Mailer: Apple Mail (2.619) X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Neeraj, actually there seem to be two separate issues (sorry for mixing them up): 1) The exception throwing hotspot (see prior mail for profiles): Observed when using an XMLReader without validation 2) Xerces HEAD is 2x faster than 1.5 RC xerces internal: Observed when using an XMLReader with validation against a W3C XML Schema when two schemas instead of one schema are specified with the "http://apache.org/xml/properties/schema/external-schemaLocation" property. Two schemas vs. one schema is the only diff in the setup. There is no performance regression when using one schema only. Again, the profiles below seem to indicate Exception raising to make the difference. About the usage: In the setup the documents themselves do not specify schema locations - the property does. The schemas are pretty straightforward - nothing fancy. We are reusing the XMLReader object for each parse via a ThreadLocal. We initialize the XMLReader with a grammar pool along the following lines: // These are stored in the order of preference. private static final String[] schemaParsers = { "org.apache.xerces.parsers.SAXParser", "com.sun.org.apache.xerces.internal.parsers.SAXParser" // JDK 1.5 }; // These are stored in the order of preference. private static final String[] schemaGrammarPools = { "org.apache.xerces.util.XMLGrammarPoolImpl", "com.sun.org.apache.xerces.internal.util.XMLGrammarPoolImpl" // JDK 1.5 }; XMLReader parser; for (int i = 0; i < schemaParsers.length; i++) { try { parser = XMLReaderFactory.createXMLReader(schemaParsers[i]); boolean wantReusableParser = true; if (wantReusableParser) { // This improves performance and, more importantly, prevents xerces // bugs/exceptions when the Builder is used more than once (at least for // xerces-2.6.2). I'm not sure how to actually reproduce those problems :-( // See http://xml.apache.org/xerces2-j/faq-grammars.html parser.setProperty("http://apache.org/xml/properties/internal/grammar- pool", Class.forName(schemaGrammarPools[i]).newInstance()); } setupFeaturesAndProps(parser); } catch (.... ... } Here are the profiler snippets: JDK 1.5 RC with internally packaged xerces (two W3C XML Schemas): ------------------------------------------------------ Compiled + native Method 8.5% 355 + 0 ExceptionBlob 3.4% 141 + 1 com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStar tElement Stub + native Method 38.2% 0 + 1603 java.lang.Throwable.fillInStackTrace 1.4% 0 + 60 java.lang.StrictMath.floor 39.6% 0 + 1663 Total stub Thread-local ticks: 0.1% 3 Blocked (of total) 0.1% 3 Class loader 0.1% 6 Compilation 0.1% 4 Unknown: thread_state Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 85.75 seconds: 100.0% 4223 Received ticks 0.5% 21 Received GC ticks 6.7% 285 Compilation 0.1% 3 Class loader 0.1% 4 Unknown code real 1m26.020s user 1m23.680s sys 0m1.410s JDK 1.5 RC with xerces CVS HEAD (two W3C XML Schemas): ------------------------------------------------------ Compiled + native Method 6.1% 132 + 0 org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement 4.7% 102 + 0 java.nio.ByteBuffer. 4.0% 87 + 0 org.apache.xerces.impl.dv.xs.XSSimpleTypeDecl.normalize 3.6% 78 + 0 org.apache.xerces.impl.dv.xs.AnyURIDV.getActualValue 3.4% 10 + 64 org.apache.xerces.impl.xs.XSModelImpl. Stub + native Method 4.6% 0 + 101 java.lang.Throwable.fillInStackTrace 2.1% 0 + 45 java.lang.StrictMath.floor 6.7% 0 + 146 Total stub Thread-local ticks: 0.3% 7 Blocked (of total) 0.0% 1 Class loader 0.2% 4 Compilation 0.1% 3 Unknown: thread_state Flat profile of 0.02 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 45.67 seconds: 100.0% 2207 Received ticks 0.9% 19 Received GC ticks 12.4% 274 Compilation 0.0% 1 Class loader 0.1% 3 Unknown code real 0m45.928s user 0m43.250s sys 0m0.220s Any insights? Wolfgang. On Sep 2, 2004, at 11:14 PM, Neeraj Bajaj wrote: > Interestingly we identified the same problem few weeks back and i > have the fix available on my local machine but didn't put back because > it's not that clean. > Currently the fix includes making some changes in > DocumentScannerImpl.java & XMLEntityScanner based on the depth of > entities and current entity being null. > I was looking for some thing more clean which could tell the scanner > about the end of Document. I will look into this again and put back > the change in next > few days. > >> I have a server app that parsers millions of smallish documents. >> >> Performance has been improved at lot by reusing XMLReaders. It's >> pretty good but could perhaps get better in light of the (perhaps >> dubious?) hints given by the profiler snippet below. >> >> Accordingly, the theory is that throwing an (artifical) EOFException >> in XMLEntityScanner.load() at the end of each document/entity >> consumes some 25% (JDK 1.5) and some 15% (JDK 1.4.2) of the total >> execution time, the single hottest spot in the program. Probably due >> too the heavy nature of exceptions and in particular >> Throwable.fillInStackTrace(). If this can indeed be confirmed by >> others, would it perhaps be possibly (and correct) to restructure the >> relevant xerces internals to avoid raising artificial exceptions for >> what appears to be normal program control flow (the documents and >> streams are fine)? >> >> Configuration: Sun JDK 1.5 RC and Sun JDK 1.4.2, xerces CVS head >> [never using the JDK internal xerces which appears to be twice as >> slow in this case, for whatever reason] > > JDK 1.5 RC contains almost latest Xerces. Could you tell what are you > doing so that we can identify the problem and fix it ? > > > Thanks, > Neeraj > >> >> Here is the JDK 1.5 profiler snippet (java -server -Xprof): >> ----------------------------------------------------------- >> Stub + native Method >> 28.6% 0 + 487 java.lang.Throwable.fillInStackTrace >> 28.6% 0 + 487 Total stub >> >> Thread-local ticks: >> 0.1% 1 Blocked (of total) >> 0.1% 2 Class loader >> 0.1% 2 Compilation >> 0.2% 3 Unknown: thread_state >> >> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM >> >> Thread-local ticks: >> 100.0% 1 Blocked (of total) >> >> >> Global summary of 35.44 seconds: >> 100.0% 1718 Received ticks >> 0.7% 12 Received GC ticks >> 9.7% 167 Compilation >> 0.1% 2 Class loader >> 0.2% 3 Unknown code >> >> real 0m35.715s >> user 0m34.170s >> sys 0m0.190s >> >> >> Here is the JDK 1.4 profiler snippet (java -server -Xprof): >> ----------------------------------------------------------- >> Stub + native Method >> 12.7% 4 + 239 java.lang.Throwable.fillInStackTrace >> 12.7% 4 + 239 Total stub >> >> Runtime stub + native Method >> 0.2% 3 + 0 _rethrow_Java >> 0.2% 3 + 0 Total runtime stubs >> >> Thread-local ticks: >> 3.1% 61 Blocked (of total) >> 0.4% 7 Interpreter >> 0.1% 2 Compilation >> 4.9% 93 Unknown: running frame >> >> >> Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM >> >> Thread-local ticks: >> 100.0% 1 Blocked (of total) >> >> >> Global summary of 43.25 seconds: >> 100.0% 2071 Received ticks >> 3.8% 79 Received GC ticks >> 6.2% 128 Compilation >> 0.5% 10 Other VM operations >> 0.3% 7 Interpreter >> 4.5% 93 Unknown code >> >> real 0m43.517s >> user 0m42.100s >> sys 0m0.530s >> >> >> >> Trace via java -server -agentlib:hprof=cpu=samples,depth=30: >> ----------------------------------------------------------- >> TRACE 300347: >> java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown >> line) >> java.lang.Throwable.(Throwable.java:181) >> java.lang.Exception.(Exception.java:29) >> java.io.IOException.(IOException.java:28) >> java.io.EOFException.(EOFException.java:32) >> org.apache.xerces.impl.XMLEntityScanner.load(> Source>:Unknown line) >> org.apache.xerces.impl.XMLEntityScanner.skipSpaces(> Source>:Unknown line) >> >> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.d >> is patch(:Unknown line) >> >> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(> kn own Source>:Unknown line) >> org.apache.xerces.parsers.DTDConfiguration.parse(> Source>:Unknown line) >> org.apache.xerces.parsers.DTDConfiguration.parse(> Source>:Unknown line) >> org.apache.xerces.parsers.XMLParser.parse(> Source>:Unknown line) >> org.apache.xerces.parsers.AbstractSAXParser.parse(> Source>:Unknown line) >> nu.xom.Builder.build(Builder.java:786) >> nu.xom.Builder.build(Builder.java:569) >> >> gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62) >> >> >> I guess the relevant block is: >> ----------------------------------------------------------- >> >> XMLEntityScanner.load(...): >> ... >> if (changeEntity) { >> fEntityManager.endEntity(); >> if (fCurrentEntity == null) { >> throw new EOFException(); >> } >> // handle the trailing edges >> if (fCurrentEntity.position == fCurrentEntity.count) { >> load(0, true); >> } >> } >> >> >> Comments? >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org >> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For additional commands, e-mail: xerces-j-dev-help@xml.apache.org