Return-Path: Delivered-To: apmail-xml-axis-dev-archive@xml.apache.org Received: (qmail 20263 invoked by uid 500); 27 Dec 2002 00:06:38 -0000 Mailing-List: contact axis-dev-help@xml.apache.org; run by ezmlm Precedence: bulk Reply-To: axis-dev@xml.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list axis-dev@xml.apache.org Received: (qmail 20254 invoked from network); 27 Dec 2002 00:06:38 -0000 From: Eric.D.Friedman@WellsFargo.COM Message-ID: <8F6C90BF40FFD211948B0001FA7E51661186CED8@xcem-casfo-13.wellsfargo.com> To: axis-dev@xml.apache.org Subject: RE: performance issues in WSDL2Java Date: Thu, 26 Dec 2002 17:06:43 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Drat, hit send too soon. Here's the second set of metrics, with a cache in place on getComplexElementExtensionBase CPU SAMPLES BEGIN (total = 2444) Thu Dec 26 16:07:51 2002 rank self accum count trace method 1 41.33% 41.33% 1010 443 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 2 23.73% 65.06% 580 445 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 3 4.54% 69.60% 111 446 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 4 2.82% 72.42% 69 455 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 5 0.70% 73.12% 17 369 java.lang.String.substring 6 0.65% 73.77% 16 325 org.apache.xerces.xni.XMLString.toString 7 0.49% 74.26% 12 574 java.io.FileOutputStream.close0 8 0.49% 74.75% 12 419 java.lang.String.substring 9 0.41% 75.16% 10 371 java.lang.StringBuffer. 10 0.41% 75.57% 10 468 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 11 0.41% 75.98% 10 475 org.apache.axis.wsdl.symbolTable.Utils.getQNameFromPrefixedName 12 0.41% 76.39% 10 501 org.apache.axis.wsdl.symbolTable.Utils.getQNameFromPrefixedName 13 0.41% 76.80% 10 473 java.util.HashMap.addEntry -----Original Message----- From: Eric.D.Friedman@WellsFargo.COM [mailto:Eric.D.Friedman@WellsFargo.COM] Sent: Thursday, December 26, 2002 4:06 PM To: axis-dev@xml.apache.org Subject: performance issues in WSDL2Java I'm running WSDL2Java on some very large schemas (> 900 elements and types) and have run into several performance bottlenecks. These are bad enough to cause the generator to grind away for hours without producing anything. After spending some time with a profiler (Hprof), I've identified the following problems: * In ...axis.wsdl.symbolTable.SymbolTable, there's a poorly chosen datastructure that yields O(n^2) performance (possibly worse) -- the "types" Vector is subjected to multiple linear searches, in some cases from within nested loops. Suggestion: replace the types Vector with two Maps, one for QName -> Element and one for QName -> Type. * In javax.xml.namespace.QName, the localName and namespaceURI are cached using String.intern(). This gets hit *a lot* from within org.apache.axis.wsdl.symbolTable.Utils.getNodeQName(). Removing the interned Strings and changing the QName.equals() implementation to use String.equals() instead of reference comparison yields a significant speedup. * org.apache.axis.wsdl.symbolTable.SchemaUtils.getComplexElementExtensionBase( ) gets called many, many times from recursive invocations of org.apache.axis.wsdl.symbolTable.Utils.getDerivedTypes(). The invocation count approaches O(n^2), where n is the size of the types collection. Suggestion: most of the invocations of getComplexElementExtensionBase(Node, SymbolTable) are redundant -- the extension base of the complex type defined within Node does not change across the recursive calls, but the search for that extension base is quite expensive. A cache of previously searched Nodes is very helpful here. However, this method is static and so this resolution introduces a new problem -- how to scope the cache so that multiple instances of SymbolTable can coexist in the same VM? Two possibilities: * make the cache a parameter to the method * make the method an instance method rather than a class method. In case anyone is interested, here's the CPU profiler "samples" when the change to types/qname are made but no cache is added. Most of these calls to getNodeQName come from within getComplexElementExtensionBase. Note that these are just *samples* not an actual count of the # of invocations made. Nonetheless, the methods identified above dominate the performance profile of the app. Note also that this sample was from a 10 minute run (WSDL2Java timed out at that point) in the hotspot server VM on an 8-way Solaris 8 machine, Java 1.4.1, 32 gigs of RAM. CPU SAMPLES BEGIN (total = 11690) Thu Dec 26 15:38:13 2002 rank self accum count trace method 1 27.19% 27.19% 3179 439 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 2 17.62% 44.82% 2060 445 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 3 16.05% 60.86% 1876 437 org.apache.axis.wsdl.symbolTable.Utils.getQNameFromPrefixedName 4 11.14% 72.00% 1302 438 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 5 6.63% 78.63% 775 440 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 6 3.13% 81.76% 366 454 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 7 2.57% 84.33% 300 448 org.apache.axis.wsdl.symbolTable.SchemaUtils.getComplexElementExten sionBase 8 2.20% 86.53% 257 414 java.lang.String.substring 9 1.80% 88.33% 211 465 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 10 1.18% 89.51% 138 456 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 11 0.78% 90.29% 91 458 org.apache.axis.wsdl.symbolTable.Utils.getNodeQName 12 0.76% 91.05% 89 450 java.lang.String.substring With a cache for getComplexElementExtensionBase added, this gets better, though the quadratic algorithm for finding base types still dominates the clock: