Return-Path: Delivered-To: apmail-xml-axis-dev-archive@xml.apache.org Received: (qmail 69174 invoked by uid 500); 4 Apr 2001 05:03:15 -0000 Mailing-List: contact axis-dev-help@xml.apache.org; run by ezmlm Precedence: bulk Reply-To: axis-dev@xml.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list axis-dev@xml.apache.org Received: (qmail 68783 invoked from network); 4 Apr 2001 05:03:14 -0000 Received: from e31.co.us.ibm.com (HELO e31.bld.us.ibm.com) (32.97.110.129) by h31.sny.collab.net with SMTP; 4 Apr 2001 05:03:14 -0000 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id AAA35938; Wed, 4 Apr 2001 00:55:59 -0400 Received: from f6n96e (d03nm104h.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.8.8m3/NCO v4.95) with ESMTP id XAA275242; Tue, 3 Apr 2001 23:03:19 -0600 Importance: Normal To: axis-dev@xml.apache.org Cc: xerces-j-dev@xml.apache.org Subject: Re: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils. java X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: From: "James M Snell" Date: Tue, 3 Apr 2001 22:03:10 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.6 |December 14, 2000) at 04/03/2001 11:03:19 PM, Serialize complete at 04/03/2001 11:03:19 PM MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N Arnaud, These are the best answers I've heard yet ;-) ... thanks for taking the time... Ok... so, we have this built in modularity in Xerces 2... this modularity would make it very easy to pull things out that we don't need. I'll start working with that and see what I can get figured out. - James Snell Software Engineer, Emerging Technologies, IBM jasnell@us.ibm.com (online) jsnell@lemoorenet.com (offline) Please respond to axis-dev@xml.apache.org To: xerces-dev@xml.apache.org cc: axis-dev@xml.apache.org Subject: Re: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils. java James M Snell wrote: > > Now, perhaps the Xerces2 guys can help us out with this: how much of a > speed > enhancement will the Xerces2 parser have over Xerces1? We don't know because depending on how it's used the performance will vary greatly. In general, what I can tell you is that Xerces1 is considered to be *too* optimized and to suffer from it in the sense that its code is really hard to read. This might surprise you because in your application Xerces1 appears to give very poor performance. But I believe this is because you're using it in a way which is very different from what it's been tuned for. Specifically, if I understand correctly, you parse thousands of small documents. Xerces was optimized to parse one big document as fast as possible. So the requirements are quite different. There is probably some room for improvement in the direction you need because nobody has every looked at the cost of resetting the parser (between two parse). As far as Xerces2 is concerned the requirements that have been set so far are not about performance but about modularity. So it is expected to be rather slower than faster in general. But it would probably benefit from the same optimization work on reset. > What we do need is an extremely > fast, extremely efficient way of quickly extracting information from an > XML structure. While Xerces2 is expected to be slower in general in some cases, because it is more modular, it may be set in a way which leads to much better performance. For instance, if you never use a grammar, the validator can easily be removed completely from the "pipeline". Earlier tests have shown very promising results on that front... -- Arnaud Le Hors - IBM Cupertino, XML Strategy Group