axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James M Snell" <>
Subject Re: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils. java
Date Wed, 04 Apr 2001 05:03:10 GMT

These are the best answers I've heard yet ;-) ... thanks for taking the 

Ok... so, we have this built in modularity in Xerces 2...  this modularity 
would make it very easy to pull things out that we don't need.  I'll start 
working with that and see what I can get figured out. 

- James Snell
     Software Engineer, Emerging Technologies, IBM (online) (offline)

Please respond to 
Subject:        Re: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils. java

James M Snell wrote:
>    Now, perhaps the Xerces2 guys can help us out with this:  how much of 
> speed
>    enhancement will the Xerces2 parser have over Xerces1?

We don't know because depending on how it's used the performance will
vary greatly. In general, what I can tell you is that Xerces1 is
considered to be *too* optimized and to suffer from it in the sense that
its code is really hard to read. This might surprise you because in your
application Xerces1 appears to give very poor performance. But I believe
this is because you're using it in a way which is very different from
what it's been tuned for. Specifically, if I understand correctly, you
parse thousands of small documents. Xerces was optimized to parse one
big document as fast as possible. So the requirements are quite
different. There is probably some room for improvement in the direction
you need
because nobody has every looked at the cost of resetting the parser
(between two parse).
As far as Xerces2 is concerned the requirements that have been set so
far are not about performance but about modularity. So it is expected to
be rather slower than faster in general. But it would probably benefit
from the same optimization work on reset.

> What we do need is an extremely
> fast, extremely efficient way of quickly extracting information from an
> XML structure.

While Xerces2 is expected to be slower in general in some cases, because
it is more modular, it may be set in a way which leads to much better
performance. For instance, if you never use a grammar, the validator can
easily be removed completely from the "pipeline". Earlier tests have
shown very promising results on that front...
Arnaud  Le Hors - IBM Cupertino, XML Strategy Group

View raw message