Mailing-List: contact axis-dev-help@xml.apache.org; run by ezmlm
Precedence: bulk
Reply-To: axis-dev@xml.apache.org
Importance: Normal
To: axis-dev@xml.apache.org
Cc: xerces-j-dev@xml.apache.org
Subject: Re: cvs commit: xml-axis/java/src/org/apache/axis/utils
 XMLUtils.	java
Message-ID: <OF566275B9.CE0626F1-ON88256A24.001AE9AE@LocalDomain>
From: "James M Snell" <jasnell@us.ibm.com>
Date: Tue, 3 Apr 2001 22:03:10 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"

Arnaud,

These are the best answers I've heard yet ;-) ... thanks for taking the 
time... 

Ok... so, we have this built in modularity in Xerces 2...  this modularity 
would make it very easy to pull things out that we don't need.  I'll start 
working with that and see what I can get figured out. 

- James Snell
     Software Engineer, Emerging Technologies, IBM
     jasnell@us.ibm.com (online)
     jsnell@lemoorenet.com (offline)

Please respond to axis-dev@xml.apache.org 
To:     xerces-dev@xml.apache.org
cc:     axis-dev@xml.apache.org 
Subject:        Re: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils. java


James M Snell wrote:
>
>    Now, perhaps the Xerces2 guys can help us out with this:  how much of 
a
> speed
>    enhancement will the Xerces2 parser have over Xerces1?

We don't know because depending on how it's used the performance will
vary greatly. In general, what I can tell you is that Xerces1 is
considered to be *too* optimized and to suffer from it in the sense that
its code is really hard to read. This might surprise you because in your
application Xerces1 appears to give very poor performance. But I believe
this is because you're using it in a way which is very different from
what it's been tuned for. Specifically, if I understand correctly, you
parse thousands of small documents. Xerces was optimized to parse one
big document as fast as possible. So the requirements are quite
different. There is probably some room for improvement in the direction
you need
because nobody has every looked at the cost of resetting the parser
(between two parse).
As far as Xerces2 is concerned the requirements that have been set so
far are not about performance but about modularity. So it is expected to
be rather slower than faster in general. But it would probably benefit
from the same optimization work on reset.

> What we do need is an extremely
> fast, extremely efficient way of quickly extracting information from an
> XML structure.

While Xerces2 is expected to be slower in general in some cases, because
it is more modular, it may be set in a way which leads to much better
performance. For instance, if you never use a grammar, the validator can
easily be removed completely from the "pipeline". Earlier tests have
shown very promising results on that front...
--
Arnaud  Le Hors - IBM Cupertino, XML Strategy Group