axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Graham" <>
Subject Re: The Great Debate: Xml Parsers
Date Wed, 21 Mar 2001 21:10:44 GMT
Performance/Scalability support with standards support is ideal.
If forced to choose, performance/scalability has to trump standards

At the end of the day what is used internally is irrelevant to the outside
world, but the performance of Axis and its ability to scale IS important.
If we can provide a DOM api in the programming model, as an option, then


Steve Graham
(919)254-0615 (T/L 444)
Web Services Architect
Emerging Internet Technologies

James M Snell/Fresno/IBM@IBMUS on 03/21/2001 12:25:20 PM

Please respond to

Subject:  The Great Debate: Xml Parsers


(I'm cross-posting this to the Xerces-dev list so our friends on the
parser-side of things can follow along and join in)

As many of you know, we've had discussions in the past about which Xml
Parser to use as the core of the Axis message processing API.  Throughout
the course of this discussion, we've touched on several issues that have
become core requirements of Axis and need to drive our decision.  These
requirements are:

   1  Axis must not force the entire message object model to be in memory
at one time.  In other words, DOM is out.
   2  Axis must be very fast and very scalable in order to be widely
adopted over other Web Service implementation platforms
   3  We must be able to independently parse individual elements of the
message either as raw bits, SAX, the Axis defined Message API, DOM or
whatever else the user wants.
   4  We must be able to fully support SOAP semantics (i.e. multiref
elements, id/href, etc) without an overly negative impact on performance
(see number 1 and 2)

We've looked at Xerces, we've looked at JDOM, and most recently I've been
doing some work with a new Xml Pull Parser developed originally by
Aleksander Slominski as part of a research project for Indiana Univ. Below
is a basic summary of our thoughts thus far:

Xerces 1.x ->  Our concern with Xerces 1.x DOM is that it is slow, huge,
and complicated.  These are the standard complaints with DOM that we've
all heard (note to the Xerces guys:  I eagerly await the release of
Xerces2 ! :-) ....)  It just won't scale well in the types of environments
that we foresee Axis being deployed (which include limited capacity
devices such as handhelds (in which case it probably wouldn't work at all
due simply to it's size).

We also looked at SAX as an alternative but quickly determined that SAX
just was not adequate for proper SOAP processing that also met the
requrements mentioned above.  (for those of you who weren't part of that
discussion, I will not rehash it here, ping me later and I'll give you the

JDOM -> Whlie JDOM is smaller and faster than Xerces and DOM, which is
nice, it still does not meet our requirements listed above.  An additional
issue raised internally at IBM was that JDOM is nowhere near being a
standard yet.  (As some of you may know, the current Axis codebase uses
JDOM for it's message processing).  We've all pretty much decided already
that JDOM should be removed from the core and should be replaced with a
lightweight XML parser that meets the requirements.

Xml Pull Parser (XPP) -> XPP is a lightweight (23k) pull parser that is
completely namespace aware and XML 1.0 compliant.  It's interface needs
quite a bit of work so I've been working with the author on getting it
cleaned up.  XPP has two advantages: 1. it's small, 2. it's fast.  The
parser was originally implemented as part of a research project comparing
the performance of various parsers in relation to SOAP-deserialization.
I'll have to try to dig up the results of their tests again, but XPP
outperformed nearly everything else available.   XPP would meet each of
our requirements once the interface redesign is complete.  This interface
redesign includes building a SAX layer over the parser's primary

Now, here's what we need to decide:

Which is more important: Performance/Scalability or Standards support?

>From earlier decisions, I believe that we have agreed that performance and
scalability in the case of Axis far outweigh standards support within the
core engine itself as long as there are hooks specifically designed into
the engine that allow full standards support if the developer wishes it.
Thus the reason we were going to provide our own Axis Message API with
hooks for optionally processing the message with SAX or DOM.  (i.e. if the
developer wants to tank their performance by using DOM, so be it)

I would like to invite the Xerces guys to join this discussion so that we
may figure out how to resolve this issue.  I understand now that Xerces 2
includes a Pull Parser interface of it's own along with a low level
interface that enables modularization, but many of us here either haven't
heard of it yet or aren't quite sure what it could mean for Axis.  Could
anybody on the Xerces team explain this in greater depth for us?

- James Snell
     Software Engineer, Emerging Technologies, IBM (online) (offline)

View raw message