axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Leung" <twle...@sauria.com>
Subject Re: The Great Debate: Xml Parsers
Date Thu, 22 Mar 2001 07:08:18 GMT

----- Original Message -----
From: "Sam Ruby" <rubys@us.ibm.com>
To: <xerces-j-dev@xml.apache.org>
Sent: Wednesday, March 21, 2001 1:26 PM
Subject: The Great Debate: Xml Parsers


> Cross posting to xerces-j-dev.
>
> - Sam Ruby
>
> ---------------------- Forwarded by Sam Ruby/Raleigh/IBM on 03/21/2001
02:36 PM ---------------------------
>
> James M Snell/Fresno/IBM@IBMUS on 03/21/2001 12:25:20 PM
>
> Please respond to axis-dev@xml.apache.org
>
> To:   axis-dev@xml.apache.org
> cc:   xerces-dev@xml.apache.org
> Subject:  The Great Debate: Xml Parsers
>
>
>
> All,
>
> (I'm cross-posting this to the Xerces-dev list so our friends on the
> parser-side of things can follow along and join in)
>
> As many of you know, we've had discussions in the past about which Xml
> Parser to use as the core of the Axis message processing API.  Throughout
> the course of this discussion, we've touched on several issues that have
> become core requirements of Axis and need to drive our decision.  These
> requirements are:
>
>    1  Axis must not force the entire message object model to be in memory
> at one time.  In other words, DOM is out.

Seems to me that JDOM should be out on this count also.

>    2  Axis must be very fast and very scalable in order to be widely
> adopted over other Web Service implementation platforms

You couldn't be more right.

>    3  We must be able to independently parse individual elements of the
> message either as raw bits, SAX, the Axis defined Message API, DOM or
> whatever else the user wants.

Why?

>    4  We must be able to fully support SOAP semantics (i.e. multiref
> elements, id/href, etc) without an overly negative impact on performance
> (see number 1 and 2)
>
> We've looked at Xerces, we've looked at JDOM, and most recently I've been
> doing some work with a new Xml Pull Parser developed originally by
> Aleksander Slominski as part of a research project for Indiana Univ. Below
> is a basic summary of our thoughts thus far:
>
> Xerces 1.x ->  Our concern with Xerces 1.x DOM is that it is slow, huge,
> and complicated.  These are the standard complaints with DOM that we've
> all heard (note to the Xerces guys:  I eagerly await the release of
> Xerces2 ! :-) ....)  It just won't scale well in the types of environments
> that we foresee Axis being deployed (which include limited capacity
> devices such as handhelds (in which case it probably wouldn't work at all
> due simply to it's size).
>
> We also looked at SAX as an alternative but quickly determined that SAX
> just was not adequate for proper SOAP processing that also met the
> requrements mentioned above.  (for those of you who weren't part of that
> discussion, I will not rehash it here, ping me later and I'll give you the
> rundown).

I'd like to know why this is?  Especially since you are talking about
building a
SAX layer atop XPP below

> JDOM -> Whlie JDOM is smaller and faster than Xerces and DOM, which is
> nice, it still does not meet our requirements listed above.  An additional
> issue raised internally at IBM was that JDOM is nowhere near being a
> standard yet.  (As some of you may know, the current Axis codebase uses
> JDOM for it's message processing).  We've all pretty much decided already
> that JDOM should be removed from the core and should be replaced with a
> lightweight XML parser that meets the requirements.
>
> Xml Pull Parser (XPP) -> XPP is a lightweight (23k) pull parser that is
> completely namespace aware and XML 1.0 compliant.  It's interface needs
> quite a bit of work so I've been working with the author on getting it
> cleaned up.  XPP has two advantages: 1. it's small, 2. it's fast.  The
> parser was originally implemented as part of a research project comparing
> the performance of various parsers in relation to SOAP-deserialization.
> I'll have to try to dig up the results of their tests again, but XPP
> outperformed nearly everything else available.   XPP would meet each of
> our requirements once the interface redesign is complete.  This interface
> redesign includes building a SAX layer over the parser's primary
> interface.
>
> Now, here's what we need to decide:
>
> Which is more important: Performance/Scalability or Standards support?

PERFORMANCE  -- It's already bad enough that you're trying to do RPC like
 things with text files.  VC's aren't dropping out of the sky to buy kids
E10K's or
S80's any more.

> >From earlier decisions, I believe that we have agreed that performance
and
> scalability in the case of Axis far outweigh standards support within the
> core engine itself as long as there are hooks specifically designed into
> the engine that allow full standards support if the developer wishes it.
> Thus the reason we were going to provide our own Axis Message API with
> hooks for optionally processing the message with SAX or DOM.  (i.e. if the
> developer wants to tank their performance by using DOM, so be it)
>
> I would like to invite the Xerces guys to join this discussion so that we
> may figure out how to resolve this issue.  I understand now that Xerces 2
> includes a Pull Parser interface of it's own along with a low level
> interface that enables modularization, but many of us here either haven't
> heard of it yet or aren't quite sure what it could mean for Axis.  Could
> anybody on the Xerces team explain this in greater depth for us?

Actually Xerces 1 contains a pull parser interface as well, but it's poorly
documented
and mostly used internally.  If getting the "product out" is the key, then
neither this API
nor it's descendent API in Xerces2 are for you.

However,  Axis is an xml.apache.org project, as is Xerces.  It seems
perfectly reasonable
to me that you guys push requirements on us, just as Scott and the Xalan
developers have
done (and should continue to do).   I would like to see us engage in a
vigorous and public
discussion of your requirements and why Xerces is not suitable.  It's a
known fact/bug that
Xerces 1 performance on small documents is poor.  It's also true that very
little effort has
been expended on rectifying that.  So far the only real requirement that I
can see coming from
Axis is that we give you good performance on small documents.  Am I missing
something?
In my book it's okay if in the short term Axis has to use XPP, but in the
long term, both
projects should be trying to find a way to make the ASF SOAP a truly ASF
stack.

FYI is posted a SOAP related performance study to xerces-j-dev within the
last few weeks.
I'm glad to see you guys coming to the party.  Especially since you are the
ones who are
going to keep us from getting Hailstormed.

> - James Snell
>      Software Engineer, Emerging Technologies, IBM
>      jasnell@us.ibm.com (online)
>      jsnell@lemoorenet.com (offline)
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>


Mime
View raw message