axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject RE: The Great Debate: Xml Parsers
Date Wed, 21 Mar 2001 22:32:07 GMT

>    1  Axis must not force the entire message object model to 
> be in memory 
> at one time.  In other words, DOM is out.

OK, hang on a sec.  There are some pretty massive concerns around dealing
with any kind of streaming model, concerns which I don't believe have been
adequately addressed yet.  Until we resolve how we're building this, and
what the object model for the messages really looks like, I am not
personally ruling out using DOM or something much like it.

>From my point of view, it is MUCH more important to get v1.0 out the door
than it is to get *all* of the requirements met.  In particular I've been
thinking about this one, and frankly I'm willing to give it up and just use
JDOM/DOM internally if that gets us a working engine in the nearer term.
This is not to say I don't support the goal, I just don't see it happening
yet and I'm more leaning towards an "extreme programming" type viewpoint on
this project; get v1.0 out, collect feedback, refactor for v2.  I'm willing
to be convinced otherwise, if we can make good progress.

>    2  Axis must be very fast and very scalable in order to be widely 
> adopted over other Web Service implementation platforms

Yes, although what "fast enough" and "scalable enough" mean is somewhat open
to debate.

>    3  We must be able to independently parse individual 
> elements of the 
> message either as raw bits, SAX, the Axis defined Message API, DOM or 
> whatever else the user wants.

OK, yes.  +1!

>    4  We must be able to fully support SOAP semantics (i.e. multiref 
> elements, id/href, etc) without an overly negative impact on 
> performance 
> (see number 1 and 2)

Yeah baby!

> We've looked at Xerces, we've looked at JDOM, and most 
> recently I've been 
> doing some work with a new Xml Pull Parser developed originally by 
> Aleksander Slominski as part of a research project for 
> Indiana Univ. Below 
> is a basic summary of our thoughts thus far:
> Xerces 1.x ->  Our concern with Xerces 1.x DOM is that it is 
> slow, huge, 
> and complicated.  These are the standard complaints with DOM 
> that we've 
> all heard (note to the Xerces guys:  I eagerly await the release of 
> Xerces2 ! :-) ....)  It just won't scale well in the types of 
> environments 
> that we foresee Axis being deployed (which include limited capacity 
> devices such as handhelds (in which case it probably wouldn't 
> work at all 
> due simply to it's size).
> We also looked at SAX as an alternative but quickly 
> determined that SAX 
> just was not adequate for proper SOAP processing that also met the 
> requrements mentioned above.  (for those of you who weren't 
> part of that 
> discussion, I will not rehash it here, ping me later and I'll 
> give you the 
> rundown).
> JDOM -> Whlie JDOM is smaller and faster than Xerces and DOM, 
> which is 
> nice, it still does not meet our requirements listed above.  
> An additional 
> issue raised internally at IBM was that JDOM is nowhere near being a 
> standard yet.  (As some of you may know, the current Axis 
> codebase uses 
> JDOM for it's message processing).  We've all pretty much 
> decided already 
> that JDOM should be removed from the core and should be 
> replaced with a 
> lightweight XML parser that meets the requirements.

Just speaking for myself, I haven't decided that yet.

> Xml Pull Parser (XPP) -> XPP is a lightweight (23k) pull 
> parser that is 
> completely namespace aware and XML 1.0 compliant.  It's 
> interface needs 
> quite a bit of work so I've been working with the author on 
> getting it 
> cleaned up.  XPP has two advantages: 1. it's small, 2. it's 
> fast.  The 
> parser was originally implemented as part of a research 
> project comparing 
> the performance of various parsers in relation to 
> SOAP-deserialization. 
> I'll have to try to dig up the results of their tests again, but XPP 
> outperformed nearly everything else available.   XPP would 
> meet each of 
> our requirements once the interface redesign is complete.  
> This interface 
> redesign includes building a SAX layer over the parser's primary 
> interface.
> Now, here's what we need to decide:
> Which is more important: Performance/Scalability or Standards support?

My opinion - if you can get the same product out, and it meets the goals
outlined above, with either but not both of these things, I'd certainly pick
performance/scalability.  However, as mentioned above, getting the product
out is priority 1.

> From earlier decisions, I believe that we have agreed that 
> performance and 
> scalability in the case of Axis far outweigh standards 
> support within the 
> core engine itself as long as there are hooks specifically 
> designed into 
> the engine that allow full standards support if the developer 
> wishes it. 
> Thus the reason we were going to provide our own Axis Message 
> API with 
> hooks for optionally processing the message with SAX or DOM.  
> (i.e. if the 
> developer wants to tank their performance by using DOM, so be it)


> I would like to invite the Xerces guys to join this 
> discussion so that we 
> may figure out how to resolve this issue.  I understand now 
> that Xerces 2 
> includes a Pull Parser interface of it's own along with a low level 
> interface that enables modularization, but many of us here 
> either haven't 
> heard of it yet or aren't quite sure what it could mean for 
> Axis.  Could 
> anybody on the Xerces team explain this in greater depth for us?
> - James Snell
>      Software Engineer, Emerging Technologies, IBM
> (online)
> (offline)

View raw message