axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gdani...@macromedia.com
Subject RE: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils. java
Date Mon, 02 Apr 2001 20:33:54 GMT

Hi Sam!

I agree with the spirit of everything you say here.  For the benefit of
myself as well as the others who may not be as in tune with SOAP, I'm going
to quickly run down some bullet points about the environment we're in.
These are in no particular order, but cover what I consider the important
facets of the job we have to do.  This begins to describe our requirements,
I hope.

* SOAP is XML.  It's basically structured as follows:

<SOAP-ENV:envelope xmlns:SOAP-ENV="insert-important-url-here">
 <SOAP-ENV:header>
  <header-entry />
 </SOAP-ENV:header>
 <SOAP-ENV:body>
  <body-entry />
 </SOAP-ENV:body>
</SOAP-ENV:envelope>

* Inside the header and body entries may be XML-encoded language objects,
particularly ones which are encoded as specified in the SOAP spec [1].  The
encoding (in section 5 of the spec) calls out the use of the XML Schema
basic types, plus a few other rules about structures and arrays.

* One feature of the SOAP section 5 encoding is "multi-ref accessors", which
work like this:

<SOAP-ENV:envelope xmlns:SOAP-ENV="insert-important-url-here"
                   xmlns:foo="urn:foo"
                   xmlns:xsi="schema-instance-uri"
                   xmlns:xsd="schema-data-uri">
 <SOAP-ENV:header>
  <foo:header ref="#1" />
 </SOAP-ENV:header>
 <SOAP-ENV:body>
  <foo:body ref="#1" />
  <foo:actualElement id="1" xsi:type="xsd:int">5</foo:actualElement>
 </SOAP-ENV:body>
</SOAP-ENV:envelope>

  (both the foo:header and the foo:body are references to the same integer)

* To deserialize multi-ref accessors, we may need to look arbitrarily far
ahead in the document for the element with the correct id.  This makes a
straight-ahead "streaming" approach (process the XML in order as it comes
in) somewhat challenging.  Also, different pieces of code may desire to
process particular headers in an order different from that in which they are
serialized in the XML.

* There is some concern that the XML, especially the body entries, may get
to be really large (giant base-64-encoded documents, for instance), hence we
are somewhat cautious about assuming we need to pull the whole document into
memory before processing.  I note that there is a school of thought here (to
which I subscribe, btw) that says it's pilot error to try and send a huge
chunk of data inside your XML; rather you should take such things and attach
them per the SOAP with Attachments spec [2].

* We need this stuff to be parsed into some usable form very quickly and
efficiently.

* Some developers will want direct access to the XML within a particular
part of the envelope as DOM, or JDOM, or perhaps SAX events.

* Graham Glass claims to parse XML into an internal object model (I suspect
he parses the whole document before processing, btw) EXTREMELY quickly using
his Electric XML parser [3].  This model is used for SOAP processing.

* W3C XML Protocol [4] will be arriving on the scene at some point.  We'd
like to abstract out as much of the SOAPness as possible so that Axis can
easily become XMLP-compatible as soon as possible.

Is there other stuff I've left out, folks?

OK, so as I said, I agree with Sam's points here.  The first thing I'd like
to do is some basic performance testing of various XML parsing models.  I do
not see a real streaming approach being all that viable for Axis v1.0 (I'm
open to argument on that).  If that is the case, we're talking about parsing
the document into some object model.  As I see it, we can either: 1) use a
pre-existing model like DOM or JDOM, or 2) use SAX or a pull parser such as
XPP to parse into our own SOAP-specific object model.

Option 2 might be faster.  Option 1 gains us a standard programming model
(i.e. when developers ask us for JDOM/DOM we can just give it to them), plus
perhaps a speedier development cycle.

I'd like to do the simplest possible thing that gives us the desired
results.

Jason, do you have any numbers/stats as to whether parsing into JDOM using
SAX is faster than a typical DOM parse in, say, Xerces?

Over and out for now,

--Glen

[1] http://www.w3.org/TR/soap
[2] http://www.w3.org/TR/SOAP-attachments
[3] http://www.themindelectric.com/products/xml/xml.html
[4] http://www.w3.org/2000/xp/

> -----Original Message-----
> From: Sam Ruby [mailto:rubys@us.ibm.com]
> Sent: Monday, April 02, 2001 4:00 PM
> To: axis-dev@xml.apache.org
> Cc: xerces-dev@xml.apache.org; jhunter@collab.net
> Subject: RE: cvs commit: xml-axis/java/src/org/apache/axis/utils
> XMLUtils.java
> 
> 
> Glen Daniels wrote:
> >
> > OK, here's my suggestion.  Take it with appropriate salt.
> >
> > DOM is pretty much a pain in the ass to work with, and we're Java
> > developers, with access to JDOM.  JDOM is screaming along 
> in terms of
> > functionality, and they now deal just fine with JAXP on the 
> bottom end, so
> > you can use whatever parser you want underneath there.  
> JDOM is also going
> > to be rolled into the Java standard fairly soon (JSR-102, I think).
> >
> > Until we figure out what we're "really" doing about XML parsing and
> > modeling, I think we'd move much faster with JDOM, and 
> that's where I think
> > we should be.  Besides, if we're going to end up using some 
> other model like
> > pull or whatever anyway, why should it matter if we use 
> JDOM or DOM right
> > now?
> >
> > Suggestion : put JDOM back for now, and feel free to use the JAXP
> > interface to pick a parser.
> 
> OK, here's my suggestion.  Take it with appropriate salt.
> 
> Warning: the message is a real downer.  Parental discretion advised.
> 
> The xml-soap implementation continues to be popular.  It is 
> getting ever
> more interopable with other implementations (thanks Glen!).
> 
> The biggest gripe I hear is that it can't process as many messages per
> second as some other implementations.  Some say it is Java's 
> fault, but I
> see some boasting orders of magnitude improvements over 
> Apache with their
> Java implementations.  Others have noticed perhaps a 20% 
> improvement with
> C/C++.
> 
> Some measurements suggest that up to the 75% of the time is 
> in the parser.
> Even if we accept that on face value, we have to conclude 
> that 25% of the
> time is not, and even if the parser were eliminated entirely 
> we will never
> see an order of magnitude improvement by just fixing the parser.
> 
> I believe that some new thinking is required.  It likely will 
> require some
> cooperation with the parser team (hence why I am copying the 
> Xerces mailing
> list, and for that matter Jason too in order to get a JDOM 
> perspective) to
> pull it off.
> 
> Meanwhile, my response to "if we're going to end up with some 
> other model
> like pull or whatever anyway" is that I don't think it much 
> matters what
> you work with right now as it will likely by DOA.
> 
> Lets start by setting some priorities, and expressing them 
> with concrete
> scenarios and test cases.  Lets start with a trivial 
> implementation which
> simply reads from a socket and sends back a canned reply, and 
> measure that.
> No parser, no servlet engine, simply Java code.  Then lets 
> slowly introduce
> more function measuring the impact and determining if the impact is
> reasonable and if not what is the alternative.
> 
> Meanwhile, lets figure out a concrete way to express our 
> requirements to
> the parser team in a way that helps them understand what our 
> needs are.
> 
> Thoughts?
> 
> - Sam Ruby
> 
> Disclaimer:  IMHO, a parser and a servlet engine is a 
> requirement, don't
> take any of the above as an indication to the contrary.
> 

Mime
View raw message