axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Leung" <twle...@sauria.com>
Subject Re: Reasons for pull parser for streaming
Date Fri, 23 Mar 2001 18:03:46 GMT

----- Original Message -----
From: "James M Snell" <jasnell@us.ibm.com>
To: <axis-dev@xml.apache.org>
Sent: Thursday, March 22, 2001 11:08 PM
Subject: Re: Reasons for pull parser for streaming


> Andy,
>
> What part of XPP is not XML Compliant? (I'm not arguing the point, I'm
> bowing to your greater wisdom on the matter :-) ...)

Well, Xerces1 is passing all of OASIS's XML conformance tests.  Is XPP?
That's one part of it.

> And to be fair (and glib at the same time ;-) ..), the Xerces Pull Parser
> API is proprietary also ;-)  It just happens to generate standard
> interfaces as a result of calling it's proprietary methods.

Yes, but I think it's safe to say that there are going to be more eyes and
hands working on the Xerces codebase to improve it than the XPP codebase.

> Let's definitely plan on discussing this soon (the sooner the better).  If
> we can walk away with both a better Axis and a better Xerces then we've
> done our jobs well, if one succeeds without the other, then we're shooting
> ourselves in the foot.
>
> 1. We need something now that works
> 2. We need something now that is fast
> 3. We can't take three months to figure it out
>
> Bottom line: we need to prove that Xerces can do the job.  I don't care if
> it's the pull parser interface or Sax as long as we can get it done
> quickly and as long we can guarantee that the thing is fast.

I think that 2 things need to happen:

1) we do the work to make X1 fast on small documents.
2) we get the right API for pull parsing, and write an adapter between the
right API and the internal pull API.

For 1, we need a benchmark suite and some volunteers with access to JProbe
or
OptimizeIt.  This is something the Xerces folks should do.
For 2, we need to see some uses cases where doing things with XPP/Pull is
"better"
than SAX.  By use cases, I mean something that looks like code, even if it's
not executable.
This is something the Axis folks should do.

> - James Snell
>      Software Engineer, Emerging Technologies, IBM
>      jasnell@us.ibm.com (online)
>      jsnell@lemoorenet.com (offline)
>
> Please respond to axis-dev@xml.apache.org
> To:     axis-dev@xml.apache.org
> cc:
> Subject:        Re: Reasons for pull parser for streaming
>
>
>
> Glen Daniels wrote:
> > Check me on this - it seemed to us that to make a SAX based "push"
> parser
> > build up a piece of the object model, then stop, then continue again,
> etc.
> > you would need to have a separate thread blocking and unblocking since
> > there's no way in SAX to "pause" the event stream and let the thread
> that
> > initiated the parse continue.  It was this concern which motivated the
> turn
> > towards "true" pull parsing.
>
> Internally, Xerces is a pull parser. The external parse() method
> is really just a loop around the pull parsing methods. The only
> difference between Xerces and XPP is the choice of API and the
> way that the information is communicated to the application.
>
> In XPP, you loop, each time checking the "token" returned to see
> what kind of thing it is. Depending on the type, you can call
> methods to retrieve the information that you are interested in.
> In Xerces, there is no token returned. Instead, calls are made
> via the internal API (which is then emitted via the standard
> API like DOM and SAX, depending on the parser instance).
>
> The actual stopping point is very similar between Xerces and
> XPP. Each "piece" of the XML document is considered a stopping
> point. For example: a start element, end element, text content,
> etc.
>
> My concerns with adding a dependency on XPP are the following:
>
>   1) XPP is not XML compliant
>   2) XPP uses a proprietary API
>   3) XPP's pull parsing API doesn't buy you much over SAX,
>      in my opinion
>
> We should work on meeting your requirements in the Xerces
> codebase and centralize on that parser for Apache projects
> that use XML.
>
> > > I think that the Axis team was unaware that Xerces actually
> > > is a pull parser masquerading as a push parser. :)
> >
> > We'd heard rumours, but none of us were brave enough to go look in the
> > code... :)
>
> I'm more familiar with the code but you can find it by looking
> at the public methods on the SAXParser (or DOMParser) up through
> the class hierarchy. That's how I found it because I've never
> taken advantage of the pull parsing mechanism before.
>
> --
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
>
>


Mime
View raw message