xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksander Slominski <as...@cs.indiana.edu>
Subject Re: How to start writing a non-blocking SAX parser
Date Sun, 05 May 2002 01:53:59 GMT
Andy Clark wrote:

> So putting a real pull parsing API on top of the output from
> an XNI pull parser configuration is what should be done. But
> what pull parsing API should that be?

> > it gathers information from callback(s) to return to the user just one event.
> > if XNI parse(false)  were too many callbacks the
> > xni2xmlpull parser should throw exception (it is _not_ tested ...)
>
> I briefly skimmed the code. My first impression is that I
> would have done it differently. (But I think that of any code
> written by someone else... ;)

writing now second implementation for XMLPULL API it should have
better design and cleaner implementation ...

> First, I think I would prefer a different API for pull parsing.
> Just from an object oriented standpoint, I don't like having
> all of the accessor methods on the XmlPullParser interface. I
> would have chosen to return different event objects. Then the
> event object would have public fields for its data (to avoid
> method calls) and specific methods for added functionality.

that looks like a compromise but is it good one?
in XMLPULL API the choice was to avoid creation
of event objects to minimize memory footprint J2ME
environments making XmlPullParser interface to
work as specialized iterator.

IMHO if API is just for J2SE and with current very good
JIT compilers (like Hotspot in JDK 1.3+) that can handle very
efficiently inlining and creation lot of small objects it is much
better to keep event objects similar to all Java API and
expose get/set methods instead of public fields.


> Some of the extra functionality that I'm referring to would
> be methods that make processing XML documents in a pull manner
> as easy as possible. The XmlPullParser API has some methods
> that do these things. For example, the "nextTag" method lets
> the application skip intervening text and just jump to the
> element boundaries. Very nice feature.

we were discussing it for some time on xmlpull-dev and we were
not even sure if it belongs in API but the feeling is that nextTag and
nextText are very needed and makes processing of SOAP or similar
XML data bindings much easier ...

> But I would also like
> to have a method that allows me to skip to a start element's
> end tag, returning all of the text within that element.

all of those functions can be easily built with XMLPULL API
and exposed as an utility class instead requiring too detailed
description of method implementation in interface ...

> Second, assuming that we start from the XmlPullParser API,
> then there are a few things that need to be handled within an
> implementation driven by an XNI pull parser configuration. I
> already mentioned it but I'll state it again for completeness:
> event queueing. Due to the pipeline nature of the XNI parser
> configuration, when working with generic configurations you
> can't guarantee that only one event (or at least one event
> that you then forward as a pull parsing event) will occur
> during a single call to "parse".

that sounds like a good engineering decision and i will
try to implements it in xni2xmlpull - and will make
implementation more robust :-)

> And since event queueing would then require you to buffer
> event information, I have a specific feature that I would
> add to the Xerces implementation to make it perform a little
> better. Arguably, one of the biggest wastes of time would be
> the copying of character content from the "characters" and
> "ignorableWhitespace" callbacks into a buffer so that runs
> of contiguous text can be returned together. So I would add
> a feature to Xerces so that the entity scanner (which is
> actually implemented by the entity manager) would not re-use
> the character buffers. That way I would not have to copy
> any characters at all because I would know that the contents
> of the char buffers would not be over-written.

that sounds like a great addition to Xerces 2  - i had something similar
in XPP2 that was called setBufferShrinkable() - it worked as an
additional interface that could be XmlPullParserBufferControl implemented
by parser and  detected by instanceof operator but it will certainly
work as well with SAX-like feature.

thanks,

alek



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message