xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: SAX
Date Mon, 15 May 2000 18:58:42 GMT
Scott Boag/CAM/Lotus wrote:
> Stefano Mazzocchi <stefano@apache.org> wrote:
> > Oh, c'mon. This is what happens when you have a hammer and everything
> > looks like a nail: XPath is not a database query language and XSLT is
> > not a database query optmizer.
> XSLT is very much a type of query language, and processing hundreds of
> documents is well within what you can do with it.  In any case, there will
> be a full XML Query language at some point, and the same issue will apply.
> > Also, even if you pre-analyze your stylesheet and you know that the
> > element <foo> and <bar> are never processed, how do you know where to
> > stop?
> Good point.  Yeah, you would still have to do some processing of the
> stream.  But that's different from having to make String objects and having
> to decode the entire stream.

I disagree: SAX is based on char[] arrays and decoding _must_ be
performed anyway to know if the SAX events in question needs to be
thrown or not.

Let's have an example where a SAX producer feeds a SAX consumer:

 - the SAX consumer knows that "some" of the content will be filtered
 - so it tells the SAX producer to avoid sending it thru the pipe

So far so good.

The possible SAX producers are:

- adapters: behave passively on a resource, transforming this (normally
serialized) resource into a stream of SAX events.
- generators: behave actively, generating the SAX events directly.

Both, could receive information from the pipeline about what to throw
and what to avoid throwing. 
For adapters, the costs of decoding and parsing must be done anyway. The
only time saved is the creation of the char[] array and the calling of
the method (which is very likely to be optimized by hotspot engines)

For generators, parts of the code could be skipped, thus saving time.

My question is: if you know that a pipeline is "filtering out" some
content that is very expensive to generate (think at a JDBC query
transformed into SAX events by the generator), it's much more likely to
spend time writing a better generator than creating unnecessary
complexity in the API.

I haven't changed my mind, this would add implementation complexity of
both producers and consumers without adding significant functionality.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message