xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Boag/CAM/Lotus" <Scott_B...@lotus.com>
Subject Re: SAX
Date Sat, 13 May 2000 03:00:19 GMT

Michal Mosiewicz <mimo@interdata.com.pl> wrote:
>  I'm talking about improvement
> that is possible along the whole processing path - from content
>  generators, that may be sometimes not required to generate full content,
>  to translators, and finally to serializer, which is able to get the
>  information about cacheable parts of the document and remember them in
>  serialized form.

I agree with Mike.  Something along the lines of what he's saying (i.e.
return codes to skip sections) could be very cool.  It would be very
interesting to pre-analyze a stylesheet, and then skip sections that
couldn't possibly be processed.  The primary wins would be that the section
of the buffer wouldn't have to be decoded to Unicode, and the string
objects that have to be passed wouldn't have to be created for that
section.  Also, a streaming XSLT process (i.e. pre-analyze the stylesheet
to see if an internal tree needs to be built, and, if not, do the whole
transform in streaming mode without building a tree) should be able to tell
the parser to quit if it has completed the transformation.  Consider how
important this would be if you were using an extended XSLT processor as a
query engine to retrieve partial sections of hundreds of documents...
perhaps you just want to get the <description>...</description> sections.

I wonder if this might be better done as a callback to the producer
instance?  You could do this without breaking the current ContentHandler,
and it's something the Xerces people could do as a prototype, without
having to struggle with a proposal to the SAX folks (it's always better if
you come to something like this with a working implementation).  Also, you
could specify much more detailed information with a callback.  (I'm tempted
to say you could specify the section to be skipped with a subset of a XPath
match pattern... but that would probably be impractical in terms of the
complexity vs. the actual performance gain?).

> 3. There is also potentially much larger gain in serializer part,
> becouse this could allow for structure level caching of the result, i.e.
> you could potentially decide to cache some fragments of the output, not
> necessarily whole documents. Currently it's teoretically possible, but
> document producer is not able to know if it is not required to provide
> some data, becouse the cached output is still valid.

Interesting theory, but it's hard for me to see what kind of "structure
level caching" could be done, even if the serializer had Schema information
about the result tree.  Could you give an example?  I'm always very
interested to consider performance gains that could be made in the


View raw message