cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Curdt <>
Subject Re: XMLByteStreamInterpreter
Date Fri, 18 Jan 2002 19:08:12 GMT
> > On the one hand the usage of both components gets more flexible. You
> > can compile and interprete arbitrary nodes of XML.

Exactly I don't wanna loose this.

> > But on the other hand we loose validity checking of the interpreted
> > byte stream. Currently the byte stream must contain a valid XML document.
> > With your proposed change, the byte stream can contain any block of
> > XML and I think it is very hard to check if all opened elements are
> > closed (or if for each startElement event an endElement event is send).

But the current implementation did do validation at all. The compiler
takes all events and stores them. And the interpreter waits
(while(true)!!) until a endDocument is found. I think this is really bad
and changed this to make the interpreter reveal all the events.
(What goes in - should also come out)

I also found a >=/= bug and I am still wondering why this has worked
for so long and so well ;)
Maybe someone can crosscheck this and so I can remove the comment in the
code. But I'm pretty sure it's correct now.

> I have a "WellFormednessCheckerPipe" on my TODO list for our projects.
> It's an XMLPipe that - as it names implies - checks that all elements
> are well balanced, namespaces are properly defined, etc.
> This is something that could be placed in front of XMLByteStreamInterpreter.

This sounds cool

> > Another problem I see is that the interpreter is an XMLProducer
> > and I think that the contract for an XMLProducer is to stream
> > a whole document. So we shouldn't break that contract.

Hm... but I also do see a need for XML fragments without being a full
Document. (Look e.g. at the xscript stuff - IIRC I have seen it there,
too) Always using an EmbeddedXMLPipe to work around this is IMHO a bit
ugly - maybe we really should separated those concerns by using a
different interface.

Don't we have:

  XMLFragmentByteCompiler implements XMLSerializer
  XMLFragmentByteInterpreter implements XMLFragment (with the toSAX() method)


  XMLDocumentByteCompiler implements XMLSerializer
  XMLDocumentByteInterpreter implements XMLProducer

> There's XMLByteStreamFragment just for that. Currently, it pipes the
> output of XMLByteStreamInterpreter through an EmbeddedXMLPipe that
> strips out start/endDocument().
> > So it seems better to enfore that the compiler only compiles complete
> > documents.
> This is limiting : the compiler is very usefull to buffer some content,
> be it a document or a fragment. It avoids the overhead of DOM when you
> just want to hold the content but not look at what's inside. See for
> example the "capture" logicsheet.


> > Another possibility is to explicitly add methods to the XMLSerializer
> > and XMLDeserializer which tell that not a whole document is processed
> > but only a fragment.
> We can also consider that the choice between document and fragment
> depends on the context where the data is deserialized, but isn't know
> when XML is serialized.


> We could then say :
> - XMLDeserializer is for documents (as it extends XMLProducer) and
> *always* calls start/endDocument(),
> - XMLByteStreamFragment is for fragments and *never* calls
> start/endDocument(),
> In that case, the compiler doesn't need to store start/endDocument
> events, because this is determined by the deserialization context.

...but maybe with a cleaner separation. What about:

  AbstractXMLByteCompiler implements XMLSerializer {
    //shared code

  XMLFragmentByteCompiler extends AbstractXMLByteCompiler {
   //will just store events - should leave out start/endDocument

  XMLDocumentByteCompiler extends AbstractXMLByteCompiler {
   //will be a bit more picky - could leave out start/endDocument, depends on the contract

  BTW: Carsten, don't you think checking well-formness at byte-compile
  time will slow down the caching system quite a bit? Is this really
  necessary to enforce?

  AbstractXMLByteInterpreter {
    //shared code

  XMLFragmentByteInterpreter extends AbstractXMLByteInterpreter implements XMLFragment {
    // will never spit out start/endDocument
    public void toSAX(ContentHandler);

  XMLDocumentByteInterpreter extends AbstractXMLByteInterpreter implements XMLProducer {
    // will always spit out start/endDocument

What about this?

To unsubscribe, e-mail:
For additional commands, email:

View raw message