cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject Re: [cocoon3] Stax Pipelines
Date Wed, 03 Dec 2008 07:56:45 GMT
Andreas Pieber wrote:
> First of all, my name is Andreas and I'm one of the students working on the StAX 
> implementation for cocoon. Therfore hello from my colleagues and me.

Hi Andreas and colleagues!

> Secondly me first post ever to the mailing list of an open source project and 
> such a long post to answer. Thank you Sylvain ;) Nevertheless I'm going to try 
> my best.

Doh, sorry for that. But at least this brought some material for the 
discussion :-P

> We (if i say we, I mean us students strongly influenced by Reinhard and Steven 
> :)) also thought about the problems described by you and came to the same 
> conclusion.

Good to hear!

> Therefore we're trying another approach. Pulling StAX-XmlEvents 
> through the entire pipeline from the end. 
> In other words, if we have a simple pipe of the following form:
> Producer - Transformer - Serializer
> the Serializer would have in its start method some code like:
> while(parent.hasNext()){
> 	xmlOutputWriter.add(parent.getNext());
> }
> retrieving the next event on the Transformer in this case and writing it into an 
> XmlOutputWriter. The transformer on his self calls the getNext method on the 
> Starter (in this case) which retrieves the XmlEvents directly from the 
> XmlInputReader.
> In this approach the Transformer needs (of course) some kind of buffer since in 
> response to one sibling from the parent much new content could be produced by 
> the transformer. This content is only retrieved one by one while the next 
> pipeline component calls getNext which explains the need for some kind of 
> buffer.
> Of course this buffer and some more helper code have to be produced to avoid 
> code duplication and helping the developer.

I thought about that approach as well, but it doesn't avoid state 
management, which is the main complexity that Stax is supposed to solve. 
This is still a callback-based processing, although we have here pull 
callbacks rather than push callbacks.

Now you're right: a single pull callback can consume several input 
events that are related, making it thus easy to process a subtree of 
several closely related elements from the input. It would for exemple 
radically simplify the implementation of the I18nTransformer where 
<i18n:translate> and <i18n:choose> have a nested structure.

But in many situations the elements of interest to a transformer enclose 
large document sections that are to be propagated without modification. 
Examples are JXTemplateTransformer or FormsTransformer (but does anybody 
still use these instead of their generator replacements?), 
RoleFilterTransfomer, SQLTransformer, LuceneIndexTransformer, 
MailTransformer, etc.

In that case, if we want to avoid processing the full input when 
reacting to a start element in order to keep the benefits of streaming, 
we have to use state management very similar to what would be needed for 
a SAX implementation.

I also have the feeling that because of the need for state management, 
we'll end up with quite complex structures, because of the mix of a 
callback and state automata approach with the pull approach where state 
is kept in the method calls stack and local variables.

Now I'd love to be proven wrong, since after considering these issues 
I've never actually experimented with this approach.

> One big "problem" in this approach is that the "flow direction of events" is 
> completely inverted. This means that StAX and SAX components would not be able 
> to work "directly" together. But also in a push-pull approach a conversion 
> between StAX and SAX events have to be done and further more this problem could 
> be tackled by writing a wrapper or adapters around the SAX components and add 
> them to an StAX pipe.

Absolutely. Converting Stax to SAX is fairly trivial, but the other way 
around requires buffering or multithreading. Have you looked at 
Stax-Utils [1]? It contains many classes to ease the SAX <-> Stax 

> At the moment we're developing a prototype for such a "pull only pipe" to get 
> some experience with it.

Even if I may seem a big negative above, keep up on this work. As I 
said, I haven't actually experimented Stax-based state management, so 
maybe my feelings were wrong and I'm very interested in seeing what you 
can come up with.

Now there's one very interesting use case for Stax we should not forget: 
communication with remote APIs in a xmlrpc-style where the response body 
contains both status information useful to a controller, and actual data 
that can be used by a pipeline. In that case, the application controller 
should be able to pull a few events from the request until it has all 
the necessary information to decide what to do next, and then replay the 
full request event stream into a pipeline.

A typical example is the Flickr "REST" response [2], which BTW is 
actually not REST at all since the status code is in the response body 
rather than in the HTTP status. A typical controller for this API would be:

  InputStream flickrResponse = callFlickerAPI("foo");
  PushBackStreamReader input = new PushBackStreamReader(flickrResponse);
  if ("ok".equals(in.getAttributeValue(null, "status")) {
      // go back to the first event in the stream
      Pipeline pipe = new Pipeline();
      ... build the pipeline and run it ...
  } else {
      sendErrorResponse("Flickr failed");

(note that in "pipe.setGenerator(input)" I don't care if the pipeline is 
Stax-based or SAX-based with a Stax to SAX converter)

> I hope i was able to point out the nub of our thoughts. So, what do you think?

Yes, you got it! And sorry for throwing at you a large email for your 
first participation :-)

But you'll quickly learn that cocoon-dev is friendly place where 
everybody can voice his opinions... and have them challenged :-P



Sylvain Wallez -

View raw message