cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Fagerstrom <>
Subject [RT] Input Pipelines: Storage and Selection (was Re: [RT] Input Pipelines (long))
Date Tue, 07 Jan 2003 10:55:21 GMT
Stefano Mazzocchi wrote:
 > Hmmm, maybe deep architectural discussions are good during holydays
 > seasons... we'll see :)
Not for me, I've been away from computers for a while. But you and 
Nicola Ken seem to have had an interesting discussion :)

The discussion about input pipelines can be divided in two parts:
1. Improving the handling of the input stream in Cocoon. This is needed 
for web services, it is also needed for making it possible to implement 
a writable cocoon:-protocol, something that IMO would be very useful for 
reusing functionality in Cocoon, especially from blocks.

2. The second part of the proposal is to use two pipelines, executed in 
sequence, to respond to input in Cocoon. The first pipeline (called 
input pipeline) is responsible for reading the input and from request 
parameters or from the input stream, transform it to an appropriate 
format and store it in e.g. a session parameter, a file or a db. After 
the input pipeline there is an ordinary (output) pipeline that is 
responsible for generating the response. The output pipeline is executed 
after that the execution of the input pipeline is completed, as a 
consequence actions and selections in the output pipeline can be 
dependent e.g. on if the handling of input succeeded or not and on the 
data that was stored by the input pipeline.

Here I will focus on your comments on the second part of the proposal.

 > Daniel Fagerstrom wrote:
 >> In Sitemaps
 >> -----------
 >> In a sitemap an input pipeline could be used e.g. for implementing a
 >> web service:
 >> <match pattern="myservice">
 >>   <generate type="xml">
 >>     <parameter name="scheme" value="myInputFormat.scm"/>
 >>   </generate>
 >>   <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
 >>   <serialize type="dom-session" non-terminating="true">
 >>     <parameter name="dom-name" value="input"/>
 >>   </serialize>
 >>   <select type="pipeline-state">
 >>     <when test="success">
 >>       <act type="my-business-logic"/>
 >>       <generate type="xsp" src="collectTheResult.xsp"/>
 >>       <serialize type="xml"/>
 >>     </when>
 >>     <when test="non-valid">
 >>       <!-- produce an error document -->
 >>     </when>
 >>   </select>
 >> </match>
 >> Here we have first an input pipeline that reads and validates xml
 >> input, transforms it to some appropriate format and store the result
 >> as a dom-tree in a session attribute. A serializer normally means that
 >> the pipeline should be executed and thereafter an exit from the
 >> sitemap. I used the attribute non-terminating="true", to mark that
 >> the input pipeline should be executed but that there is more to do in
 >> the sitemap afterwards.
 >> After the input pipeline there is a selector that select the output
 >> pipeline depending of if the input pipeline succeed or not. This use
 >> of selection have some relation to the discussion about pipe-aware
 >> selection (see [3] and the references therein). It would solve at
 >> least my main use cases for pipe-aware selection, without having its
 >> drawbacks: Stefano considered pipe-aware selection mix of concern,
 >> selection should be based on meta data (pipeline state) rather than on
 >> data (pipeline content). There were also some people who didn't like
 >> my use of buffering of all input to the pipe-aware selector. IMO the
 >> use of selectors above solves booth of these issues.
 >> The output pipeline start with an action that takes care about the
 >> business logic for the application. This is IMHO a more legitimate use
 >> for actions than the current mix of input handling and business logic.
 > Wouldn't the following pipeline achieve the same functionality you want
 > without requiring changes to the architecture?
 > <match pattern="myservice">
 >   <generate type="payload"/>
 >   <transform type="validator">
 >     <parameter name="scheme" value="myInputFormat.scm"/>
 >   </transform>
 >   <select type="pipeline-state">
 >     <when test="valid">
 >       <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
 >       <transform type="my-business-logic"/>
 >       <serialize type="xml"/>
 >     </when>
 >     <otherwise>
 >       <!-- produce an error document -->
 >     </otherwise>
 >   </select>
 > </match>

Yes, it would achieve about the same functionality as I want and it 
could easily be implemented with the help of the small extensions of the 
sitemap interpreter that I implemented for pipe aware selection [3].

I think it could be interesting to do a detailed comparison between the 
differences in our proposals: How the input stream and validation is 
handled, how the selection based on pipeline state is performed, if 
storage of the input is done in a serializer or in a transformer, and 
how the new output is created.

Input Stream

For input stream handling you used

   <generate type="payload"/>

Is the payload generator equivalent to the StreamGenerator? Or does it 
something more, like switching parser depending on mime type for the 
input stream?

I used

   <generate type="xml"/>

The idea is that if no src attribute is given the sitemap interpreter 
automatically connect the generator to the input stream of the 
environment (the input stream from the http request in the servlet case, 
in other cases it is more unclear). This behavior was inspired by the 
handling of std input in unix pipelines.

Nicola Ken proposed:

   <generate type="xml" src="inputstream://"/>

I prefer this solution compared to mine as it doesn't require any change 
of the sitemap interpreter, I also believe that it it easier to 
understand as it is more explicit. It also (as Nicola Ken has explained) 
gives a good SoC, the uri in the src attribute describes where to read 
the resource from, e.g. input stream, file, cvs, http, ftp, etc and the 
generator is responsible for how to parse the resource. If we develop a 
input stream protocol, all the work invested in the existing generators, 
can immediately be reused in web services.


Should validation be part of the parsing of input as in:

   <generate type="xml">
     <parameter name="scheme" value="myInputFormat.scm"/>

or should it be a separate transformation step:

   <transform type="validator">
     <parameter name="scheme" value="myInputFormat.scm"/>

or maybe the responsibility of the protocol as Nicola Ken proposed in 
one of his posts:

   <generate type="xml" src="inputstream:myInputFormat.scm"/>

This is not a question about architecture but rather one about finding 
"best practices".

I don't think validation should be part of the protocol. It means that 
the protocol has to take care of the parsing and that would mumble the 
SoC where the protocol is responsible for locating and delivering the 
stream and the generator is responsible for parsing it, that Nicola Ken 
have argued for in his other posts.

Should validation be part of the generator or a transform step? I don't 
know. If the input not is xml as for the ParserGenerator, I guess that 
the validation must take place in the generator. If the xml parser 
validates the input as a part of the parsing it is more practical to let 
the generator be responsible for validation (IIRC Xerces2 has an 
internal pipeline structure and performs validation in a transformer 
like way, so for Xerces2 it would probably be as efficient to do 
validation in a transformer as in a generator). Otherwise it seem to 
give better SoC to separate the parsing and the validation step, so that 
we can have one validation transformer for each scheme language.

In some cases it might be practical to augment the xml document with 
error information to be able to give more exact user feedback on where 
the errors are located. For such applications it seem more natural to me 
to have validation in a transformer.

A question that might have architectural consequences is how the 
validation step should report validation errors. If the input is not 
parseable at all there is not much more to do than throwing an exception 
and letting the ordinary internal error handler report the situation. If 
some of the elements or attributes in the input has the wrong type we 
probably want to return more detailed feedback than just the internal 
error page. Some possible validation error report mechanisms are: 
storing an error report object in the environment e.g. in the object 
model, augmenting the xml document with error reporting attributes or 
elements, throwing an exception object that contains a detailed error 
description object or a combination of some of these mechanisms.

Mixing data and state information was considered to be a bad practice in 
  the discussion about pipe-aware selection (se references in [3]), that 
rules out using only augmentation of the xml document as error reporting 
mechanism. Throwing an exeption would AFAIU lead to difficulties in 
giving customized error reports. So I believe it would be best to put 
some kind of state describing object in the environment and possibly 
combine this whith augmentation of the xml document.

Pipe State Dependent Selection

For selecting response based on if the input document is valid or not 
you suggest the following:

   <transform type="validator">
     <parameter name="scheme" value="myInputFormat.scm"/>
   <select type="pipeline-state">
     <when test="valid">
       <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>

As I mentioned earlier this could easily be implemented with the 
"pipe-aware selection" code I submitted in [3]. Let us see how it would 

The PipelineStateSelector can not be executed at pipeline construction 
time as for ordinary selectors. The pipeline before the selector 
including the ValidatorTransformer must have been executed before the 
selection is performed. This can be implemented by letting the 
PipelineStateSelector implement a special marker interface, say 
PipelineStateAware, so that it can have special treatment in the 
selection part of the sitemap interpreter.

When the sitemap interpreter gets a PipelineStateAware selector it first 
ends the currently constructed pipeline with a serializer that store its 
sax input in e.g. a dom-tree and the pipeline is processed and the dom 
tree thith the cashed result is stored in e.g. the object model. In the 
next step the selector is executed and it can base its decision on 
result from the first part of the pipeline. If the ValidationTransformer 
puts a validation result descriptor in the object model, the 
PipelineStateSelector can perform tests on this result descriptor. In 
the last step a new pipeline is constructed where the generator reads 
from the stored dom tree, and in the example above, the first 
transformer will be an XSLTransformer.

An alternative and more explicit way to describe the pipeline state 
dependent selection above, is:

   <transform type="validator">
     <parameter name="scheme" value="myInputFormat.scm"/>
   <serialize type="object-model-dom" non-terminating="true">
     <parameter name="name" value="validated-input"/>
   <select type="pipeline-state">
     <when test="valid">
       <generate type="object-model-dom">
         <parameter name="name" value="validated-input"/>
       <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>

Here the extensions to the current Cocoon semantics is put in the 
serializer instead of the selector. The sitemap interpreter treats a 
non-terminating serializer as ordinary serializer in the sense that it 
puts the serializer in the end of the current pipeline and executes it. 
The difference is that it instead of returning to the caller of the 
sitemap interpreter, it creates a new current pipeline and continue to 
interpret the component after the serializer, in this case a selector. 
The sitemap interpreter will also ignore the output stream of the 
serializer, the serializer is suposed to have side effects. The new 
current pipeline will then get a ObjectModelDOMGenerator as generator 
and an XSLTTransformer as its first transformer.

I prefer this construction compared to the more implicit one because it 
is more obvious what it does and also as it gives more freedom about how 
to store the user input. Some people seem to prefer to store user input 
in Java beans, in some applications session parameters might be a better 
place then the object model.

Pipelines with Side Effects

A common pattern in pipelines that handle input (at least in the 
application that I write) is that the first half of the pipeline takes 
care of the input and ends with a transformer that stores the input. The 
transformer can be e.g. the SQLTransformer (with insert or update 
statements), the WriteDOMSessionTransformer, the 
SourceWritingTransformer. These transformers has side effects, they 
store something, and returns an xml document that tells if it succeeded 
or not. A conclusion from the threads about pipe aware selection was 
that sending meta data, like if the operation succeeded or not, in the 
pipeline is a bad practice and especially that we don't should allow 
selection based on such content. Given that these transformers basically 
translate xml input to a binary format and generates an xml output that 
we are supposed to ignore, it would IMO be more natural to see them as 
some kind of serializer.

The next half of the pipeline creates the response, here it is less 
obvious what transformer to use. I normally use an XSLTTransformer and 
typically ignore its input stream and only create an xml document that 
is rendered into e.g. html in a sub sequent transformer.

I think that it would be more natural to replace the pattern:

   <transform type="store something, return state info"/>
   <transform type="create a response document, ignore input"/>


   <serialize type="store something, put state info in the environment"
   <generate type="create a response document" src="response document"/>

If we give the serializer a destination attribute as well, all the 
existing serializers could be used for storing input in files etc as well.

   <serialize type="xml" dest="xmldb://..." non-terminating="true"/>

This would give the same SoC that i argued in favour of in the context 
of input: The serializer is responsible for how to serialize from xml to 
the binary data format and the destination is responsible for where to 
store the data.


I am afraid that I put more question than I answer in this RT. Many of 
them are of "best practice" character, and do not have any architectural 
consequences, and does not have to be answered right now. There are 
however some questions that need an answer:

How should pipeline components, like the validation transformer, report 
state information? Placing some kind of state object in the object model 
would be one possibility, but I don't know.

We seem to agree about that there is a need for selection in pipelines 
based on the state of the computation in the pipeline that precedes the 
selection. Here we have two proposals:

1. Introduce pipeline state aware selectors (e.g. by letting the 
selector implement a marker interface), and give such selectors special 
treatment in the sitemap interpreter.

2. Extend the semantics of serializers so that the sitemap interpreter 
can continue to interpret the sitemap after a serializer, (e.g. by a new 
non-terminating attribute for serializers).

I prefer the second proposal.

Booth proposals can be implemented with no back compatibility problems 
at all by requiring the selectors or serializer that need the extended 
semantics, to implement a special marker interface, and by adding code 
that reacts on the marker interface in the sitemap interpreter.

To use serializers more generally for storing things, as I propsed 
above, the Serializer interface would need to extend the 
SitemapModelComponent interface.


What do you think?

Daniel Fagerstrom


[3] [Contribution] Pipe-aware selection

To unsubscribe, e-mail:
For additional commands, email:

View raw message