cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Fagerstrom <dani...@nada.kth.se>
Subject Re: [RT] Input Pipelines (long)
Date Wed, 18 Dec 2002 23:27:31 GMT
Nicola Ken Barozzi wrote:
> 
> Daniel Fagerstrom wrote:
> [...]
> 
> Cocoon is symmetric, if you see it as it really is, a system that 
> transforms a Request in a Response.
> 
> The problem arises in the way we have defined the request and the 
> response: The Request is a URL, the response is a Stream.
> 
> So actually Cocoon transforms URIs in a stream.
> 
> The sitemap is the system that demultiplexes URIs by associating them 
> with actual source of the data. This makes cocoon richer than a system 
> that just hands an entity to transform: Cocoon uses indirect references 
> (URLs) instead.
> 
> The Stream as an input is a specialization, so I can say in the request 
> to get stuff from the stream.
> 
> More on this later.
> 
>> In a sitemap an input pipeline could be used e.g. for implementing a
>> web service:
>>
>> <match pattern="myservice">
>>   <generate type="xml">
>>     <parameter name="scheme" value="myInputFormat.scm"/>
>>   </generate>
>>   <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
>>   <serialize type="dom-session" non-terminating="true">
>>     <parameter name="dom-name" value="input"/>
>>   </serialize>
>>   <select type="pipeline-state">
>>     <when test="success">
>>       <act type="my-business-logic"/>
>>       <generate type="xsp" src="collectTheResult.xsp"/>
>>       <serialize type="xml"/>
>>     </when>
>>     <when test="non-valid">
>>       <!-- produce an error document -->
>>     </when>
>>   </select>
>> </match>
> 
> 
> What you correctly point out, is that the Generation phase not get the 
> source, but just transform it to SAX.
<snip/>
> But IMHO this has a deficiency of fixing the source from the input.
My intension was that that when not using src attribute, the generator 
should read the input stream.

> Think about having good Source Protocols.
> 
> We could write:
> 
>  <match pattern="myservice">
>    <generate type="xml" src="inputstream:myInputFormat.scm"/>
>    ...
>  </match>
> 
> This can easily make all my Generators able to work with the new system 
> right away.
This seem to be a better solution. Can please expand about why you put 
the scheme in the inputstream: protocol.

> 
>> Here we have first an input pipeline that reads and validates xml
>> input, transforms it to some appropriate format and store the result
>> as a dom-tree in a session attribute. A serializer normally means that
>> the pipeline should be executed and thereafter an exit from the
>> sitemap. I used the attribute non-terminating="true", to mark that
>> the input pipeline should be executed but that there is more to do in
>> the sitemap afterwards.
> 
> 
> Pipelines can already call one another.
> We add the serializer at the end, but it's basically skipped, thus 
> making your pipeline example.
The idea is using two pipelines, executed in sequence, for processing a 
post. First the input pipeline that is responsible for reading the input 
data, trandform it to an appropriate format and store it, after that the 
stored data can be used for the business logic that can be called from 
an action, after the action an ordinary output pipeline is executed for 
publishing the result of the business logic, for sending the next form 
page etc.

In this scenario the serializer in the input pipeline is responsible for 
storing the input data and can thus not be skipped. Furthermore as we 
are going to execute two pipelines in sequence, the first serializer 
must not mean an exit from the sitemap as it normally would do.

I think it is better SoC and reuse of components, to let a serializer be 
responsible for storing input data than to use transformers for that. 
Write DOM session transformer, source writing transformer, 
SQLTransformer used for inserting data and the session transformer would 
IMHO be more natural as serializers.

> I would think that with the blocks discussion there has been some 
> advancement on the definition of pipeline fragments.
> I didn't follow it closely though, anyone care to comment?
> 
>> After the input pipeline there is a selector that select the output
>> pipeline depending of if the input pipeline succeed or not. This use
>> of selection have some relation to the discussion about pipe-aware
>> selection (see [3] and the references therein). It would solve at
>> least my main use cases for pipe-aware selection, without having its
>> drawbacks: Stefano considered pipe-aware selection mix of concern,
>> selection should be based on meta data (pipeline state) rather than on
>> data (pipeline content). There were also some people who didn't like
>> my use of buffering of all input to the pipe-aware selector. IMO the
>> use of selectors above solves booth of these issues.
> 
> 
> I don't see this. Can you please expand here?
1. Selection should be based on pipeline state instead of pipeline data. 
   First the input pipeline is executed and is able to set the state of 
the pipeline. After that ordinary selects can be used for deciding how 
to construct the output pipeline. The selectors for the output pipeline 
has no access some pipeline content and are used in exactly the same way 
as selector allwys are used.

2. No use of buffering within the pipeline. IIRC some people were 
concerned with that pipe aware selection based on buffering of the sax 
events before the selection, could be very inefficient if there is much 
data in the pipeline. As my main use case for pipe aware selection was 
to use it after transformers with side effects, and after validation of 
user submitted input data. I never saw it as problem as the amount of 
data in the mentioned cases typically is quite small. Anyway, with input 
pipelines selection is restricted to cases where the input was going to 
be stored by the system anyhow.

> [...]
> 
>> In Flowscripts
>> --------------
>>
>> IIRC the discussion and examples of input for flowscripts this far has
>> mainly dealed with request parameter based input. If we want to use
>> flowscripts for describing e.g. web service flow, more advanced input
>> handling is needed. IMO it would be an excelent SOC to use output
>> pipelines for the presentation of the data used in the system, input
>> pipelines for going from input to system data, java objects (or some
>> other programming language) for describing business logic working on
>> the data within the system, and flowscripts for connecting all this in
>> an appropriate temporal order.
> 
> 
> Hmmm, this seems like a compelling use case.
> Could you please add a concrete use-case/example for this?
> Thanks :-)
One use case, (if combined with persistent storage of continuations), 
would be a workflow system.

Besides that, input pipelines are IMO very usefull for handling request 
parameters from forms as well. In all webapps that we build at my 
company, we use absolute xpaths as request parameter names and then use 
a generator that builds a xml document from  the name/value pairs. This 
xml input is then possibly transformed to another format and therafter 
stored in a db or as a dom tree in a session attribute.

A flowscript that uses input pipelines might look like:

handleForm("formPage1.html", "storeData1");
if (objectModel["state"] == "succees")
   doBusinessLogic1(...);
...

Where formPage1.html is an output pipeline that produces a form and 
storeData handles and store the input.
> 
>> For Reuseability Between Blocks
>> -------------------------------
>>
>> There have been some discussions about how to reuse functionality
>> between blocks in Cocoon (see the threads [1] and [2] for
>> background). IMO (cf. my post in the thread [1]), a natural way of
>> exporting pipeline functionality is by extending the cocoon pseudo
>> protocol, so that it accepts input as well as produces output. The
>> protocol should also be extended so that input as well as output can
>> be any octet stream, not just xml.
>>
>> If we extend generators so that their input can be set by the
>> environment (as proposed in the discussion about input pipelines), we
>> have what is needed for creating a writable cocoon protocol. The web
>> service example in the section "In Sitemaps" could also be used as an
>> internal service, exported from a block.
>>
>> Booth input and output for the extended cocoon protocol can be booth
>> xml and non-xml, this give us 4 cases:
>>
>> xml input, xml output: could be used from a "pipeline"-transformer,
>> the input to the transformer is redirected to the protocol and the
>> output from the protocol is redirected to the output of the
>> transformer.
>>
>> non-xml input, xml output: could be used from a generator.
>>
>> xml input, non-xml output: could be used from a serializer.
>>
>> non-xml input, non-xml output: could be used from a reader if the
>> input is ignored, from a "writer" if the output is ignored and from a
>> "reader-writer", if booth are used.
>>
>> Generators that accepts xml should of course also accept sax-events
>> for efficiency reasons, and serializer that produces xml should of the
>> same reason also be able to produce sax-events.
> 
> 
> Also this seems interesting.
> 
> Please add concrete examples here to, possibly applied to blocks.
> I know it's hard, but it would really help.
What I tried to describe is just a somewhat different approach to how to 
describe reusable pipeline fragments between blocks, so for use cases 
please see Sylvains and Stefanos original posts in the threads [1] and [2].

Lets take a look on an example from Sylvains post (in [1]) to illustrate 
what I have in mind:

    <map:match pattern="a_page">
      <map:generate src="an_xdoc.xml"/>
      <map:transform type="pipeline" src="xdoc2skinnedHtml"/>
      <map:serialize type="html"/>
    </map:match>

    <map:match pattern="xdoc2skinnedHtml">
      <map:generate type="dont_care"/>
      <map:transform type="i18n"/>
      <map:transform type="xdoc2html.xsl"/>
      <map:transform type="htmlskin.xsl"/>
      <map:serialize type="dont_care"/>
    </map:match>

Here the idea is that when xdoc2skinnedHtml is used from a pipeline 
transformer the generator and the serializer is not used and only the 
sub pipeline consisting of the three transformers in the middle is used. 
This behaviour is inspired by the cocoon: protocol where the serializer 
is skipped.

Several people thought that the removal of parts of generators and 
serializer depending on the usage context of the pipeline, confusing. 
Carsten wrote that:
"It is correct, that internally in most cases the serializer
of a pipeline is ignored, when the cocoon protocol is used.
But this is only because of performance."
And that a pipeline used from the cocoon protocol is supose to end with 
an xml serializer. I agree with this and think that it would be better 
to express the example above as (cf with my post in [1]):

    <map:match pattern="a_page">
      <map:generate src="an_xdoc.xml"/>
      <map:transform type="pipeline" src="cocoon:xdoc2skinnedHtml"/>
      <map:serialize type="html"/>
    </map:match>

    <map:match pattern="xdoc2skinnedHtml">
      <map:generate src="inputstream:xdoc.scm"/>
      <map:transform type="i18n"/>
      <map:transform type="xdoc2html.xsl"/>
      <map:transform type="htmlskin.xsl"/>
      <map:serialize type="xml"/>
    </map:match>

Here the cocoon: protocol is suposed to be a writable source. The 
function of the pipeline transformer is that it serializes its xml 
input, redirect it to the writable source in the src attribute, parses 
the xml output stream from the source and output the result from the 
parser as sax events. Of course the serialize-parse steps should be 
optimzed away, but this should be considered an implementation detail 
not part of the semantics.

By further generalizing the cocoon: protocol so that it allows non-xml 
output (and input) it can be used for the pipeline serializer that 
Sylvain proposed as well. For the pipeline generator the cocoon: 
protocol can be used as is.

> 
> It seems that what you propose Cocoon already mostly has, but it's more 
> the use-case and some minor additions that have to be put forward.
> 
>> Conclusion
>> ----------
>>
>> The ability to handle structured input (e.g. xml) in a convenient way,
>> will probably be an important requirement on webapp frameworks in the
>> near future.
>>
>> By removing the asymmetry between generators and serializers, by letting
>> the input of a generator be set by the context and the output of a
>> serializer be set from the sitemap, Cocoon could IMO be as good in
>> handling input as it is today in producing output.
> 
> 
> Cocoon already does this, no?
> Can't we use the cocoon:// protocol to get the results of a pipeline 
> from another one? What would change?
As said above, the cocoon protocol should be writable as well as 
readable and allow for non xml input and output. The block protocol 
could use the same ideas and thus give a good way of exporting 
functionality.

To realize the above ideas we would need to implement the inputstream 
protocol that in turn would require that the Request interface is 
extended with a getInputStream() method. The cocoon protocol should be 
extended as described. The proposed extension of the serializer for the 
use in input pipelines would require serializers to implement 
SitemapModelComponent.

Thank you for your comments.

/Daniel Fagerstrom

<snip/>

>> References
>> ----------
>>
>> [1] [RT] Using pipeline as sitemap components (long)
>> http://marc.theaimsgroup.com/?t=103787330400001&r=1&w=2
>>
>> [2] [RT] reconsidering pipeline semantics
>> http://marc.theaimsgroup.com/?t=102562575200001&r=2&w=2
>>
>> [3] [Contribution] Pipe-aware selection
>> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2
> 
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message