cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylvain.wal...@anyware-tech.com>
Subject Re: [RT] reconsidering pipeline semantics
Date Tue, 02 Jul 2002 21:17:17 GMT
Stefano Mazzocchi wrote:

>In light of the discussion on blocks, Sylvain pointed out that cocoon
>services should be mapped to pipelines and not to resources directly.
>
>This consideration triggered a few RT that I would like to share with
>you and trigger further discussion.
>
>NOTE: this is nothing related to blocks or flow, but only at the sitemap
>semantics.
>
>                                 - o -
>
>What is a pipeline
>------------------
>
>The first and major architectural contribution that Cocoon brought in
>the web world is the ability to compose web services using the "pipe and
>filters" design pattern. (I'm using 'web services' in the original sense
>of the term: any service that is related to the web)
>
>Cocoon decided to follow an XML-oriented approach to pipelines, forcing
>everything in the XML real and working on that from there. So, the
>Cocoon's pipelines concept is somewhat an extension to the original GoF
>"pipe and filters" pattern: in fact, the Cocoon pipeline implements both
>'pipe and filters' and 'adaptor' patterns.
>
>Why? well, this comes from the fact that the HTTP protocol is not XML
>oriented (unlike SOAP, for example). So, in order to perform XML piping,
>we need to adapt in and out from the generic octet-stream world.
>
>So, unlike the UNIX pipeline which doesn't need adaptation (since the
>STDIN/OUT streams are all octet-oriented), Cocoon needed to create ways
>to adapt to the rest of the world which is not XML oriented.
>
>For this reason, why a UNIX pipeline is composed like this
>
> input -> filter -[pipe]-> filter -[pipe]-> filter -> output
>
>a cocoon pipeline is composed by
>
> input -> adaptor -[pipe]-> filter -[pipe]-> adaptor -> output
>
>unfortunately, the above picture isn't entirely correct since the two
>adaptors can't be exchanged, thus they are, in fact, different entities:
>the first adapts an octet-based world to an XML-based world, the other
>does the opposite. They are not symmetrical. In Cocoon terminology, the
>first adapter is a generator, the second is a serializer.
>
>We call 'Cocoon pipeline' the collection of all filters (transformers)
>and adapters (generator and serializer) because there cannot be a
>pipeline without adapters.
>
>I think it's time to challenge this concept.
>
>                            - o -
>
>What are sitemap resources?
>---------------------------
>
>Let me tell you: they are a mistake, a mistake I did trying to reduce
>the sitemap verbosity and fixing a problem that didn't yet emerged at
>that time. Early optimization is the root of all evil and I see that
>now: resources overlap with pipelines.
>
>Let me show you why. Consider this sitemap snippet:
>
> <sitemap>
>  <resources>
>   <resource name="blah">
>    <generate ../>
>    <transform ../>
>    <serialize ../>
>   </resource>
>  </resources>
> 
>  <pipelines>
>   <pipeline internal-only="true">
>    <match pattern="*">
>     <call resource="blah"/>
>    </match>
>   </pipeline>
>  </pipelines>
> </sitemap>
>
>and now this
>
> <sitemap>
>  <pipelines>
>   <pipeline name="blah">
>    <generate ../>
>    <transform ../>
>    <serialize ../>
>   </pipeline>
>
>   <pipeline>
>    <match pattern="*">
>     <call pipeline="blah"/>
>    </match>
>   </pipeline>
>  </pipelines>
> </sitemap>
>
>which one is more semantically consistent? Can you say "named XSLT
>templates"?
>
>Composing pipelines
>-------------------
>
>Let me assume the above syntax gets introduced. At this point, we have
>four different ways to call a pipeline:
>
> - as a pipeline
> - as a generator
> - as a transformer
> - as a serializer
>
>let me write the code so you understand what I mean:
>
>[using a pipeline as a pipeline] (as today)
>
>   <pipeline>
>    <match pattern="*">
>     <call pipeline="blah"/>
>    </match>
>   </pipeline>
>
>nothing fancy here. Used mainly for verbosity reduction when the same
>pipeline is used in different places.
>
>[using a pipeline as a generator]
>
>   <pipeline>
>    <match pattern="*">
>     <call pipeline="blah"/>
>     <transform .../>
>     <serialize ../>
>    </match>
>   </pipeline>
>
>in this case, the 'serializer' of the called pipeline is not used and
>the output of the last tranformer of the named pipeline is connected
>with the input of the transformer right after the call.
>
>This is equivalent of *overloading* the serializer of the called
>pipeline with the rest of the pipeline in place.
>
>[using a pipeline as a transformer]
>
>   <pipeline>
>    <match pattern="*">
>     <generate ../>
>     <call pipeline="blah"/>
>     <serialize ../>
>    </match>
>   </pipeline>
>
>where both the generator and the serializer of the named pipeline are
>not used.
>
>This is equivalent of *overloading* both the generator and the
>serializer of the called pipeline with the rest of the pipeline in
>place.
>
>[using a pipeline as a serializer]
>
>   <pipeline>
>    <match pattern="*">
>     <generate ../>
>     <tranform ../>
>     <call pipeline="blah"/>
>    </match>
>   </pipeline>
>
>where the generator of the named pipeline is not used.
>
>This is equivalent of *overloading* the generator of the called pipeline
>with the rest of the pipeline in place.
>
>                               - o -
>
>So, here is what I propose:
>
> - add the 'pipeline' attribute to 'map:call'
> - add the 'name' attribute to 'map:pipeline'
> - deprecate the 'map:resources' element
> - deprecate 'internal-only' attribute of 'map:pipeline' 
>   [because named pipelines become implicitly internal-only]
> - allow 'map:call' to be executed in any place, performing the pipeline
>overloading behavior I explained above.
>
>What do you think?  
>  
>

Although it is ok to call named pipelines _inside_ a sitemap (that's 
just a name change for resources), I don't like it for _inter sitemap_ 
calls, like can or will be the case for subsitemaps and blocks : up to 
now, the input contract of the sitemap is the environment, and pipeline 
choice is most often directed by the request URI. Does calling named 
pipelines mean you want to add a new property to the environment, just 
as the view and action we have today ?

IMO, the called pipeline should be defined by an URI, just as what we 
already use for the "cocoon:" pseudo-protocol. This wouldn't introduce 
yet-another naming scheme and would keep the existing sitemap contract.

Of course, we must keep today's resources as "named pipeline snippets" 
inside a single sitemap. To answer Peter's request, we can allow a 
resource to be not terminated (i.e. not contain a serializer). I even 
think the treeprocessor already handles this (needs to be verified, though).

                              --o0o--

The second point I'm not comfortable with is implicit overloading. I 
have the feeling the associated behaviour will be difficult to predict 
and will make the sitemap hard to read by requiring lots of "look-ahead".

Consider serializer overloading. The current sitemap definition says 
that a pipeline is terminated when a <map:serialize> or <map:read> is 
encountered. With the implicit overloading semantic, this rule is no 
longer valid as the calling pipeline _may_ or _may not_ define another 
serializer. And as any <map:serialize> that's present _below_ the 
<map:call> can theoretically terminate the pipeline, this means that 
knowning if the called pipeline serializer is overloaded requires 
traversal of the entire remaining part of the sitemap, even if all 
remaining serializers are enclosed in <map:match> that will never match.

So I'm in favor of a more explicit semantic that clearly defines what 
the caller wants to use in the called pipeline. For this, we can use the 
existing <map:generate>, <map:transform> and <map:serialize> :

* use the full pipeline (generator & transformers & serializer). We 
already have it today :
  <map:redirect-to uri="cocoon:/pipeline_uri"/>

* use generator & transfomers (ignore serializer). We already have it 
today :
  <map:generate type="file" src="cocoon:/pipeline_uri"/>

* use the transformation part (ignore generator & serializer)
  <map:transform type="pipeline" src="pipeline_uri"/>

* use the transformers & serializers (ignore generator)
  <map:serialize type="pipeline" src="pipeline_uri"/>

The first two notations, although currently in use, may be changed to 
something more consistent with the last ones which are new :
  <map:redirect-to pipeline="pipeline_uri"/>
  <map:generate type="pipeline" src="pipeline_uri"/>

Thoughts ?

Sylvain

-- 
Sylvain Wallez
 Anyware Technologies                  Apache Cocoon
 http://www.anyware-tech.com           mailto:sylvain@apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message