cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject [RT] Using pipeline as sitemap components (long)
Date Thu, 21 Nov 2002 10:05:30 GMT
The discussions around Stefano's "Cocoon blocks version 1.1" showed the 
need for pipelines to provide not only resources, but also services, 
identified by their URI.

This document defines this concept of "pipeline service", which, as we 
will see, consists in using pipelines as sitemap components (generator, 
transformer and serializer). It is separated from the blocks design 
document since pipeline services can be used without blocks, even if 
they will be mostly useful in that context.

What is a pipeline ?

The concept of pipeline, a central part of the Cocoon architecture, is a 
chain of components handling XML documents as SAX events. By "handling", 
we mean 3 different things :

- generate : at the start of the chain, produce an initial document and 
feed the next component in the chain with the result.

- transform : take the content produced by preceding components in the 
chain (either a generator or another transformer), transform it and feed 
the next component in the chain with the result.

- serialize : take the content produced by the preceding component in 
the chain (either a generator or a transformer), and convert this XML 
stream to a binary stream.

These 3 concepts are represented using only 2 interfaces, XMLProducer 
and XMLConsumer :
- a generator is an XMLProducer,
- a transformer is an XMLConsumer _and_ an XMLProducer,
- a serializer is an XMLConsumer.

The "cocoon:" protocol

Up to now, we've considered pipelines as a "final" concept. This means 
that a pipeline has to be considered as a whole : it handles a request 
and answers by the result of it's execution.

Well, in fact, we "nearly" considered it as final. Consider the 
"cocoon:" protocol that is so useful. What happens if we write the 
following :

  <map:match pattern="first-uri">
    <map:generate type="file" src="cocoon://other-uri"/>
    <map:transform src="foo.xslt"/>

We're simply using another pipeline as the starting point of the current 
one. We have used a pipeline as the generator of another one.

Most often, the "other-uri" builds a pipeline that is terminated by a 
<map:serialize type="xml"/> because we want it to produce xml for the 
calling generator. But this serializer is a fake : you can put any 
serializer you like, it doesn't matter. What happens under the hood is 
that the SAX events produced by the component immediately preceding the 
serializer are used as the output of the generator in the calling pipeline.

So in the above example, when requesting "first-uri", we actually chain 
the generators and transformers of "other-uri" to the transformers and 
serializer of "first-uri".

Pipelines as generators

This leads to a first conclusion : using a pipeline as a generator means 
using the SAX events produced by the last XMLProducer of that pipeline, 
i.e. the last transformer or the generator if there are no transformers.

Since we've used a pipeline as a genererator, let's introduce a new 
generator for this purpose, instead of using the "file" one, which fools 
us in thinking we use a full pipeline when it actually strips out the 
serializer :

  <map:match src="first-uri">
    <map:generate type="pipeline" src="/other-uri"/>
    <map:transform src="foo.xslt"/>

I don't see a need for a new sitemap element such as "map:call-pipeline" 
or "map:generate-from-pipeline". What we want is to generate and initial 
content in the current pipeline, and for this we just use a particular 
implementation of a generator, as we already do for files, XSP, etc.

Pipelines as serializers

We've seen how to use a pipeline as the generator of another one, let's 
consider now the other end of the chain : using a pipeline as a serializer.

Let's suppose have defined a pipeline that gets an XML document in the 
xdoc DTD and formats it to PDF. This can be for example :

  <map:match pattern="doc2pdf">
    <map:generate src="an_xdoc.xml"/>
    <map:transform src="doc2fo.xslt"/>
    <map:serialize type="fo2pdf"/>

The interesting part here isn't the initial document, but the chaining 
of a stylesheet that produces an xsl:fo version of its input and the FOP 
serializer. This is the typical example of what is called a "service" in 
the current block specification.

Now how do we reuse this in other pipelines ? Yes, we can define a 
<map:resource>. But this resource will be available only in the current 
sitemap, and not in other sitemaps nor blocks.

What actually means "reusing" this ? This means producing a xdoc 
document and _serializing_ it to PDF. We don't actually care if there is 
a serializer to PDF that directly accepts xdocs or if there are one or 
more transformations before serializing.

This leads to a second conclusion : using a pipeline as a serializer 
means sending the SAX events of the calling pipeline to the first 
XMLConsumer of the called pipeline.

How do we use this ? Well, just as for the generator, let's define a new 
"pipeline" serializer :

  <map:generate src="another_xdoc.xml"/>
  <map:serialize type="pipeline" src="doc2pdf"/>

Note : the "src" attribute doesn't currently exist on <map:serialize>, 
but it seems the more natural and consistent way to name the called 
pipeline. Wether this translates to implementing SitemapModelComponent 
or not is another story.

Pipelines as transformers

And here comes the last use of a pipeline : as a transformer. Let's 
consider the following :

  <map:match pattern="a_page">
    <map:generate src="an_xdoc.xml"/>
    <map:transform type="i18n"/>
    <map:transform src="xdoc2html.xsl"/>
    <map:transform src="htmlskin.xsl"/>
    <map:serialize type="html"/>

The 3 transformers define a transformation service that takes an xdoc as 
input and produces some skinned html. To achieve reusability, we would 
like to have a "xdoc2skinnedHtml" transformer. We can write this like 
the following :

   <map:match pattern="a_page">
     <map:generate src="an_xdoc.xml"/>
     <map:transform type="pipeline" src="xdoc2skinnedHtml"/>
     <map:serialize type="html"/>


   <map:match pattern="xdoc2skinnedHtml">
     <map:generate type="dont_care"/>
     <map:transform type="i18n"/>
     <map:transform type="xdoc2html.xsl"/>
     <map:transform type="htmlskin.xsl"/>
     <map:serialize type="dont_care"/>

This leads to a third conclusion : using a pipeline as a transformer 
means feeding the SAX events of the calling pipeline to the first 
transformer of the called pipeline, and sending the output of the last 
transformer of the called pipeline to the next XMLConsumer of the 
calling pipeline.

Note : if there are no transformers in the called pipeline (i.e. it's 
only a generator and a serializer), the "pipeline" transformer does 
nothing and only copies its input to its output.

Relation to blocks

Up to now, we made no mention of blocks. The "src" attribute of the new 
"pipeline" sitemap components is an URI that is considered as what 
follows the first "/" in the "cocoon:" protocol :
- "/pipeline-uri" is resolved by calling the root sitemap,
- "pipeline-uri" is resolved by calling the current sitemap.

We can now introduce blocks :
- "block:foo:pipeline-uri" is resolved by calling the "foo" block.

So if we consider the transformer example above, and move the 
"xdoc2skinnedHtml" pipeline to a "skin" block, our sitemap becomes :

  <map:match pattern="a_page">
    <map:generate src="an_xdoc.xml"/>
    <map:transform type="pipeline" src="block:skin:xdoc2skinnedHtml"/>

Questions and answers

Q: What about caching when we call a pipeline ?

A: This should integrate smoothly : the cache key and validity of the 
"pipeline" generator, transformer and serializer are the composition of 
cache keys and validities of the used components of the called pipeline.


Q: Doesn't this deprecate the use of the "cocoon:" protocol ?

A: No. The only notation that may be deprecated is <map:generate 
type="file" src="cocoon://xxx"/> that can now be written <map:generate 
type="pipeline" src="/xxx"/>. Other uses of the "cocoon" protocol keep 
their usefulness.


Q: I want do define a pipeline that will be used only as a 
transformation service. Why must I write a <map:generate> and a 
<map:serialize> in its definition ?

A: Because the sitemap, as a pipeline building language, must be able to 
determine the start of a pipeline and its end, even if not all its 
components are used. Like opening and closing braces in Java, the 
generator begins the pipeline definition and the serializer ends it.

Ok. Thanks for reading so far. What are your thoughts about this ? If we 
agree on it, I'll update the Cocoon blocks document so that block 
services are shown as "pipeline" sitemap components.


Sylvain Wallez                                  Anyware Technologies 
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }

To unsubscribe, e-mail:
For additional commands, email:

View raw message