cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject [RT] Implementation of VPCs and "multi-relative" source resolving (long)
Date Tue, 16 Nov 2004 22:52:40 GMT
Hi team,

Recently Vadim started to scratch the VPC itch (for those who wonder, 
VPC = "virtual sitemap component"), and he pinged me a few times on ICQ 
to discuss their implementation. And I'm happy to say that we found what 
I consider a simple and elegant solution to many problems, even ones we 
have today.

Enough teasing, let's explain it all now. Warning you may get lost if 
you don't know how the pipeline and sitemap engine works :-)

Implementing VPCs means that we have to wrap a ProcessingNode (the 
execution unit of the sitemap interpreter) in an implementation of 
Generator, Transformer, Serializer or Reader. The problem is that these 
objects have very different lifecycles:
- a VPC's ProcessingNode is attached to a Processor instance (a sitemap) 
and is known at sitemap build time.
- a sitemap component (G, T, S or R) is poolable and therefore created 
on demand when used.

This "on demand" creation means it can occur e.g. in a subsitemap, i.e. 
in an environment context (used to resolve relative URIs) that is 
different from the one where it was actually declared. We see some 
effects of this with the I18NTransformer who sometimes fails to load 
message catalogues if its first use is in a child sitemap of the one 
where it was declared. We'll come back to this later.

                           --- oOo ---

So what object or data does a component have acces to data that is 
directly related to the actual sitemap where it was declared? Such an 
object is the ServiceManager (SM), as each sitemap defines a new SM to 
hold its components. However, extending the SM interface to provide 
access to additional data is a bad idea as it ties components to a 
particular extension of the Avalon framework contract. Now if we look at 
the Avalon lifecycle, we see the Contextualizable interface, where the 
SM passes to the components it manages the Avalon Context it itself was 
contextualized with.

Currently, there's only one Context object throughout the whole Cocoon 
system, created by the top-level environment (servler or CLI) and 
holding either webapp-wide data (e.g. the work directory, the servlet 
context, etc) or request-specific data (object model, etc). We can 
change this so that each sitemap defines its own Avalon Context that 
will be passed to every Contextualizable component managed by its local SM.

That per-sitemap Context can then hold any information known at 
sitemap-build time that could be needed by VPCs, whatever environment 
they are created in (subsitemap, remote blocks, etc).

So the TreeBuilder can put in the context a Map for each kind of sitemap 
component that associates VPC names to the corresponding ProcessingNode. 
The VPC sitemap component implementation can then,  _whatever the 
environment it is created in_, get its associated processing node and 
invoke it to build a partial pipeline that will behave like a "regular" 

"Partial" pipeline means that we will need some special implementations 
of the Pipeline interface that accept incomplete pipelines. Vadim 
already started working on this and for example, the pipeline for a 
virtual serializer won't accept a generator, but will accept zero or 
more transformer and will require a serializer.

Now that we know how to implement VPCs as regular components, on to 
source resolving...

                          --- oOo ---

The problem with source resolving is that the base URI used to resolve 
relative URIs changes when we enter a subsitemap: relative sources are 
relative to the directory containing the "current" sitemap.

That means that the base URI used to resolve e.g. the "src" attribute of 
a <map:generate> is the one of the sitemap containing that statement, 
and not the sitemap where the component was declared, which can be a 
parent sitemap of the current one.

This isn't a problem with URIs part of a statement ("src" attribute and 
<map:parameter>) but is a real problem for URIs part of the component 
configuration. That's what happens with the I18nTransformer as catalogue 
locations are URIs defined in the component declaration, thus relative 
to the sitemap where the component is _declared_. Unfortunately, they 
are resolved relatively to where the component is first _instanciated_, 
which can occur randomly in any of the current sitemap and its child 
sitemaps, depending on how pools are managed. The practical result is 
that we cannot reliably declare an i18n transformer for use by a tree of 

Now that we have a per-sitemap Avalon Context, we can also store in that 
context the base URI of the sitemap declaring the component. The i18n 
transformer just has to use that base URI to access the catalogues 
defined in its configuration.

That's what I called "multi-relative" source resolving in the subject of 
this post: URIs coming from a component configurations will have to be 
resolved relatively to the base URI contained in the Avalon context, 
whereas URIs coming from sitemap statements are resolved using the 
relative URI of the sitemap that is currently executing.

Still following? Now let's see source resolving in VPCs...

                          --- oOo ---

With VPCs, the problem is worse than with regular components, as VPCs 
are components defined by sitemap snippets with their "src" and 
<map:parameter>. So what does "relative" means in this context? Is it 
relative to the calling sitemap or relative to the sitemap that defines 
the VPC? The result is "it depends"!

It depends on whether the URI is passed from the calling environment 
(it's then relative to the calling sitemap) or is some local data used 
by the VPC implementation such as an XSLT (it's then relative to the 
sitemap defining the VPC).

So how do we distinguish them? A solution was proposed [1] where we 
added some typing information to the sitemap statements calling the VPC, 
so that URIs could be absolutized before the actual call.

That is actually wrong, as it forces the user of a component to 
explicitely indicate that some particular action should be taken on a 
parameter, whereas this information is related to the implementation of 
the component. Furthermore, forgetting to specify that absolutization 
has to be performed can lead to weird behaviours difficult to debug.

So, it's the VPC's responsibility to make explicit in its definition 
what values coming from the caller have to be absolutized relatively to 
the calling sitemap.

For this, I propose that VPC definitions have additional statements 
defining what parameters have to be absolutized, e.g.:

<map:generator name="foo">
  <map:absolutize param="src"/>
  <map:absolutize param="bar"/>

  <map:generate type="file" src="{src}">
    <map:parameter name="baz" value="bar"/>
  <map:transform src="data/{skin}.xslt/>

The input parameters "src" (actually the "src" attribute in the calling 
statement) and "bar" are first absolutized relatively to the calling 
sitemap, and then the base URI of the sitemap defining the VPC becomes 
the new relative context, used e.g. to resolve "data/{skin}.xslt".

That way, we can also implement multi-relative source resolving in 
sitemap statements.

We may actually want to go a bit further by allowing any computation to 
provide input parameters using input modules, e.g.
<map:generator name="foo">
  <map:parameter name="src" value="{absolutize:{src}}"/>

But the source-resolving problem is not finished...

                          --- oOo ---

The last source-resolving problem is related to URIs that may be present 
in the SAX stream, e.g. XInclude URIs. What are they relative to?

My feeling here is that we need to distinguish for a single VPC the base 
URI used to resolve URIs within the setup phase (i.e. "src" and 
<map:parameter>) and the base URI used to resolve URIs during the 
processing phase.

That could be achieved using an additional attribute on the component 
declaration, i.e. in the above example something like

<map:generator name="foo" stream-uris-base="local|caller">

Now we should have considered every source-resolving problem :-)

                          --- oOo ---

Ok, thanks for reading so far.

As a conclusion, the main change in the current architecture that leads 
to solving a great number of problems is that we will now have a 
per-sitemap Avalon Context rather than a single webapp-wide one.

That context will contain:
- ProcessingNodes to be wrapped as regular components,
- the base URI of the associated sitemap,
and will of inherit all other entries from its parent context.

Once we have that, many things will follow and although there are still 
some details to be sorted out such as in-stream URIs, I think we now 
have an answer to most if not all the nasty questions that were somehow 
blocking the implementation of VPCs.

And as VPCs are an important part of the real blocks puzzle, the next 
step will be to integrate all this with the new kernel.

Thanks a lot to Vadim for starting the work on VPCs and triggering all 
these thoughts.

Thoughts, comments?



Sylvain Wallez                                  Anyware Technologies 
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }

View raw message