cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Fagerstrom <dani...@nada.kth.se>
Subject Re: [RT] Implementation of VPCs and "multi-relative" source resolving (long)
Date Mon, 10 Jan 2005 22:57:52 GMT
A somewhat late answer ;) But I have spent some more thinking on this 
during the discussion about exporting flowscript functions from blocks.

Sylvains original mail can be found in 
http://marc.theaimsgroup.com/?t=110064560900003&r=1&w=2.

What I will discuss is rather subtle and involved stuff, but discussing 
the gory details is IMO important for getting a robust implementation of 
VPCs that behave in an "expected" way.

Before discussing source resolving in VPC I would like to remind about 
the concepts of static and dynamic binding from program language theory 
that IMO gives some further insights to the situation.

Static and Dynamic Variable Binding
===================================

Lets look at an example program:

var x=1;

function a() {
  var y=2;
  return function() {var z=3; return x+y+z;}
}

function b() {
  var y=4;
  var c=a();
  return c();
}

Here the question is: what should b() return?

In x+y+z it is rather obvious that x should be bound to the globaly 
defined value 1 and z to the localy defined value 3, illustrating that 
local and global variables has a rather obvious interpretation. But what 
about the non local variable y? Here there are two reasonable alternatives:

Static binding: variables are bound in the context where they are 
defined, here y=2, and b()==6.
Dynamic binding: variables are bound in the context where they are 
executed, here y=4, and b()==8.

In early implementations of Lisp, dynamic binding was used, but that 
gives poor isolation as you must take the context where the function is 
executed in account to understand what it does, rather than just looking 
at the definition. Since then it is generally accepted that static 
binding is better than dynamic binding for functions.

In object oriented languages member variables used in member functions 
are statically bound, (which is a necesity for getting object 
orientation). But on the other when we write a class B that extends 
another class A we can see that the member functions are dynamically 
bound. I.e. if we call a function in A that use a function that is 
defined in booth A and B the later will be used, (this is of course 
somewhat more complicated than pure dynamic binding as we have fallback 
to functions higher up in the class hierarchy).

So we can see that static binding is good when you _use_ a function from 
somewhere else while dynamic binding is good for _extending_ something.

VPC Source Resolving
====================

So what does this have to do with source resolving in VPCs? Well using a 
VPC is like calling a function or a procedure, resolving an absolute URI 
is like dereferencing a global variable and resolving a relative URI is 
like dereferencing a non local variable, so what binding strategy should 
we use for relative URIs in VPCs?

Sylvain Wallez wrote:
<snip/>

> The problem with source resolving is that the base URI used to resolve 
> relative URIs changes when we enter a subsitemap: relative sources are 
> relative to the directory containing the "current" sitemap.
>
> That means that the base URI used to resolve e.g. the "src" attribute 
> of a <map:generate> is the one of the sitemap containing that 
> statement, and not the sitemap where the component was declared, which 
> can be a parent sitemap of the current one.
>
> This isn't a problem with URIs part of a statement ("src" attribute 
> and <map:parameter>) but is a real problem for URIs part of the 
> component configuration. That's what happens with the I18nTransformer 
> as catalogue locations are URIs defined in the component declaration, 
> thus relative to the sitemap where the component is _declared_. 
> Unfortunately, they are resolved relatively to where the component is 
> first _instanciated_, which can occur randomly in any of the current 
> sitemap and its child sitemaps, depending on how pools are managed. 
> The practical result is that we cannot reliably declare an i18n 
> transformer for use by a tree of subsitemaps.
>
> Now that we have a per-sitemap Avalon Context, we can also store in 
> that context the base URI of the sitemap declaring the component. The 
> i18n transformer just has to use that base URI to access the 
> catalogues defined in its configuration.
>
> That's what I called "multi-relative" source resolving in the subject 
> of this post: URIs coming from a component configurations will have to 
> be resolved relatively to the base URI contained in the Avalon 
> context, whereas URIs coming from sitemap statements are resolved 
> using the relative URI of the sitemap that is currently executing.

Expressed in the above terminology we could say that components today 
use a dynamic binding strategy of relative URIs which creates unexpected 
and unwanted behaviour. Sylvain describes a mechanism for using static 
binding instead. Excelent IMO.

> Still following? Now let's see source resolving in VPCs...
>
>                          --- oOo ---
>
> With VPCs, the problem is worse than with regular components, as VPCs 
> are components defined by sitemap snippets with their "src" and 
> <map:parameter>. So what does "relative" means in this context? Is it 
> relative to the calling sitemap or relative to the sitemap that 
> defines the VPC? The result is "it depends"!
>
> It depends on whether the URI is passed from the calling environment 
> (it's then relative to the calling sitemap) or is some local data used 
> by the VPC implementation such as an XSLT (it's then relative to the 
> sitemap defining the VPC).
>
> So how do we distinguish them? A solution was proposed [1] where we 
> added some typing information to the sitemap statements calling the 
> VPC, so that URIs could be absolutized before the actual call.
>
> That is actually wrong, as it forces the user of a component to 
> explicitely indicate that some particular action should be taken on a 
> parameter, whereas this information is related to the implementation 
> of the component. Furthermore, forgetting to specify that 
> absolutization has to be performed can lead to weird behaviours 
> difficult to debug.
>
> So, it's the VPC's responsibility to make explicit in its definition 
> what values coming from the caller have to be absolutized relatively 
> to the calling sitemap.
>
> For this, I propose that VPC definitions have additional statements 
> defining what parameters have to be absolutized, e.g.:
>
> <map:generator name="foo">
>  <map:absolutize param="src"/>
>  <map:absolutize param="bar"/>
>
>  <map:generate type="file" src="{src}">
>    <map:parameter name="baz" value="bar"/>
>  </map:parameter>
>  <map:transform src="data/{skin}.xslt/>
> </map:generator>
>
> The input parameters "src" (actually the "src" attribute in the 
> calling statement) and "bar" are first absolutized relatively to the 
> calling sitemap, and then the base URI of the sitemap defining the VPC 
> becomes the new relative context, used e.g. to resolve 
> "data/{skin}.xslt".
>
> That way, we can also implement multi-relative source resolving in 
> sitemap statements.


Also here I agree with the analysis of the situation, relative URIs 
within the VPC should be resolved relative to the sitemap (block), they 
are defined in, i.e. static binding. URIs as parameters to VPCs should 
be resolved relative to the calling sitemap (block). However there are 
some subtilities in the parameter passing that makes me suggest a 
somewhat different implementation.

I will describe things as each block or sitemap has its own source 
resolver, that knows how to resolve relative URIs in that context (and 
that also have access to all public resources with absolute URIs). I 
find it easier to describe the behaviour in such terms. Whether that is 
a good implemetation strategy or not is another question.

Now, the "ideal" solution would IMO be that the VPC declare its URI 
input parameters as Source and that the framework resolves the input 
URIs with the callers source resolver. That would give complete 
isolation between the source resolver of the caller and the source 
resolver of the block. This solution is in practice not possible as the 
SitemapModelComponent interface (that is used by most sitemap 
components) take a String and SourceResolver as arguments rather than a 
Source. And changing the SitemapModelComponent interface would suddenly 
make the Avalon change to Servicable seem like a relatively popular 
decision ;) So we have no other choice than giving the pipeline 
components in the VPC, URI strings as arguments.

Sylvain solves this problem by transforming relative input URIs to 
absolute URIs (relative to the callers context). This absolute URI can 
the be resolved by the VPCs resolver. This is a good solution that gives 
the correct semantics IMO. But it imposes some restrictions on what we 
can do.

Say that we have a block B that want to apply a VPC from block A on some 
of its files or some of its internal pipelines. This is certainly a 
relevant and usable thing to be able to do. But this creates problems, 
as neither the files nor the internal pipelines are reachable from the 
global context.

One way to solve this would be make all resources reachable from the 
global context, but I find that very unatractive as that takes away the 
isolation between blocks that IMO is one of the most important reasons 
for introducing them. Another possibility is to require a block to make 
all resources that it want to use in external VPCs available through its 
sitemap. But that also breaks isolation. Still another possiblity is to 
send the resolver of the calling component to the called one, but thats 
is even worse as it both means that a component must make its internals 
available to all components that it want to use and furthermore, the 
called component must know when it should use its own and when it should 
use its callers source resolver.

I haven't found any simple and elegant solution to this problem, but at 
least I think that I have a possible solution:

* A source parameter in the VPC is declared as a Source and resolved to 
a Source by the framework, that uses the callers source resolver. This 
is like in the "ideal" case described above.
* The resolved Source object is put in some temporary place where it can 
be reached by a special protocol, "param:arg1" say. * Then this URI, 
"param:arg1", is used as parameter to the internal components in the VPC.

More complicated than I would have liked, but AFAICS it should solve the 
problems that I outlined above.

                         --- oOo ---

Ok, isn't this overly paranoid, do we need this level of isolation?

IMO we need that. One of the reasons that OO have become so popular and 
successfull for building large systems is that it provides mechanism for 
isolating components. Without isolation you must check _all_ code in 
_all_ components if something within a component was changed in an 
unexpected way.

Also, even if we probably not are going to have class loader isolation 
in the first version of blocks, we should at least design for isolation 
to the best of our knowledge.

> We may actually want to go a bit further by allowing any computation 
> to provide input parameters using input modules, e.g.
> <map:generator name="foo">
>  <map:parameter name="src" value="{absolutize:{src}}"/>
>  ...

I prefer explicit decaration of all input parameters so that one can see 
the contract without needing to browse the VPCs implementation.

> But the source-resolving problem is not finished...
>
>                          --- oOo ---
>
> The last source-resolving problem is related to URIs that may be 
> present in the SAX stream, e.g. XInclude URIs. What are they relative to?
>
> My feeling here is that we need to distinguish for a single VPC the 
> base URI used to resolve URIs within the setup phase (i.e. "src" and 
> <map:parameter>) and the base URI used to resolve URIs during the 
> processing phase.
>
> That could be achieved using an additional attribute on the component 
> declaration, i.e. in the above example something like
>
> <map:generator name="foo" stream-uris-base="local|caller">

First I wouldn't like a VPC to be able to resolve URIs in its callers 
context. This is based on my opinions about isolation discussed above. 
If the caller want to use XIncludes that involves relative URIs it can 
use an XIncludeTransformer on the stream before passing it to a VPC. 
Second, running an XIncludeTransformer on a input stream of a VPC means 
that all internal URIs in the VPC are exposed. But if the VPC writer 
finds that ok, I would assume that resolving them in the VPC context 
would be the most expected result.

I think it could be a good idea to make the XIncludeTransformer (and 
similar things) configurable so that one can require them to only 
resolve absolute URIs for VPC usage on input streams.

<snip/>

Polymorphic Blocks
==================

This far I (and before me Sylvain) have proposed that static binding is 
the best strategy for VPCs that are _used_ by some other sitemap or 
block. But as a sidenote it might be worth mentioning that we can make 
good use of dynamic bindning as well.

In Stefano's Cocoon Blocks document 
http://wiki.apache.org/cocoon/BlockIntroduction a mechanism for block 
inheritance is described where a block B can extend a block A. Say that 
block A makes /foo/bar available through its sitemap then block B can 
overide /foo/bar by defining it in its own sitemap. If B doesn't define 
it, fallback to the version of block A will be used.

We can push this further by allowing dynamic resolution of relative 
URIs. We could introduce a protocol "dynamic:" (or maybe "polymorphic:") 
that resolves URIs according to the dynamic strategy.

If block A use "dynamic:/foo/bar" whithin some of its pipelines the 
_extending_ block B will be able to override the default behaviour by 
providing its own version of /foo/bar. This is very usefull if you have 
a block that uses some default configuration documents and content 
files. Then you can step by step customize its behaviour by providing 
own versions of what you want to change.

We have used something like that through some simple "sitemap magic" in 
a couple of our applications for reusing common parts, with good results.

                         --- oOo ---

WDYT?

/Daniel


Mime
View raw message