cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject [RT] ComponentizedProcessor (was RE: Migrating TreeProcessor to Fortress)
Date Tue, 11 Nov 2003 20:48:16 GMT
Hi all,

Here's a RT about Unico's proposal of "flattening" the sitemap for the 
migration to Fortress. Please read carefully, this has a lot of 
implications.


Introduction
------------
Today isn't worked in France. We "celebrate" (should we enjoy of that?) 
the end or Word War I, and this is the occasion to explain children what 
their grand-grand-fathers went through a century ago, hoping this won't 
happen again. I was doing some DIY at home, and manual work freezes my 
brain. So while digging in the garden, I was thinking of Unico's 
"iconoclast" proposal about the sitemap engine. Yes, the treeprocessor 
is still somehow "my baby", and seeing it shaked as it is these days 
makes me think a lot about it.

And then came the sudden revelation: Unico's idea is brilliant and its 
implications go far beyond the migration to Fortress.


Implications
------------
Considering every sitemap statement as a component allows to very easily 
implement a number of features that are either were wanted for long but 
were never implemented because of their complexity, or that will be 
needed for blocks:

1/ Virtual components
Virtual components are sitemap snippets that can be used in place of 
"regular" components. I many languages, these are called "macros". With 
sitemap statements as components, virtual components are a breeze to 
implement: just lookup the component, and see if what's returned is a 
regular sitemap component (e.g. a Serializer) or if it's a 
ProcessingNode. If it's a regular sitemap component, add it to the 
pipeline, and otherwise invoke the ProcessingNode.

What I'm not sure about here, is if its possible (or even desirable) 
that we can have two different implementation interfaces for a single role.

2/ Resources inheritance
Resources are nothing more than untyped virtual components (yeah 
Stefano, I know, they should be serializers). So if a resource isn't 
defined in a sitemap, we go up to the parent sitemap's component manager 
and lookup the resource there.

3/ Block-defined sitemap components
A block can provide sitemap (and other) components to other blocks, 
including virtual components. Nothing special here actually, but the 
fact that block inheritance is implemented, once again, by the parent 
relationship of component managers.

3/ View inheritance
Views are nothing more than virtual serializers, with the main 
difference that their hint is defined at runtime by the "cocoon-view" 
parameter. And since these are components, lookup goes up to the parent 
sitemap if a view is not declared in a given sitemap, thus providing 
inheritance.


Side note: relative URIs
------------------------
The various considerations about inheritance above leads to the question 
of resolution of relative source URI (Carsten raised this issue some 
time ago): what is the base URI that should be used by the resolver?

My opinion is that the base URI should be the one of the sitemap 
_handling_ the request. This means that "jumping" to another sitemap 
through virtual components or view inheritance should not affect the 
base URI.

However, there are many situations where we want to use a source 
relative to the _current_ sitemap regardless on how it's called. For 
this, I propose a new protocol similar to how "context:" behaves with 
the root sitemap, but for non-root sitemaps. The "sitemap:" protocol 
comes to mind, but I'm not sure this is a good name.


Performance considerations
--------------------------
When writing the TreeProcessor, great care was taken to pre-analyse 
everything that is possible to achieve maximum runtime speed. I 
currently found only two performance degradation points with this new 
approach:

- it's not possible to choose the ProcessingNode implementation 
depending on the class of a component as, e.g. in MatchNodeBuilder. The 
cost is finally just an "instanceof" check to choose the right behaviour.

- mapping from view names to their labels is pre-computed in the 
TreeProcessor for each individual sitemap component, so that the view's 
ProcessingNode (if any) can be found directly with the view name (see 
SitemapLanguage.getViewsForStatement and e.g.GenerateNode.invoke()). 
But, considering that views are marginally used in a production 
environment, the few extra lookups can be considered as negligible.


Implementation
--------------
The implementation mainly consists in merging the code of 
ProcessingNodeBuilder classes in the corresponding ProcessingNode class.

The initial "flattening" transformation can be implemented in XSL, whose 
simplicity will allow to implement at this level some semantic checks 
that can be difficult to implement otherwise.

However, an important requirement is to keep location information of 
sitemap statements. For this I suggest to augment the sitemap SAX stream 
by adding Locator information in a "location" attribute on every 
element. This augmentation can be useful in several other contexts such 
as Woody (would avoid the dependency on Xerces in 
DomUil.LocationTrackingDOMParser). This way, the initial location 
information can survive any kind of transformation.

 From a security and abuse point of view, I'm wondering if all sitemap 
statement components should be made visible to other components through 
the container. If we don't want this, the sitemap engine could consist 
of two component managers, one containing the "public" statements such 
as views, resources, virtual components and the contents of 
<map:component>, and a child "private" manager containing other sitemap 
statements. This may also allow the public container to be less loaded 
and therefore faster.


Conclusion
----------
This new approach seems to have very few drawbacks (hope I did not miss 
something important), and will lead to a dramatic simplification of the 
sitemap engine. The most noticeable one being that the number of classes 
will be divided by 2.

There's only one implication on Cocoon's core: the ProcessingNode 
interface is now a public contract between processors, since this is 
what all these components implement.

The only criticism (yes, there need to be some ;-) is that I took great 
care in the TreeProcessor to separate build-time code and run-time code, 
while the ComponentizedProcessor will merge them in a single class. This 
allows all build-time data structures to be garbage collected, since we 
will never need them again. I also had the secret hope to be able to 
serialize the processing tree, in order to be able to use a pre-built 
tree on small devices (remember, I run Cocoon in small places), but this 
proved to be difficult if not impossible because components have a lot 
of relations with non-serializable objects.

I'm wondering if we should write this new sitemap engine in the 2.2 
branch or if it should go in the 2.1. Fortress isn't a requirement to 
implement this, and it will allow us to provide views and resource 
inheritance before the 2.2 is out.

And I also think we should consider this approach when migrating Woody 
to CocoonForms, since Woody uses the same mechanism than the 
TreeProcessor to build a widget definition trees.

Thanks again Unico for this brillant idea.

What do you think, folks?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Mime
View raw message