cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject Re: Overview of Treebuilder/Sitemap Builder?
Date Sat, 18 Oct 2003 17:48:25 GMT wrote:

>Berin Loritsch wrote:
>>Stefano Mazzocchi wrote:
>>>On Friday, Oct 17, 2003, at 20:36 Europe/Rome, Berin Loritsch wrote:
>>>>Anyone have a logical view of how the Treebuilder is supposed to work?  It
would definitely help me in refactoring things.  As it is now, the Treebuilder is
>>>>tightly integrated with the ECM, so it is something that won't work right
away...  I am just trying to get to a place where I can compile 2.2 so that my testcases will
run and I can verify what I am doing works.
>>>Sylvain knows this and, AFAIK, he's one of the few (only?) that does. Which is
also something that I'm particularely comfortable with, even if this is not clearly Sylvain's
fault if what he writes works as expected and nobody has to go there and fix it ;-)
>>>But a description of the internals of the tree processor would be helpful not
only for migration for also for future reference (refactor? cleanup? profile? whatever)
>>I am slowly getting up to speed, and I will eventually get there.  The important thing
is to grok the big picture, which will help me with the details.
>Main steps
>The TreeProcessor is set to get the Processor role in the cocoon.roles file.
>During the configuration of the TreeProcessor an
>ExtendedComponentSelector (builderSelector) is set up using the
>configuration file "treeprocessor-builtins.xml".
>While calling TreeProcessor.process(environment), i.e. the method that 
>takes the environment, applies the sitemap on it and produces the  output,
>the following things happen:
>* The method setupRootNode is called (if necesary) and the
>builderSelector is used to get a TreeBuilder (builder). The build method 
>on the builder is called with the sitemap as argument and a tree of 
>ProcessingNodes corresponding to the sitemap is returned.
>* The sitemap is then executed by calling the invoke method for the root 
>Building the tree
>In Cocoon using "treeprocessor-builtins.xml" SitemapLanguage that  extends
>DefaultTreeBuilder is used as TreeBuilder. Within the
>DefaultTreeBuilder (during execution of the build method) a RoleManager 
>is set up based on the "roles" section of "treeprocessor-builtins.xml" 
>and a ExtendedComponentSelector is set up based on the "nodes" section. 
>The "nodes" section associates the sitemap concepts to the appropriate 
>ProcessingNodeBuilders. It also configures a ProcessingNodeBuilder so 
>that it knows what type of children it is allowed to have and which ones 
>that are forbidden.
>The build process starts (in the method createTree) by creating the 
>ProcessingNodeBuilder (rootBuilder) that corresponds to the root element 
>in the sitemap, associate the rootBuilder to the current TreeBuilder and 
>call the rootBuilder.buildNode method with the configuration tree  created
>from the sitemap.
>The FooNodeBuilder.buildNode method creates and returns a FooNode object
>   and recursevly creates the child nodes of the object by creating and
>executing the corresponding builder objects.
>Executing the tree
>While (recursevly) executing the invoke(environment, context) method for 
>the node objects in the tree a Pipeline object is constructed that is 
>stored in the context object (other things happens as well). When a 
>SerializeNode is invoked, the current Pipeline is proccesed and the 
>output is stored in the environment.
>I builded a Cocoon inspired signal processing framework about a year ago 
>and tried to reuse Sylvain's framework. While most of it is very
>general, there are some Cocoon specific details in the Context and 
>Environment interfaces, so I ended up in building something similar but 
>simpler instead.

Nice explanation, Daniel! I'm happy to see that other people understand 

However, I'd like to add some background to this to explain why it does 
work this way, some additional details and what we could eventually 
refactor to ease the migration to Fortress.

I started the TreeProcessor for two reasons.

The first reason was that the sitemap engine at that time was compiled 
into a Java class like XSP. But the sitemap logicsheet was very complex 
and recompiling a large sitemap took ages (more than 20 seconds on the 
samples sitemap), leading to painful try/fail cycles. We needed 
something faster.

The second reason was that at that time (autumn 2001), a number of RTs 
were written related to what we called "flowmaps" and later led to 
flowscript. These RTs were describing new ways to build a pipeline to 
take flow into account, but no real code was written to test these 
ideas, because deeply changing the way the sitemap code was generated 
was very painful: finding its way into the 2000-lines XSLT was not easy.

So I decided to consider another approach, based on an evaluation tree 
(hence TreeProcessor), each node in the tree corresponding to a xxxmap 
instruction (sitemap or flowmap).

An additional motivation for me was that it would require me to heavily 
use the Avalon concepts and therefore increase my knowledge in this 
area. This was mostly written at home, and my wife deserves many thanks, 
because this thing took my brain day and night for more than 2 months ;-)

The main idea of the TreeProcessor is that each kind of instruction 
(e.g. <map:act>, <map:generate>, etc) is described by two classes :
- a ProcessingNode, the runtime object that will execute the instruction,
- a ProcessingNodeBuilder, responsible for creating the ProcessingNode 
with the appropriate data and/or childnodes, extracted from attributes, 
child elements, etc.

Implementing the sitemap language then translates into writing the 
appropriate ProcessingNodeBuilder classes for all statements of the 
language. But since we were discussing flowmaps and other pipeline 
construction approaches, I wanted this to be easily extensible, and even 
allow the simultaneous use of different languages in the system 
(sitemap/flowmap). This is why <map:mount> supports an additional 
undocumented and never used "language" attribute (see MountNodeBuilder)

So the TreeProcessor configuration contains the definition of 
TreeBuilder implementations for various "languages", the sitemap being 
the only one we have today. The whole configuration document is actually 
a ComponentSelector for TreeBuilder implementations. The SitemapLanguage 
class is the implementation of TreeBuilder for the sitemap language. A 
TreeBuilder builds a processing node tree based on a file (e.g. 
sitemap.xmap) that is read in an Avalon configuration (this was chosen 
for its ease of use compared to raw DOM).

Obviously, this initial selector can be removed and the sitemap language 
be the only one available, as we now have the flowscript and it's very 
unlikely that we will redesign a new pipeline language in the near (or 
even distant) future.

Roles, selectors and <map:components>

The <map:components> section of a sitemap is used to configure a 
ComponentManager (child of either the parent sitemap's manager or the 
main manager), and the <roles> section of the TreeProcessor 
configuration defines a RoleSelector that is used by this manager. For 
the sitemap, it defines the shorthands that will map <map:generators>, 
<map:selectors>, etc, to a special "ComponentsSelector" (yeah, the name 
could be better).

This ComponentsSelector handles the <map:components> syntax ("src" and 
not "class", etc), and holds the "default" attribute, view labels and 
mime types for each hint (these are not know by the components themselves).

AFAIU, Fortress allows defaults for a collection of components 
implementing the same role, but I don't know how we can handle the 
additional "label" and "mime-type", which are not handled by the 
component itself.

Can we imagine a "fake" selector that route calls to select() to the 
manager and handle these additional information on its own?

Building the processing tree

The second section in a language configuration, <nodes>, defines a 
ComponentSelector for ProcessingNodeBuilders. For each element 
encountered in the sitemap source file, the corresponding node builder 
is fetched from this selector with the local name of the element as the 
selection hint, i.e. <map:act> will lead to"act").

The contents of each <node> element is the specific Avalon configuration 
of the corresponding ProcessingNodeBuilder and mostly define the allowed 
child statements.

Now a sitemap is not a tree, but a graph because of resources and views 
that can be called from any point in the sitemap. To handle this, 
building the processing tree follows two phases:
- the whole node tree is built, and nodes that other nodes can link (or 
jump) to are registered in the common TreeBuilder by their respective 
node builders (see TreeBuilder.registerNode()).
- then then those node builders that implement 
LikedProcessingNodeBuilder are asked link their node, which they do by 
fetching the appropriate node registered in the first phase.

We then obtain an evaluation tree (in reality a graph) that is ready for 
use. All build-time related components are then released.

It is to be noted also, that a ProcessingNode is considered as a 
"non-managed component": with the help of the LifecycleHelper class, the 
TreeBuilder honours any of the Avalon lifecycle interfaces that a node 
implements. This is required as many nodes require access to the 
component selectors defined by <map:components>. Disposable nodes are 
collected in a list that the TreeProcessor traverses when needed 
(sitemap change or system disposal).

Great care has been taken to cleanly separate build-time and run-time 
code and data, to ensure the smallest memory occupation and the fastest 
possible execution. This led this intepreted engine to be a bit faster 
at runtime than the compiled one (build time is more than 20 times faster).

An optimisation that is done and may be relevant to migration to 
Fortress is that ThreadSafe components are looked up as part of the tree 
building and never looked up again later (see e.g. MatchNode). AFAIU, 
lifestyle interface no more exist with Fortress, so this optimisation 
may be difficult to do, if not impossible.

Building a pipeline

When a request has to be processed, the TreeProcessor calls invoke() on 
the root node of the evaluation tree. This method has two parameters: 
the environment defining the request, and an InvokeContext that mainly 
holds the pipeline that is being built and the stack of sitemap variables.

The invoke method executes all processing nodes (depth first) until one 
them returns "true", meaning that a pipeline was successfully built. 
Examples of nodes that return true are serializers, readers and redirect.

If the environment is external, the pipeline is executed as soon as it 
is ended (i.e. in the reader or serializer node). But if the environment 
is internal (i.e. a "cocoon:" source), it is not, meaning the pipeline 
is returned to the SitemapSource, ready for later execution if requested 
so (e.g. by a Source.getInputStream()).

Phew... I finally explained the whole thing in depth. I'm no more the 
only one to know ;-)
I'll also put this into the wiki.


Sylvain Wallez                                  Anyware Technologies 
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -

View raw message