cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicola Ken Barozzi" <>
Subject [RT] Cocoon subcomponent object model (was: Re: is cocoon symmetry a holy grail?)
Date Fri, 15 Feb 2002 02:45:22 GMT
From: "Vadim Gritsenko" <>

> I have another one, it provides different functionality but it features
> similar approach. As I don't have a name for this (multiplexer?), here
> is the diagram:
>                   - pipeline1 -
>                  /              \
> request -> A -> X - pipeline2 - X -> C -> response
>                  \              /
>                   - pipelineN -
> Explanation:
> 1. Request goes in
> 2. Pipeline is being constructed from A, X, C
> 3. SAX events passed from the A to X, where they are dispatched (same as
> separator) to several other pipelines
> 4. SAX events passed from these events reassembled into the one SAX
> stream by the same instance of X component
> 5. Result passed down the original pipeline to the C
> 6. C spits out the response

Oh my, I've seen this in ApacheCon two more that a year ago, we knew it was
going to come out again! ;-)

The first comment that come to me is that IMHO, to have better performance
you need to have good control over what is happening, and that leads to
KISS. The whole concept of making pipelines split, recombine and branch
could make it difficult to maintain control.

But the concept is intriguing. IMHO it could be transformed in another
concept, a sub-component object model.

We have been seeing the picture from a sitemap POV, but never talked about
helping the developer in writing the components themselves.

So, since it's 3:26 and I can't get sleep, here's my first RT.

Cocoon Sub-Component Object Model


This RT describes a finer grained object model for Cocoon that is meant
to attain a better separation of concerns and usability.


Cocoon has a macro object model based on the pipeline metaphore.

Each Cocoon "object" is a pipeline component and can be of three major
1.Generator: initiates the XML pipeline by converting generic data into XML.
2.Transformer: filters the XML events.
3. Serializer converts the resulting XML into something useful for the

The pipelines are defined in a sitemap that specifies order, parameters and
condition of pipeline components.

This componentization is useful because it enforces separation of concerns
between content providers, graphic-layout designers, developers and site

Cocoon1 made life easy for the first two and quite hard for the last, who
had the data he is responsible for scattered in all three kinds of
components. The sitemap of Cocoon2 changed this and put things where they

My opinion is that developers are not yet taken correctly into account.
While the other three have a componentization which is sufficient for their
part of work, developers suffer for the lack of it. Usually a developer has
to write a component, and doesn't have a (sub) component model to deal with.

Ok, it's not really true, there are XSPs.
But in many respects there are not sufficient:
XSPs are hard to write
XSPs mix (declarative) XML and (procedural) Java in an unmaintainable and
undebuggable tangle
XSPs cannot aid writing transformers
XSPs must have their main tag
XSPs do not automagically scale well (no automantic pooling or brokering)
XSPs have slooow startup and are not good for dynamic pages that change
XSPs are a nightmare to debug (just try ;-) )
XSPs have the 64k limit
XSP taglibs are hard to understand, write and maintain

Also, Cocoon components do not have scope and filter all events coming
in (security: I don't want sensitive tags passing in a transformer that is
useful but not completely known).

Cocoon doesn't have context scoping for session or global values.

As you can see these remarks are not in a small number, but come all from
simple shortcomings of Cocoon IMHO:
- The coexistence between Java and XML is a key problem.
- The current component model is too coarse grained to help pipeline
component writers.

A finer grained object model could also have the notion of context

These have nothing to do with and do not endanger necessarily the existence
- XSP syntax.
- Current level of object abstractions for other roles.

How can we solve this?
Here are some possibility

First we have to change slightly the notion of cocoon pipeline components
introducing scope.
Pipeline components need not access <all> SAX events but only what pertains
them. This also means that the pipeline coulde be evaluated eventually in
fashion, improving scalability in heavy processor intensive or high latency

For example let's say that we have this XML:
<longquery name="account"/>
<query name="username"/>
Let's say that in another file (the developer's sitemap) is written that
query tags must be processed by the foo.sql.QueryTransformer and the
longquery tags by acme.sql.BankTransformer.
As SAX events come into action the start page tag is directly sent to the
Then the acme.sql.BankTransformer is given only the longquery tag and starts
processing in a non-blocking fashion.
This means that SAX events can continue and parallely
foo.sql.QueryTransformer can start processing his tag.
Now the pipeline has to wait for the first transformer to finish because
embedded tags link page cannot be processen in non-blocking fashion. When
they finish their output events are outputted in order and finally the last
page tag.

As you can see if there are transformers that take longer to perform (also
because of latency of DBs and likes) they can be performed this way in
non-blocking fashion, speeding up total response time.

A global context-aware object broker could also be inserted in the scheme.
This doesn't really change the framework, it's just a useful addition.

Now let's explain how a finer-grained object model can be devised.
First of all it must be capable of specifying a pipeline component as a sum
of smaller components possibly only by writing XML described "glue".
I's like:
pipeline component : pipeline = smaller component : pipeline component
Which basically means that these smaller components are a second level of
indirection with regerds to the pipeline.

What guided the specification of the pipeline components?
The fact that they had to
- Interface XML with other streams.
- Transform XML.

Basically They had to
- detokenize
- make-change grammar
- retokenize

So it's all about interfacing generic streams to XML so to be able to
transform them the XML way with Transformers.
In our case it's about interfacing XML to Java to be able to transform it
with Java Objects (beans, EJB, etc.).
This means that we could:
1 Change XML tokens with something meaningful to Java: variables and data
2 Call Java methods on them to have results.
3 Retransform Java data structures into XML tokens.

The great thing is that phase 1 is usually quite long and cumbersome to
write but is essentially the same code over and over, the usual "if"s in the
SAX event handlers.
I think that a basic set of "(De)Tokenizers" can be used in 95% of cases. A
very used one would for example store a variable with the same name of the
tag it's in when it has certain parents.
Phase 2 is where the real "coding" takes place.
Phase 3 is easy to write, and it's the only part of XSPs which really works.

As you can see XSPs don't have phase 1 reusable and are cumbersome with
phase 2. This is because the mix them into a same phase, putting Java
code directly on the page with the <xsp:logic> tag.

Here the separation is done by relegating the interaction of Java and XML to
the simple and reusable contract of (De)Tokenizers.
In this way the coding can be done in Java and simply mapped to XML with
reusable components.

Seeing this globally the pipeline should work this way:
- (Generation) Tokenize and make SAX events from streams
- Filter events and dispatch to Transformers
- For each new Transformer (in parallel if necessary and requested)
- de-tokenize events and convert to Java.
- Call methods
- Retokenize
- Serialize SAX events to stream

With this mail is also an illustrative image.

Stefano, could you please lend me your asbestos garments, you don't need
them anymore AFAIK ;-)

Nicola Ken Barozzi       
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)

View raw message