cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject [RT][long] Cocoon 3.0: the necessary mutation
Date Fri, 02 Dec 2005 17:59:32 GMT
Hi all,

For many years, I have been more than happy with Cocoon, enjoying the 
power and ease it brought for both publication and webapp projects. Over 
the last months however, other feelings have emerged: there are things 
that are definitely overly complex in Cocoon, and there have been some 
emerging frameworks leading to "wow, cool!" reactions rather than "yeah, 
yet another one". Also, I strongly believe that the Ajax revolution is 
quickly obsoleting the traditional reload-page-on-user-action model that 
prevailed on the web up to recently and requires frameworks that help 
building these new kinds of applications.

All this to say that if we want Cocoon to have a bright future, it must 
go through a mutation. This RT dumps my current ideas about this, 
depicting something that IMO better fits today's needs while taking into 
account many years of innovation and what we learned from them.

                       -- oOo --

First of all, let's consider the place where Cocoon fits in the large 
family of webapp development frameworks.

On one end, we have pure-J2EE things like Struts or JSF. They lead to 
writing a lot of Java classes, lots of XML config files, and try/fail 
cycles require compile/deploy/restart, even if some tools ease the task. 
Despite their heavyweight development process, they are widely accepted 
in large companies, both because the J2EE stamp pleases managers and 
because of the vast number of entreprise-grade libraries available.

On the other end, we have scripted frameworks like Ruby on Rails, 
Django[1], etc. The try/fail cycle is basically save/reload, and writing 
simple stuff is very fast because of the use of convention over 
configuration and runtime generation of data models (e.g. through 
database introspection). Now recent comments[2] show that going beyond 
the basic stuff may not be that easy.

Cocoon actually sits in the middle of this spectrum:
- it's a Java servlet and can therefore use almost anything that's 
written in Java. The contents of our blocks show this well! As such it 
is somehow J2EE compliant and can be deployed in large companies, even 
if we have to convince managers that it's better than Struts.
- it's a scripted framework: sitemap, XSL, templates, flowscript. Save 
and reload! And this goes further in 2.2 with auto reloading and 
compiling classloaders.

So theoretically, Cocoon could be the "RoR of J2EE". Now in its current 
incarnation it won't. The learning curve is too steep, some 
architectural choices imposed by Cocoon actually go in the way of 
developers with the new emerging development practices, and 5 years of 
legacy led to a rather confusing picture, with tons of legacy components 
and many inconsistencies.

Now Cocoon has also introduced a number of super-cool features and 
innovations that definitely make sense, but in a more lightweight and 
consistent environment.

So let's draw a picture of what could be Cocoon 3.0. I use a major 
version as the ideas outlined below are more than likely to require a 
code base, even if many code snippets can be reused from the current code.

                       -- oOo --

Giving its real role to the controller

When we introduced flowscript, we decided that <map:pipeline> should be 
the central switchboard through which *all* request go, and introduced 
<map:call function>. This leads most webapps written in Cocoon to have 
their sitemap starting with something like:

  <map:match pattern="do_*">
    <map:call function="do_{1}"/>

Why in hell do we have to go through the sitemap to call a function and 
then go back to the sitemap through cocoon.sendPage()? This not only 
clutters up the sitemap with cut'n pasted snippets, but also makes the 
flowscript a second-zone citizen in the application.

So I think we should change a bit the semantics of the sitemap and the 
priorities of its various components:

- the sitemap is the configuration of the overall request processing of 
the application (or a subpart of it in the case of mounts). It defines 
the configuration of that request processing, which is composed of 
components (<map:components>), controllers (<map:flow>) and views 
(<map:pipeline>). And I even think these 3 parts should really be split 
in different files, i.e. moving components to a xconf and pipeline 
definitions to e.g. "pipelines.xml".

- the processing flow in a sitemap goes *first* in the controller if 
there is one, and *second* in the view. Going to a <map:pipeline> to 
call back a function should really be an exceptional case, or even 

- since it's no more called by the sitemap, the controller defines a 
single entry point, such as "process()". A builtin default 
implementation provides an equivalent to <map:call 
function="public_{request:sitemapURI}"/>, thus automatically publishing 
any public_xxx flowscript function. This is similar to 
HttpServlet.service() that calls doGet(), doPost(), etc depending on the 
HTTP method but still allows overloading service().

- to allow sophisticated implementations of process() where needed, the 
matchers and selectors are made available to the controller, so that 
they can check the request environment as easily as the pipeline 
statements in sitemap.xmap (also have a look at the Django 
dispatcher[3]). This can allow to write something like:

  var match = cocoon.matchers.wildcard("admin/*");
  if (match) {
      if (authenticateAdmin()) {
          // call the function named by the '*'
      } else {

- calling cocoon.sendPage(uri) directly goes to the <map:pipeline> 
section of the sitemap, to build a view. This seems obvious, but has an 
interesting side-effect: there is no more need to invent a private URL 
space such as "view-*" to have a two-step processing (controller/view) 
of the requests. We can even say that "cocoon.sendPage(null)" calls the 
sitemap with the current request URI untouched.

Note: the controller examples in this RT are written in JavaScript, but 
JavaFlow should be considered on an equal ground, as Coocon's user base 
is two-sided, composed of people coming from the webdesign/php world, 
and others coming from the J2EE world. Also, JavaFlow should be the 
language of choice for builtin reusable helper controllers.

                       -- oOo --

Expression languages

Do you know how many expression languages there are in Cocoon? Java, 
JavaScript, XPath, XReporter, JEXL, etc. There's also all the 
micro-languages defined by each of the input modules: many of them use 
XPath, but not all...

Also, the way to access a given data is not the same in the sitemap 
(e.g. "{request-param:foo}") and in JXTG 
("${cocoon.request.getParameter('foo')}" or even 

We should restrict the number of languages to the useful minimum, and 
ensure they can be used consistently everywhere. This useful minimum 
looks to me as being JavaScript, XPath and Java (using Janino[4]).

As for the syntax, I think we should use the simple "{..}" notation, 
with no initial character. To choose among the 3 expression languages, 
we have to choose a default one, and use prefixed expressions for the 
other ones. I consider JS to me the most versatile and thus to be the 
default language.

That means we'll have "{cocoon.request.remoteHost}" or 
"{xpath:$cocoon/request/remoteHost}" or 

About XPath, I'm a bit skeptical wrt its actual usefullness with non-XML 
objects, which often looks weird. However, we need to be able to call 
XPath on DOM parts of a non-DOM data model, e.g. 
"{xpath(cocoon.session.attributes.userDoc, '/meta/dc:title')}". And 
interstingly this sample shows that a namespace prefix table must be 
available in the expression context for *all* languages.

All this also means that we need a well-defined "cocoon" object defined 
identically in all contexts. Additional top-level objects can be 
available to provide context-specific data, such as "flow.sendPage()", 
"sitemap.resolve('../1')" or "template.consumer".

                       -- oOo --

Content-aware pipelines

Cocoon 1 had a DOM-based processing, meaning transformations could be 
chosen according to the pipeline content. Cocoon 2, when switching to 
SAX-based streamed pipelines, abandoned this ability. This hasn't been a 
real problem for a long time, as datasources were mostly passive 
documents of a well-known structure.

Now things have changed a lot, and we have to deal with heterogeneous 
data types and content-driven processing. Let's take some real-life 
- Content syndication: a feed's URL can provide RSS 0.9, 1.0, 2.0 or 
Atom. How can we decide what processing has to be applied on a feed if 
we don't know what's inside?
- Forrest's infamous SourceTypeAction[5] identifies a document's type 
using pull parsing
- SOAP requests: why is SOAP so badly integrated with Cocoon? We 
basically need to delegate to Axis that will then call a Java class. Why 
so? Because we're unable to choose the service to be called depending on 
the request's content.
- finally, the ESB buzz is turning into real projects, and requires 
content-based routing of messages.

There were some proposals to implement content-aware selectors[6] but 
they never materialized because of the impedance mismatch between a SAX 
stream and the usage of DOM (so Cocoon-1-ish!) that was proposed to 
implement them.

Now Forrest's SourceTypeAction shows us the way: pull parsing.

So let's switch pipelines from SAX push to StAX pull (JSR 173, see[7]). 
Content-aware matchers and selectors can then grab just the amount of 
information they need from the pipeline to make their decision. And 
contrarily to the SourceTypeAction that requires to resolve the source 2 
times (once for pull, once for push), the pipeline engine can 
transparently buffer the StAX events consumed by matchers and selectors 
to replay them in the next pipeline component.

Using pull pipelines doesn't mean we have to trash everything. 
Converting DOM to/from StAX is straightforward, and so is StAX->SAX. The 
SAX->StAX conversion is less easy and requires either buffering or a 
separate thread.

Using pull pipelines also has an interesting side effects on 
aggregations, as they can easily be inlined by pulling events 
successively from partial pipelines (i.e. without a serializer), e.g:

  <map:aggregate element="root">
      <map:generate src="header.xml"/>
      <map:transform src="header2html.xsl"/>
    <map:part src="content/{1}.xml"/>
  <map:transform src="layout.xsl"/>

Actually, writing

  <map:part src="foo"/>

is equivalent to writing

    <map:generate type="file" src="foo"/>

                       -- oOo --

Dynamic pipelines

Yes, you read it well: dynamic pipelines. This is what comes next 
naturally after content-aware pipelines: with use cases like webservices 
and ESBs, the content-based routing is not enough and we also need 
controller-driven routing.

For simpler cases, we already have cocoon.processPipelineTo() (and the 
more versatile PipelineUtils class), but having to call the sitemap and 
invent a private URL just to perform a transformation is really overkill.

I'd like to be able to write the following in a flowscript:

  var pipeline = flow.newPipeline("non-caching");
  pipeline.addTransformer("xslt", "normalize.xsl");
  if (cocoon.matchers.xpath("/foo/bar[@id = '" +
          cocoon.session.attributes.bar_id + "']", pipeline)) {
  } else {

What we can see above is that we don't even need a serializer for the 
pipeline to be useful, as we can pull events from it as soon as it has a 
generator. And that generator could well be another pipeline built 
somewhere else.

Basically, the pipeline engine becomes a very general-purpose object 
that can be used not only in the sitemap (to build views), but also in 
the controller for content-driven business logic decisions.

This programmatic building of pipelines can also be used by Cocoon 
components themselves to implement some built-in transformations, e.g. 
converting an XMLSchema to a CForms definition, without requiring to 
copy/paste the corresponding sitemap instructions in user sitemaps, of 
requiring to call a system-defined sitemap. IMO, the lack of reusable 
system pipelines is one of the reasons why there hasn't been many 
off-the-shelf products or applications built on top of Cocoon.

Being able to directly use pipelines can also ease the integration of 
Cocoon as a transformation engine in other environments, such as an 
advanced message transformer in the ServiceMix ESB[8].

                       -- oOo --

Controller-driven responses

The advent of Ajax applications leads to a radical change in web 
applications architectures. There are many requests that don't lead to 
producing a view, but sending data and/or control information. Having to 
call a pipeline for this is really useless and overkill, as we don't 
need any kind of processing.

We therefore need the controller to be able to directly send a 
non-processed response. We already have an example of this in the Ajax 
stuff for CForms[9] to send a simple <bu:continue> when form interaction 
is finished and a full page reload is needed. Another example is data 
transmission with an Ajax client using JSON[10].

So we need additional "sendxxx" methods in the controller: sendText(), 
sendObject(), sendBytes() and why not sendStream().

Ajax applications also require aggregations defined at the controller 
level. Let's consider an Ajax shopping cart application: the page 
displays the items catalogue and a sidebar with the current content of 
the shopping cart. When the user browses the items, only the catalogue 
area needs to be refreshed on the page. When he adds an item to the 
cart, both areas need to be refreshed at once to show the updated cart. 
The knowledge of what parts of the page need to be refreshed is in the 
controller. A solution can be to call a pipeline that will generate an 
XInclude that itself will call other pipelines, but that's smelly and 
doesn't allow to give different view data to each of the pipelines.

To allow this, we need something like:
      ["catalogue", { paginator: paginator }],
      ["cart-sidebar", { cart: cart }]

                       -- oOo --

Core components

Moving to pull pipelines isn't the only important core change: we need 
to move away from Avalon for good. Now what container will we use? We 
don't care: Cocoon 3.0 will be written as POJOs, and will come with a 
"default" container. Will it be Spring, Hivemind, Pico? I don't know. We 
may even provide configurations for several containers, as does 

                       -- oOo --

Ok, thanks reading so far.

My impression is that with all these changes, Cocoon will be sexy again. 
Add a bit of runtime analysis of databases and automatic generation of 
CForms to the picture, and you have something that has the same 
productivity as RoR, but in a J2EE environment. It also includes what I 
learned when working on Ajax and the consequences it has on the overall 
system architecture.

You certainly have noticed that the above is more about the controller 
than about the sitemap. This is because not much changes are needed 
there, except content-aware matchers and selectors. But a more featured 
controller will allow to trash a great number of pipeline components 
that were invented to circumvent controller limitations. The code base 
will shrink.

There are also a number of simplifications that can be done by using 
builtin conventions over configuration, but I'll write about this later.

Tell me your thoughts. Am I completely off-track, or do you also want to 
build this great new thing?



Sylvain Wallez                        Anyware Technologies           
Apache Software Foundation Member     Research & Technology Director

View raw message