cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject [RT] Cocoon Blocks
Date Fri, 28 Jun 2002 17:04:39 GMT
Even if the flowscript discussion isn't finished, I think we have
reached an important conclusion on that side: a flowscript isn't a
sitemap replacer, but a sitemap augmenter.

Architecturally, this has a major importance: the flowscript and all its
design details become part of the sitemap's concern island and we can
independently work on something that is therefore not directly connected
with the flowscript.

In the past, this architectural assumption wasn't recognized and for
this reason, the design of the Cocoon Blocks couldn't be finished
without finishing that part (or, at least, understanding that the two
aren't directly related).

In the past, the concern islands were these three:

    block            sitemap             flowmap

the most general way to connect them is

               block
              /     \
             /       \
        sitemap --- flowmap

but now we know that the (blocks-flowmap) contract isn't necessary
(rather: it's perceived as dangerous because it allows to assemble
pipelines in scripting and this creates concern overlap with the
sitemap).

The current situation is

     block ---- sitemap --- flowscript

which, from the block's point of view, becomes

     block --- sitemap

since the sitemap makes the flowscript concern totally hidden inside its
own.

This contract analysis tells us that as long as the concern topology
remains this, there is no overlap or interference between the design
work on the flowscript and the design work on the Cocoon blocks.

All right, so we can focus on blocks.

                            - o -

A step back: what are the problems we are trying to solve
---------------------------------------------------------

Cocoon is a framework implemented as an application.

A 'framework' is supposed to give services to entities included in it,
while an application is supposed to be executed by a containing
framework.

While the above might sound weird at first, this is a very common
situation: an operating system is a framework implemented as an
application run at boot time. At the same time, an application server is
a framework implemented as an application. But even a browser is a
framework implemented as an application.

So, there is no inherently bad design in this concept, *but* the
framework must be implemented in such a way that it's inherently *easy*
to deploy/install/plug-in/connect/attach/inject/link an internal
application that must be executed by the framework.

Cocoon lacks this.

Let me give you an example: I would like to be able to package my stuff
that I wrote to be run *by* cocoon and deploy it on Cocoon, maybe even
at runtime.

The parallel is easily made: servlets and WARs archives. The Servlet API
introduced in a later release the concept of a WAR (Web ARchive) package
that includes all the resources needed for the servlet/jsp-based web
application to run, including libraries, resources, files and
everything.

So, the parallel I want to draw is simple:

  WAR (Web ARchive) -> tomcat (or other servlet container)
  COB (COcoon Block) -> cocoon

so, a WAR package is for tomcat what a COB will be for Cocoon.

In very short terms: a way for you to deploy your stuff on Cocoon
without hassle (including special libraries, resources and what not).

Are we really cloning the servlet API?
--------------------------------------

Many people from the pure J2EE world (even Apache people) believe that
Cocoon is just an attempt to rewrite the servlet API for XML. In a sense
it's true: the servlet API wasn't designed for pipelines and the
deployment descriptor wasn't designed for serious URI space mapping.

So, while the Servlet API introduces components (servlets and filters)
that are based on streams of bytes/chars, Cocoon introduces components
designed to be part of a pipeline [since 1997 I thought about a way to
allow servlet chaining to be feasible, that is probably what triggered
the idea of pipeline components for Cocoon].

Anyway, looking at this parallel, Cocoon really lacks a way to make its
applications deployed easily within a 'naked' container that includes
only the basic and default machinery.

                                   - o -

I'm pretty sure that if I stopped here and went on describing the schema
of the COB descriptor file and so on, people would love it, thank me,
run to their boss to tell them and blah blah..

Sure, we could stop here, we could clone the WAR concept inside Cocoon,
allow you to deploy your stuff and you won't be missing anything.

But there is one thing that the servlet API architects didn't consider
(not even myself at that time since I was part of that group):
polymorphism.

Ok, a few blank lines so you think about it....









what does it mean to have a polymorphic package?


















Applying Avalon COP philosophy over again
-----------------------------------------

If ever worked with Avalon, you know the feeling: at first it doesn't
make any sense at all. It's a mess of stupid and very abstract
interfaces... but after a while, a patter emerges and it sticks.

Some might think that Avalon (probably Cocoon itself) includes infecting
'memes' and I agree. [Look up the name on google if you don't know what
I'm talking about]

Once you start using COP (component oriented programming), it's very
hard to go back (so much so that many abuse it and over-componentize
their systems... even Cocoon itself suffers from this problem on some
parts).

COP is based on IoC (Inversion of Control) and SoC (Separation of
Concerns) [for those who still don't know about them!] and while the
servlet API makes extensive use of the IoC metapattern, SoC doesn't play
a clear and defined role (they tried to patch it with RequestDispatcher,
which is the biggest hack I ever seen, I even voted against it but I was
overruled).

Anyway, if the servlet API, internally, show use of IoC and SoC,
externally, from the WAR point of view, there is *absolutely* no notion
of it: a WAR is a package that includes a single and isolated
application.

Period. That's it. There are many mechanism that enforce the clear
separation between different WARs. So, they implement monolithic web
applications and this is *by design*.

A step up: Blocks as cocoon application components
--------------------------------------------------

If we design cocoon blocks as 'isolated units of application deployment'
we fall back in the good old web trap: making web applications
interoperate in the same URI space is a MAJOR PITA with *ANY* web
technology.

I'm talking about making Bugzilla and Horde IMP share the same look and
feel. Try it!

'coherence' is a value, expecially on professional sites, but coherence
shouldn't mean that everything has to be written by the same team!

Sure, I want Cocoon blocks to ease deployment of cocoon-based web
applications, but this is a secondary byproduct: what I really want is
to make it possible to *share* cocoon web applications as we currently
do with Avalon components.

                                   - o -

Ok, enough introduction, let's get to the meat.

WARNING: what follows is the result of ideas collected from many people,
and was cleared in all parts with real-life discussion between Giacomo
and myself a few weeks ago. Anyway, what follows is still part of the RT
flow, so it must only be considered as a proposal and not something
carved in stone.

Cocoon Blocks
-------------

A Cocoon block is a zipped archive, just like JARs and WARs.

The extension of a cocoon block is .cob (for COcoon Block). The MIME
type is yet to be determined (might be required for over-the-net block
download).

A Cocoon Block (COB from now on) includes a directory called

 /BLOCK-INF

which contains all the block metadata and the resources that must not be
directly referentiable from other blocks (for example, jars, classes or
file resources made available thru the classloader). The directories

 /BLOCK-INF/classes
 /BLOCK-INF/jar

are used for classes and jar files. This follows the WAR paradigm.

The main COB descriptor file is found at

 /BLOCK-INF/block.info

[FIXME: can this create conflicts with Avalon blocks?]

This file MUST be an XML file, containing markup with a cob-specific
namespace and will include the following information:

 1) block implementation metadata (name, author, license, URL of the
project and so on)
 2) role(s): the URI(s) of the behavioral role(s) this block implements
and exposes [optional]
 3) dependencies: the URI(s) of the behavioral roles this block expects,
along with the prefixes used by the block as shortcuts in protocol
resolvin (see below for the meaning of this)
[optional]
 4) sitemap: the location inside the block file space of the sitemap
[optional]

Visually, the block metadata can be pictured like this:

 
                    implementation metadata
                               ^ 
                               |
 (exposed behaviors)? <---- [block] ----> (required behaviors)?

Also, the /BLOCK-INF/ directory contains the 'roles' file for Avalon
components:

 /BLOCK-INF/block.roles


What is a 'block behavior'?
---------------------------

If you are familiar with Avalon, you probably understood the idea (it's
very similar to the concept of Avalon roles), but if not it might be a
little difficult, so let me write you an example of this:

let's take Forrest and let it decouple in two blocks:

 1) one block provides the document production
 2) another block provides the skinning and presentation layers

Currently, it is already done like this, but the change of the skin
(this is how the second block is currently called) must be done by hand:
there is no cocoon machinery in place to make this possible.

So, let us assume the machinery is now in place:

forrest itself becomes a block, but in order to function, it needs
access to the stylesheets contained in the skin, which, in order to
simplify decoupling, we want to implement as another block.

Result: 

 forrest.cob/BLOCK-INF/block.info

is something like

 <block>
  <metadata>
   <name>Forrest</name>
   <organization>ASF</organization>
   ...
  </metadata>
  <dependencies>
   <block behavior="http://xml.apache.org/forrest/skin/1.0"
prefix="skin"/>
  </dependencies>
  <sitemap location="sitemap.xmap"/>
 </block>

while:

 skin.cob/BLOCK-INF/block.info

is something like

 <block>
  <metadata>
   <name>Xmas Skin</name>
   ...
  </metadata>
  <behaviors>
   <behavior uri="http://xml.apache.org/forrest/skin/1.0"/>
  </behaviors>
 </block>

Now: suppose you have your naked cocoon running in your favorite servlet
container, and you want to deploy forrest.cob, here is a possible
sequence of actions on an hypotetical web interface on top of Cocoon
(a-la Tomcat Manager)

 1) upload the forrest.cob to Cocoon
 2) Cocoon scans /BLOCK-INF/, reads block.info and finds out that
Forrest depends on a block which the given role
 3) then it connects to the uber "Cocoon Block Librarian" web service
(hosted somewhere around *.apache.org) and asks for the list of blocks
that exhibit that required behavior.
 4) the librarian returns a list of those blocks, so the users chooses,
or the manager allows the user to deploy its own block that implements
the required behavior.
 5) Cocoon checks that all dependencies are met, then unpacks the blocks
 6) Since 'forrest.cob' exposes a sitemap, the deployment manager asks
the deploying user where he/she wants to *mount* that block in the
managed URI space.
 7) If no collisions in the URI spaces are found, the blocks are made
available for servicing. [note: the skin block doesn't exposes a sitemap
so it's not mounted on the URI space]

A big issue: resource dereferencing
-----------------------------------

Security concerns aside, the above scenario shows one major issue:
blocks are managed, deployed and mounted by the container. There is (and
there should not be) a way for a block to directly access another block
because this would ruin IoC.

So, one block doesn't know where the blocks it depends on are located,
both on disk *and* on the URI space as well.

The proposed solution is to use block-specific protocols to identify the
dereferenced resources.

For example, the forrest.cob/sitemap.xmap file could contain a global
matcher which works like this:

   <map:match pattern="**/*.html">
    <map:aggregate element="site">
     <map:part src="cocoon:/{1}/book-{1}/{2}.xml"/>
     <map:part src="cocoon:/{1}/tab-{1}/{2}.xml"/>
     <map:part src="cocoon:/body-{1}/{2}.xml" label="content"/>
    </map:aggregate>
    <map:transform src="block:skin:/stylesheets/site2xhtml.xslt"/>
    <map:serialize/>
   </map:match>

please note the

 block:skin:/stylesheets/site2xhtml.xslt

which indicates

 block -> use the block protocol
 skin -> use the 'skin' prefix to lookup the block behavior URI and thus
the block which implements it for this block (the block manager knows
this)
 /stylesheets/site2xhtml.xslt -> since the 'skin' block doesn't expose a
sitemap, give me the file located in that position of the internal block
file space (except /BLOCK-INF/ which is protected)

[in case the block exposes a sitemap, the block: protocol connects to
the URI space exposed by the sitemap... before you start suggesting a
block-raw: protocol to get access to that, think twice because, to me,
it smells like FS a lot!]

Dereferencing navigation
------------------------

Not only a sitemap needs to connect to the resources contained in the
blocks on which the block depends on, but the resulting pages as well.

In fact, suppose you have a block that exposes a web service and another
one that exposes a web application that wraps that web service. For
sure, the generated web page will have to have a URI to connect to that
service, since it's the client's browser that makes the call (unless we
want to virtualize everything thru the sitemaps, but I wouldn't suggest
it).

So, a possible solution is to use the "block:" protocol in the pages as
well and have a URI-mapping transformer right before the serialization
stage.

For example, things like

<form action="block:web-service:/post">...</form>

is trasnformed into

<form action="/servizio-web/post"/>...</form>

                                 - 0 -

Some design decision taken
--------------------------


o) NO BEHAVIOR VALIDATION: 

I thought a lot about it but I think that having 'behavior description
languages' (such as the WSDL-equivalent for blocks) is going to be
terribly complicated, expensive to implement and hard to use and
enforce, even for simple blocks which don't expose a sitemap and are
just repositories for informations.

For this reason, there is no validation taking place: if a block
implements a particular behavior and exposes it thru its descriptor
file, Cocoon automatically assume it implements the behavior correctly.

In the future, we might think of adding a behavior description layer to
enforce a little more validation, but I fear the complexity (for
example) of validating stylesheets against a particular required
behavior.

IMO, only human try/fail and patching will allow interoperability.


o) VERSIONING AS PART OF THE BEHAVIOR URI

The behavior URI *MUST* terminate with a /x.y that indicates the
major.minor version of the behavior that a block implements.

On dependencies, each block must be able to specify the 'ranges' of
versioning that it is known to work with. For example

  <block behavior="http://xml.apache.org/forrest/skin/1.x"
prefix="skin"/>

But I haven't really thought about the patterns that could be used for
this. 

Please, help on this.


o) CROSS-BLOCK SECURITY

Even I don't think anybody is stupid enough to use a single Cocoon
instance to run a full ISP and ask for sandboxing of the single blocks,
cross-block security is a big concern, expecially since you might be
deploying components on the fly in a binary format.

So, first thing is to protect the /BLOCK-INF/ directory.

The second thing is to wrap each block with its own classloader,
connected to the block dependency map, so that each class discovery is
done only on the class space of the dependent blocks.

[NOTE: this doesn't prevent people from using blocks as trojans, but we
won't host blocks which don't come with the source code so we solve that
problem].


o) COCOON MANAGER SECURITY

The cocoon manager might be a block itself that connects to specific
cocoon internals and provides a web interface for it. So, it can be
removed or disabled when put on production.

Also, the feature of automatic discovery of blocks thru the 'cocoon
block library' can be turned off or substituted with its own (even the
'cocoon block library' could be a block, so you could have your own
block library on your system instead of connecting to the apache one).


o) OPTIONAL COP 

The block.info file makes it *optional* to expose behaviors or to depend
on them. This allows the COP model to nicely downgrade to the good old
single-archive WAR paradigm for those who don't care about block
polymorphism.


Possible Problems
-----------------

1) classloading performance:

since classloading will become more complicated, it will be slower, but
this will impact only the startup performance not the runtime
performance so no real issues here.

2) possible reduced portability of Cocoon:

some servlet containers don't like servlets to come up with their own
classloaders. In those environments a block-enabled Cocoon might simply
not work. This doens't mean that Cocoon won't work, but that blocks
can't be deployed.

NOTE: the next servlet API might fix that by requiring a better
classloading behavior by the containers.

3) difficult block interoperability

without a way to automatically validate if a block implements a behavior
correctly, the type of that component is inherently weak and might lead
to problems that might become hard to fix.

The block manager *must* be able to *clone* a block and let you modify
one clone without disturbine the other. [but these are implementation
details and we'll see in the future how serious this problem becomes. in
fact, sitemap pipelines aren't validated as well but nobody had enough
itch to scratch this]

4) difficult transition

When we have blocks, it's easy to imagine that will exist pure-code
blocks that wrap around libraries and provide only sitemap components
(think FOP, POI, Batik and so on).

In that case, a 'naked Cocoon' becomes "de facto" back incompatible
because some sitemap components which are now included by default in
Cocoon) might not be present anymore, unless you wrap your code in
blocks and you depend explicitly on those blocks that expose that
specific behavior.

So, some working is required.

This might force us to call a block-enabled Cocoon: Cocoon 3.0

                                 - o -

Conclusions
-----------

I think I have exposed a detailed plan on how to implement blocks and
solve a number of issues we are having:

 1) allow users to 'compose' Cocoon only with those modules they need
 2) allow users to easily deploy their stuff on cocoon
 3) allow users to easily reuse web applications components without
sacrificing coherence
 4) allow users to be helped by Cocoon to 'fill the gaps' and be
suggested on what components is best required and feed it automatically
(apt-get like)
 5) allow the Cocoon communities to clearly separate concerns between
the core and the application-level stuff (a-la Zope)
 6) allows, for the first time in the history of the web, to use
polymorphism and COP at a web application level.

That's all folks.

Fire your comments and try to tear it appart: I'm pretty confident this
is really a big thing for Cocoon!

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message