cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Russell <pauljamesruss...@gmail.com>
Subject [RT] The Silkworm Experiment
Date Tue, 22 Feb 2005 11:14:59 GMT
Hi all, long time no see!

First of all, a bit of a pre-emptive apology. Some of you might
remember me, but a lot of you probably don't. I was involved in Cocoon
back in the early days of Cocoon 2, but I got a job working for a
company that didn't use it, and so fell out of the loop. That said,
I've kept a vague eye on things, and I'm still in touch with
Luminas[1], my ex-employer.

Recently, I've been looking at Cocoon again, partly just because I had
a bit more time, and partly because my work is starting to swing back
that way. I've been thinking about what would make Cocoon even better,
and have come up with a few ideas for what you might consider to be a
'Cocoon 3'. What I'm going to suggest is a bit of a departure from
where we are right now, and is certainly not a backwards compatible
step. I'm conscious that I might well touch on some discussions that
have already been had (particularly around blocks and containers etc),
but please take this e-mail in the spirit its intended: /I'm just
spouting ideas/. I'm not proclaiming to be up to speed with where
Cocoon is right now, and the Cocoon mailing lists are way too high
traffic for me to catch up on the last four years in a hurry.
Implications of this:

* If you disagree with anything I say, it probably means I'm wrong. No problem.
* If I'm touching on discussions that have already been had, then apologies.
* If you think I'm talking from my posterior, and this whole thing is
a big mistake, then you're probably right. This is a [RT], expect
random thoughts.

I've deliberately avoided calling this 'the proposed cocoon 3' or
anything like that, because I don't think that's an acceptable thing
to do. However, obviously I needed a name for this proposal, so that I
can refer to it without saying 'this proposal' all the time. I decided
to call it Silkworm, I guess for similar reasons to 'Butterfly'. This
is only a code-name, don't worry.

Right, that's the disclaimer out of the way. On with the good stuff:

1. Architectural priorities
2. A guided tour
 2.1. 'Sitemap Tool'-based configuration
 2.2 Pipeline Components and assemblies.
 2.3 Channel negotiation
 2.4 Branching pipelines
 2.5 Pipeline 'shadows'
 2.6 Aside: Long-lived requests
 2.7. Plug-in architecture: Blocks++
 2.8 Block Stacking
 2.9 Build tooling
 2.10 Development tooling
3 Summary

=====--------
= 1. Architectural priorities

* To lower the barrier to entry for new users. Specifically:
  * Radically simplify configuration files.
  * Plugin-based architecture so that new features neatly 'wire
themselves in' to the framework and tools.
  * Build (i.e. ant/maven) and GUI (eclipse? IDEA?) tools support
* To create an architecture capable of supporting the /next/ 5 years
worth of progress.
* To promote re-use by increasing the level of abstraction at which
the site administrator works. One 'top level' component might actually
be implemented using ten smaller ones.

=====--------
= 2. A guided tour

Derived from the architectural priorities, we can derive some goals:

--------
= 2.1. 'Sitemap Tool'-based configuration
Nothing to do with IDEs and stuff this one. More akin to the concept
of a 'Tool' in Photoshop etc. Site administrators configure their
sites by bolting together a number of 'tools' to perform the functions
they require. Under Silkworm, the sitemap uses tools, not components.
Tools are an abstraction over lower level components. One tool may
yield more than one component. For example, a trivial sitemap under
Silkworm might look like:

<?xml version="1.0"?>

<sitemap xmlns="urn:cocoon/sitemap/core">
  <html-serialize version="xhtml-1.0"/>
  <xslt-transform stylesheet="context:/ss/basic-skin.xsl"/>
  <uri-xml-generate uri="context:/content/index.xml"/>
</sitemap>

That's it. No component declarations, no nothing. A couple of things
to note here:
* Each tool:
  * Is defined in a plug-in somewhere. The 'Kernel' itself contains no tools.
  * Contributes its own mark-up syntax to the sitemap (in its own
namespace -- in the above example, all elements happen to be part of
the 'core' namespace).
  * Is responsible for knowing how to 'realise itself' into a component tree.

Another thing to note is that <sitemap> is 'just another tool'. Like
any other tool, it generates a component tree.

There's no magic here; it's perfectly possible to provide alternative
root sitemap functionality simply by contributing another Tool called
<my-magic-sitemap>, for example. You could even have a root sitemap
element with no children, that generates its entire component tree
from some other source (a database or whatever). This isn't FS -- it's
just a natural by-product of this tools-based architecture.

--------
= 2.2 Pipeline Components and assemblies.
Components operate at one layer of abstraction below Tools. They don't
form part of the Sitemap, they form part of the Pipeline.

At this stage, we'll assume that a Component is pretty similar to an
existing Cocoon component (actually, they're quite different, but
we'll tackle that later!).

As mentioned in section 2.1, each tool may realise itself into one or
more components. However, we don't really just want to have a gigantic
'swarm' of
components kicking around -- it'd be nice to have them organised into
a bit of a hierarchy so that we can visualise their structure more
easily.

In order to support this, we introduce the concept of an Assembly.
Assemblies are just like components (in fact, the Assembly interface
extends that of a component), except that they can actually /hold/
other components (i.e. they are an 'assembly of components')

+--------------------+
| pipeline-component |
+--------------------+
          ^
          |
          |
+---------+----------+
| pipeline-assembly  |
+--------------------+

The inside of an assembly looks just like another pipeline. Because
actually, the whole pipeline is just an Assembly created by the 'root'
Tool.

As mentioned above, Assemblies are just like Components. The 'input'
and 'output' of a given assembly is just the input of its 'first'
element, and the output of its 'last'.

           ^
+----------|--------+
| pipeline-assembly |
|          |        |
|   +------+------+ |
|   | component 1 | |
|   +------+------+ |
|          |        |
|   +------+------+ |
|   | component 2 | |
|   +------+------+ |
|          |        |
+----------|--------+
           ^

--------
= 2.3 Channel negotiation
Okay, now we get onto the real differences between components in C2,
and in Silkworm. In C2, components are split into four main
categories:

* Generators - generate XML from somewhere
* Transformers - transform one kind of XML into another
* Serializers - transform XML data into binary data to send to the client
* Readers - generate binary data from somewhere to send to the client

The reasons for these different classes of component are mainly
because they each have a different 'profile' -- they each send and
accept different types of data.

Under the Silkworm architecture, we relax this restriction. Each
component may send and receive data using arbitrary data structures.
Obviously, we need to ensure that components placed next to each other
in the pipeline can talk to each other (i.e. they share a common
language). We achieve this using an automatic negotiation protocol
(and associated service) to 'discover' an appropriate communication
method between each pair of components. Specifically, we achieve this
using Channels, Ports and End Specifications

A Channel encapsulates a particular communication paradigm. Each
Channel has an input and an output end, each with an End
Specification. Ports are attached to pipeline components, and are the
conduits through which the components send and receive data. Each port
is capable of 'rating' an End Specification as to its suitability for
that particular port. A 'Channel Negotiation Service' is responsible
for negotiating an appropriate channel for each pair of ports in a
given pipeline.

The negotiated channels would obviously need to be able to be
overridden for specific needs, but for the vast majority of cases, the
defaults should suffice. Some example channels:

* SAX - just transports SAX events
* SAX in, DOM out - aggregates SAX events into a DOM, and then passes it on.
* DOM in, SAX out - serializes a DOM into SAX events.
* DOM - just transports a DOM.
* Byte stream

As you can see from the above examples, it is perfectly possible for a
channel to act as an 'adaptor' from one data format to another. SAX
and DOM is a classic example -- there are some processes that are
inherently 'better' with SAX, and some that are better with DOM. Using
the negotiation approach allows the system to decide the best
communication path through the system. For example, the negotiation
could lead to:

   URI-Reader -[byte-stream]-> XML parser -[SAX]-> XSLT -[SAX2DOM]->
SVG2JPEG -[byte-stream]-> output.

This means that the SVG2JPEG serializer doesn't have to worry about
processing SAX events, it can just deal with DOM object (disclaimer:
this is just an example, I know Batik can talk SAX).

Equally, one can envisage scenarios where it would be more efficient to
do three DOM steps in a row (passing a DOM by reference) than
converting to SAX at each step like we would right now.

--------
= 2.4 Branching pipelines

In C2, all components have at most one input and one output. This
makes sense because of the specialised component roles that Cocoon
uses. It wouldn't make sense for a Transformer to have more than one
output, for example.

Obviously, we still needed to be able to conditionally include
elements in the pipeline, so the Matcher and Selector were invented.
These are evaluated 'out of band' when the pipeline is constructed
before each request to select which components get to appear within
that pipeline.

As time passed, people started to talk about aggregating content in a
manner more flexible than was possible using 'include transformers'
and the like. We introduced new markup and semantics in the sitemap to
cope with this new functionality.

In Silkworm, we simply allow any pipeline component to support any
number of input and output ports. Each component may use as many or as
few of those ports as it wishes for a given request. Components
implementing conditional pipeline execution will typically have one
output port, and a number of input ports. They will only use /one/ of
those input ports for each request. Components performing aggregation
will have one output port and a number of input ports. They will use
/all/ of these input ports for each request. Components performing
segregation (i.e. splitting content in two, a la the Fragment
Extractor) will have one input port but two or more output ports.

Because of the generality of this architecture, the pipeline can be
arbitrarily complex, depending on the required functionality. The
pipeline as a whole forms an acyclic directed graph:

/|\        ^ ^
 |         | |
 |         o |
 |        / \|
 |       o   o
 |      / \ / 
 |     o   o
 |      \ /|
 |       o |
 |      /| |
 |     | | |
 |     ^ ^ ^

An important side-effect of this architecture is that under silkworm,
there is only one pipeline. Whereas C2 builds its pipeline for each
request, silkworm builds a single (rather large!) pipeline at
initialisation, and uses this pipeline for each request.

--------
= 2.5 Pipeline 'shadows'

While some (stateless) pipeline components will be able to service
requests in an entirely stateless manner, it is expected that most
components will need to allocate objects to requests for the duration
of that request's execution.

When a request is received, a 'shadow' of the pipeline is created.
Each component uses a 'shadow model' which specifies how the component
shadows are managed:

* singleton - a single instance of the component shadow is maintained
for all requests.
* request scoped - a new instance is created for each request.
* request scoped, pooled - an instance is allocated to each request,
but these instances are pooled and re-used.

Component shadows are lazily instantiated, so components that are not
used (because they're up the dead-path of a conditional component, for
example)
will never be instantiated.

--------
= 2.6 Aside: Long-lived requests

Something which occurred to me long ago, and which caused me to create
the Fragment Extractor in Cocoon 2 was that because of the way the web
was conceived, we often send tightly related content in separate
request-response pairs to the client. A common example of this is a
dynamic graph in the middle of a page of statistics. A dead-common
scenario; the graph is logically part of the same page as the rest of
the content. It has no independent existence per-se, and it will
always be displayed whenever the page is. The fact that they happen to
be sent as separate requests is an implementation artefact, rather
than something inherent to the content itself.

Cocoon 2 is the premier framework in what I would call 'logical
content handling' frameworks. It advocates the separation of concerns
between content, presentation and business logic. My feeling is that
we could extend this concept to support the separation of concerns
with regards to exactly how content is sent to the client. For example
- iirc - in future versions of xhtml, it will be possible to embed SVG
images directly into the body of a document. The browser will spot the
SVG and render it to the screen. This is how it should be, however we
are not quite there yet. Wouldn't it be nice to be able to do this
now, logically at least?

Using the branching pipelines concept introduced above, it would be
possible to do this within silkworm with a clean architecture, and
eventually to support automatic decisions about whether to just leave
the content inline for an advanced browser to deal with, or whether to
take the hard work off its hands and render it server side. Better
still, because of the tools architecture, this would be just one line
within the sitemap. Or none if this was provided as a block
contributing against a mount-point in the default sitemap tool.

I'm not going to go into implementation details here, because I'm not
clear yet how this would work. If you have any ideas, feel free to
chip in! High-level options are basically:

* Generate all the content and attachments up-front and store it somewhere for
  later retrieval.
  Feels like a bit of a cop-out -- the existing fragment extractor would be
  more efficient than this!
* Keep the pipeline shadow for a request with attachments around, and
then direct requests for different 'attachments' to different areas of
the pipeline.
  This would only require the data that drives the pipelines to be cached, which
  could be a very small amount of XML, for example -- much more efficient.

This is just an idea at the moment, so don't worry if it seems a bit crazy.

--------
= 2.7. Plug-in architecture: Blocks++
Cocoon 2 uses the Avalon IoC framework to create, configure and
'wire-up' its services. This requires a configuration file listing
each service to be configured, and what the component's configuration
is. This configuration is seldom touched, but takes up a significant
chunk of the 'line count' for a typical application.

Silkworm could use Hivemind[2] to support a truly modular
architecture, similar to the concept of Blocks currently supported in
C2, but where blocks are parsed and aggregated at run time. The
Hivemind based infrastructure would allow (at minimum) the following
components to be contributed by plug-in blocks:

* Sitemap Tools
* Pipeline Components and shadows
* Pipeline shadow models
* Channel implementations
* Expression languages(?)
* Named sitemap fragments (including the 'root' sitemap)
* Auto-mounted sitemap fragments (e.g. /debug, /status etc)
* JMX MBeans

Using Hivemind would enable us all these implementations and services
to automatically wire themselves up. For example, dropping a module
containing a channel implementation into the classpath would be enough
to automatically ensure that the implementation is presented as a
candidate for negotiation. Equally, sitemap tool contributions would
not only contain high-level configuration information, but artefacts
such as an icon and documentation allowing IDE plug-ins to present the
tool in a graphical toolbar. Adding a module to the classpath will
automatically make it appear in all relevant locations.

Automounted sitemap fragments supports the ability for modules to
contribute a sitemap fragment, and specify that it is mounted on a
particular URI within the web application. This would enable us to
have a 'debug' module, for example, which is automatically mapped to
'/debug' whenever the module is in the classpath. Of course, this
contribution wouldn't just contain the sitemap, but meta-data about
it, enabling us to display auto-mount tables in a 'status' module, if
desired, itself contributed to '/status' by an automounted module.

All this infrastructure can be used by the /users/ of Silkworm as well
as the committers. For example, we could write a 'Skin' module which
provides for pluggable 'skins' for applications (including the built
in debug/status modules etc). Silkworm users could then write their
own 'Skin' modules, and they would be automatically applied to the
relevant applications.

--------
= 2.8 Block Stacking

Hivemind works by building a registry of all available modules, and
wiring them all together. It is not designed to change these bindings
at run-time, so we'd need to build that ourselves.

By default, Hivemind works by locating modules on the classpath.
Silkworm still supports this mode of operation. It would be perfectly
possible to build a Silkworm application just by adding all your
required modules (blocks) to the classpath of the application.

However, to support the ability to hot-swap modules, and add/remove
those modules at runtime requires something a little different. This
is where block stacking comes in.

The BlockStacker forms the core of this capability; it is a Hivemind
Service like any other, however it has a rather special role: It is
responsible for creating new Hivemind Registries from a set of blocks,
and detecting when something changes that means that the registry
needs to be rebuilt. The BlockStacker has access (via Hivemind) to a
collection of BlockProviders and a collection of BlockSelectors. The
initial Hivemind registry is called the 'bootstrap registry'.

+--------------+
| BlockStacker |
+--+--------+--+
   |        |
   |        +----------------+------------------------+
   |                         |                        |
+--+-------------------+ +---+------------------+ +---+--------------+
| IbiblioBlockProvider | | DefaultBlockSelector | | DepBlockSelector |
+----------------------+ +----------------------+ +------------------+

The BlockStacker has the following responsibilities:
* Soliciting lists of required blocks from the BlockSelectors.
* Loading required blocks using the block providers.
* Constructing a new Registry from these blocks.
* Starting the BlockStacker in the new registry.
* Issuing notifications to interested parties when a new registry is created.

BlockProviders have the following responsibilities:
* To load blocks to which they have access by name.
* To issue notifications that something has changed which requires a
registry rebuild.

BlockProviders are /not/ responsible to specifying /which/ blocks to
load, only for loading those that are. There may well be a number of
BlockProviders installed, and these form a 'chain of responsibility'
-- when a block is required, each BlockProvider attempts to load the
block in turn, until one succeeds, or we run out of providers.

Determining exactly /which/ blocks should be loaded is the
responsibility of BlockSelectors. BlockSelectors have the following
responsibilities:

* Providing lists of blocks which the selector thinks should be loaded.
* Providing notifications when this list changes.

It is anticipated that there would be two block selectors by default
in the silkworm core:

* The DefaultBlockSelector
  Selects blocks which the silkworm team determines are 'core' blocks which
  everyone should have loaded. The list is expected to be small.
* The DependencyBlockSelector
  Selects blocks based on the stated dependencies of blocks that have
  already been loaded. Exactly /how/ these dependencies are stated has yet
  to be determined, but may take the form of a separate dependency descriptor,
  or as part of the Hivemind configuration of the block.

As mentioned towards the top of this section, the user is perfectly
entitled to install blocks by placing them in the classpath of the
application itself. This is particularly useful because it allows the
application itself to become 'just another block'. This block has
dependencies like any other, so no specific block selector is required
to specify which blocks should be loaded for a given application.

Because when the BlockStacker creates a new registry, it starts a new
blockstacker as part of the new registry, this entire mechanism is
extensible using the blocks mechanism also. A block can contribute a
new BlockSelector or BlockProvider just as it would contribute any
other service.

As well as creating new registries, we also want to ensure that old
registries are destroyed. Special 'routing policy' components in the
top-level request handling infrastructure will deal with deciding
whether a given request (HTTP or otherwise) should be handled by the
'current' registry, or whether it should be delegated to the 'old'
registry. These components have the ability to signal that it may now
be possible to destroy an old registry, but they also have the ability
to veto a destruction if required. As soon as the new blockstacker is
started in the new registry, it issues a request to destroy the old
registry which may be vetoed if there are still resources in the old
registry that are in use. Likely routing policies might be:

* Request based
  Dispose of the old registry when all in-flight requests have completed
  meanwhile route all new requests to the new registry.
* Session based
  Dispose of the old registry when all in-flight sessions have been destroyed,
  meanwhile route all new sessions to the new registry.
* Quiescent (maybe)
  Wait for /all/ sessions to expire before switching to the new registry. Risky
  because it might lead to the switch never being made, but might be necessary
  if there is lots of inter-session communication.
  
--------
= 2.9 Build tooling

It is suggested that due to the highly modular architecture proposed
in this e-mail, Silkworm might be an appropriate candidate for a
Maven-based build. I am conscious that there is a lot of controversy
surrounding Maven. I have used it quite a lot lately, and have found
that despite it not being perfect, it allows me to get my job done
more quickly.

The Silkworm lends itself nicely to this kind of build system also,
because of the number of inter-block dependencies that are likely to
develop over time. Including a block in the build-time dependency list
for a block would lead to it automatically being included in any
Hivemind registries constructed during JUnit  integration test cases
etc.

Maven automatically manages the dependencies between sub-projects in a
build:  Using maven would mean that the build-order for all our own
blocks would be automatically determined.

Maven also automatically handles inter-project dependencies (i.e. on
other libraries etc, such as Hivemind), which would radically cut down
the size of our source distribution. It could allow people to download
and compile individual blocks (because they need a vendor branch, for
example) without having to download the entire source tree, and Maven
would automatically download the latest version of dependency blocks.

Additionally, it would make sense for us to make it easy for /users/
of Silkworm to use Maven, if they so desire. This would likely take
the form of a Maven plug-in allowing users to quickly create skeletons
of new applications and blocks. Maven would automatically handle
auto-deploying applications to Tomcat/JBoss (and hopefully soon
Geronimo!), so we wouldn't need to get involved in that side of
things. If possible (I've not got this deep into Maven yet), the
Silkworm plugin should also provide the ability to pre-download blocks
required into the project itself as dependencies, to facilitate people
who don't want to download these things automatically at runtime - I
know my employers would want this, for example.

We don't want to leave Ant users on their own either though. It would
be pretty trivial to provide a skeleton Ant project that people can
use to create their own applications. Because of the new blocks
infrastructure, it would be quite a small distribution -- only the
Silkworm kernel would need to be part of the distribution itself,
everything else could be dynamically loaded at runtime. Again, Ant
tasks could be provided for downloading blocks as dependencies at
build-time.

--------
= 2.10 Development tooling

Cocoon 2 is already a fairly complex tool. Silkworm offers even more
flexibility. C2 is crying out for really good tools; don't get me
wrong, I know there are some out there -- all I mean is that this is
something we want to perpetuate and improve on.

The silkworm architecture is designed from the ground up to support a
tooling-based approach as a complement to simply editing the source
code. The tools contributed by blocks aren't just designed to be used
from the sitemap XML, they can each provide an icon, documentation etc
to allow them to be visualised on screen. This doesn't just apply to
development tooling, but to debug consoles, status displays and the
like.

Because of the Block Stacker, we can automatically discover which
'tools' are available within a given application, and expose them in
graphical editors, content assist sitemap editors and the like. It may
be possible for us to support the concept of a 'tooling block' -- an
'XSLT block' could refer to an XSLT tool block, so that if the XSLT
block is available in a particular application, so is an XSLT editor.
Equally, the 'debug' block could expose an XMLRPC interface, and refer
to a debug tooling block, so that when the debug block is installed in
an application, you automatically get debug tooling in Eclipse.

These concepts are just ideas at the moment -- I don't have enough
experience of Eclipse to know how possible they are to implement.
However, I think it's safe to say we should be sticking to the
following principle:

* Make sure we /design/ for tooling, rather than adding it as an afterthought.
* Make sure working /without/ a tool is not a second class citizen in any way.

--------
= 3 Summary

I had intended to write some sections about how the Silkworm
architecture would work for a Cocoon developer, and a separate section
about how it would feel to /use/ Silkworm. However, this e-mail is
already getting very long, and anything else would just make it
unreadable. If there is sufficient interest in this 'proposal', I'm
more than happy to put this additional content together.

So, let me know what you think! Am I mad? Is this a bad/good idea?
What have I missed? Is this something we should take further, or just
a distraction?

To restate, it's been a while since I've been involved in Cocoon, so:
* If you disagree with anything I say, it probably means I'm wrong. No problem.
* If I'm touching on discussions that have already been had, then apologies.
* If you think I'm talking from my posterior, and this whole thing is
a big mistake, then you're probably right. This is a [RT], expect
random thoughts.

Thanks,

Paul

[1] http://www.luminas.co.uk/
[2] http://jakarta.apache.org/hivemind/

Mime
View raw message