cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject [RT] Content aggregation
Date Tue, 03 Oct 2000 20:30:54 GMT

welcome to a new episode of the infamous RT series of the Cocoon
development mail list... for those new lurkers, RT means Random
Thoughts, and if a mail starts with [RT] it means it can contain
potentially good or crappy ideas and to decide this the author needs

RT are my favorite research tool: throw stones in the lake and see what
sinks and what floats.

Ok, here we go.

                           ------------- o -------------

Everybody knows that dynamically generated web content is less expensive
than human-generated content. This is why the web turned from mostly
static (human generated) to mostly dynamic (machine generated).

Don't get me wrong, almost all data is human generated, but I mean
directly written by authors with authoring tools or editors, raw
database data (even if insered by humans) is another story.

A web publishing system "cooks" "raw" information to present it on the
web as requested. 
If the "raw" information in a page comes from a single data source, the
content is said to be "presented" or "published"... but if the "raw"
information in a page comes from more than one data source, the content
must be first "aggregated", "combined" before being "presented".

first problem: what defines the aggregation? whose concern is that?

The aggregation can be pictured in graphical terms as a layout of areas,
but more abstractly, a collection of connected resources, each with a
particular role.

The aggregation of a particular resource is, at the end, tree-shaped,
but each resource doesn't need to know how its internal resources are
generated (this is enforced thru a solid resource addressing contract
which allows stronger SoC).

NB. SoC = Separation of Concerns.

So we don't loose generality if we focus on one-level aggregation which
is something like this:

 parent res
  +-- child res
  +-- child res
  +-- child res

second problem: the same aggregation structure is normally shared by
many different resources so the request indicates one internal resource
instead of the above one.

Example: in such a resource structure

   +-- logo
   +-- navigation
   +-- page

the page resource is influenced by the requested top resource, while the
requested top resource doesn't change.

Let's try to picture how a sitemap could approach this

 <map:match pattern="docs/*">
  <map:generate type="aggregator">
   <part name="logo" uri="images/logo" aggregates="no"/>
   <part name="navigation" uri="navigation/docs"/>
   <part name="page" uri="pages/{1}" view="content"/>
  <map:serialize type="xml"/>

where the "aggregator" generator is able to perform internal redirection
and create something like


  <str:part name="logo" xlink:href="/images/logo"/>

  <str:part name="navigation" xlink:href="/navigation/docs">
   <nav:bar xmlns:nav=""/>
    <nav:group name="main">
     <nav:link xlink:href="whatever"/>
     <nav:link xlink:href="whatever-else"/>
     <nav:group name="nested">
      <nav:link xlink:href="whatever-nested"/>
      <nav:link xlink:href="whatever-else-nested"/>

  <str:part name="page" xlink:href="pages/simple-page">
   <page xmlns="">
     <title>This is a simple page</title>

which includes all the necessary information to create the page without
containing any style information. (otherwise we'd break SoC!!)

second problem: is an aggregating generator enough? is the notion of
"content aggregation" important enough to be placed directly into the

The second question should be answered with the awareness that the
aggregator should avoid serialization/deserialization (for performance
reasons) of SAX events.

                        -------------- o -------------

Ok, now we have aggregated content.... what do we do with it?

We have to style it.

Smart, isn't it?

There are two different style concerns: area placing (layout) and
content adaptation (style). Normally, the two things are done by
different people: the layout writer is not usually an artist, but it's
more a user interface designer that looks at different locations for
information and understand where the things should work best.

>From the most simple web site to the most complex portal, layout is a
fundamental part of every 2D interface: newspapers invented the idea of
2D layout and visual design patterns... while the "look" of the page is
defined by the style, the "feel" of the page is defined by its layout.

Cocoon must therefore allow these concerns to be separated.

How? well, let's try to come with a reasonable solution... we now we
have a highly orthogonal document which contains n+2 namespaces where n
is the number of aggregated resources (each resource to be aggregated
*MUST* output a namespace, this will be absolutely required and Cocoon
might signal an error in case no namespace is present to avoid

(+2 'cause of the xlink and structure namespace)

ok, let's see...

 <generate type="aggretator">
 <transform src="/www/graphics/fancy/structure2html.xsl"/>
 <transform src="/www/graphics/fancy/navbar2html.xsl"/>
 <transform src="/www/graphics/fancy/doc2html.xsl"/>
 <serialize type="html"/>


 - structure2html generates the layout frame and copies all the other
 - navbar2html stylizes the navbar but leaves all the rest untouched.
 - doc2html does the same thing for the doc

Think about the structure document as a bandwidth spectrum

     |s|  n  |          p          |s|

and each transformer as a stop-band filter (hope you have some
electronic background here)

 ----+ +---------------------------+ +------
     | |                           | |         s -> html
     +-+                           +-+

 ------+      +----------------------------
       |      |                                n -> html

 -------------+                    +--------
              |                    |           p -> html

which change when they are "low" and left untouched when they are

The result is 

     |             html              |

which can be sent directly to the browser.

NOTE: the "logo" is not aggregated but linked. This is because HTML
doesn't include images (like PDF does, for example) but links them. For
the image is useless to aggregate the binary result, but in case of
navbar, one could aggregate the content to create the navbar inside the
HTML, or willing "NOT" to aggregate the content to leave it out for a
flash or SVG object.

The same thing could be said when generating content aggregation for PDF
books or HTML frames or WML decks.

                                  ----------- o -----------

I believe the above is a clear and defined separation of the concerned
involved in aggregated web content... being almost all web content
aggregated, I think we need to resolve this in Cocoon directly and not
in higher-level packages (such as Jetspeed) which can use these
abilities yet remain focused on specific tasks.

There are things that must be addressed and I'll try to do it in the
near future (hopefully with your direct help):

 - how does the above model impact on a already established digital
publishing workflow?
 - where do existing (and yet to exist) authoring tools come to help?
 - is XSLT the right language for transformations? do we need to define
a simpler and  single-namespace-filtering transformation-by-example
language to decouple operations and simplify them?

Ok, this should keep your neurons busy for a while.

Try to apply the above model to your own experience and tell me what you
think about it.

As always, praise, comments, flames and suggestions are welcome :)

And from the fried brain (with fever... yes, I have a bad cold :( this
is, stay tuned for a new episode where your heros:

 - find a way to solve the above problems
 - implement it in a day
 - kick ass to any existing publishing framework in existance
 - become rich, famous and a bunch of beautiful and xml-loving ladies
come to them :)

Ok, ok, enough :)

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message