cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject [C2] Link filtering and Content aggregation
Date Mon, 02 Oct 2000 11:03:31 GMT
Stylebook is a great little tool. It's more or less a black box and
"syles" are a collection of closely connected XSLT stylesheets that use
special URIs to obtain access to the stylebook internals.

It was written to work, it was written under pressure, it was written by
Pier alone without a community to give him feedback.

It's great, but it's so full of design problems (not Pier's fault, being
based on C1 design it inherited all the design problems that C1 has)
that is is much easier to throw it away than to clean it up.

NOTE: I'm sure there will be people that will continue to use stylebook
even when C2 will be out. Why? well, stylebook is *dead simple* when you
do what you are supposed to do... it's a tool for secretaries and it
works great for sites which has the same layout model: sitebar on the
left, logo on top, content in the main frame.

So far, all the xml.apache project use stlylebook and recently jetspeed,
turbine and avalon started to use it as well (turbine and avalon have
their own skins as well)

If you only used stylebook, I know you love it and don't see any
problems with it: it's a magic tool that does the job for you (more or
less an autodoc)... but if you ever tried to write a skin for it...
well, you know what I mean when I say there are problems.

So, while stylebook is able to perform static generation only and
Cocoon1 only dynamic generation (there is a CLI but it's pretty
useless), C2 is able to do both and, here is the magic, using the exact
same sitemap!!!!

So, if your site works as expected on your browser, it will look exactly
the same once a snapshot is taken and saved on disk. While this is not
used to replace Cocoon2 dynamic generation (a proxy increases C2 speed
faster and it's much more integrated), it is mainly used for
documentation generation.

So, by indicating what browser you are simulating from the CLI, you
could either obtain a WML snapshot or a PDF snapshot or an HTML
snapshot... it depends on your sitemap, but you don't have to duplicate
your efforts because if it works on the web it works from command line
as well.

NOTE: there are limitations and things like sessions and cookies are yet
handled by the CLI, but this is just because I didn't have time to
implement it, not that is not possible to do so (even if it will be
admittedly hard to fully emulate a browsing experience from command

But using the exact same sitemap creates a few problems that we still
have to address:

link filtering

Today, C2 follows _all_ valid local links. This means that there is no
way to prevent C2 from taking a snapshot of, for example, a timestamping
XSP page, which would not have much having on a static file.

Initially, I was able to perform crawling blocks using specific xlink
roles, but I recently understood this breaks SoC between the document
writer (which is responsible for links as content enhancement) and the
sitemap maintainer (which is responsible for resource generation and
link enforcement).

The URI space is the main contract on a web site and the sitemap is the
only place where this contract is enforced.

IMO, we need to expand the sitemap semantics to allow resources to be
blocked from CLI crawling. The best way, IMO, is to add a specific
attribute to the resource indicating elements... these elements are

 - match
 - mount

and we just have to define an attribute name between

 - crawl
 - crawlable
 - walk
 - walkable
 - ???

for example

 <map:match patter="someuri" crawl="no">

will return a specific error number to the CLI requesting the page.

What do you think?

Content Aggregation

a key feature of a publishing system is the ability to mix different
sources of documents into the same page. JetSpeed does this in a very
specific way, we, on the other hand, should try to be as general as
possible, also to allow things like jetspeed to operate on top of us and
inherit our functionalities.

so, the main thing is the creation of a way for pages to include content
generated from other sitemap resources rather than external ones.

It was already proposed to use the "cocoon:" protocol and to access them

 <sitebar xinclude:href="cocoon:/sitebar"/>

is expanded at runtime as

  <item xlink:href=".."/>
  <item xlink:href="index"/>
  <item xlink:href="user-guide"/>

that can be later on translated into HTML and, for example, saved on
disk by the CLI.

This will completely emulate stylebook functionality and allow us to
write our docs using the same exact layout we are using now but using C2
as an engine.

Ok, let's discuss these two things so that we can have a release.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message