cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Cocoon Offline mode and sitemap changes
Date Tue, 23 May 2000 12:58:48 GMT
Instead of waiting for Paul to initiate the discussion, I went ahead and
did it myself after taking a deep look at his (very nice) code.

Paul added an offline processing module for Cocoon2 and he did it
following the site-walking model of web spiders.

Here is the doc-fragement that he added to its sitemap

  <offline target="target">
      <startpoint uri="/welcome.html"/>
      <handler type="text/html"

which says:

for offline operation, use the specified sitewalker, start from
/welcome.html and process what's returned as the MIME type text/html
with the given handler.

After close analysys three actors can be identified:

1) the offline generator
2) the crawler
3) the link parser

Paul wrote all three of them they do their job very well. The problem is
are totally XML-unaware. And this is, IMO, a big design fault.

Let's get deeper:

1) the offline generator. The class that implements this is


I don't have problems in keeping this as it is, but suggestions are

2) the crawler (Paul called it sitewalker, but I like crawler much more)

Paul identified the need for multiple crawlers to generate a site. Is
this flexibility syndrome? Should each target have one crawler? Should
we have more than one entry point?

3) the link parser.

This is the most important design decisions and I believe that while
clever, Paul's idea of using MIME-driven link parsing may become very
dangerous. Suppose we generate FO + SVG: do we have to parse it back to
have the links? Do we have to create a link parser for every meaningful
MIME-type our formatters support?

I still believe XLink is the solution.

Cocoon must be able to recognize crawlers and give them the "original"
XML view of the file, before adaptation.

But how can Cocoon enforce the creation of a semantic view before

I believe the sitemap needs to be changed to allow this but I still
don't know how to do it.

Something like

<process uri="hello">
  <generator name="file">
   <parameter name="location" value="../hello.xml"/>
  <filter name="xslt">
   <parameter name="stylesheet" value="..."/>
  <serializer name="html">
   <parameter name="contentType" value="text/html"/>

<process uri="data/report">
  <generator name="file">
   <parameter name="location" value="../report.xsp"/>
  <filter name="xsp">
   <parameter name="logicsheet" value="..."/>
  <filter name="rdf-izer"/>
  <filter name="xslt">
   <parameter name="stylesheet" value="..."/>
  <serializer name="html">
   <parameter name="contentType" value="text/html"/>

which indicates -clearly- the difference between an original XML source
and some adapted view (which is optional, of course).

This is due to the fact that the generator/filter/serializer doens't
indicate clearly _where_ semantic information is added, transformed or
lost, so we must indicate so.


Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message