cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: Retuning Sitemap Design
Date Thu, 10 Jan 2002 21:27:07 GMT
Berin Loritsch wrote:
> Stefano Mazzocchi wrote:
> > The Cocoon sitemap concept that is currently implemented is more than
> > two years old. Not much, but older than some W3C recommendations for the
> > XML model.
> Amazing isn't it?
> I am sorry it took me so long to jump in here, but I do want to make a
> few points here:
> 1) Our focus could be better spent by virtualizing the Sitemap.  This
>     approach allows us to explore different markups and possibilities
>     without tying ourselves to one implementation.  The Role of a Sitemap
>     in its simplest form is to map a Request to a Response.

Hmmm, I disagree.

Our focus would *not* be better spent by virtualizing something so
important as the sitemap.

Before making possible for everybody to add their own markup language
into something so critical, I want to get impressions and design right
first, complete documentation later.

I'm pretty sure everyone of us can come up with another
request-2-pipeline mapping language (not even XML syntax!) but that
would *reduce* focus and will clearly not Cocoon to solidify.

Yes, guys, the direction is "solidify", not 'expand into ultimate

Don't get me wrong, I love to design and I love to innovate, but without
solidity we don't get users and without users we don't get feedback, and
without feedback our stuff gets accademic and misses the point.

Now it's time to stabilize, making the sitemap engine pluggable at this
time, *hurts* the project rather than helping it.

> 2) By forcing the semantics of the Sitemap to be procedural instead of
>     declarative, we have in effect created a mini scripting language.
>     This does not help administrators.

Again, I disagree. The sitemap semantics has a mix of declarativity (for
example matchers) and procedurality (the direct pipeline components).

I recently had to go back writing some httpd.conf configurations (which
are completely declerative) for a site and I can't tell you how much I
missed sitemap semantics that allowed to say exactly what I wanted.

BTW, Apache 2.0 will very likely require more procedurality in order to
drive their layered I/O (i.e. pipelines of modules)

Moreover, who said web administrators should be the one who write a
> 3) The Sitemap in its current incarnation requires too much systemic
>     knowledge of the administrator--the reason the strict filesystem to
>     URI space is so popular among venders is that it is so easy to maintain.
>     We already know the dangers of that approach when a site reorganization
>     has to be made though.

Again. The URI-file one2one mapping is a hack. Big time hack and
generates many more problems than it solves! If you have been around the
web enough, you know what I'm talking about.

The sitemap *removes* this power from their hands. And this is a

Now an administrator knows that Cocoon is handling part of the entire
URI space and this is all they have to know about it.

Somebody else takes care of the sitemaps and delegates the various
sitemaps to the people responsible for that. This rebalances the

> All your points are very important, and should fix the current implementation
> of the Sitemap.  I also agree that all Sitemaps should be validated, or at
> least we provide a tool to validate them against a webapp context.

> However, getting back to point 1, by virtualizing the Sitemap we make the
> provision that a configuration for the Sitemap (i.e. pipeline declaration)
> is implementation dependant.  This allows us to support the current procedural
> <matcher> and <selector> oriented Sitemap for legacy situations, but to
> explore more declarative approaches in future implementations (point 2).

A sitemap is a contract.

Big time contract. Probably the most important we currently ship with
Cocoon. For sure, the most used.

But suggesting to make this contract 'pluggable' for sake of innovation,
you are basically stating the equivalent of using introspection to the
pipeline component interfaces so that component interfaces can be made
pluggable and make it easier to innovate.

Apply this to Avalon and you know what I mean.

I'd rather patch the current contract that provide flexible one.

Sounds *very* similar to FS to me.

> It is my observation that 60-70% of all pipelines are dependant strictly
> on the URI space in real life situations.  By providing for a markup
> like this:
> <map:pipelines>
>    <map:pipeline uri="re:foo\/bar-\([0-9]*\).html">
>      <map:generate src="docs/foo/bar-{1}.xml"/>
>      <map:transform/>
>      <map:serialize/>
>    </map:pipeline>
>    <map:pipeline uri="wildcard:foo/bar/baz-*.html">
>      <map:generate src="docs/foo/bar/baz-{1}.xml"/>
>      <map:transform/>
>      <map:serialize/>
>    </map:pipeline>
> </map:pipelines>
> We allow the Sitemap implementation to expand all the valid URIs at initialization
> time, and preassemble the Pipelines so that they can be accessed by a simple
> Hashmap lookup.

Tell me why this is not possible with current matchers?

> Assuming the drive space looks like this:
> docs/foo/
>      bar-1.xml
>      bar-500.xml
>      bar-zztop.xml
>      bar/
>        baz-1.xml
>        baz-500.xml
>        baz-zztop.xml
> the URI space in the Map would be like this:
> foo/bar-1.html
> foo/bar-500.html
> foo/bar/baz-1.html
> foo/bar/baz-500.html
> foo/bar/baz-zztop.html
> If you notice, foo/bar-zztop.html will never be matched because "zztop" is not
> a string of numbers.
> This allows a very quick test of which resource to return--and if there are no
> matching resources!  It also allows the validator to test if there are any files
> in your context that are dead weight.
> A HashMap lookup is far more efficient than the procedural approach encouraged
> today.

I think you are taking an implementation detail for a design issue.

> For that 30-40% of the time when you truly dynamic pipelines, it can be achieved
> by non-URI related matches inside of a pipeline, and are only applied to one or
> more components in the pipeline.
> An additional benefit of the declarative approach is that you clearly demarkate
> what you expect a given pipeline to be so that pipelines whose sole role is within
> an Aggregation does not have to supply a serializer.

Stop right here! 

I totally agree that since the aggregating semantics were added *after*
the initial design phase, the sitemap lacks the semantics to express the
difference between internal and external pipelines.

But again, I'd rather patch what we already have rather than making
possible for everybody to create a new sitemap.

BTW, remember: we declared a versioned sitemap namespace exactly to
allow innovation to take place without braking back compatibility.

> Another addition to the procedural approach to minimize the impact of point 3
> is to select the serializer depending on the expected mime-type and source of
> the input.  For instance, a pipeline with a mime-type attribute of "image/png"
> and a source of Reader will not choose a Serializer.  However, the same mime-type
> with a SAX source would choose the SVG Serializer.

I thought about this *a*lot* and I think that automating pipeline
assembling might become a source for unexpected behavior *very* hard to

My current perception is that this sort of 'componentization help'
should come from a sitemap authoring tool and not by the interpreting
engine itself.

But I have to admit I don't have a clear vision of this myself.

> By automagically determining the serializer types, you never have to explicitly
> declare them making all pipelines whether they are part of an Aggregation or not
> appear similar.

Remember than sometimes verbosity *is* useful! In this case, explicit
serializer force the author to think about what's going on.

> Currently, by forcing an explicit matching of all parts of a pipeline, you force
> the administrator to know too much of the Cocoon domain.  That is counter-productive
> when an administrator's job is defined as managing URI space and ensuring the system
> is running.

This is the most important point: the administrator's jog IS NOT TO
DEFINE THE URI SPACE, is to make sure that the server runs as expected.

The URI space is the most visible and important contract on the server.

It's sublimated *content* on itself.

Why in hell should you give to the same person control over the top-most
and bottom-most part of a system?

It would be like making the printer-men of a newspaper deciding the
article titles.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche

To unsubscribe, e-mail:
For additional commands, email:

View raw message