cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@apache.org>
Subject Re: Retuning Sitemap Design
Date Thu, 10 Jan 2002 15:03:37 GMT
Stefano Mazzocchi wrote:

> The Cocoon sitemap concept that is currently implemented is more than
> two years old. Not much, but older than some W3C recommendations for the
> XML model.


Amazing isn't it?

I am sorry it took me so long to jump in here, but I do want to make a
few points here:

1) Our focus could be better spent by virtualizing the Sitemap.  This
    approach allows us to explore different markups and possibilities
    without tying ourselves to one implementation.  The Role of a Sitemap
    in its simplest form is to map a Request to a Response.

2) By forcing the semantics of the Sitemap to be procedural instead of
    declarative, we have in effect created a mini scripting language.
    This does not help administrators.

3) The Sitemap in its current incarnation requires too much systemic
    knowledge of the administrator--the reason the strict filesystem to
    URI space is so popular among venders is that it is so easy to maintain.
    We already know the dangers of that approach when a site reorganization
    has to be made though.

All your points are very important, and should fix the current implementation

of the Sitemap.  I also agree that all Sitemaps should be validated, or at
least we provide a tool to validate them against a webapp context.

However, getting back to point 1, by virtualizing the Sitemap we make the
provision that a configuration for the Sitemap (i.e. pipeline declaration)
is implementation dependant.  This allows us to support the current procedural
<matcher> and <selector> oriented Sitemap for legacy situations, but to
explore more declarative approaches in future implementations (point 2).

It is my observation that 60-70% of all pipelines are dependant strictly
on the URI space in real life situations.  By providing for a markup
like this:

<map:pipelines>
   <map:pipeline uri="re:foo\/bar-\([0-9]*\).html">
     <map:generate src="docs/foo/bar-{1}.xml"/>
     <map:transform/>
     <map:serialize/>
   </map:pipeline>

   <map:pipeline uri="wildcard:foo/bar/baz-*.html">
     <map:generate src="docs/foo/bar/baz-{1}.xml"/>
     <map:transform/>
     <map:serialize/>
   </map:pipeline>
</map:pipelines>


We allow the Sitemap implementation to expand all the valid URIs at initialization
time, and preassemble the Pipelines so that they can be accessed by a simple
Hashmap lookup.

Assuming the drive space looks like this:

docs/foo/
     bar-1.xml
     bar-500.xml
     bar-zztop.xml

     bar/
       baz-1.xml
       baz-500.xml
       baz-zztop.xml

the URI space in the Map would be like this:

foo/bar-1.html
foo/bar-500.html
foo/bar/baz-1.html
foo/bar/baz-500.html
foo/bar/baz-zztop.html

If you notice, foo/bar-zztop.html will never be matched because "zztop" is not
a string of numbers.

This allows a very quick test of which resource to return--and if there are no
matching resources!  It also allows the validator to test if there are any files
in your context that are dead weight.

A HashMap lookup is far more efficient than the procedural approach encouraged
today.

For that 30-40% of the time when you truly dynamic pipelines, it can be achieved
by non-URI related matches inside of a pipeline, and are only applied to one or
more components in the pipeline.

An additional benefit of the declarative approach is that you clearly demarkate
what you expect a given pipeline to be so that pipelines whose sole role is within
an Aggregation does not have to supply a serializer.

Another addition to the procedural approach to minimize the impact of point 3
is to select the serializer depending on the expected mime-type and source of
the input.  For instance, a pipeline with a mime-type attribute of "image/png"
and a source of Reader will not choose a Serializer.  However, the same mime-type
with a SAX source would choose the SVG Serializer.

By automagically determining the serializer types, you never have to explicitly
declare them making all pipelines whether they are part of an Aggregation or not
appear similar.

Currently, by forcing an explicit matching of all parts of a pipeline, you force
the administrator to know too much of the Cocoon domain.  That is counter-productive
when an administrator's job is defined as managing URI space and ensuring the system
is running.

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message