cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <>
Subject Re: Matchers vs. Selectors [was Re: Retuning Sitemap Design]
Date Tue, 15 Jan 2002 20:23:15 GMT
Stefano Mazzocchi wrote:

> Berin Loritsch wrote:
>>My approach assumes that the regexp is used in the unrolling process.  I.e.
>>when the sitemap is set up, the sitemap checks for files that match the
>>patterns in the "src" attributes like this:
>>     store-1001/
>>        section1/index.html
>>        section2/index.html
>>     store-2056/
>>        section1/index.html
>>        section2/index.html
>>Such a directory structure will unroll to the following lookup values in
>>the hashmap:
>>So that during the runtime of the system, the request is easily matched
>>with a simple get on a hashmap.
>>This also provides another layer of validation: No uri is added that
>>cannot be resolved--offering a quick 404 detection.
> ????
> The above it's possible *only* if you assume that Cocoon already knows
> all the URIs that it will serve, but this is *almost never* the case,
> unless you go back to the URI-2-filesystem type of matching that is
> *exactly* what I want to stay away from.

Notice the key word "unrolling".  The resources are dynamically unrolled
when the sitemap is created.  It is possible to figure out the actual
URIs from the combination of pattern and "src" attribute looking at the

> Think of a wildcard matching of
>  movies/*/*/*
> where
>  {1} is a country
>  {2} is a city
>  {3} is a movie-title
> how in hell are you going to 'unroll' this?
> I'm very puzzled.

You are missing the key pieces.  You need three pieces of information in
order to unroll a URI.

1) the pattern
2) the src attribute on the generator/transformer(s)
3) the resource location

Are there some things that can't be unrolled?  Sure, but a great many
things can.  In many installations, a URI will represent a collection of
resources on the harddrive that the sitemap then assembles into the final

With your pattern given above, we might have something like this:

<map:match pattern="movies/*/*/*">
   <map:generate src="docs/movies/{1}/{2}/{3}.xml"/>
   <map:transform type="i18n" src="{1}"/>
   <map:transform src="site2xhtml.xsl"/>

We now have two of the three parts.

The unrolling algorithm would perform the following steps:

1) look in ${context}/docs/movies and get a set of all directories
        (since we have another level)
2) look in each directory in the set for another subdirectory
        and refine the set to include all matches so far--
        discarding false matches from the previous step.
3) look in each directory in the new set for all "*.xml" files
        refining the set to include all definite matches.

That approach is the monolithic "all in advance" algorithm.
An alternative algorithm--one that is not so time consuming, and
one that is immune from resources that are not on the drive is
the incremental approach.

The sitemap always checks the unrolled URIs first.  If the URI
is not in there, it tries to resolve it the hard way.  When a
match is found, the requesting URI is added to the unrolled
URI map.

This approach is preferred, as it favors real matches and speeds
the process of matching those resources.  Bad URIs will always
take longer to process, but that is ok--we want to favor good


"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin

To unsubscribe, e-mail:
For additional commands, email:

View raw message