cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject [C2] Sitemap issues to solve
Date Fri, 16 Jun 2000 18:40:39 GMT
Ok,

C2 development is stalled and this is due mainly to the fact that the
sitemap schema is not solid enough even for a rought implementation.

We need to remove this lock as soon as possible since too many things
are waiting for C2 to happen.

In this mail I'll outline what not yet solved and the issues that were
brought up. I'll also suggest a way to revolve them. As always, these
are my personal opinions and I count as one, like anyone of you. But
please, let's try to finish this and avoid the "second generation
syndrome" where you want it to do everything and even brew your coffee
and wash your dog.

On the other hand, let's keep thinking big: Cocoon should be able to
handle the largest site in the world and still be able to scale.

                 ---------- o ------------------

Issues to solve are:

1) Component Model
------------------

We have decided to make C2 avalon-aware. This includes the internal
component model (which the pipeline will implement) as well as the
external component model (which Cocoon will implemenet to access those
required blocks such as database connection pooling, logging, thread
recycling and all those fancy services you normally need).

The sitemap has the notion of a general component. I think might be
dangerous since it could allow cascaded sitemaps to "tweak" the
component installation and provide security holes. Since I don't think
sitemaps should deal with such high level components, I suggest we
remove the 

 <component>

element from the sitemap and only leave

 <generator>
 <filter>
 <serializer>
 <matcher>
 
as possible sitemap components.



2) Matching architecture
------------------------

The latest sitemap WD is based on the idea that URIs are the main
matching parameter. This is implied by the fact that process is
declaratively hooked to a URI pattern.

 <process uri="/users/*-??/**">
  ...
 </process>

some identified this as dangerous asymmetry, indicating this is just
another form of matching based on the URI instead of other parameters.

So, instead of current 

 <process uri="/users/*-??/**">
  <if test="browser accepts image/svg">
  ...
 
 </process>

it was proposed to do

 <matcher type="uri">
  <match pattern="/users/*-??/**">
   ...
  </match>
 <matcher>

which is move verbose for URI matching, but creates a symmetry between
all possible matching capabilities.

Let's not decide which one is better for now, let's keep going.

The other part of a matching architecture is the conditional model. A
conditional model should be boolean-equivalent to be complete. This
means that given any possible boolean expression, even the most complex
one, should be represented in our model. It can be shown that any syntax
that is able to encode NAND or NOR operations is boolean-complete.

Two conditional models were proposed (excuse the pseudo-dtd)

[procedural conditional model]

 <if></if>
 [<else-if></else-if>]*
 <else></else>

and 

[declarative conditional model]

 <choose>
  [<when></when>]+
  [<default></default>]?
 </choose>

it can be easily shown they are both equivalent to a boolean-complete
model.

In sake of coherence, I agree the declarative model is better suited for
sitemap schemas even if I was the one that proposed the procedural
model.

This leads to an interesing vision

 <matcher type="..">
  [<match test=".."></match>]+
  [<default></default>]?
 </matcher>

is substantially equivalent to the XSLT model if the matching
capabilities is specified

 <choose type="xpath">
  [<when test=".."></when>]+
  [<default></default>]?
 </choose>

So, my proposal is to adopt the following matching model:

 <choose type="..">
  [<when test=".."></when>]+
  [<default></default>]?
 </choose>

where

 type=".."

is an IDREF to the "choosing" component (XXX: should we call it
"chooser" instead of "matcher"? should we keep the XSLT model? should we
use "matching"?)

and

 test=".."

indicates a boolean test that must be performed. Following the XPath
model, we should use

 <choose type="browser">
  <when test="accepts('image/svg')">
  </when>
  <default>
  </default>
 </choose>

so all tests take the form

 method(pattern)

and this should reflect well to all programming languages, for example,
in java it would directly call

 public boolean method(String pattern);

I think that even if a little more verbose, such conditional model is
very balanced, very flexibly and easy enough to understand by sitemap
maintainers. Also, I believe my "girlfriend test" should not be our
concern since my girlfriend will never do this job anyway :-)

Seriously, the previous draft was unbalanced toward URI-reaction.
Unbalanced schemas are a good thing when you already know what are the
best design patterns for sitemap generation. But I honestly don't and
they might not even exist for what we know today.

To, I think it's better to be as neutral as possible for the first
sitemap generation and maybe do a redesign later on.

IMPLEMENTATION NOTES:

A sitemap can be interpreted or compiled. Given the amount of "XML ->
java" code generation experience we have on this project, it makes
perfect sense, IMO, to compile the sitemaps.

This might even reuse all the XSP machinery for that and simplify
sitemap generation in just a single XSLT->Java logicsheet. This might
allow us to change the schema and adapt the sitemap-interpretation code
just as easily, but I want to hear more from experts in this area before
continuing on this (Ricardo, what do you think?).

Anyway, this doesn't impact the sitemap schema in any way.



3) Parameter percolation
------------------------

Another thing that was embedded into <process uri=""> was the pattern
paradigm used to fragment the uri into pieces that were used later down
the pipe.

This is a _vital_ feature that allows component programmers and sitemap
maintainers to separate their concerns.

This is what I identified with "parameter percolation".

In the current WD we are able to perform things like

  <process uri="/\([0-9]\{4\}\)/\([0-9]\{2\}\)/">
    <set-parameter name="year" value="$1"/>
    <generator name="serverpages" src="/$1/dailynews.xsp"/>
    <filter type="xslt" src:local="./stylesheet/news.xsl"/>
    <serializer type="html"/>
  </process>

now we have

  <choose type="uri">
   <when test="/\([0-9]\{4\}\)/\([0-9]\{2\}\)/">
    <generator name="serverpages" src="/$1/dailynews.xsp"/>
    <filter type="xslt" src:local="./stylesheet/news.xsl"/>
    <serializer type="html"/>
   </when>
  </choose>
    
but how do we know what "$1" is?

Let us try to write the java code for the URI chooser

 public class RegexpURIChooser implements Chooser {
   ??? default(String test) {
     ...
   }
 }

where

 ??? indicate what object the method should return
 "default" is the name of the "default" method

but we might want to implement something like

 public class GenericURIChooser implements Chooser {
   
   // default is wildcard mapping
   ??? default(String test) {
     ...
   }

   // this uses regexp
   ??? regexp(String test) {
     ...
   }
 }

now, how do we "percolate" the parameters? One possibility is to use

  ??? -> java.util.Map

(java2 collections finally!!!!)

and test if the return object is null to grasp its boolean value. So, if
the result is null, the test was not true, if so, the Map contains the
parameters that can percolate thru the pipeline and accessed with
"$name" inside the sitemap.

Note: the scope of these parameters belongs to the <when></when>
element, and these parameter Maps should be cascaded as well, with
scoping precedences from deeper nested to lower nested conditional
elements.

For example

 <choose type="uri">
  <when test="/\([0-9]\{4\}\)/\([0-9]\{2\}\)/">
   <generator name="serverpages" src="/$1/dailynews.xsp"/>
   <choose type="browser">
    <when test="name(Mozilla *)">
     <filter type="xslt" src:local="./style/mozilla-$1/news.xsl"/>
     <serializer type="html"/>
    </when>
    <when test="name(*MSIE 5.*)">
     <serializer type="xml"/>
    </when>
    <default>
     <filter type="xslt" src:local="./style/default/news.xsl"/>
     <serializer type="html"/>
    <default>
   </choose>
  </when>
 </choose>

where "$1" means different things depending on its location.

NOTE: there is the very high risk of people _abusing_ these features to
move
complex logic into the sitemap instead of keeping them inside the
choosing components. For example something like this

 <choose type="browser">
  <when test="version()">
   <filter type="xslt" src:local="./style/mozilla-$version/news.xsl"/>
   <serializer type="html"/>
  </when>
 </choose>

is perfectly legal even if this doesn't choose anything!



4) Sitemap customization
------------------------

Some people would like to be able to create their own sitemap schemas.
This is a very common situation in the XML world where standard schemas
are used as strong contracts but may be too complex for thier needs and
proprietary schemas are just perfect but non-standard (so they don't
make contracts).

The XML model proposes XSLT as a solution for this problem: there is a
complete but complex standard schema, a simpler but non-standard schema
and some transformation logic that is able to transform the simpler in
the standard.

This is exactly the pattern that stylebook used, but it created the
impression to many that the simpler schema was the standard one. In this
case, the schema adaptation model was helpful for normal users, but
harmful for power users.

While I think the XSLT model is _not_ harmful by itself even if applied
at the sitemap schema, Cocoon should define its main sitemap schema and
let users define their own simpler schemas if they like to do so.

A way to transparently make this happen, even for cascaded sitemaps is
the use of namespace reaction. This means that Cocoon will interpret the
sitemap based on the namespace URI used for the sitemap elements.

The main Cocoon sitemap will be found at

  http://xml.apache.org/cocoon/sitemap/[version]

and will be "the only one" cocoon is able to process. In the future, if
required, the sitemap engine should be able to "adapt" simpler schemas
to the default one by using XSLT transformations indicated in the Cocoon
configuration file (cocoon.xconf). NOTE: _not_ in the sitemap itself, no
use of PI for this since sitemap writers should not be aware of where
the adaptation XSLT sheet is located.

If this is implemented, we are able to create unbalanced sitemap schemas
like

 <process uri="">
 </process>

simply by using the adaptation sheet

 <xsl:template match="process">
  <choose type="uri">
   <when test="{@uri}">
    <xsl:apply-templates>
   </when>
  </choose>
 </xsl:template>

that should make everybody happy and allow site managers to "craft"
their sitemaps schemas around the ability of the cascaded sitemap
owners.

Note, this also allows schema i18n, for example

 <processa indirizzo="..">
  ...
 </processa>

with

 <xsl:template match="processa">
  <choose type="uri">
   <when test="{@indirizzo}">
    <xsl:apply-templates>
   </when>
  </choose>
 </xsl:template>

NOTE2: this does _not_ implies any differences in the original sitemap
schema, but only at the implementation level. For this reason, this can
be implemented when/if the need emerges, but doesn't impact C2 normal
operation. (again, the beauty of the separation of concerns).


5) Sitemap Version

Tying the sitemap version number (found in the namespace) with Cocoon
version is nonsense, since Cocoon is very likely to change more often
than the sitemap schema (at least, we all hope :)

So, I think we should use version "1.0".

The versioning scheme should be

 major.minor

where minor is incremented every time something is added but the schema
is back-compatible. major is incremented when the schema is not back
compatible.


6) Resource Loading Architecture

Since we want to separate the sitemap schema from the resource loading,
we defined another namespace for the resource locating attributes. This
namespace will be

 http://xml.apache.org/cocoon/loaders/[version]

where version should be "1.0" and versioning scheme same as above.



Hew, that was long.

Let's see how you guys digest this one. :)

Of course, comments very appreciated.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------


Mime
View raw message