Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm
Message-ID: <3A93D7F5.A6273761@apache.org>
Date: Wed, 21 Feb 2001 16:00:05 +0100
From: Stefano Mazzocchi <stefano@apache.org>
Organization: Apache Software Foundation
MIME-Version: 1.0
To: cocoon-dev@xml.apache.org
Subject: Re: [c2] Cocoon XInclude
References: <CHEGLKNOIJHFNDNBCIAJEEIMCAAA.tcurdt@dff.st>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Torsten Curdt wrote:
> 
> [snip]
> >                            Cocoon XInclude
> >                            ===============
> >
> > Cocoon requires a way to specify content aggregating behavior.
> 
> Oh, yes!!:)
> 
> > This is defined by making possible for a generated page to trigger a
> > Cocoon internal subrequest and substitute the triggering content with
> > the content generated from the internal subrequest.
> 
> So a generator can trigger another generator via this subrequest.

More or less, yeah, but not directly.

You do something like this

 /resource [ G ---> T -*-> T ---> T ] ---> S
                       |
                [/resource/include]
                       |
                       + [ <--- T <--- G ]

where '*' is the transparent including mechanism that reacts on the
"cocoon include" namespace (element or attribute reaction is yet to be
defined).

> This sounds cool ... so inclusion of serverpages should be possible, correct?

Of course, this allows you to "include into one pipeline result the
result of other pipelines". It's *WAY* more than including other
serverpages :)

> Assuming this: we are talking about generator based inclusion.

No, like I said (and hopefully you can see from the above picture, if
not, let me know) this is 'pipeline based' not only generation.

> I alway felt XInclude beeing a transformer is a little unnatural.

Yes, I felt exactly the same and using component cache atomicity as our
metric clearly indicates why.

> Isn't including more generating than a transforming issue?

Totally.
 
> [snip]
> 
> > 1) the URI must be local and internal, therefor it must *NOT* contain a
> > protocol identifier: this enforces SoC by placing direct resource
> > control on the sitemap and avoid loosing aggregation information around
> > the system.
> >
> > I repeat this since it's very important: allowing the aggregation of
> > resources directly instead of passing thru the sitemap, creates the same
> > problems that the document() XPath function generates, making site
> > administration a nightmare and placing site growth saturation with
> > concern overlap.
> 
> So how would you accomplish external aggregation then?

Ask yourself this question: how can Cocoon map external resources if the
sitemap processes only requests made to Cocoon?

The response, in this context, is obvious: go get them. Which translates
into: write a generator to get them (or use one shipped with the
distribution, which is much more likely to happen).

Donald is '-1' on forcing 'cocoon include' to the internal resources,
here I try to explain why I believe it would be harmful to do it.

                                - o -

As usual, I'll use the metrics of SoC where a design can be 'judged' on
how much overlap creates between existing 'concern islands' (this is a
term I got from the 'knowledge management' world). Of course, the value
of a design is inversively proportional to the overlap that creates
between concern islands.

The concern islands map will be the usual 'cocoon pyramid of contracts'.

Let us start by assuming that cocoon implements transparent including as
specified in the picture above: it is sufficient to generate an
element/attribute with the specific namespace for Cocoon to react and
perform a subrequest and include the result removing the element.

Now, in general, two possible subrequests can be made

 1) internal (no protocol specified in the URI)

 2) external (protocol specified in the URI)

While it is *obvious* that this inclusion mechanism must be able, at the
end, to obtain resources created both internally and externally, but we
are judging (using the SoC metric) the functionality of allowing
*direct* external inclusion, instead of forcing external inclusion to go
thru an internal resource map.

To do this, we must understand what concern island generates the include
triggering instruction: it could be 

 - content island:  the trigger is placed directly into the document
(for example, a document fragment (i.e. license) that is included in
many files and maintained separately)

 - logic island:  the trigger is dynamically generated (for example, in
a portal-like application, where aggregation is statefull and
user-driven)

it must be noted at this point that the presentation island (style)
should never triggers content aggregation: this doesn't mean that, for
example, stylesheets cannot be separated in multiple files, no, this
means that style doesn't need to include pipeline results, otherwise,
SoC is broken since content generation is not thier concern.

In case of direct external inclusion, the content or logic island
overlaps with the administration island since the administration (i.e.
those who manage the sitemap, not those who manage the web server!)
cannot control directly the behavior of the included resource.

This is the problem: scalability is hurt by the fact that more contracts
need to be created if the responsibility of direct inclusion is
delegated by the administration island to the content/logic islands.

In general, the contracts between these islands should only be schemas
and internal URI spaces (and enviornment parameters for the logic
island), what is external should be 'mediated' by a central authority,
which is technologically represented by the sitemap.

So, instead of doing

 <include uri="http://www.cnn.com/news/today/headlines.rss">

you should do

 <include uri="/news/today/headlines"/>

and let the administration map this resource to the external one.

This has several benefits:

1) changes are localized, thus they spread automatically thruout the
entire site. For example, if the newsfeed is changed between cnn.com to,
say, cnet.com because the management did a specific deal with them, this
doesn't impact the rest of the system in any way... expecially
because...

2) resources can be 'adapted' in a central way. For example, the cnn.com
news schema might be semantically equivalent to the cnet.com news
schema, but might require transformation to be adapted to the schema our
system uses. 

Allowing direct external inclusion means that other islands rather than
administration must be aware of 'adaptation issues' with the external
world, and this creates unnecessary (and harmful) overlap between the
concerns.

This adaptation could also imply the 'namespacization' of the external
resource, or, even more likely, the 'XML-ization' of non-XML resources
(text feeds, email, news, SMS messages, MPEG-7 streams etc..)

So, while forcing internal inclusion seems limiting, it only imposes a
design pattern that is carefully choosen to minimize abuse, thus reduce
overlap, increase separation, therefore scalability and return of
investiment.

The fact that even XML experts in this list don't see this limitation
valuable automatically, adds even more value to the concept.

                                - o -

Continuing with Donald's concerns about the proposal:

1) nested content: the idea of using the markup nested inside the
include element as an error message is a very clever one, I love it!
Great idea! Definately +1!

2) element vs. attribute: yes, element filtering is easier than
attribute filtering in SAX2 and probably more performant. 

As a general perspective, attribute filtering is more semantically
reasonable, even if not immediate, in fact, it's much better to
understand something like

 <sidebar include:uri="sidebar"/>

than

 <include:include include:uri="sidebar"/>

but then again since the element is normally ignored even something like
this

 <sidebar>
  <include:include include:uri="sidebar"/>
 </sidebar>

is meaningful enough.

Well, I'd say we go for the element for performance reasons. Allowing
for attribute behavior in the future is piece of cake anyway.

3) inclusion loops: if we force all included resources to be internal,
then we have to pass thru the sitemap, this allows to watch over
inclusion loops by traking the inclusion path of the entering URI.

For example

 [/home] includes [/home/sidebar] 
                  [/news] 
                  [/mail/headers/new]
                  [/cvs/last-commits]

 [/news] includes [/synd/slashdot.org/] 
                  [/synd/xmlhack.com/]
                  [/sync/freshmeat.net/]
                  [/mail/headers/new]

is no problem even if there is multiple inclusion while
 
 [/home] includes [/home/recursive]

 [/home/recursive] includes [/home]

will generate an error for the second inclusion and avoid it. 

The algorithm to check this it's left to the reader as an exercise :)

It's incredible to see how such a simple namespace reaction can lead to
such an incredibly powerful publishing system.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------