Return-Path: Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm Delivered-To: mailing list cocoon-dev@xml.apache.org Received: (qmail 13557 invoked from network); 21 Feb 2001 15:09:01 -0000 Received: from pop.systemy.it (194.20.140.28) by h31.sny.collab.net with SMTP; 21 Feb 2001 15:09:01 -0000 Received: from apache.org (pv29-pri.systemy.it [194.21.255.29]) by pop.systemy.it (8.8.8/8.8.3) with ESMTP id QAA03016 for ; Wed, 21 Feb 2001 16:08:54 +0100 Message-ID: <3A93D7F5.A6273761@apache.org> Date: Wed, 21 Feb 2001 16:00:05 +0100 From: Stefano Mazzocchi Organization: Apache Software Foundation X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: it,en MIME-Version: 1.0 To: cocoon-dev@xml.apache.org Subject: Re: [c2] Cocoon XInclude References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N Torsten Curdt wrote: > > [snip] > > Cocoon XInclude > > =============== > > > > Cocoon requires a way to specify content aggregating behavior. > > Oh, yes!!:) > > > This is defined by making possible for a generated page to trigger a > > Cocoon internal subrequest and substitute the triggering content with > > the content generated from the internal subrequest. > > So a generator can trigger another generator via this subrequest. More or less, yeah, but not directly. You do something like this /resource [ G ---> T -*-> T ---> T ] ---> S | [/resource/include] | + [ <--- T <--- G ] where '*' is the transparent including mechanism that reacts on the "cocoon include" namespace (element or attribute reaction is yet to be defined). > This sounds cool ... so inclusion of serverpages should be possible, correct? Of course, this allows you to "include into one pipeline result the result of other pipelines". It's *WAY* more than including other serverpages :) > Assuming this: we are talking about generator based inclusion. No, like I said (and hopefully you can see from the above picture, if not, let me know) this is 'pipeline based' not only generation. > I alway felt XInclude beeing a transformer is a little unnatural. Yes, I felt exactly the same and using component cache atomicity as our metric clearly indicates why. > Isn't including more generating than a transforming issue? Totally. > [snip] > > > 1) the URI must be local and internal, therefor it must *NOT* contain a > > protocol identifier: this enforces SoC by placing direct resource > > control on the sitemap and avoid loosing aggregation information around > > the system. > > > > I repeat this since it's very important: allowing the aggregation of > > resources directly instead of passing thru the sitemap, creates the same > > problems that the document() XPath function generates, making site > > administration a nightmare and placing site growth saturation with > > concern overlap. > > So how would you accomplish external aggregation then? Ask yourself this question: how can Cocoon map external resources if the sitemap processes only requests made to Cocoon? The response, in this context, is obvious: go get them. Which translates into: write a generator to get them (or use one shipped with the distribution, which is much more likely to happen). Donald is '-1' on forcing 'cocoon include' to the internal resources, here I try to explain why I believe it would be harmful to do it. - o - As usual, I'll use the metrics of SoC where a design can be 'judged' on how much overlap creates between existing 'concern islands' (this is a term I got from the 'knowledge management' world). Of course, the value of a design is inversively proportional to the overlap that creates between concern islands. The concern islands map will be the usual 'cocoon pyramid of contracts'. Let us start by assuming that cocoon implements transparent including as specified in the picture above: it is sufficient to generate an element/attribute with the specific namespace for Cocoon to react and perform a subrequest and include the result removing the element. Now, in general, two possible subrequests can be made 1) internal (no protocol specified in the URI) 2) external (protocol specified in the URI) While it is *obvious* that this inclusion mechanism must be able, at the end, to obtain resources created both internally and externally, but we are judging (using the SoC metric) the functionality of allowing *direct* external inclusion, instead of forcing external inclusion to go thru an internal resource map. To do this, we must understand what concern island generates the include triggering instruction: it could be - content island: the trigger is placed directly into the document (for example, a document fragment (i.e. license) that is included in many files and maintained separately) - logic island: the trigger is dynamically generated (for example, in a portal-like application, where aggregation is statefull and user-driven) it must be noted at this point that the presentation island (style) should never triggers content aggregation: this doesn't mean that, for example, stylesheets cannot be separated in multiple files, no, this means that style doesn't need to include pipeline results, otherwise, SoC is broken since content generation is not thier concern. In case of direct external inclusion, the content or logic island overlaps with the administration island since the administration (i.e. those who manage the sitemap, not those who manage the web server!) cannot control directly the behavior of the included resource. This is the problem: scalability is hurt by the fact that more contracts need to be created if the responsibility of direct inclusion is delegated by the administration island to the content/logic islands. In general, the contracts between these islands should only be schemas and internal URI spaces (and enviornment parameters for the logic island), what is external should be 'mediated' by a central authority, which is technologically represented by the sitemap. So, instead of doing you should do and let the administration map this resource to the external one. This has several benefits: 1) changes are localized, thus they spread automatically thruout the entire site. For example, if the newsfeed is changed between cnn.com to, say, cnet.com because the management did a specific deal with them, this doesn't impact the rest of the system in any way... expecially because... 2) resources can be 'adapted' in a central way. For example, the cnn.com news schema might be semantically equivalent to the cnet.com news schema, but might require transformation to be adapted to the schema our system uses. Allowing direct external inclusion means that other islands rather than administration must be aware of 'adaptation issues' with the external world, and this creates unnecessary (and harmful) overlap between the concerns. This adaptation could also imply the 'namespacization' of the external resource, or, even more likely, the 'XML-ization' of non-XML resources (text feeds, email, news, SMS messages, MPEG-7 streams etc..) So, while forcing internal inclusion seems limiting, it only imposes a design pattern that is carefully choosen to minimize abuse, thus reduce overlap, increase separation, therefore scalability and return of investiment. The fact that even XML experts in this list don't see this limitation valuable automatically, adds even more value to the concept. - o - Continuing with Donald's concerns about the proposal: 1) nested content: the idea of using the markup nested inside the include element as an error message is a very clever one, I love it! Great idea! Definately +1! 2) element vs. attribute: yes, element filtering is easier than attribute filtering in SAX2 and probably more performant. As a general perspective, attribute filtering is more semantically reasonable, even if not immediate, in fact, it's much better to understand something like than but then again since the element is normally ignored even something like this is meaningful enough. Well, I'd say we go for the element for performance reasons. Allowing for attribute behavior in the future is piece of cake anyway. 3) inclusion loops: if we force all included resources to be internal, then we have to pass thru the sitemap, this allows to watch over inclusion loops by traking the inclusion path of the entering URI. For example [/home] includes [/home/sidebar] [/news] [/mail/headers/new] [/cvs/last-commits] [/news] includes [/synd/slashdot.org/] [/synd/xmlhack.com/] [/sync/freshmeat.net/] [/mail/headers/new] is no problem even if there is multiple inclusion while [/home] includes [/home/recursive] [/home/recursive] includes [/home] will generate an error for the second inclusion and avoid it. The algorithm to check this it's left to the reader as an exercise :) It's incredible to see how such a simple namespace reaction can lead to such an incredibly powerful publishing system. -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. Friedrich Nietzsche --------------------------------------------------------------------