Return-Path: Mailing-List: contact cocoon-users-help@xml.apache.org; run by ezmlm Delivered-To: mailing list cocoon-users@xml.apache.org Received: (qmail 58270 invoked from network); 25 Aug 2000 14:37:45 -0000 Received: from smtp1.libero.it (193.70.192.51) by locus.apache.org with SMTP; 25 Aug 2000 14:37:45 -0000 Received: from apache.org (151.20.72.154) by smtp1.libero.it; 25 Aug 2000 16:37:44 +0200 Message-ID: <39A6554B.B133E45F@apache.org> Date: Fri, 25 Aug 2000 13:15:23 +0200 From: Stefano Mazzocchi Organization: Apache Software Foundation X-Mailer: Mozilla 4.72 [en] (Win98; I) X-Accept-Language: en,it MIME-Version: 1.0 To: cocoon-users@xml.apache.org CC: Cocoon , Scott Boag Subject: Re: comments !! References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N Robin Green wrote: > > >Any comments on this article: > > > >http://www.oasis-open.org/cover/lie-foch.html > > > >Manpreet Singh. I've discussed the above issues with many of the XSL WG people in several different occasions and I came to the conclusion there is some truth in that: FO are "somewhat" harmful in the sense that the XSL WG choose to fight with the CSS WG for some historical reasons. > I agree that losing semantic information is bad, and that this cannot really > be legislated against. Please, try to define "semantic information"? You can't. There is no such thing as "global semantics", each semantic is associated to the context it lives in. So, there is nothing _more_ semantic in Hello World! than in Hello World! than in Hello World! than in Hello World! just a different context of interpretation. Look at XHTML: many use it as a "very simple semantic markup" if you leave out the font="" align="" and blah blah attributes that define semantics in the layout context (a.k.a. style). The Cocoon document DTD enforces this, reusing the HTML tags where appropriate (no use in creating new names for tags that work). But FO is _somewhat_ different from the other markup and gives a "sense of incoherence" with the other W3C schemas.... I spent 18 months to find out "what" is this incoherence and where it comes from and now I think I got it. It's due to several things combined: 1) wrong name: XSL stands for eXtensible Stylesheet Language.... but neither FO nor XSLT have nothing to do with style. Style is the process of adding semantics for the layout context, it doesn't have anything to do with tree transformation or defyning elements that describe that semantics. CSS is the only "true" stylesheet because it "adds information orthogonally", this is what "considered harmful" means in the article: XSLT is not orthogonal. I proposed the XSL WG to change the name of the languages to XTL - eXtensible Transformation Language FO - Formatting Objects but they still think XSL is the new DSSSL and this would mean throw away their past. Sure, they have the argument that XTL would become too complex if turned into a "general" transformation language. Well, as an argument, it's weak as anything: people are already planning to use XSLT extensively in B2B to transform one schema into another and styling done in XSLT (without the use of final CSS) is already considered a bad practice. They simply don't want to admit they've been wrong since day one in fighting CSS instead of adopting it. This leads to the second part of the problem: 2) FO cannot be styled with CSS. They made sure something like this is not possible, or, at least, not recommended in the spec, unlike SVG (a much better effort in all senses) which makes CSS the very core of its styling part, defining semantics with the graphic elements and keeping the style at the CSS level. Why can't FO do the same? Why are we _forced_ to use XSLT to "tranform" (note, not "style") something into FO? This is the key problem: tree transformation can be used for styling, but it's a bad practice. It should be avoided. So, instead of turning this into a "style war", why don't we do the right thing unknown schema -> transformation -> known schema [+ style] where "unknown" is in the context of the program that has to "consume" the schema (browser, B2B consumer, or other), "known" means a schema that is known in that context and style information is optional. A few examples of this would be docbook -> xtl -> fo + css myDTD -> xtl -> xhtml + css bixtalk -> xtl -> ebxml tableDTD -> xtl -> svg + css This would finally fix the symmetry, it would bring peace to the "style war" and finally "separate concerns" in between working groups, thus maximizing throughput. A single WG (XSL) has been responsible of - tree transformations (XSLT) - tree queries (XPath) - formatting objects (XSL) sure, when it started it simply had one concern - apply DSSSL to XML but it turned out to be something entirely different and they did a good job in separating the specs. Now they should finish the good job and finally separate the WG into more focused groups, one of each of the concerns they have.... but, hell, they recently rechartered to keep going exactly the same. Sharon Adler already asked me: "why do you care about how we work?" I don't, really, I'm only concerned about what you guys produce as byproduct of that work and what I see, expecially on the "general vision" is not what I like (I love the technology, but I don't like the ideas behind it... this indicates stuctural problems to me) [Scott, I copied you on this because I'd like to hear your comments (Scott is member of the XSL WG)] > However, I think the author of the article is missing the wider point: > Remember, it is very easy to write XML that is almost or completely > meaningless, either because it is based on a proprietary format or because > it is not designed to be easily parsed for useful meaning. The author is caught into the trap "S"-trap: XSL vs. CSS both define 'stylesheets', there is clear overlap, which is better? There is 'no' overlap whatsoever: the XSL should try to enforce this instead of keeping on fighting the "style war" and remove that damn "s" from their language names!!!! > There is a Plain English Campaign to stop the use of unnecessary jargon by > public officials, here in Britain. Perhaps, analogously, someone should > start a Semantic XML Campaign, to campaign for semantically-rich uses of XML > and semantic preservation over networks? It's not a very inspiring subject, > sure, but it's an important one from a software engineering point of view. Careful, this is something different: "unnecessary jargon" can be translated into "keep the semantics in the appropriate context"... or, more technically, don't send me a schema I can't understand, or that I can't translate into something I understand. A big vocabulary is sort-of transformation: what you call "jargon" is a schema that is not frequently used by your thinking, or you might not know entirely. Such a campaign is almost equal to the "this page is valid HTML 4.0" campaign: it maximizes visibility to use of the appropriate schema for the required context. > The other thing is market demand driving greater semantics, of course - and > I think in terms of searching at least, sites will find it very advantageous > in terms of getting targeted hits, to use richer markup in promoting their > sites electronically (I'm not thinking of spam, but search engine metatags > etc.) - and then there's B2B, of course. This is where, as we say in italian, the "donkey falls" :) (no offense intended) If you think that having more "semantic" schema will ease searching, you are not only wrong, you are missing a lot of the W3C effort. The problem of XML (and SGML as well) is the 'babel syndrome': there will be tons of schemas, sure, lots of semantic content, but how do you search it if you don't know the schema? It's turning a language into a babel of strong-typed dialects. Today, search engines know HTML and try to "estimate" euristically the semantic meaning of a particular content, to rate it in a significant way: their success is based on the "quality" of such euristics. People think: when the web will be made of XML documents, searching will be much more "semantic". Wrong! Dead wrong! Let us suppose we have such a web (which will take decades to be created, if ever): you want to reserve your summer vacation on a trip to the Java island and you want to find out if there is a travel agency that is cheaper than the one down the street. What do you search for? how do you know what markup has been used to publish the page of that java travel agency? Ok, let's guess XHTML... then what? Hmmm, the search engine accepts XPath queries, but how do you know what element has been used to markup what you're looking for? It's clearly a dead end, it won't pass my father's test, it would die out. So, let's search for the textual content first: "Java Travel Agency" with the EN language (hoping the agency has xml:lang="en" text in their pages). The result is a list of schemas (not pages!) that contain that textual reference. - programming language - geographical information - military operations - travelling then you "refine" your visit thru it. (this has been used (and patented?) recently by a company called xyzsearch or something like that) But what if the list is something like - XUIURL schema - eBUKfj schema - DDLT schema now what? you iterate thru them to find out, big deal! Sure, more semantic information means "potential" better searches. But don't "assume" we'll have them: the road is long and bumpy and a very few people seem to understand that (luckly the W3C director surely does) -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. Friedrich Nietzsche -------------------------------------------------------------------- Missed us in Orlando? Make it up with ApacheCON Europe in London! ------------------------- http://ApacheCon.Com ---------------------