xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brett McLaughlin <bmcla...@algx.net>
Subject Re: A mathematical vision of XML leads to interesting conclusions
Date Sat, 18 Dec 1999 15:16:42 GMT

Stefano Mazzocchi wrote:
> People,
> it's no secret that I think one of biggest design mistakes in XML was
> keeping it SGML compatible, but it's easy to say this now and I think I
> would have probably made the same mistake.
> The XML spec is almost entirely dedicated to the DTD definition and,
> unfortunately, the DTD patterns are old and don't keep up well with the
> XML world. Mostly, the problem is lack of namespace orthogonality, but
> hopefully the XSchema effort will patch DTD problems. Anyway, I still
> think that XML, XML-namespaces and XSchema are not orthogonal specs and
> they should be merged into a wider XML spec... keeping syntax
> compatible, of course, but allowing a more solid namespace driven
> validation.

Keep in mind also (and I'm agreeing here) that even though XML Schema
goes a long way, XML Schema itself conforms to a DTD; even though you
only worry about writing a schema now, there is still the flat, limited
structure underlying the whole situation.  In keeping with your math
analogies (which are appropriate), using a third- or fourth- form
derivitive can be really useful, but the fact remains that you still are
dealing with the original equation.  As long as there is no "unified"
specificaiton as you mention, everything is still goverened by the XML
1.0 spec, which is not ideal.

> If you think about it, the namespace idea is the real key to XML
> success: namespaces are "versors" of an "infinite-dimension" solution
> space. Topologically speaking, while SGML is an single infinite
> dimension, XML is an infinite set of infinite dimensions. Mathematically
> speaking, you can create a one to one relationship with all the points
> of XML to SGML (like it's done considering, for example,
> "xsl:stylesheet" like a one dimensional SGML element, rather than the
> "stylesheet" element of the "xsl" namespace). This may lead to imply
> that SGML and XML have the same "multidimentional" volume. Thus,
> namespaces don't alter the topological tissue of XML (which remains flat
> and one-dimentional), but simply adds "classes" of elements to allow
> elements to share the same name, but have different meanings.

If anyone isn't clear on this, explain to someone who is XML illiterate
why every element reference in something like a DTD or Schema must
include the complete namespace prefix plus the element name
(<namespace:element>).  Then field all the questions about why you can't
do something like:

  <element name="element">
    <archetype contains="empty"/>

and so on.  It will give you an amazing understanding of how flawed XML
that uses namespaces really is in terms of its multidimensional value,
and how little namespaces really change things (today).

> The importance of thinking about XML as a multidimensional language is
> incredible since you inherit all the geometric patterns we are well used
> to: projection, orthogonality, clipping, translation, rotation can all
> have meanings applied in the XML world. Meanings that, like any powerful
> design pattern, express much more in a word that in a thousand pages.
> So, continuing in the topological equivalence, XPointer define geometric
> points in the XML space and XLink define translation vectors, arcs or
> vector sets, from one point (the XPoint where the XLink is present) to
> one, more or a chain on points.
> But there are two important differences between XML and a mathematical
> vision:
>  - XSLT is not a rotation (unlike math transforms): this because
> information is created and lost during the transformation. This implies
> that there cannot be such thing as an anti-XSLT.

This is the key problem, IMHO.  Without a set formulaic approaches
(instead, you have this concept of a tree/rtf and nodesets that come
accross and are mutated) to transformation, there is a real problem. 
This is also why I think we still see a significant speed hit in
transformations.  It is a fairly common recognition that representing
mathemetical transformations and performing these transformations
algorithmically is _extremely_ fast because using matrix (linear)
mathemetics and the properties of inverses allows for tremendous
shortcuts to occur.  But matrices inherently are orthogonal, the point
which Stefano is making about XML (not being).

>  - XSLT mixes patterns from both transformations, generic functions and
> convolution, but not allowing to go further down with the mathematical
> equivalence.
> But one thing should be noted: there is a lot in common between the
> convolution pattern and the OOP inheritance pattern.
> While there is a proposed XInclude specification that aims to unify the
> need for inclusion of external things, there is very little concern
> about the application of more general object-oriented design patterns,
> such as inheritance.
> Donald and I both came to the conclusion that such inheritance facility
> would allow great simplification of the use of XML as data container,
> also allowing easier data-binding with OO languages.
> Let's make an example:
>  <page xml:extends="template.xml">
>   <title>Hello world!</title>
>   <body>
>    <p>This is to say hello!<p>
>    <p><strong>Hello!</strong></p>
>   </body>
>  </page>
> where "template.xml" is
>  <page>
>   <author>John Smith</author>
>   <body>Yet to be written!</body>
>   <legal>Copyright (c) Foo Inc.</legal>
>  </page>
> and after the parsing you get
>  <page>
>   <title>Hello world!</title>
>   <author>John Smith</author>
>   <body>
>    <p>This is to say hello!<p>
>    <p><strong>Hello!</strong></p>
>   </body>
>   <legal>Copyright (c) Foo Inc.</legal>
>  </page>
> which is _very_ handy, expecially for XML web publishing.
> True, the above can be done thru XSLT transformations, but it's much
> more complex and Donald and I both think XSLT may fail to cover all the
> cases for useful inheritance.

OK.  Do you have any "formal" spec written up?  There are several
important items that come to mind:

- I see how you allow elements to be overriden.  What about adding to? 
For example:

  <author>John Smith</author>
  <body>Yet to be written</body>
  <legal>Copyright (c) Foo Inc.</legal>

I want to have my result look like:

  <author>John Smith</author>
  <author>Brett McLaughlin</author>
  <legal>Copyright (c) Foo Inc.</legal>

How could I do this?  If you are wondering , the importance of this is
an incremental bulding of a page.  What I am slowly wandering to is the
biggest problem we have today using XML in Java is that DOM is a hog,
and although SAX2 looks like it will allow us to move away from an
in-memory model, we still need XML techniques to incrementally add to an
XML document.  With a template system, this gets more complex, because
the additions may sometimes be additions, sometimes replacements.  So
the ability to take a template, possibly already parsed or in memory,
and not just replace elements/nodesets but add to them, and how to
represent the difference in XML, becomes important.

> This is why we ask for your comments on such thing, hopefully to be
> included in the XInclude effort or in another W3C note, but think that
> inheritance should be a fundamental feature of the XML model.

I think you want this separate from XInclude.  What you are trying to do
is really push towards XML 2.0 (in my mind) because you are moving XML
towards more of a data representation than a markup language.  If that
doesn't make sense to someone, consider that one of XML's biggest
problems IMHO is that at its core, it is simply a metadata framework. 
XML without namespaces, DTDs/XML Schema, XSL, XSLT, is almost useless,
because another application (including a client) can't do anything
useful with that data.  It isn't constrained, so what structure is it
in?  It can't be transformed, because I don't know how?
However, XML 1.0 makes that framework syntax available without setting
up a unification between it and a framework multiverse
<slipping-into-math/>.  In other words, there should be a coherent set
of formulae that can be applied to any well-formed XML document, without
needing any other XML constructs (namespace, schema, etc), and result in
another XML document.  These are the things Stefano is getting at,
because without these defined behaviors, applications have all this
overhead (using DTD, Schema, XSL) to perform the simplest of tasks (like
inheritance - today we have to use XSL/T). If these formulae are defined
(in XML 2.0) then there would be mathematical-esque set behavior that
could dictate how to handle inheritance _without_ having to resort to
XSL/T.  Remember in college when you learned how to add matrices

[A] + [B] = [C]  

This could be done.  But how do you do:

[XML_1] + [XML_2] = ???

This needs to be defined.  +1 Stefano, let me know if you need help.

> Awaiting for your comments and sorry for the non-math people around :)

No, it was the right approach.


> --
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <stefano@apache.org>                             Friedrich Nietzsche

View raw message