xml-commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [ANN] XInclude processor for xml-commons
Date Tue, 06 May 2003 22:03:09 GMT
on 5/6/03 7:59 AM Arnaud Le Hors wrote:

> To end, I'll add a note on the debate over the "infoset-messing validation".
> Here again there is no universal solution. 


> If all you are dealing with is
> XML documents - this is text -, validation merely means checking that your
> document validates against a set of constraints. You couldn't care less
> about any infoset augmentations in an environment where everything is text.
> On the other hand, if what you are dealing with is XML data - this is
> objects serialized in XML -, validation is the way you reconcile serialized
> objects with their real type. Infoset augmentations are then fundamental and
> can hardly be considered as messy... 

Wait a second.

I agree with this vision, but I wouldn't call it "infoset augmentation"
but rather "infoset normalization". "augmenting" means to "add
information". This is what I personally consider bad because it alters
the infoset (read: "information set") of the XML stream.

just like DOM Node.normalize() changes the tree but doesn't change the
information included in the tree, your 'normalization' of types changes
the tree but doesn't change the information it includes.

Why am I so picky on this? consider caching: when the infoset is
normalized, the information contained is not changed, just morphed. So,
if an XML stream wasn't valid before validation, it's not valid after.
but if it was valid before validation, it is valid after.

This means: normalizing the infoset doesn't influence the ergodic period
of that infoset.

But if I have "infoset augmentation" (say, external entity evaluation),
this cannot be considered true anymore.

This is why DTD are and must be considered harmful in a heavily
cache-based environment like Cocoon.

yes, there are way to work around this, but none are as elegant as
separating concerns between infoset-normalizing pipeline stages and
infoset augmenting pipeline stages.

> So, beware of over simplistic characterizations... :-)

I hope this helps outlining my point on this.


View raw message