From Ross Gardler <>
Subject Re: xinclude
Date Mon, 11 Sep 2006 09:00:22 GMT
Jim Dixon wrote:
> On Fri, 8 Sep 2006, Ross Gardler wrote:
>>>I've thought about this a bit more.  One of the problems here is that
>>>adding xi:include elements has unexpected results.
>>>If the DTD is extended as above, then the validator will, I think, not
>>>check beyond the xi:include element, and so a document may validate
>>>even though what is being XIncluded is nonsense.  I can write
>>>  <p><xi:include href="rubbish.xml"/></p>
>>>and validation will succeed, because the xi:include element has the
>>>pattern required by the DTD even though rubbish.xml isn't XML at all
>>Good point.
>>>The expected behavior is that the validator recognizes that what is being
>>>XIncluded is XML (as it is by default) and goes through to validate that
>>>as well, silently replacing the xi:include element with whatever is
>>>XIncluded.  I think that some parsers do this - perhaps only if an
>>>option is set - but most don't.
>>Does Xalan do it? This is the default parser for Forrest. A healthy
> Uhm, do you mean Xerces?  From what I can see Xalan is unaware of
> XIncludes.

Yes, I often get Xerces and Xalan names mixed up, sorry.

>>>A better approach would be to process the XIncludes before validation,
>>>stripping off the xlmns:xi attribute from the document element and
>>>replacing xi:includes with whatever they resolve to.  This should be
>>>cheaper than it might seem: unless the xmlns:xi is present, the
>>>document is simply handed on to the validator untouched.
>>I can't see an easy way of doing this as, in many cases, the included
>>content is generated by Forrest. In fact, this would be a problem if the
>>parser were doing the includes.
> I am baffled.  How would it be a problem if the parser was doing the
> XIncludes?

David points out in another message that the validate-xdocs is done 
prior to Forrest doing any transformations on content, it only validates 
the *source* documents.

This means that if a source document XIncludes another source document 
that is available statically on disk/network, as in your use case, then 
the above will work OK.

However, if a source document includes source content that is 
dynamically generated, for example, pulled from a database/RSS Feed/Jira 
instance etc. then we would have to fire up Forrest to generate these 
sources. If we are validating source documents before we fire up Forrest 
we end up in a catch 22.

One solution would be to fire up a running instance of Forrest (aka 
forrest run) and have Xerces validate the xincludes by retrieving them 
from the running instance of Forrest. But this really is clumsy and I 
would guess non-trivial.

My point is, any solution that is created to better support the first 
use case (including static content) must also work in the second use 
case (including dynamic content).


