forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Dixon <...@dixons.org>
Subject Re: xinclude
Date Mon, 11 Sep 2006 07:31:01 GMT
On Fri, 8 Sep 2006, Ross Gardler wrote:

> > I've thought about this a bit more.  One of the problems here is that
> > adding xi:include elements has unexpected results.
> >
> > If the DTD is extended as above, then the validator will, I think, not
> > check beyond the xi:include element, and so a document may validate
> > even though what is being XIncluded is nonsense.  I can write
> >   <p><xi:include href="rubbish.xml"/></p>
> > and validation will succeed, because the xi:include element has the
> > pattern required by the DTD even though rubbish.xml isn't XML at all
>
> Good point.
>
> > The expected behavior is that the validator recognizes that what is being
> > XIncluded is XML (as it is by default) and goes through to validate that
> > as well, silently replacing the xi:include element with whatever is
> > XIncluded.  I think that some parsers do this - perhaps only if an
> > option is set - but most don't.
>
> Does Xalan do it? This is the default parser for Forrest. A healthy

Uhm, do you mean Xerces?  From what I can see Xalan is unaware of
XIncludes.

> warning in the docs and output of the validate task may be sufficient
> for those using a different parser.

I have only skimmed the Forrest build files but Xerces must be handling
XInclusion, because after all Forrest works.  If I XInclude foo.txt in
a document, then its contents appear on the page fetched by my browser.

> > A better approach would be to process the XIncludes before validation,
> > stripping off the xlmns:xi attribute from the document element and
> > replacing xi:includes with whatever they resolve to.  This should be
> > cheaper than it might seem: unless the xmlns:xi is present, the
> > document is simply handed on to the validator untouched.
>
> I can't see an easy way of doing this as, in many cases, the included
> content is generated by Forrest. In fact, this would be a problem if the
> parser were doing the includes.

I am baffled.  How would it be a problem if the parser was doing the
XIncludes?

Some people build XML documents by writing chapters or sections separately
and then XIncluding them into one master document.  That is, the top-level
document consists of a preamble followed by a series of xi:includes.
This is quite a sensible approach in many circumstances.  But if you do
this in Forrest using a DTD modified along the lines that I have taken, it
means that you can't actually validate the document, because the
validating parser will just check that the xi:includes in the top-level
document are permitted, find that they are, and then go on, ignoring the
contents of the chapters/ sections.

To make this particular example (XIncluded sections) work, you would have
added xi:include to local.sections in the DTD.  However, anyone new to
Forrest but familiar with XInclude will expect to be able to use
xi:include in many places.

You can handle this with complex DTDs or by writing XSLT scripts to
replace the xi:includes with what they represent.  But this is perverse.
Think C: you don't change the grammar of C to explicitly recognize
#includes; you have a preprocessor that handles the inclusion and then
you parse what comes out of the preprocessor.

This is exactly how XIncludes should be handled: you make a pass that
dereferences the xi:includes, then you validate the output XML against
the DTD (one with no xi:includes in it).

--
Jim Dixon  jdd@dixons.org   tel +44 117 982 0786  mobile +44 797 373 7881

Mime
View raw message