forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <>
Subject [RT] Entities in XML docs
Date Fri, 27 Dec 2002 13:43:32 GMT

Stylebook has a nice feature whereby a project can create a file,
entities.ent, containing XML entity definitions for use in project XML
files.  Here is a sample from Xalan's entities.ent:

<?xml encoding="US-ASCII"?>

<!ENTITY xslt "Xalan">
<!ENTITY xslt4j "Xalan-Java">
<!ENTITY xslt4j2 "Xalan-Java 2">
<!ENTITY xslt4j-dist "xalan-j_2_4_D1">
<!ENTITY xslt4j-dist-bin "&xslt4j-dist;-bin">
<!ENTITY xslt4j-dist-src "&xslt4j-dist;-src">
<!ENTITY xslt4j-current "&xslt4j; version 2.4.D1">
<!ENTITY xslt4j-distdir "">
<!ENTITY xml4j "Xerces-Java">
<!ENTITY xml4j1 "Xerces-Java 1">
<!ENTITY xml4j2 "Xerces-Java 2">
<!ENTITY xml4j-used "&xml4j; 2.0.1">
<!ENTITY xml4j-jar "xercesImpl.jar">
<!ENTITY xslt4c "Xalan-C++">
<!ENTITY xml4c "Xerces-C++">
<!ENTITY download "The &xslt4j-current; download from includes &xml4j-jar;
from &xml4j-used; and xml-apis.jar. For version
information about the contents of xml-apis.jar, see the JAR manifest.">

<!ENTITY xsltcwhatsnewhead '<li><link anchor="xsltc">XSLTC</link></li>'>


This entities.ent file is automatically included in the book.dtd, through this

<!ENTITY % externalEntity SYSTEM "sbk:/sources/entities.ent">

Reusing snippets of content like this seems a pretty nice feature.  In Forrest,
we have a couple of options to get the same effect:

1) Emulate the Stylebook solution in document-v11.dtd:

<!ENTITY % externalEntity SYSTEM "context://entities.ent">

Currently, this just results in an 'unknown protocol: context' error.
Which is odd, because I thought the XML parser would have an
EntityResolver set that understands Cocoon protocols.  Or is this just
wishful thinking?

The problem with this general approach is that XML docs can no longer be
validated outside Cocoon, eg from a catalog-aware editor.  IMHO that
makes this approach unacceptable.

2) Tell users to do it themselves.  Each XML file would have something like:

<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"document-v11.dtd" [
<!ENTITY % local-ents SYSTEM "entities.ent">


Simple, effective, and doesn't lock users into using only Forrest.  Only problem
is, it assumes rather more XML knowledge than I'd expect most doc editors would
have.  I think this should be our default solution, unless something better
comes up..

3) Avoid XML entities altogether.

3.1) Use XInclude.  Eg, given an entities.xml file:

  <entity id="xml4j">Xerces-Java</entity>
  <entity id="xml4j1">Xerces-Java 1</entity>
  <entity id="xml4j2">Xerces-Java 2</entity>

  <entity id="xslt4j-current">
    <xi:include href="#xslt4j"/> version 2.4.D1
  <entity id="download">
      The <xi:include href="#xslt4j-current"/> download includes ...

to include an entity, we'd use:

<xi:include href="../entities.xml#download"/>

With a SimpleMappingMetaModule we can simplify that to 

<xi:include href="res:download"/>

This method has the limitation that values cannot be included halfway inside an
attribute.  Eg, we couldn't have

<s1 title="The <xi:include href="#xml4j"/> project">

Another disadvantage is that it imposes XInclude (and namespaces) on docs.  We
currently have a DTD based architecture that can't really handle namespaces.

It is also a PITA having to modify the DTD to support xi:include.  Do we define
it as an inline or block-level element?  We really need both.  Then when users
want to use Docbook, they must first hack the DTD to allow xi:include.

3.2) We implement a SearchReplaceTransformer, which replaces ${variables} with
values.  Eg, entities.xml:

  <xml4j1>Xerces-Java 1</xml4j1>
  <xml4j2>Xerces-Java 2</xml4j2>

    ${xslt4j} version 2.4.D1

      The ${xslt4j-current} download includes ...

This seems a lot more intuitive than XInclude, and doesn't require modifying
DTDs.  We could go all the way and use one of the expression languages in
Jakarta Commons, like jexl[1].

Are there any more options I haven't thought of?

My current preference is to go with 3.2, and implement it with InputModules, the
same way LinkRewriterTransformer works.  Using XInclude would involve less
coding, but the DTD problems would be too horrible..




View raw message