forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crossley <cross...@indexgeo.com.au>
Subject rationalise storage of DTDs and other entities
Date Fri, 01 Mar 2002 07:24:44 GMT
One of the issues that Steven touched in another
thread is where to store the DTDs and other entities
so that they are readily available to all XML instance
documents. We also want to only maintain one copy
of these entities.

Sorry that it is long - i am trying to ensure that we look
at all issues early. Also, i am providing background so
that other people on forrest-dev, who are not necessarily
familiar with Cocoon, can see where we are coming from.
Parts of that background will have bearing on where we
decide to store the DTDs in Forrest.

I am using the term "entities" to refer to all external bits that
are required to build an XML instance document, i.e. its DTD,
any character entity sets that are declared by either the DTD
or the XML instance, and potential other external entities.

--------------------------------------
Background - Cocoon
----------------
Cocoon originally had its DTDs at xdocs/dtd/*.dtd
This was when Cocoon had a flat directory structure.
The DTDs were conveniently directly underneath all
xdocs/*.xml and their document type declarations used
a basic default System Identifier, e.g. dtd/document-v10.dtd

Then the Cocoon xdocs were re-organised to have
a hierarchy, i.e. sub-directories under xdocs/
That meant that the default System Identifiers needed to
start using tricks with ../../ to refer to their DTD. Messy.

Additionally there were other documents that were outside
the xdocs/ directory, e.g. changes.xml at the top-level.
These needed default System Identifiers with hard-coded
pathnames. Even more messy.

At around the same time the Entity Catalog resolver support
was added to Cocoon [1]. This allowed DTDs and entity sets
to be placed in a centralised location. The XML instance
documents could declare their Public Identifiers and the
entity resolver could ignore the default System Identifiers
and locate the relevant DTDs via their Public Identifiers.

We decided to put the DTDs and entity sets together with
other resources at webapp/resources/entities/
However, we needed to leave a copy of the DTDs at their
original location under xdocs/dtd/ as a belt-and-braces
solution while the entity resolver capability was being
developed. In this way the entity resolver could fail and
yet the parser could still fall-back to using the hard-coded
System Identifiers.

Now that the entity resolver is working for Cocoon, the
storage of DTDs could be at just one directory, probably
webapp/resources/entities/  Anyway, this issue has not
yet been raised on cocoon-dev.

Existing filesystem structure for Cocoon ...
webapp/resources/entities/*.dtd
src/documentation/xdocs/dtd/*.dtd

--------------------------------------
Background - Forrest
----------------
The CVS for xml-forrest has recently been set up and
is based on Krysalis Centipede. This in turn was based
on Cocoon, so it brought with it a similar filesystem
structure for the entities. It also brought similar duplication
due to the still-standing issue.

Meanwhile Forrest is starting to develop the next version
of the DTDs. It has them stored at a different location, together
with new OASIS Catalogs.

By the way, i verified that the catalog entity resolver of
Cocoon is working properly inside Forrest by raising the
verbosity level and tweaking the document type declaration
in index.xml and entries in the OASIS Catalog. Would someone
on Windows please verify this too? Perhaps Ken has done
so already for Centipede.

Existing filesystem structure for Forrest ...
src/resources/entities/*.dtd
src/documentation/xdocs/dtd/*.dtd
src/resources/schemas/DTD/*.dtd

--------------------------------------
Proposed storage of external entities for Forrest
---------------------------------
Here are the alternatives that i see. We may need some
discussion before we can decide.

A) under src/resources/some-dir-structure/

B) under src/documentation/xdocs/some-dir-structure/

"some-dir-structure" either has sub-directories ...
schemas/dtd/*.dtd
schemas/entities/*.pen

or it is flat ...
entities/*.dtd
entities/*.pen

By the way, the word "schemata" is actually the plural
of "schema", if good grammar matters. That is why i chose
the directory name "entities" for Cocoon - to avoid that
issue :-)

I currently lean towards A, because it should be entirely
independent of Forrest's own documentation. I also prefer
a flat structure because there are not really all that many
entities involved.

--------------------------------------
Other issues 
---------
1) Need to decide where to store the ISO*.pen character entity
sets. Cocoon has them dumped in the same directory as the
DTDs. Forrest currently has them in a separate directory.

2) Other projects, such as Centipede and Cocoon, will still want
to ship a collection of external entities and an OASIS Catalog.

3) How should default System Identifiers be expressed for
the XML instance documents of Forrest's own documentation?
3a) Use filesystem-based URIs with a relative pathname to
the actual DTD, e.g. ../dtd/document-v10.dtd
3b) Use a filename which does not yield the DTD and just
lets the entity resolver do the work, e.g. "document-v11.dtd"
3c) Use URL-based default System Identifiers,
e.g. "http://xml.apache.org/dtd/document-v11.dtd"
I prefer 3b because it is the easiest.

4) Should there be an actual DTD file located at that URL?
We need to be careful with this, because we do not want
to encourage crappy xml tools that do not use an entity
resolver. Instead, such tools are wasting bandwidth by
retrieving the DTD from xml.apache.org every time that
the XML instance is parsed. They should resolve the
request to a local copy.

5) We need to encourage any project that uses Forrest
to provide proper document type declarations in their
XML instance documents. Lead by example is best.

--------------------------------------
[1] Enitity resolution with catalogs
http://xml.apache.org/cocoon/userdocs/concepts/catalog.html

Mime
View raw message