forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Noels" <stev...@outerthought.org>
Subject RE: rationalise storage of DTDs and other entities
Date Thu, 28 Feb 2002 11:39:27 GMT
David Crossley wrote:

> Sorry that it is long - i am trying to ensure that we look
> at all issues early. Also, i am providing background so
> that other people on forrest-dev, who are not necessarily
> familiar with Cocoon, can see where we are coming from.
> Parts of that background will have bearing on where we
> decide to store the DTDs in Forrest.

It sums it up very nicely, so the length was appropriate :-)

> I am using the term "entities" to refer to all external bits that
> are required to build an XML instance document, i.e. its DTD,
> any character entity sets that are declared by either the DTD
> or the XML instance, and potential other external entities.
>

<snip>cocoon-history</snip>

>
> Now that the entity resolver is working for Cocoon, the
> storage of DTDs could be at just one directory, probably
> webapp/resources/entities/  Anyway, this issue has not
> yet been raised on cocoon-dev.

We should test and raise if necesarry - I would stick all Forrest
related stuff in the same webapp, unless we are planning to make the
DTD's available across HTTP (as Sun is doing for their server.xml and
the like).

> Existing filesystem structure for Cocoon ...
> webapp/resources/entities/*.dtd
> src/documentation/xdocs/dtd/*.dtd
>
> --------------------------------------
> Background - Forrest
> ----------------
> The CVS for xml-forrest has recently been set up and
> is based on Krysalis Centipede. This in turn was based
> on Cocoon, so it brought with it a similar filesystem
> structure for the entities. It also brought similar duplication
> due to the still-standing issue.
>
> Meanwhile Forrest is starting to develop the next version
> of the DTDs. It has them stored at a different location, together
> with new OASIS Catalogs.
>
> By the way, i verified that the catalog entity resolver of
> Cocoon is working properly inside Forrest by raising the
> verbosity level and tweaking the document type declaration
> in index.xml and entries in the OASIS Catalog. Would someone
> on Windows please verify this too? Perhaps Ken has done
> so already for Centipede.

We should make use of the entity resolver a default, which means
cleaning up the docs inside current CVS, and provide some template docs
for each doctype.

> Existing filesystem structure for Forrest ...
> src/resources/entities/*.dtd
> src/documentation/xdocs/dtd/*.dtd
> src/resources/schemas/DTD/*.dtd
>
> --------------------------------------
> Proposed storage of external entities for Forrest
> ---------------------------------
> Here are the alternatives that i see. We may need some
> discussion before we can decide.
>
> A) under src/resources/some-dir-structure/

I would go for A)

> B) under src/documentation/xdocs/some-dir-structure/
>
> "some-dir-structure" either has sub-directories ...
> schemas/dtd/*.dtd
> schemas/entities/*.pen
>
> or it is flat ...
> entities/*.dtd
> entities/*.pen

not flat:

src/resources/schema/dtd
src/resources/schema/entities
src/resources/schema/relax

> By the way, the word "schemata" is actually the plural
> of "schema", if good grammar matters. That is why i chose
> the directory name "entities" for Cocoon - to avoid that
> issue :-)

My collaegue told me once that using plurals for directory names is
making explicit what is already implied: directories are 'made' to
contain multiple items of some kind, so the plural is superficial. Oh
well... ;-)

> I currently lean towards A, because it should be entirely
> independent of Forrest's own documentation. I also prefer
> a flat structure because there are not really all that many
> entities involved.

hm - flat is good for directories containing the same type of
'entities', which is not the case anymore - i would prefer some
structure.

>
> --------------------------------------
> Other issues
> ---------
> 1) Need to decide where to store the ISO*.pen character entity
> sets. Cocoon has them dumped in the same directory as the
> DTDs. Forrest currently has them in a separate directory.

Should stay like that.

I don't like these entities anyhow: documents should be using proper
Unicode encoding, and eventually character references instead of these
remains from the SGML-era. We should avoid entities like the plague:
http://www.textuality.com/xml/xmlSW.html makes no reference to entities
(or even DTD's anymore), and
http://www.xml.com/pub/a/2002/02/20/deviant.html.

> 2) Other projects, such as Centipede and Cocoon, will still want
> to ship a collection of external entities and an OASIS Catalog.
>
> 3) How should default System Identifiers be expressed for
> the XML instance documents of Forrest's own documentation?

"filename.dtd" unless we are planning to make them public as you
describe underneath.

> 3a) Use filesystem-based URIs with a relative pathname to
> the actual DTD, e.g. ../dtd/document-v10.dtd
> 3b) Use a filename which does not yield the DTD and just
> lets the entity resolver do the work, e.g. "document-v11.dtd"
> 3c) Use URL-based default System Identifiers,
> e.g. "http://xml.apache.org/dtd/document-v11.dtd"
> I prefer 3b because it is the easiest.

3b: enforce usage of the entity resolver

> 4) Should there be an actual DTD file located at that URL?
> We need to be careful with this, because we do not want
> to encourage crappy xml tools that do not use an entity
> resolver. Instead, such tools are wasting bandwidth by
> retrieving the DTD from xml.apache.org every time that
> the XML instance is parsed. They should resolve the
> request to a local copy.
>
> 5) We need to encourage any project that uses Forrest
> to provide proper document type declarations in their
> XML instance documents. Lead by example is best.

yes indeed

I still am quite unclear on what the difference will be between the CVS
structure and the 'unit of deployment', but am lacking some time to
properly investigate.

The main idea to commit Centipede as-is was to give us some flying
start: it should be easier now to edit and remove, instead of endless
tinkering on the mailing list. So feel free everybody to attack what we
have in CVS now ;-)

</Steven>


Mime
View raw message