cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crossley <>
Subject powerful catalog entity resolver could easily be added
Date Mon, 30 Jul 2001 05:20:36 GMT
(Sorry to approach the page limit ... it is worth explaining.)

To our local Cocoon-2.0b2 i have added a CatalogEntityResolver
onto the entityResolver hook that is provided by the xerces
parser. Cocoon can now utilise the power of OASIS Catalogs
or XML Catalogs. These provide a standards-based mechanism to
resolve Public Identifiers and System Identifiers to local
filenames or other identifiers or even to remote network
resources. So references to external DTDs, sets of character
entities such as mathematical symbols, fragments of XML
documents, complete sub-documents, non-xml data chunks (like
images), etc. can all be centrally managed and resolved locally.

The type of XML documents that we want to serve with Apache
Cocoon are already in existence in another information system.
The XML document instances have a declaration of their DTD
Document Type Definition as an external file. This external
DTD also includes entity sets such as ISOnum, ISOlat1, etc.
Also the DTD declaration has a Formal Public Identifier and
a System Identifier which points to a remote URL. These XML
instances cannot be changed.

Whether you have validation=yes or not, the parser will still
want to resolve all of the entities that are required by the
XML instance. So it will happily go across the network to get
them. It will do this every time that the document is
processed. This is obviously a needless overhead. Additionally,
if your Cocoon is an off-line server then it is broken because
it cannot retrieve the network-based resources.

As far as i know, the sitemap cannot be used to specify the
location of these resources, because this resolution of the
external entities is under control of the guts of the parser
and the XML model.

The following article eloquently describes the need for all
parsers and XML frameworks to be capable of utilising entity
resolvers. Very few do that yet (SP/nsgmls and XMetaL) while
others have hooks that are not utilised. Arbortext make their
Java code available to the public domain.

If You Can Name It, You Can Claim It!
by Norman Walsh

There are also some other links which extol entity management:

The Walsh document was very easy to follow, to hook up the
entity resolver. A handful of lines were added to the code
components/parser/ to load the catalogs and to
set the entity resolver. Excellent, Cocoon is now automatically
using the local entities and there is a speedup in processing.

i believe that this capability should be added to the core
Apache Cocoon. The code changes can be supplied, if appropriate.

regards, David Crossley

> Date: Fri, 23 Feb 2001 14:03:43 +1100
> From: David Crossley <>
> To:
> Back in July/August 2000 there was a small discussion
> on this list about XML Public Identifier resolution.
> There is a very good article by Norm Walsh explaining the
> importance of using an SGML Open Catalog (OASIS Catalog)
> for resolving Public Identifiers to local file copies
> of the relevant DTDs and other entities.
> This document also provides access to Java classes for
> implementing catalogs for entity management. The Cocoon
> administrator at each server should be able to add new
> entries to their XML catalog.
> regards, David Crossley
> ---------------------------------------
> > Date: Sat, 05 Aug 2000 20:58:50 +0200
> > Subject: [Cocoon Devel] Re: DTD PUBLIC ID resolution
> > From: Stefano Mazzocchi
> >
> > Hans Ulrich Niedermann wrote:
> > >
> > > I'd like to have a mechanism that maps some known PUBLIC IDs from the
> > > <!DOCUMENT> declaration to the corresponding local URIs (similar to
> > > SGML catalog files). This would allow one to write XML files with the
> > > "canonical" URI for the used DTDs and still use a local copy for
> > > validation and default value gathering, which increases both
> > > reliability and speed.
> > >
> > > Do you think such a mechanism makes sense?
> >
> > Sure it does, it's called "catalog" and it goes back to the old SGML
> > days.
> >
> > > Has anybody seen such a thing implemented yet?
> >
> > I'm pretty sure all good parsers implement one (I know Xerces does)
> >
> > > Where could/should such a thing be hooked into the C2 processing chain?
> >
> > If we use Xerces, we can use their API and provide the catalog
> > ourselves.... or use directly SAX EntityResolver...hmmmm, probably
> > better using SAX anyway...
> >
> > > Where and how should the configuration, i.e. the mapping from PUBLIC
> > > to SYSTEM be stored?
> >
> > Good question. I haven't thought about it (yet). Should the sitemap
> > contain the semantics to describe schema catalogs as well?
> > 
> > > I don't mean to distract you from important things but perhaps we
> > > should think about it before every API and config file spec is set
> > > into stone.
> >
> > Totally. Thanks for bringing this up.
> >
> > > I'd be willing to contribute some code if/when I can figure out how
> > > the C2 internals really are supposed to work.
> >
> > Same here :)
> >
> > Anyway, I'll dive into C2 very soon, expect tons of
> > "what-the-hell-is-this?" :)
> >
> > --
> > Stefano Mazzocchi      One must still have chaos in oneself to be
> >                           able to give birth to a dancing star.
> > <>                             Friedrich Nietzsche
> > --------------------------------------------------------------------

Please check that your question has not already been answered in the
FAQ before posting. <>

To unsubscribe, e-mail: <>
For additional commands, e-mail: <>

View raw message