From Earl Hood <e...@earlhood.com>
Subject Re: resolver should be able to parse catalog files without needing to resolve external entities?
Date Sat, 24 Oct 2009 17:22:50 GMT
On October 23, 2009 at 17:19, someone wrote:

> Here's an example of a catalog.xml file distributed in the Debian and
> Ubuntu w3c-dtd-xhtml package,
> http://www.sfu.ca/~jdbates/tmp/debian/200910230/catalog.xml
> It starts with,
> <?xml version='1.0'?>
> <!DOCTYPE catalog PUBLIC "-//GlobalTransCorp//DTD XML Catalogs V1.0-Based 
> <Extension V1.0//EN"
>     "http://globaltranscorp.org/oasis/catalog/xml/tr9401.dtd";>
> [...]

I think they should fix it so the system identifier is set
to a pathname on the local file system.

Also, the public identifier used is not the standard public
identifier, "-//OASIS//DTD XML Catalogs V1.1//EN".  So even
if the resolver provided intrinsic recognition of
the "-//OASIS//DTD XML Catalogs V1.1//EN" identifier, it
would still be of no use in this case.

One can argue that the w3c-dtd-xhtml package has a bug in
their distribution since it provides no facility to resolve
the DTD to the local file system.  The system identifier
should be set to the pathname the catalog DTD is placed
by the w3c-dtd-xhtml installer.

> I understand comment #4,
> https://bugs.launchpad.net/ubuntu/+source/w3c-dtd-xhtml/+bug/400259/comments/4
> - to be suggesting that org.apache.xml.resolver is not following the
> encouragement of,
> http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html#s.bootstrap
> "Implementations are encouraged to provide some sort of bootstrapping
> functionality to resolve external identifiers and URIs that the
> implementation needs to load catalog entry files.

It is not a requirement:

  Conformant processors are not required to be able to perform
  resolution of those identifiers through the XML Catalog.

The word "should" is used in other text instead of "must".  Also,
the following is stated:

  Users can avoid any problems that might arise by limiting the
  external identifiers and URIs used to those that do not need
  resolution. Note that this only applies to external identifiers and
  URIs that must be resolved in order to load the catalog entry file.

> - and to be suggesting that not following this encouragement is a bug
> Is maybe my understanding wrong - or either of these suggestions wrong?

The recommendations of the Oasis document are beneficial, but
they are only recommendations, not requirements.  So the "bug"
reports are really enhancement requests.

IMO, the work-around for the problem is easy, and is directly
suggested by the Oasis document: Use system identifiers that
are resolvable without the need of a catalog.

I think the underlying technical problem of why the resolver library
does not provide intrinsic resolution of the catalog DTD is that
the library does not know where the DTD may be installed for any
system that uses the resolver.  Since other software systems include
the resolver in their distribution, the DTD itself may not even
be available.

A possible method of always knowing how to find the catalog DTD is
for the resolver to include the DTD in the resolver.jar file itself.
The resolver could register a custom (internal) resolver to the XML
parser when reading catalog files so any references to the DTD can
be resolved via a classpath resource lookup.  IMO, I'm not sure it
is worth the effort to do this when simple work-arounds exist for
the problem.

I'm sure patches are welcome if anyone wants to implement this


