xml-commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craeg K Strong <cstr...@arielpartners.com>
Subject Re: Question/enhancement request for Norman Walsh's resolver 1.1 library
Date Tue, 07 May 2002 03:49:17 GMT
Norman Walsh wrote:
> / Craeg K Strong <cstrong@arielpartners.com> was heard to say:
> |  > Does anyone know how I might go about getting in touch with the
> |  > xml-commons maintainers/owners?   Thanks!
> That would be me. I have some refactoring in mind too. Please let me know
> what you have in mind.
>                                         Be seeing you,
>                                           norm

Indeed.  My original charter was to integrate the xml-commons resolver
facilities into the new ant 1.5 <xmlcatalog> datatype.

This has, however, become a larger and more ambitious effort than I first thought...

Bare-bones Ant currently supports PUBLIC and URI entries.   Of course,
it uses its own syntax unrelated to any of the standards :-(
Here is an example:

       publicId = "-//ArielPartners//DTD XML Article V1.0//EN"
       location = "com/arielpartners/knowledgebase/dtd/article.dtd"/>
        publicId = "LogoImage"
        location = "com/arielpartners/images/ariel-logo-large.gif"/>

In standard-speak, <dtd> roughly corresponds to PUBLIC.
<entity> rougly corresponds to URI.

They implemented a lookup scheme that does the following:

a) consider BASE to be the ant file basedir property (this is a Project global)
b) lookup an entry in the filesystem, prepending the basedir if necessary
     (the location attribute may specify a relative or absolute pathname)
c) failing that, lookup an entry in the classpath using a ClassLoader--
    WITHOUT prepending the basedir.  The most famous example where this
    is needed is j2ee.jar -- if you load it up in an emacs buffer,
    you will see many DTD files in there!
d) failing that, lookup an entry in URL space by attempting to construct
    a URL object from the location attribute.
e) failing that, give up.

NOTE:  step (c) requires that we maintain entries _sans BASE_.
This is a problem if we are going to use the resolver library, since
CatalogEntries are automatically made absolute according to this
code snippet starting at line 864 of Catalog.java.addEntry()

    } else if (type == PUBLIC) {
       String publicid = PublicId.normalize(entry.getEntryArg(0));
       String systemid = makeAbsolute(normalizeURI(entry.getEntryArg(1)));

       entry.setEntryArg(0, publicid);
       entry.setEntryArg(1, systemid);

       Debug.message(4, "PUBLIC", publicid, systemid);



OK, if you've read this far.. :-) here's what I would like to do.

1) if resolver.jar is not found on the ant classpath, change NOTHING.
<dtd> and <entity> sub-elements are supported as before.  Additional
sub-elements will be ignored with a warning sent to the BuildLogger.
This is essential, b/c 90% of ant users probably do not need resolvers.
In this way, we avoid classpath bloat :-)

2) If resolver.jar IS found on the classpath, add support for an additional
<catalogfiles> sub-element.  <catalogfiles> specifies a FileSet which
is a set of external catalogs, in either XML or original format.

    <catalogfiles dir="/my/catalogs" includes="**/catalog, **/catalog.xml"/>
       publicId = "-//ArielPartners//DTD XML Article V1.0//EN"
       location = "com/arielpartners/knowledgebase/dtd/article.dtd"/>

This recursively includes all files called "catalog" or "catalog.xml" in the
filesystem starting at /my/catalogs.

3) Here is the tricky part.   Add support for classpath lookup for those
entries WITHIN EXTERNAL CATALOGS.    This requires several changes and

a) I need to override org.apache.xml.resolver.Catalog.addEntry() to
somehow grab the uri attribute _before_ the BASE gets prepended to make
it absolute.  One way to do this is to have it call back to
org.apache.ant.types.XMLCatalog and add it to my Vector of entries.
Another way would be to refactor org.apache.xml.resolver.Catalog to
separately maintain URI and BASE entries and append them only when needed.

b) I need to add a hook to org.apache.xml.resolver.tools.CatalogResolver
to attempt to resolve resources in the classpath before giving up.
The classpath resolution should happen AFTER the local filesystem lookup
but BEFORE looking up in URL space.   In order to do this
I had to override both:
public InputSource resolveEntity (String publicId, String systemId)
public Source resolve(String href, String base)

If instead these methods' lookup algorithms were factored out into
separate calls for each step a la the Gang of Four "Template Method" pattern,
I could override one or another, or substitute my own algorithm skeleton
which called most of your methods.  This would result in way less
code duplication :-)


 >whew!<     Since I am already knee deep in this stuff, I would
be happy to help out in any way I can -- submitting patches or whatnot.

Your Thoughts?


View raw message