xml-commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman Walsh <...@nwalsh.com>
Subject Re: Question/enhancement request for Norman Walsh's resolver 1.1 library
Date Tue, 14 May 2002 14:01:38 GMT
/ Craeg K Strong <cstrong@arielpartners.com> was heard to say:
| <xmlcatalog>
|      <dtd
|        publicId = "-//ArielPartners//DTD XML Article V1.0//EN"
|        location = "com/arielpartners/knowledgebase/dtd/article.dtd"/>
|      <entity
|         publicId = "LogoImage"
|         location = "com/arielpartners/images/ariel-logo-large.gif"/>
| </xmlcatalog>
|
| In standard-speak, <dtd> roughly corresponds to PUBLIC.
| <entity> rougly corresponds to URI.

Well, I'd be inclined to say that entity corresponds to "system"
rather than "uri" since entities are identified by external
identifiers in XML. But it should really be the combination of system
and public identifiers that's used, not just public.

| They implemented a lookup scheme that does the following:
|
| a) consider BASE to be the ant file basedir property (this is a Project global)
| b) lookup an entry in the filesystem, prepending the basedir if necessary
|      (the location attribute may specify a relative or absolute pathname)
| c) failing that, lookup an entry in the classpath using a ClassLoader--
|     WITHOUT prepending the basedir.  The most famous example where this
|     is needed is j2ee.jar -- if you load it up in an emacs buffer,
|     you will see many DTD files in there!
| d) failing that, lookup an entry in URL space by attempting to construct
|     a URL object from the location attribute.
| e) failing that, give up.
|
| NOTE:  step (c) requires that we maintain entries _sans BASE_.
| This is a problem if we are going to use the resolver library, since
| CatalogEntries are automatically made absolute according to this
| code snippet starting at line 864 of Catalog.java.addEntry()

Right, but that's the semantic of catalogs; relative URIs in a catalog
are relative to the catalog file location. It would be wrong to change
that in the catalog code.

It looks to me like you need to insert catalog resolution after step
d. And then the fact that the catalog URIs are absolute shouldn't
cause any backwards incompatibility.

| OK, if you've read this far.. :-) here's what I would like to do.
|
| 1) if resolver.jar is not found on the ant classpath, change NOTHING.
| <dtd> and <entity> sub-elements are supported as before.  Additional
| sub-elements will be ignored with a warning sent to the BuildLogger.
| This is essential, b/c 90% of ant users probably do not need resolvers.
| In this way, we avoid classpath bloat :-)

Ok.

| 2) If resolver.jar IS found on the classpath, add support for an additional
| <catalogfiles> sub-element.  <catalogfiles> specifies a FileSet which
| is a set of external catalogs, in either XML or original format.
| Example:
|
| <xmlcatalog>
|     <catalogfiles dir="/my/catalogs" includes="**/catalog, **/catalog.xml"/>
|     <dtd
|        publicId = "-//ArielPartners//DTD XML Article V1.0//EN"
|        location = "com/arielpartners/knowledgebase/dtd/article.dtd"/>
| </xmlcatalog>
|
| This recursively includes all files called "catalog" or "catalog.xml" in the
| filesystem starting at /my/catalogs.

Fine. But why have you put the <dtd> entry inside the <xmlcatalog> entry?
It seems to me that those are unrelated.

| 3) Here is the tricky part.   Add support for classpath lookup for those
| entries WITHIN EXTERNAL CATALOGS.    This requires several changes and
| refactorings:
|
| a) I need to override org.apache.xml.resolver.Catalog.addEntry() to
| somehow grab the uri attribute _before_ the BASE gets prepended to make
| it absolute.  One way to do this is to have it call back to
| org.apache.ant.types.XMLCatalog and add it to my Vector of entries.
| Another way would be to refactor org.apache.xml.resolver.Catalog to
| separately maintain URI and BASE entries and append them only when needed.

Well, like I said before, my inclination is to leave these absolute.
If you end up doing catalog resolution, you should use catalog
semantics.

| b) I need to add a hook to org.apache.xml.resolver.tools.CatalogResolver
| to attempt to resolve resources in the classpath before giving up.

I'm confused. What resolution order are you thinking of? I would have
thought you'd look in the classpath first, before calling the catalog
resolver.

| The classpath resolution should happen AFTER the local filesystem lookup
| but BEFORE looking up in URL space.   In order to do this
| I had to override both:
| public InputSource resolveEntity (String publicId, String systemId)
| and
| public Source resolve(String href, String base)
|
| If instead these methods' lookup algorithms were factored out into
| separate calls for each step a la the Gang of Four "Template Method" pattern,
| I could override one or another, or substitute my own algorithm skeleton
| which called most of your methods.  This would result in way less
| code duplication :-)

Yes, that would probably be ok. But I'd like to understand what you
have in mind a little better first.

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@Sun.COM   | Happiness is a how, not a what; a talent, not
XML Standards Engineer | an object.--Herman Hesse
XML Technology Center  | 
Sun Microsystems, Inc. | 

Mime
View raw message