commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig McClanahan <>
Subject Re: [digester] can't resolve relative entities ?
Date Mon, 29 Mar 2004 04:17:04 GMT
robert burrell donkin wrote:

> hi paul
> On 27 Mar 2004, at 00:31, Paul Libbrecht wrote:
>> Dear Digester-Gurus...
>> While trying really much to resolve the possible responsability of a 
>> buggy dom4j in errors to resolve entities in maven project parsing, I 
>> finally realize that Digester may be the reason.
>> We start with a guess: Digester.parse(File) is weird (around lines 
>> 1527...): it doesn't store, at all, the reference to the file but 
>> still offers himself as EntityResolver. How can it resolve an entity 
>> if it doesn't know the path ??
> in many ways, digester builds a more user-friendly interface on top of 
> SAX. the usual philosophy is to offer easy, out-of-the-box support for 
> the most common use cases and then offer access to SAX for those who 
> need more sophisticated solutions.
> entity resolution is a good example of this. digester offers simple 
> support for common use cases by offering itself as the default entity 
> resolving. digester maintains a simple map of publicIDs to URLs and a 
> method for users (and digester) to register them.
> though this is better than the default adopted by most parsers, this 
> approach has many limitations. the standard advice for users who need 
> more sophisticated support is register a separate EntityResolver. (the 
> business of creating and maintaining DTD catalog programs is best left 
> to specialist components.)
>> The pathology appears very while building taglibs of jelly: the 
>> project.xml of each taglibs, extends ../taglib-project.xml which 
>> itself should reference, by means of DTD-internal-subset 
>> ../commonDeps.ent.
>> As this is buggy, the current jelly CVS contains a copy of 
>> commonDeps.ent.
> i've taken a fresh look at the specs and i think that buggy is 
> probably too strong a word. '../taglib-project.xml' is not an URI and 
> so parsers can legitimately refuse to resolve it but most common 
> parsers interpret this as a path relative to the file (i think, please 
> correct me if i'm wrong).
> i've taken a look at the digester source and it's probable that 
> digester does not allow parsers to apply this feature since it will 
> always interpret a system id as an URI and then rely on java to find 
> it. it should be possible to alter the entity resolution code so that 
> (when the URI is a relative file url) the java relative path is tested 
> first and null returned if the file does not exist allowing the parser 
> to use it's default resolution code which (i think) should find the 
> file relative to the file path.
> does this sound like it would fix the jelly problems?
> and can anyone else see any problems with this approach?
One important ingredient in using relative references for entity 
resolution is to use the appropriate Digester.parse() method.  If you 
use the one that takes an InputStream, as an example, there is no way 
for the SAX parser or Digester to know what the absolute URL of that 
resource is, and therefore no way to resolve relative references.  On 
the other hand, if you use the entry point that takes a URL, or a 
(properly formatted) InputSource, then you are providing enough 
information for the parser to resolve relative references without doing 
anything else at all.

As an example of this approach, this is a (slightly simplified) version 
of the logic that Struts uses to construct an InputSource for parsing 
struts-config.xml files:

    String path = "/WEB-INF/struts-config.xml";
    InputStream stream = getServletContext().getResourceAsStream(path);
    URL url = getServletContext().getResource(path);
    InputSource source = new InputSource(url.toExternalForm());

In this way, relative entity references in the struts-config.xml 
document get resolve to other XML documents in the "/WEB-INF" directory 
of my webapp, with no extra muss or fuss.  The same approach will work 
for non-webapp based applications as well, as long as you always 
configure the InputSource with an absolute URL.

> - robert

> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message