commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert burrell donkin <robertburrelldon...@blueyonder.co.uk>
Subject Re: [digester] can't resolve relative entities ?
Date Mon, 29 Mar 2004 19:52:44 GMT
On 29 Mar 2004, at 18:52, Craig McClanahan wrote:
> Paul Libbrecht wrote:
>
>> I think Digester.parse(java.io.File) should do it for me, or?
>> (this method does build an input-source with correct URL, btw)
>> There's even, in the maven code, efforts towards making this an 
>> absolute path.
>>
> In theory it should ... but if it doesn't, you can easily construct a 
> URL for a file and use the technique I described.
>
>> But the problem remains: if you look at the code of Digester.java, 
>> there's nothing that keeps the URL of the file! And the call to the 
>> method configure() is without any parameter!
>>
> But that's a feature, not a bug :-).  No code in Digester is 
> necessary, because it's all handled by the SAX parser underneath.
>
>> I do think, contrary to what Robert claims, that XML-compliance 
>> requires relative-system-id-entities to be resolved completely as 
>> long as we have a URL.
>>
> Correct relative entity resolution also requires users to correctly 
> utilize what the JAXP APIs provide.  If you don't provide an absolute 
> URL for the document being parsed, relative URL references will fail.  
> If you do provide an absolute URL, entity references will work in a 
> manner totally transparent to Digester, because this is a feature 
> built in to the underlying SAX based parser.

'../whatever.dtd' is not an url. XML parsers can therefore reject it 
and still be specification compliant. (the url should be something like 
'file:../whatever.dtd'.) digester makes an attempt to resolve the url 
in the standard java way which is more than the xml specification 
requires in this case.

but paul has highlighted an area where the digester could be improved: 
in the resolution of relative file urls. digester resolves these using 
the standard java system. this system can (in many common situations) 
conflict with the system outlined in the xml specification (which 
should be relative to the document).

the SAX specification says 'If the system identifier is an URL. the SAX 
parser must resolve it fully before reporting it to the application.'. 
from a search, the exact meaning of this phrase seems to be in doubt. 
i'd hope that '../whatever.dtd' should be passed to the EntityResolver 
as an absolute file URL but this behaviour quite possibly isn't present 
in many common parsers.

but SAX does give an option that digester doesn't really exploit at the 
moment: returning null. this should force the SAX parser to resolve the 
system identity in it's standard way. i'd say that this should 
definitely be an option in this particular circumstance.

i'd suggest creating a test for bad URLs (probably something like those 
that don't contain a ':' in the substring starting at zero-based 
position 2). for bad urls, digester tries to resolve them using the 
standard java process. if this fails, then digester returns null 
leaving the parser to cope with the problem.

i think that this should ensure that situations where a good URL is 
specified (such as the cases craig outlined earlier) digester would 
work as at present. in those situations where the URL is not so well 
specified then this change should give the behaviour expected by users 
- that of the parser they are using.

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message