commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: [Digester] UnknownHostException when file contains SYSTEM dtd
Date Fri, 22 Jul 2005 00:12:51 GMT
On Thu, 2005-07-21 at 14:51 -0500, Mike Miller wrote:
> After struggling with a digester problem all day, I finally found a posting that was
about 1 ½ years old that help solve my problem - but I would like to see if someone can provide
an explanation so that I can learn and understand what was happening.
> The problem:  I have several files that I am processing with the digester.   The xml
files and the dtd reside in the same directory within my web application.  The first couple
of lines of one of the files is shown below - using only a SYSTEM id. 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE root SYSTEM "ReportType.dtd">
> <root>
> ...
> When calling digester.parse() using a File object,  the call results in a "UnknownHostException
c"  where c is the windows drive where my files are located - apparently the systemId was
generated as file://c/mydir/conf/creport/reporttypes/ReportType.dtd <file:///\\c\mydir\conf\creport\reporttypes\ReportType.dtd>
 and the c is interpreted as a machine/host name.
> Changing the code to call digester.parse() with the String parameter providing the full
path of the file works.
> Looking at the Digester code, I guess this may be more of a SAX question because I can
see where the parse() methods convert their input into an InputSource, but why does the parse()
version with a File call the setSystemId()?

The reason that setSystemId is called is so that resources referenced
from the xml document (esp. the DTD file) are looked up relative to the
original file parsed. You say in your example above that the xml file
and the dtd are in "the same directory" but if we never tell the xml
parser where the xml was read from, how's it going to find the dtd? By
default, if you pass an InputSource (which just wraps a stream) to the
parser without specifying the systemid, then any relative references to
DTDs etc. are just looked up relative to the current working directory
of the application - the parser can't possibly deduce the real original
source of a stream.

Error messages generated by the parser also include the systemId of the
document: if this isn't set then the error messages can be less than

Note that InputSource.setSystemId is nothing to do with the SYSTEM value
in the xml document, other than it sets a base path that is used for
lookups if the SYSTEM value is a relative path.

The UnknownHostException "c" stuff isn't something that has been
reported before as far as I know. I am somewhat surprised you are seeing
this, as I would have thought what you are doing would be common and
therefore other people would have encountered this.

I don't use MS-Windows so can't help you with debugging but if you can
provide a patch I'll check and commit it.

BTW, which version of Digester are you using? Bugzilla#28739 has been
fixed in 1.7 which might be related to your issue.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message