commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Iantosca, Jonathan" <>
Subject RE: Digester & doctype declaration
Date Mon, 29 Dec 2003 15:23:48 GMT

Thanks for all the info...greatly appreciated!

I ended up invoking Digester.parse with a String, and everything works fine.

Thanks again,

-----Original Message-----
From: Simon Kitching []
Sent: Sunday, December 28, 2003 5:28 PM
To: Jakarta Commons Users List
Subject: Re: Digester & doctype declaration

On Mon, 2003-12-29 at 08:10, Paul Libbrecht wrote:
> On 28-Dec-03, at 19:21 Uhr, Craig R. McClanahan wrote:
> > Quoting "Iantosca, Jonathan" <>:
> >
> >> Hello,
> >>
> >> I'm trying to digest an xml file with the following doctype 
> >> declaration.
> >>
> >> <!DOCTYPE adaptor SYSTEM "woadaptor.dtd">
> >>
> >> I keep getting a when this declaration 
> >> is in
> >> the xml document. As soon as I remove it, the digester has no 
> >> problems.
> >> Also, before parsing, I'm calling the Digester's setValidating method,
> >> passing in false.
> >>
> >> Any Thoughts?
> >>
> >> -Jon
> >>
> >
> > In order for the XML parser to be able to resolve this relative URL
> > ("woadaptor.dtd"), it has to know the URL of the document (that 
> > contains this
> > line) that you are actually parsing.  In turn, that means you need to 
> > use one
> > of the Digester.parse() methods that provides this information -- 
> > either a
> > File, an InputSource, or a String.  Don't use the one that takes an
> > InputStream.
> >
> > Craig
> Dare I add to this that validation being turned off does not mean the 
> DTD will not be loaded. DTDs provide, among others, default values for 
> attributes hence need to be read at every parsing.

Hey, might as well add my $0.02 worth too :-)

If the doctype declaration specifies standalone="yes" then that tells
the xml parser that there *aren't* any default values or other stuff in
the DTD that would affect parsing, so in that case the xml parser is
allowed to skip loading the DTD. I'm not sure if xerces skips DTDs when
standalone is set...

Alternatively, some parsers (xerces at least) has a parser-specific
feature to prevent loading of any external files (see documentation on
custom parser features). The Digester#setFeature method passes its
parameters down to the underlying parser, or you can create & configure
the parser instance yourself rather than let Digester create one. Of
course you'd need to be sure what concrete parser was going to be used
in order to take advantage of this.

And finally you can write your own EntityResolver to customise how DTDs
(and other external entities referenced from an XML document) are
located during parsing. See Digester#setEntityResolver, or create &
configure the parser instance yourself. This is probably the most
portable/flexible way to handle doctype declarations in your input
files. A quick and ugly hack is to always return an empty stream when
asked to locate a DTD. Of course if the DTD *does* declare default
values for attributes, etc., then the result of parsing won't be



To unsubscribe, e-mail:
For additional commands, e-mail:

This message, including any attachments, contains confidential information intended for a
specific individual and purpose, and is protected by law.  If you are not the intended recipient,
please contact sender immediately by reply e-mail and destroy all copies.  You are hereby
notified that any disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message