commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <si...@ecnetwork.co.nz>
Subject Re: [Digester] How do I get Digester to ignore the <!DOCTYPE> tag
Date Wed, 14 Jul 2004 23:52:27 GMT
On Thu, 2004-07-15 at 09:26, Craig McClanahan wrote:
> Paolo Valladolid wrote:
> 
> >I need to use Digester to parse XML that has been retrieved from a
> >database.  The XML I'm working with was received from elsewhere (ie. Not
> >created by our team).  How do I get Digester to ignore the <!DOCTYPE>
> >tag?  I've tried setValidating( false ) and it did not work.
> >
> >  
> >
> The setValidating(false) call does indeed tell Digester to not validate 
> the XML data.  However, it does *not* tell the underlying XML parser to 
> skip the DOCTYPE, and there is no API in JAXP to say that sort of thing.
> 
> If your problem is unresolved entities, one thing you can do is to 
> provide your own EntityResolver method whose resolveEntity() method 
> always returns null.  That way, the parser won't go traipsing around the 
> network trying to find things that it can't.

Hi Paolo,

I'm presuming the problem is that you have a DOCTYPE like this:
 <!DOCTYPE public "http://www.acme.com/mydtd.dtd">
and want to suppress loading of the referenced document, or have a DTD
which declares <!ENTITY ....> and want to suppress loading of the
entity. 

In other words, you don't want to ignore the DOCTYPE, you want to
suppress loading of external entities.

Craig's suggestion of writing an EntityResolver will work, but he has
made a minor mistake: if you return *null* from the entity resolver
class, then the parser will apply its normal resolving rules, including
retrieving the entity (eg DTD) from the specified URL.

This is explicitly stated in the javadoc for the
org.xml.sax.EntityResolver class.

In order to ignore remote entities, you can instead get your
EntityResolver to return an InputSource that wraps an empty InputStream.

Note, however, that this can change the *meaning* of your xml document.
For example, if the DTD defines an implied value for an attribute, then
ignoring the DTD will result in the attribute not getting its expected
value.

In general, it is better to ensure you have a local copy of the DTD,
then use an EntityResolver to return the local DTD rather than returning
an empty string. Still, if you *know* that the DTD doesn't have this
sort of stuff in it, returning an InputSource which wraps an empty
stream will work ok.

If you happen to know that the underlying xml parser is Xerces then you
can use the setFeature method to disable loading of DTDs. However this
is parser-specific. See the xerces documentation on "features" for more
info.

By the way, this is nothing to do with the Digester; it is related to
JAXP parsing in general. So you may be better off asking this on a list
for xml parsing & JAXP.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message