commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Jakarta-commons Wiki] Update of "Digester/FAQ" by SimonKitching
Date Thu, 16 Jun 2005 06:28:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jakarta-commons Wiki" for change notification.

The following page has been changed by SimonKitching:

The comment on the change is:
Added info on entity resolvers

    digester.addCallParam("map/entry", 1, true);
+ === Why does Digester read the DTD even when validation is disabled? ===
+ A DTD can affect the meaning of a document, so an XML parser still needs to read it even
when validation is disabled.
+ Note that this is a fundamental feature of XML parsing, and nothing to do with Digester
+ For example, the DTD may define default values for xml attributes:
+ {{{
+   <!ATTLIST some-element some-attribute CDATA "some-default-value">
+ }}}
+ When the DTD is present, and the user specifies
+ {{{
+   <some-element/>
+ }}}
+ the xml parser will report that the element has an attribute "some-attribute" with value
+ But if the DTD is ignored (not read) then the element would be reported as having no attributes.
+ The DTD can also define entities that can be referenced from the document. Without the DTD,
these won't work.
+ === How can I use a local version of a DTD referenced from an xml document? ===
+ When an xml document contains {{{<!DOCTYPE rootelement PUBLIC xxxx SYSTEM yyyy>}}}
the xml parser used by
+ Digester will try to load file yyyy in order to process the DTD. As noted in the previous
FAQ entry, this
+ occurs even when validation is disabled.
+ SYSTEM is a totally non-portable identifier. Usually it is a reference to a local file that
is really only useful
+ on the same machine the document was created on. Even when it is an http reference, it is
not really wise for the
+ receiver to download the specified file each time the document needs to be parsed (if it's
accessable at all).
+ PUBLIC is a portable identifier that is essentially a key used to look up the real location
of the corresponding resource. 
+ An application receiving a document from a remote source is expected to register local copies
of the relevant document by 
+ public id, so the lookup returns a local copy. This solves the problem of passing XML documents
between host machines.
+ In order to support this, method Digester.register(String publicId, String entityURL) can
be used to specify what local file
+ (or http url) should be read instead. Digester acts as an !EntityResolver for the XML parser
it creates, and uses the registered
+ mappings in the !EntityResolver.resolveEntity method. This mapping applies to all "external
entities" referenced by the xml
+ document being parsed, not just the DTD (though xml documents don't typically use external
entities other than the DTD).
+ Note that this method takes a ''PUBLIC'' id only, not a ''SYSTEM'' id. A document that is
meant to be used across machines which
+ omits the PUBLIC identifier is broken.
+ If you do have to deal with a broken XML document that only has a SYSTEM id and no PUBLIC
id then you will need to create an
+ !EntityResolver and pass it to the Digester.setEntityResolver method. If you really do want
to ignore the DTD, you can roll your
+ own !EntityResolver class in about 10 lines; it just needs to return an empty stream. Before
doing this, however, re-read the
+ FAQ entry describing why the DTD is read even when validation is disabled.
+ An alternative to writing your own !EntityResolver is to use a real !EntityResolver such
as the one available from:
+ More information on !EntityResolver behaviour can be found here:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message