xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <j...@socialchange.net.au>
Subject [ANN] DoctypeChanger: pre-parse DOCTYPE manipulation
Date Mon, 19 Nov 2001 09:29:57 GMT

I've written up a Java utility, called DoctypeChanger, that I hope could
be useful to many people who exchange XML documents. I've mentioned it
on jakarta-commons and briefly on general@xml, but I thought I should
introduce it properly:

Probably the first "XML interoperability" issue that many users
encounter is when they are on the receiving end of XML with a DOCTYPE
declaration. Assuming one wants to validate, there are a number of
situations in which your parser will barf:

 - You are offline, or otherwise cannot retrieve the specified DTD.
 - The DOCTYPE declaration's SYSTEM id may be relative to someone else's
   system ("./dtds/foo.dtd").
 - The DOCTYPE declaration's PUBLIC or SYSTEM id might be just plain wrong.
 - If the incoming XML doesn't have a DOCTYPE declaration, there is no
   way to force the parser to validate against a local DTD [1].

In short, the categories are "incorrect", "inaccessible", "non-existent"
and "correct".

By writing a custom EntityHandler or using an entity catalog, one can
deal with "incorrect" and "inaccessible". The remaining case,
"non-existent", is AFAIK, unsolvable with mainstream techniques.

Simon St.Laurent wrote a Java stream filter to solve this, which
replaces or adds DOCTYPE declarations on the fly [1]. I have since
generalized and extended it, so that one can now add, modify, replace
and conditionally replace it (based on the old value).

The documentation (including background, rationale, examples) is
available at:


And the code can be downloaded here:


It's under the Mozilla Public License 1.1, for historical reasons (it's
APL-compatible, right?).

I hope people find this useful :) Feedback very welcome.


[1] http://www.simonstl.com/projects/doctypes/

In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

View raw message