lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <>
Subject Re: Registering a local dtd file for use with Digester
Date Thu, 22 Feb 2007 17:38:03 GMT
Hi Mike,

> I have a collection of XML files that I would like to parse using Digester
> in order to index them for Lucene. A DTD file has been supplied for the XML
> files, but none of those files has a <!DOCTYPE ...> line associating them
> with the DTD file. Can the Digester's register function be used to tell it
> to use that DTD file for such things as entity resolution? If so, how do I
> do it? I don't understand how to specify a pathname for a local file in
> terms of a publicId and an entityURL. If register can't be used for this
> purpose, is there another way to do it? Thanks.

Your issue will almost certainly be better addressed in a Digester forum
- your problem has nothing to do with Lucene.

A hint: it looks like you can create a Digester instance with an
externally created SAX parser[1], on which you can set the entity
resolver to an extended DefaultHandler2[2] (Java 1.5) which overrides
the getExternalSubset() method (specified by the EntityResolver2
interface[3]) to return an InputSource containing your desired DTD.

Something like (warning - untested; stolen in part from the Digester

  SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
  parser.getXMLReader().setEntityResolver(new DefaultHandler2() {
    getExternalSubset(String name, String baseURI) {
      return new InputSource(/* put your DTD here */);
  Digester digester = new Digester(parser);
  // add digester rules here
  parser.parse(/* put your input document here */);

Hope it helps,

[1] Digester FAQ (instantiating Digester with an external SAX parser):

[2] DefaultHandler2 (enables external DTD resolution with no DOCTYPE in
the XML document):

[3] EntityResolver2 (implemented by DefaultHandler2):

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message