lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: Registering a local dtd file for use with Digester
Date Thu, 22 Feb 2007 17:38:03 GMT
Hi Mike,

> I have a collection of XML files that I would like to parse using Digester
> in order to index them for Lucene. A DTD file has been supplied for the XML
> files, but none of those files has a <!DOCTYPE ...> line associating them
> with the DTD file. Can the Digester's register function be used to tell it
> to use that DTD file for such things as entity resolution? If so, how do I
> do it? I don't understand how to specify a pathname for a local file in
> terms of a publicId and an entityURL. If register can't be used for this
> purpose, is there another way to do it? Thanks.

Your issue will almost certainly be better addressed in a Digester forum
- your problem has nothing to do with Lucene.

A hint: it looks like you can create a Digester instance with an
externally created SAX parser[1], on which you can set the entity
resolver to an extended DefaultHandler2[2] (Java 1.5) which overrides
the getExternalSubset() method (specified by the EntityResolver2
interface[3]) to return an InputSource containing your desired DTD.

Something like (warning - untested; stolen in part from the Digester
FAQ[1]):

  SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
  parser.getXMLReader().setEntityResolver(new DefaultHandler2() {
    getExternalSubset(String name, String baseURI) {
      return new InputSource(/* put your DTD here */);
    }
  });
  Digester digester = new Digester(parser);
  // add digester rules here
  parser.setContentHandler(digester);
  parser.parse(/* put your input document here */);

Hope it helps,
Steve

[1] Digester FAQ (instantiating Digester with an external SAX parser):
<http://wiki.apache.org/jakarta-commons/Digester/FAQ#head-8ac8fa70e2db185845fadec56785cd53eab8d3f9>

[2] DefaultHandler2 (enables external DTD resolution with no DOCTYPE in
the XML document):
<http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ext/DefaultHandler2.html>

[3] EntityResolver2 (implemented by DefaultHandler2):
<http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ext/EntityResolver2.html>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message