commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Martin <dmar...@altoweb.com>
Subject RE: [digester] forcing a specific DTD
Date Tue, 13 Nov 2001 18:46:27 GMT


> -----Original Message-----
> From: Craig R. McClanahan [mailto:craigmcc@apache.org]
> Sent: Tuesday, November 13, 2001 9:30 AM
> To: Jakarta Commons Developers List
> Subject: Re: [digester] forcing a specific DTD
> 
> 
> On Tue, 13 Nov 2001, Tal Dayan wrote:
> 
> > Date: Tue, 13 Nov 2001 09:04:43 -0800
> > From: Tal Dayan <tal@zapta.com>
> > Reply-To: Jakarta Commons Developers List 
> <commons-dev@jakarta.apache.org>
> > To: commons-dev@jakarta.apache.org
> > Subject: [digester] forcing a specific DTD
> >
> >
> > We plan to use Digester for parsing XML based configuration 
> files and would
> > like
> > to use a DTD to save some validation code. From the 
> Digester documentation
> > it seems that the DTD based validation works as follows, 
> the user specifies
> > an arbitrary DTD, and digester.parse() makes sure the XML 
> document matches
> > the user's specified DTD.
> >
> > We have two problems with this approach:
> >
> > 1. When we parse a file, we know what DTD it should conform 
> to so there is
> > not need require the user to type it (e.g. when Tomcat 
> reads server.xml, the
> > user does not care about the DTD, it should be up to Tomcat 
> to do the magic
> > of validating it).
> >
> > 2. The user can specify arbitrary DTD but we want to 
> validate against a
> > *specific* DTD. It does not help us much if we know that 
> the XML conforms to
> > some arbitrary DTD the user specified (this reminds me an 
> old joke, a man is
> > asked by the bank manager to identify himself, he pull a 
> picture of from his
> > pocket and says 'that's me').
> >
> > Any idea how to address these issues ?
> >
> > A good solution would be example a 
> digester.forceSpecificDTD(...) method
> > that forces a specific DTD for that instance of the parser. 
> This makes sense
> > since the digester is configured anyway (via the set of 
> rules) to parse a
> > specific type of XML document.
> >
> > Thanks,
> >
> > Tal
> >
> 
> This sounds like it might be an interesting idea, but I don't 
> know how to
> implement it :-(.  Digester uses a SAX parser via the JAXP/1.1 APIs
> underneath the covers.  How do you tell the parser to use an 
> arbitrary DTD
> instead of whatever is specified in the document being parsed?
> 
> Craig
> 
> 

If you made the assumption that the XML document contains a DOCTYPE
definition, then you could register an EntityResolver that simply returns an
InputSource on the 'forced' DTD. This won't work if the XML document has no
DOCTYPE definition, however, since the EntityResolver will not get invoked
in this case. 
In practice, it's usually appropriate to mandate that if the incoming XML
wants to be validated, it must declare its document type. For the odd case
where this isn't possible, I don't know of any elegant solution. The
following ugly hack might work, although I haven't ever had to use it:

  1. Implement an EntityResolver that returns the 'forced' DTD the _first_
time it is invoked, and returns null all subsequent times (so that any
subsequent external entities get handled in the standard way).
  2. Always pass the XML input stream through an adapter that detects if the
beginning of the document contains a DOCTYPE declaration, and artificially
'inserts' one if it doesn't.

One could argue that step #1 is bad practice, since if an XML document
actually declares a different document type from the one you want to
validate against, that's a pretty strong indicator that you've got the wrong
format! So maybe #2 alone would be the more appropriate solution.

Of course, I'm not suggesting adding this capability to the Digester code.

-DHM

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message