commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig R. McClanahan" <craig...@apache.org>
Subject RE: [digester] forcing a specific DTD
Date Tue, 13 Nov 2001 19:17:05 GMT
See below.

On Tue, 13 Nov 2001, Dave Martin wrote:

> Date: Tue, 13 Nov 2001 10:46:27 -0800
> From: Dave Martin <dmartin@altoweb.com>
> Reply-To: Jakarta Commons Developers List <commons-dev@jakarta.apache.org>
> To: Jakarta Commons Developers List <commons-dev@jakarta.apache.org>
> Subject: RE: [digester] forcing a specific DTD
>
>
>
> > -----Original Message-----
> > From: Craig R. McClanahan [mailto:craigmcc@apache.org]
> > Sent: Tuesday, November 13, 2001 9:30 AM
> > To: Jakarta Commons Developers List
> > Subject: Re: [digester] forcing a specific DTD
> >
> >
> > On Tue, 13 Nov 2001, Tal Dayan wrote:
> >
> > > Date: Tue, 13 Nov 2001 09:04:43 -0800
> > > From: Tal Dayan <tal@zapta.com>
> > > Reply-To: Jakarta Commons Developers List
> > <commons-dev@jakarta.apache.org>
> > > To: commons-dev@jakarta.apache.org
> > > Subject: [digester] forcing a specific DTD
> > >
> > >
> > > We plan to use Digester for parsing XML based configuration
> > files and would
> > > like
> > > to use a DTD to save some validation code. From the
> > Digester documentation
> > > it seems that the DTD based validation works as follows,
> > the user specifies
> > > an arbitrary DTD, and digester.parse() makes sure the XML
> > document matches
> > > the user's specified DTD.
> > >
> > > We have two problems with this approach:
> > >
> > > 1. When we parse a file, we know what DTD it should conform
> > to so there is
> > > not need require the user to type it (e.g. when Tomcat
> > reads server.xml, the
> > > user does not care about the DTD, it should be up to Tomcat
> > to do the magic
> > > of validating it).
> > >
> > > 2. The user can specify arbitrary DTD but we want to
> > validate against a
> > > *specific* DTD. It does not help us much if we know that
> > the XML conforms to
> > > some arbitrary DTD the user specified (this reminds me an
> > old joke, a man is
> > > asked by the bank manager to identify himself, he pull a
> > picture of from his
> > > pocket and says 'that's me').
> > >
> > > Any idea how to address these issues ?
> > >
> > > A good solution would be example a
> > digester.forceSpecificDTD(...) method
> > > that forces a specific DTD for that instance of the parser.
> > This makes sense
> > > since the digester is configured anyway (via the set of
> > rules) to parse a
> > > specific type of XML document.
> > >
> > > Thanks,
> > >
> > > Tal
> > >
> >
> > This sounds like it might be an interesting idea, but I don't
> > know how to
> > implement it :-(.  Digester uses a SAX parser via the JAXP/1.1 APIs
> > underneath the covers.  How do you tell the parser to use an
> > arbitrary DTD
> > instead of whatever is specified in the document being parsed?
> >
> > Craig
> >
> >
>
> If you made the assumption that the XML document contains a DOCTYPE
> definition, then you could register an EntityResolver that simply returns an
> InputSource on the 'forced' DTD. This won't work if the XML document has no
> DOCTYPE definition, however, since the EntityResolver will not get invoked
> in this case.
> In practice, it's usually appropriate to mandate that if the incoming XML
> wants to be validated, it must declare its document type. For the odd case
> where this isn't possible, I don't know of any elegant solution. The
> following ugly hack might work, although I haven't ever had to use it:
>
>   1. Implement an EntityResolver that returns the 'forced' DTD the _first_
> time it is invoked, and returns null all subsequent times (so that any
> subsequent external entities get handled in the standard way).
>   2. Always pass the XML input stream through an adapter that detects if the
> beginning of the document contains a DOCTYPE declaration, and artificially
> 'inserts' one if it doesn't.
>
> One could argue that step #1 is bad practice, since if an XML document
> actually declares a different document type from the one you want to
> validate against, that's a pretty strong indicator that you've got the wrong
> format! So maybe #2 alone would be the more appropriate solution.
>
> Of course, I'm not suggesting adding this capability to the Digester code.
>

Digester can already do something better than Option #1, via the
register()  method.  The only restrictions are:
* The input document must contain a <DOCTYPE> element for the DTD.
* The public identifier in the DOCTYPE must match the first argument
  to the register() method.
* Digester will look up the alternate DTD by using the URL
  passed as the second argument to the register() method.

The primary purpose for this is to support validated parsing even when you
are disconnected from the Internet.  For example, Tomcat 4 (the current
development branch) uses Digester to parse the web application deployment
descriptor (web.xml).  It registers internal copies of the DTDs (one for
2.2 and one for 2.3) so that you can run without access to the
java.sun.com site that is referenced in the system identifiers.

> -DHM

Craig


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message