commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Martin <dmar...@altoweb.com>
Subject RE: [digester] forcing a specific DTD
Date Wed, 14 Nov 2001 03:10:32 GMT
I can only speak for myself, but I would summarize it as:

1. If you can enforce that the XML file has a DOCTYPE declaration, you can
configure the Digester to use a specified DTD, based on the PUBLIC
identifier in the DOCTYPE entity. (See the 'register' method in the Digester
class.)

2. If it is possible for the XML file to omit the DOCTYPE declaration, and
you still want to validate the XML, you have a slight problem, because the
SAX standard currently doesn't provide a mechanism to force validation. (SAX
says: "if validation is 'on', _and_ the XML contains DOCTYPE, then validate
against the specified DTD"). 

One workaround is to trick the XML parser into validating, by intercepting
the XML input stream and prepending a DOCTYPE declaration to it. I do not
believe it is appropriate to implement this capability in the Digester
itself.

That said, the simplest solution to #2 is to simply not validate if the XML
file omits a doctype declaration. This can be treacherous, however, because
if a DTD defines default values, it's possible for the same XML file to be
parsed differently, depending on whether validation is used. 

-DHM

> -----Original Message-----
> From: Tal Dayan [mailto:tal@zapta.com]
> Sent: Tuesday, November 13, 2001 6:44 PM
> To: Jakarta Commons Developers List
> Subject: RE: [digester] forcing a specific DTD
> 
> 
> So, what is the concensous of the list regarding DTD 
> validation of parsed
> files of known type ?
> Is there a safe way to do it now (if so, how) ?  If not, does 
> it make sense
> to add more support
> for enforcing validation ?
> 
> Tal
> 
> > -----Original Message-----
> > From: Dave Martin [mailto:dmartin@altoweb.com]
> > Sent: Tuesday, November 13, 2001 11:53 AM
> > To: Jakarta Commons Developers List
> > Subject: RE: [digester] forcing a specific DTD
> >
> >
> > IMHO, if the input XML document declares a document type other
> > than what is
> > expected, the appropriate action should be to terminate the 
> parse due to a
> > validation failure. (I.e. "If you're not what you say you 
> are, how can I
> > trust you?")
> > The Digester's register() method, which, as Craig pointed out,
> > allows you to
> > key a specific DTD off of the DOCTYPE's public identifier, should be
> > appropriate for almost all cases. (Since if the XML cannot 
> declare the
> > location of the DTD via the SYSTEM identifier, it should at least
> > be able to
> > identify its type via the PUBLIC identifier.)
> > On the other hand, in the scenario where the XML input does not
> > contain any
> > DOCTYPE declaration, one might want to 'assume' it follows 
> a particular
> > document type, and validate it against that DTD to verify 
> that assumption.
> >
> > -DHM
> >
> > > -----Original Message-----
> > > From: Arun M. Thomas [mailto:AMammenT@yahoo.com]
> > > Sent: Tuesday, November 13, 2001 11:37 AM
> > > To: Jakarta Commons Developers List
> > > Subject: RE: [digester] forcing a specific DTD
> > >
> > >
> > > You're absolutely right, this wouldn't work without a DOCTYPE
> > > declaration.  However, it wouldn't matter what the 
> contents of that
> > > declaration were.  (The user could modify to his hearts content
> > > and still have the document validated against the same DTD.
> > >
> > > -AMT
> > >
> > > -----Original Message-----
> > > From: craigmcc@localhost [mailto:craigmcc@localhost]On 
> Behalf Of Craig
> > > R. McClanahan
> > > Sent: Tuesday, November 13, 2001 10:17 AM
> > > To: Jakarta Commons Developers List
> > > Subject: RE: [digester] forcing a specific DTD
> > >
> > >
> > > Would something like this work even in the absence of a <DOCTYPE>
> > > declaration at all in the file being parsed?  I thought that
> > > this was the
> > > only time resolveEntity() was called.
> > >
> > > Craig
> > >
> > >
> > > On Tue, 13 Nov 2001, Arun M. Thomas wrote:
> > >
> > > > Date: Tue, 13 Nov 2001 10:20:42 -0800
> > > > From: Arun M. Thomas <AMammenT@yahoo.com>
> > > > Reply-To: Jakarta Commons Developers List
> > > <commons-dev@jakarta.apache.org>
> > > > To: Jakarta Commons Developers List 
> <commons-dev@jakarta.apache.org>
> > > > Subject: RE: [digester] forcing a specific DTD
> > > >
> > > > Craig,
> > > >
> > > > Despite my previous response to TAL, it should be possible
> > > to do this
> > > > by instantiating the SAXParser with a subclass of
> > > DefaultHandler which
> > > > overrides the resolveEntity method.  I had to do 
> exactly this in the
> > > > last application on which I worked using the JAXP1.0 
> API.  In that
> > > > case, we provided a custom implementation of 
> EntityResolver which
> > > > always returned an INPUT source to the same dtd.  It
> > > appears the JAXP1.1
> > > > has hidden the EntityResolver under the DefaultHandler
> > > class, so providing
> > > > an implementation of that method which is customized to
> > > return a specific
> > > > dtd should suffice.
> > > >
> > > > In the Digester case, it means a potentially simple
> > > modification to the
> > > > resolveEntity method of Digester (which is a 
> DefaultHandler).  I've
> > > > included a diff of a quick patch below as a suggestion, and
> > > attached a
> > > copy
> > > > of the modified version of digester.
> > > >
> > > > Cheers,
> > > > -AMT
> > > >
> > > > cvs diff Digester.java (in directory
> > > >
> > > 
> C:\Dev\jakarta-commons\digester\src\java\org\apache\commons\digester\)
> > > > Index: Digester.java
> > > > 
> ===================================================================
> > > > RCS file:
> > > >
> > > /home/cvspublic/jakarta-commons/digester/src/java/org/apache/c
> > ommons/digeste
> > > > r/Digester.java,v
> > > > retrieving revision 1.23
> > > > diff -r1.23 Digester.java
> > > > 161c161,171
> > > > <
> > > > ---
> > > > > 		/**
> > > > > 		 * Works only in a JAXP1.1 world, but allows
> > > the user to supply a
> > > fixed
> > > > URL against
> > > > > 		 * which all documents will be validated.  The
> > > supplied parameter will
> > > > be used by
> > > > > 		 * the {@link #resolveEntity(String,
> > String)} method.
> > > > > 		 */
> > > > > 		public Digester(String fixedDTDUrl) {
> > > > > 			super();
> > > > >
> > > > > 			this.fixedDTDUrl = fixedDTDUrl;
> > > > > 		}
> > > > >
> > > > 313a324,328
> > > > > 		/**
> > > > > 		 * URL which may be supplied against which all
> > > documents should be
> > > > > 		 * validated regardless of the public and
> > > system identifiers.
> > > > > 		 */
> > > > > 		private String fixedDTDUrl = null;
> > > > 1042a1058,1060
> > > > > 	if (fixedDTDUrl != null)
> > > > > 		dtdURL = fixedDTDUrl;
> > > > >
> > > >
> > > > -----Original Message-----
> > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On
> > > Behalf Of Craig
> > > > R. McClanahan
> > > > Sent: Tuesday, November 13, 2001 9:30 AM
> > > > To: Jakarta Commons Developers List
> > > > Subject: Re: [digester] forcing a specific DTD
> > > >
> > > > ...
> > > >
> > > > This sounds like it might be an interesting idea, but I
> > > don't know how to
> > > > implement it :-(.  Digester uses a SAX parser via the 
> JAXP/1.1 APIs
> > > > underneath the covers.  How do you tell the parser to use
> > > an arbitrary DTD
> > > > instead of whatever is specified in the document being parsed?
> > > >
> > > > Craig
> > > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:commons-dev-help@jakarta.apache.org>
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:commons-dev-help@jakarta.apache.org>
> > >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:commons-dev-help@jakarta.apache.org>
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:commons-dev-help@jakarta.apache.org>
> 

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message