commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tal Dayan" <...@zapta.com>
Subject RE: [digester] forcing a specific DTD
Date Wed, 14 Nov 2001 04:14:50 GMT
Let see if I get your suggestions right, there are two error cases:

1. There is no DOCTYPE at all. Since there will be no validation, we want to
abort the parsing. How can we currently detect this case ?

2. The document has DOCTYPE but the public id or URL are incorrect. Does the
current version of Digester provides a way to detect this case ?

Simply parsing the document without validation (and using the correct DTD)
is not an option since we don't want to write explicitly the entire
validation code, so at least we need to detect the error condition (that is,
no validation or validation using a wrong DTD) and abort the parsing.

This looks like a very general XML parsing issue so it will be great if
Digester will come out of the box with an easy way to address it.

Tal


> -----Original Message-----
> From: Dave Martin [mailto:dmartin@altoweb.com]
> Sent: Tuesday, November 13, 2001 7:11 PM
> To: Jakarta Commons Developers List
> Subject: RE: [digester] forcing a specific DTD
>
>
> I can only speak for myself, but I would summarize it as:
>
> 1. If you can enforce that the XML file has a DOCTYPE declaration, you can
> configure the Digester to use a specified DTD, based on the PUBLIC
> identifier in the DOCTYPE entity. (See the 'register' method in
> the Digester
> class.)
>
> 2. If it is possible for the XML file to omit the DOCTYPE declaration, and
> you still want to validate the XML, you have a slight problem, because the
> SAX standard currently doesn't provide a mechanism to force
> validation. (SAX
> says: "if validation is 'on', _and_ the XML contains DOCTYPE,
> then validate
> against the specified DTD").
>
> One workaround is to trick the XML parser into validating, by intercepting
> the XML input stream and prepending a DOCTYPE declaration to it. I do not
> believe it is appropriate to implement this capability in the Digester
> itself.
>
> That said, the simplest solution to #2 is to simply not validate
> if the XML
> file omits a doctype declaration. This can be treacherous,
> however, because
> if a DTD defines default values, it's possible for the same XML file to be
> parsed differently, depending on whether validation is used.
>
> -DHM
>
> > -----Original Message-----
> > From: Tal Dayan [mailto:tal@zapta.com]
> > Sent: Tuesday, November 13, 2001 6:44 PM
> > To: Jakarta Commons Developers List
> > Subject: RE: [digester] forcing a specific DTD
> >
> >
> > So, what is the concensous of the list regarding DTD
> > validation of parsed
> > files of known type ?
> > Is there a safe way to do it now (if so, how) ?  If not, does
> > it make sense
> > to add more support
> > for enforcing validation ?
> >
> > Tal
> >
> > > -----Original Message-----
> > > From: Dave Martin [mailto:dmartin@altoweb.com]
> > > Sent: Tuesday, November 13, 2001 11:53 AM
> > > To: Jakarta Commons Developers List
> > > Subject: RE: [digester] forcing a specific DTD
> > >
> > >
> > > IMHO, if the input XML document declares a document type other
> > > than what is
> > > expected, the appropriate action should be to terminate the
> > parse due to a
> > > validation failure. (I.e. "If you're not what you say you
> > are, how can I
> > > trust you?")
> > > The Digester's register() method, which, as Craig pointed out,
> > > allows you to
> > > key a specific DTD off of the DOCTYPE's public identifier, should be
> > > appropriate for almost all cases. (Since if the XML cannot
> > declare the
> > > location of the DTD via the SYSTEM identifier, it should at least
> > > be able to
> > > identify its type via the PUBLIC identifier.)
> > > On the other hand, in the scenario where the XML input does not
> > > contain any
> > > DOCTYPE declaration, one might want to 'assume' it follows
> > a particular
> > > document type, and validate it against that DTD to verify
> > that assumption.
> > >
> > > -DHM
> > >
> > > > -----Original Message-----
> > > > From: Arun M. Thomas [mailto:AMammenT@yahoo.com]
> > > > Sent: Tuesday, November 13, 2001 11:37 AM
> > > > To: Jakarta Commons Developers List
> > > > Subject: RE: [digester] forcing a specific DTD
> > > >
> > > >
> > > > You're absolutely right, this wouldn't work without a DOCTYPE
> > > > declaration.  However, it wouldn't matter what the
> > contents of that
> > > > declaration were.  (The user could modify to his hearts content
> > > > and still have the document validated against the same DTD.
> > > >
> > > > -AMT
> > > >
> > > > -----Original Message-----
> > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On
> > Behalf Of Craig
> > > > R. McClanahan
> > > > Sent: Tuesday, November 13, 2001 10:17 AM
> > > > To: Jakarta Commons Developers List
> > > > Subject: RE: [digester] forcing a specific DTD
> > > >
> > > >
> > > > Would something like this work even in the absence of a <DOCTYPE>
> > > > declaration at all in the file being parsed?  I thought that
> > > > this was the
> > > > only time resolveEntity() was called.
> > > >
> > > > Craig
> > > >
> > > >
> > > > On Tue, 13 Nov 2001, Arun M. Thomas wrote:
> > > >
> > > > > Date: Tue, 13 Nov 2001 10:20:42 -0800
> > > > > From: Arun M. Thomas <AMammenT@yahoo.com>
> > > > > Reply-To: Jakarta Commons Developers List
> > > > <commons-dev@jakarta.apache.org>
> > > > > To: Jakarta Commons Developers List
> > <commons-dev@jakarta.apache.org>
> > > > > Subject: RE: [digester] forcing a specific DTD
> > > > >
> > > > > Craig,
> > > > >
> > > > > Despite my previous response to TAL, it should be possible
> > > > to do this
> > > > > by instantiating the SAXParser with a subclass of
> > > > DefaultHandler which
> > > > > overrides the resolveEntity method.  I had to do
> > exactly this in the
> > > > > last application on which I worked using the JAXP1.0
> > API.  In that
> > > > > case, we provided a custom implementation of
> > EntityResolver which
> > > > > always returned an INPUT source to the same dtd.  It
> > > > appears the JAXP1.1
> > > > > has hidden the EntityResolver under the DefaultHandler
> > > > class, so providing
> > > > > an implementation of that method which is customized to
> > > > return a specific
> > > > > dtd should suffice.
> > > > >
> > > > > In the Digester case, it means a potentially simple
> > > > modification to the
> > > > > resolveEntity method of Digester (which is a
> > DefaultHandler).  I've
> > > > > included a diff of a quick patch below as a suggestion, and
> > > > attached a
> > > > copy
> > > > > of the modified version of digester.
> > > > >
> > > > > Cheers,
> > > > > -AMT
> > > > >
> > > > > cvs diff Digester.java (in directory
> > > > >
> > > >
> > C:\Dev\jakarta-commons\digester\src\java\org\apache\commons\digester\)
> > > > > Index: Digester.java
> > > > >
> > ===================================================================
> > > > > RCS file:
> > > > >
> > > > /home/cvspublic/jakarta-commons/digester/src/java/org/apache/c
> > > ommons/digeste
> > > > > r/Digester.java,v
> > > > > retrieving revision 1.23
> > > > > diff -r1.23 Digester.java
> > > > > 161c161,171
> > > > > <
> > > > > ---
> > > > > > 		/**
> > > > > > 		 * Works only in a JAXP1.1 world, but allows
> > > > the user to supply a
> > > > fixed
> > > > > URL against
> > > > > > 		 * which all documents will be validated.  The
> > > > supplied parameter will
> > > > > be used by
> > > > > > 		 * the {@link #resolveEntity(String,
> > > String)} method.
> > > > > > 		 */
> > > > > > 		public Digester(String fixedDTDUrl) {
> > > > > > 			super();
> > > > > >
> > > > > > 			this.fixedDTDUrl = fixedDTDUrl;
> > > > > > 		}
> > > > > >
> > > > > 313a324,328
> > > > > > 		/**
> > > > > > 		 * URL which may be supplied against which all
> > > > documents should be
> > > > > > 		 * validated regardless of the public and
> > > > system identifiers.
> > > > > > 		 */
> > > > > > 		private String fixedDTDUrl = null;
> > > > > 1042a1058,1060
> > > > > > 	if (fixedDTDUrl != null)
> > > > > > 		dtdURL = fixedDTDUrl;
> > > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On
> > > > Behalf Of Craig
> > > > > R. McClanahan
> > > > > Sent: Tuesday, November 13, 2001 9:30 AM
> > > > > To: Jakarta Commons Developers List
> > > > > Subject: Re: [digester] forcing a specific DTD
> > > > >
> > > > > ...
> > > > >
> > > > > This sounds like it might be an interesting idea, but I
> > > > don't know how to
> > > > > implement it :-(.  Digester uses a SAX parser via the
> > JAXP/1.1 APIs
> > > > > underneath the covers.  How do you tell the parser to use
> > > > an arbitrary DTD
> > > > > instead of whatever is specified in the document being parsed?
> > > > >
> > > > > Craig
> > > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > > <mailto:commons-dev-help@jakarta.apache.org>
> > > >
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > > <mailto:commons-dev-help@jakarta.apache.org>
> > > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:commons-dev-help@jakarta.apache.org>
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:commons-dev-help@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail:
<mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:commons-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message