commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tal Dayan" <...@zapta.com>
Subject RE: [digester] forcing a specific DTD
Date Wed, 14 Nov 2001 20:52:00 GMT
You are right, this is a workaround for a deficiency in the SAX standard.
Hopefully,
future version of the SAX API will address it.

Meanwhile, we are using Jeff's Doctype Changer with great success. It gives
excellent access to the original DOCTYPE and full control over the DOCTYPE
that the parser will see.

BTW, in our case, if the original XML document contains a DOCTYPE the
application considers it as an error. We are dealing with configuration
files and we don't bother the users with DOCTYPE statements. But with the
Doctype Changer, we simply insert the correct DOCTYPE to trigger DTD based
validation.

Tal

> -----Original Message-----
> From: Dave Martin [mailto:dmartin@altoweb.com]
> Sent: Wednesday, November 14, 2001 10:32 AM
> To: Jakarta Commons Developers List
> Subject: RE: [digester] forcing a specific DTD
>
>
> I didn't mean to say that lack of a DOCTYPE should be treated as an error
> condition. In fact, what's an error and what isn't is really up to the
> application that parses the XML. Think of your XML processor as having an
> implicit contract which imposes constraints on what the incoming XML looks
> like. The details of that contract are up to you. Hence, whether
> or not the
> lack of a DOCTYPE is an error depends on the XML processor's contract. The
> contract could be: "if there is no DOCTYPE, this is an error", or it could
> be "if there is no DOCTYPE, I'll validate it against my DTD anyway."
>
> The problem with the current SAX standard is that it provides no
> strait-forward way to enforce such contracts. Instead, it imposes
> a default
> contract, which says: "I will validate against the DTD you specify in your
> DOCTYPE; if you do not include a DOCTYPE, I will not validate."
>
> Jeff Turner's DoctypeChanger looks like a decent workaround for this
> deficiency in the SAX standard. To me it would seem a bit crufty
> to actually
> incorporate this kind of functionality within the Digester, since if you
> need to control the doctype in this way, you can simply pass a
> DoctypeChangerStream to the Digester.parse() method.
>
> The point of my last email was essentially to suggest a contract:
> "if there
> is no DOCTYPE, validate against my DTD; if there is a DOCTYPE, and it
> contains the public id that I expect, validate against my DTD; if
> there is a
> DOCTYPE, and it contains some unknown public id, terminate with a
> validation
> failure exception." Such a contract could be enforced using the
> DoctypeChanger framework, with a custom implementation of the
> DoctypeGenerator interface.
>
> -DHM
>
> > -----Original Message-----
> > From: Tal Dayan [mailto:tal@zapta.com]
> > Sent: Tuesday, November 13, 2001 8:15 PM
> > To: Jakarta Commons Developers List
> > Subject: RE: [digester] forcing a specific DTD
> >
> >
> > Let see if I get your suggestions right, there are two error cases:
> >
> > 1. There is no DOCTYPE at all. Since there will be no
> > validation, we want to
> > abort the parsing. How can we currently detect this case ?
> >
> > 2. The document has DOCTYPE but the public id or URL are
> > incorrect. Does the
> > current version of Digester provides a way to detect this case ?
> >
> > Simply parsing the document without validation (and using the
> > correct DTD)
> > is not an option since we don't want to write explicitly the entire
> > validation code, so at least we need to detect the error
> > condition (that is,
> > no validation or validation using a wrong DTD) and abort the parsing.
> >
> > This looks like a very general XML parsing issue so it will
> > be great if
> > Digester will come out of the box with an easy way to address it.
> >
> > Tal
> >
> >
> > > -----Original Message-----
> > > From: Dave Martin [mailto:dmartin@altoweb.com]
> > > Sent: Tuesday, November 13, 2001 7:11 PM
> > > To: Jakarta Commons Developers List
> > > Subject: RE: [digester] forcing a specific DTD
> > >
> > >
> > > I can only speak for myself, but I would summarize it as:
> > >
> > > 1. If you can enforce that the XML file has a DOCTYPE
> > declaration, you can
> > > configure the Digester to use a specified DTD, based on the PUBLIC
> > > identifier in the DOCTYPE entity. (See the 'register' method in
> > > the Digester
> > > class.)
> > >
> > > 2. If it is possible for the XML file to omit the DOCTYPE
> > declaration, and
> > > you still want to validate the XML, you have a slight
> > problem, because the
> > > SAX standard currently doesn't provide a mechanism to force
> > > validation. (SAX
> > > says: "if validation is 'on', _and_ the XML contains DOCTYPE,
> > > then validate
> > > against the specified DTD").
> > >
> > > One workaround is to trick the XML parser into validating,
> > by intercepting
> > > the XML input stream and prepending a DOCTYPE declaration
> > to it. I do not
> > > believe it is appropriate to implement this capability in
> > the Digester
> > > itself.
> > >
> > > That said, the simplest solution to #2 is to simply not validate
> > > if the XML
> > > file omits a doctype declaration. This can be treacherous,
> > > however, because
> > > if a DTD defines default values, it's possible for the same
> > XML file to be
> > > parsed differently, depending on whether validation is used.
> > >
> > > -DHM
> > >
> > > > -----Original Message-----
> > > > From: Tal Dayan [mailto:tal@zapta.com]
> > > > Sent: Tuesday, November 13, 2001 6:44 PM
> > > > To: Jakarta Commons Developers List
> > > > Subject: RE: [digester] forcing a specific DTD
> > > >
> > > >
> > > > So, what is the concensous of the list regarding DTD
> > > > validation of parsed
> > > > files of known type ?
> > > > Is there a safe way to do it now (if so, how) ?  If not, does
> > > > it make sense
> > > > to add more support
> > > > for enforcing validation ?
> > > >
> > > > Tal
> > > >
> > > > > -----Original Message-----
> > > > > From: Dave Martin [mailto:dmartin@altoweb.com]
> > > > > Sent: Tuesday, November 13, 2001 11:53 AM
> > > > > To: Jakarta Commons Developers List
> > > > > Subject: RE: [digester] forcing a specific DTD
> > > > >
> > > > >
> > > > > IMHO, if the input XML document declares a document type other
> > > > > than what is
> > > > > expected, the appropriate action should be to terminate the
> > > > parse due to a
> > > > > validation failure. (I.e. "If you're not what you say you
> > > > are, how can I
> > > > > trust you?")
> > > > > The Digester's register() method, which, as Craig pointed out,
> > > > > allows you to
> > > > > key a specific DTD off of the DOCTYPE's public
> > identifier, should be
> > > > > appropriate for almost all cases. (Since if the XML cannot
> > > > declare the
> > > > > location of the DTD via the SYSTEM identifier, it
> > should at least
> > > > > be able to
> > > > > identify its type via the PUBLIC identifier.)
> > > > > On the other hand, in the scenario where the XML input does not
> > > > > contain any
> > > > > DOCTYPE declaration, one might want to 'assume' it follows
> > > > a particular
> > > > > document type, and validate it against that DTD to verify
> > > > that assumption.
> > > > >
> > > > > -DHM
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Arun M. Thomas [mailto:AMammenT@yahoo.com]
> > > > > > Sent: Tuesday, November 13, 2001 11:37 AM
> > > > > > To: Jakarta Commons Developers List
> > > > > > Subject: RE: [digester] forcing a specific DTD
> > > > > >
> > > > > >
> > > > > > You're absolutely right, this wouldn't work without a DOCTYPE
> > > > > > declaration.  However, it wouldn't matter what the
> > > > contents of that
> > > > > > declaration were.  (The user could modify to his
> > hearts content
> > > > > > and still have the document validated against the same DTD.
> > > > > >
> > > > > > -AMT
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On
> > > > Behalf Of Craig
> > > > > > R. McClanahan
> > > > > > Sent: Tuesday, November 13, 2001 10:17 AM
> > > > > > To: Jakarta Commons Developers List
> > > > > > Subject: RE: [digester] forcing a specific DTD
> > > > > >
> > > > > >
> > > > > > Would something like this work even in the absence of
> > a <DOCTYPE>
> > > > > > declaration at all in the file being parsed?  I thought that
> > > > > > this was the
> > > > > > only time resolveEntity() was called.
> > > > > >
> > > > > > Craig
> > > > > >
> > > > > >
> > > > > > On Tue, 13 Nov 2001, Arun M. Thomas wrote:
> > > > > >
> > > > > > > Date: Tue, 13 Nov 2001 10:20:42 -0800
> > > > > > > From: Arun M. Thomas <AMammenT@yahoo.com>
> > > > > > > Reply-To: Jakarta Commons Developers List
> > > > > > <commons-dev@jakarta.apache.org>
> > > > > > > To: Jakarta Commons Developers List
> > > > <commons-dev@jakarta.apache.org>
> > > > > > > Subject: RE: [digester] forcing a specific DTD
> > > > > > >
> > > > > > > Craig,
> > > > > > >
> > > > > > > Despite my previous response to TAL, it should be possible
> > > > > > to do this
> > > > > > > by instantiating the SAXParser with a subclass of
> > > > > > DefaultHandler which
> > > > > > > overrides the resolveEntity method.  I had to do
> > > > exactly this in the
> > > > > > > last application on which I worked using the JAXP1.0
> > > > API.  In that
> > > > > > > case, we provided a custom implementation of
> > > > EntityResolver which
> > > > > > > always returned an INPUT source to the same dtd.  It
> > > > > > appears the JAXP1.1
> > > > > > > has hidden the EntityResolver under the DefaultHandler
> > > > > > class, so providing
> > > > > > > an implementation of that method which is customized to
> > > > > > return a specific
> > > > > > > dtd should suffice.
> > > > > > >
> > > > > > > In the Digester case, it means a potentially simple
> > > > > > modification to the
> > > > > > > resolveEntity method of Digester (which is a
> > > > DefaultHandler).  I've
> > > > > > > included a diff of a quick patch below as a suggestion,
and
> > > > > > attached a
> > > > > > copy
> > > > > > > of the modified version of digester.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > -AMT
> > > > > > >
> > > > > > > cvs diff Digester.java (in directory
> > > > > > >
> > > > > >
> > > >
> > C:\Dev\jakarta-commons\digester\src\java\org\apache\commons\digester\)
> > > > > > > Index: Digester.java
> > > > > > >
> > > >
> > ===================================================================
> > > > > > > RCS file:
> > > > > > >
> > > > > > /home/cvspublic/jakarta-commons/digester/src/java/org/apache/c
> > > > > ommons/digeste
> > > > > > > r/Digester.java,v
> > > > > > > retrieving revision 1.23
> > > > > > > diff -r1.23 Digester.java
> > > > > > > 161c161,171
> > > > > > > <
> > > > > > > ---
> > > > > > > > 		/**
> > > > > > > > 		 * Works only in a JAXP1.1 world, but allows
> > > > > > the user to supply a
> > > > > > fixed
> > > > > > > URL against
> > > > > > > > 		 * which all documents will be
> validated.  The
> > > > > > supplied parameter will
> > > > > > > be used by
> > > > > > > > 		 * the {@link #resolveEntity(String,
> > > > > String)} method.
> > > > > > > > 		 */
> > > > > > > > 		public Digester(String fixedDTDUrl) {
> > > > > > > > 			super();
> > > > > > > >
> > > > > > > > 			this.fixedDTDUrl = fixedDTDUrl;
> > > > > > > > 		}
> > > > > > > >
> > > > > > > 313a324,328
> > > > > > > > 		/**
> > > > > > > > 		 * URL which may be supplied
> against which all
> > > > > > documents should be
> > > > > > > > 		 * validated regardless of the public and
> > > > > > system identifiers.
> > > > > > > > 		 */
> > > > > > > > 		private String fixedDTDUrl = null;
> > > > > > > 1042a1058,1060
> > > > > > > > 	if (fixedDTDUrl != null)
> > > > > > > > 		dtdURL = fixedDTDUrl;
> > > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On
> > > > > > Behalf Of Craig
> > > > > > > R. McClanahan
> > > > > > > Sent: Tuesday, November 13, 2001 9:30 AM
> > > > > > > To: Jakarta Commons Developers List
> > > > > > > Subject: Re: [digester] forcing a specific DTD
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > This sounds like it might be an interesting idea, but I
> > > > > > don't know how to
> > > > > > > implement it :-(.  Digester uses a SAX parser via the
> > > > JAXP/1.1 APIs
> > > > > > > underneath the covers.  How do you tell the parser to use
> > > > > > an arbitrary DTD
> > > > > > > instead of whatever is specified in the document
> > being parsed?
> > > > > > >
> > > > > > > Craig
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > To unsubscribe, e-mail:
> > > > > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > > > > For additional commands, e-mail:
> > > > > > <mailto:commons-dev-help@jakarta.apache.org>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > To unsubscribe, e-mail:
> > > > > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > > > > For additional commands, e-mail:
> > > > > > <mailto:commons-dev-help@jakarta.apache.org>
> > > > > >
> > > > >
> > > > > --
> > > > > To unsubscribe, e-mail:
> > > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > > <mailto:commons-dev-help@jakarta.apache.org>
> > > >
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > > <mailto:commons-dev-help@jakarta.apache.org>
> > > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:commons-dev-help@jakarta.apache.org>
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:commons-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:commons-dev-help@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail:
<mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:commons-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message