xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <j...@socialchange.net.au>
Subject Re: Forcing validation to a particular DTD
Date Tue, 06 Nov 2001 11:06:52 GMT
On Tue, Nov 06, 2001 at 05:51:16PM +0900, Andy Clark wrote:
> Scott Dewitt wrote:
> > I would like to make sure when opening a document that it conforms 
> > to the grammar for this vocabulary so that I can intelligently 
> > notify the user if the document is not of the correct type.
> 
> The closest that you can get is using an entity resolver but
> this doesn't get you very far. For one thing, if the document
> didn't include a doctype line, then the entity resolver is
> never called. Or perhaps the tool that Jeff mentioned would
> be useful.
> 
> However, be careful about what solution you use. There are
> certainly pros and cons with each approach. For example, does
> Jeff's tool handle all the various encodings?

Hmm.. never thought of that :) I'd imagine it would break horribly with
UTF-16 or UCS-4.

Actually, if an XML stream arrives with an external encoding identifier
like a MIME type, then the MIME type is authoritative. The XML spec
says:

  "If an XML entity is delivered with a MIME type of text/xml, then the
  charset parameter on the MIME type determines the character encoding
  method; all other heuristics and sources of information are solely for
  error recovery."

So this is an unsolvable problem. The FilterInputStream doesn't get
access to things like MIME types, so cannot reliably know the
encoding. 

Does that sound right? If so, I'll add a note to the docs: "only feed
this ASCII-compatible byte streams". Fortunately that is 95% of XML.

> And what does it do with an internal subset? Does it swallow it
> completely or allow general entity declarations to pass through for
> use by the XML parser?

That's up to the user's event handler. It could be swallowed, modified
and replaced or left intact. It's very flexible that way.


--Jeff

> etc, etc...
>
> We're currently looking at how to provide a grammar caching
> mechanism in Xerces2 which would be a solution to this problem.
> But there is currently no code to do this. So if you have any
> ideas, please let us know. (We generally talk about design
> work in the xerces-j-dev mailing list.)
> 
> -- 
> Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message