Return-Path: Delivered-To: apmail-xml-general-archive@xml.apache.org Received: (qmail 73182 invoked by uid 500); 6 Nov 2001 11:02:07 -0000 Mailing-List: contact general-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: general@xml.apache.org Delivered-To: mailing list general@xml.apache.org Received: (qmail 73171 invoked from network); 6 Nov 2001 11:02:05 -0000 X-Authentication-Warning: new-smtp1.ihug.com.au: Host p366-apx1.syd.ihug.com.au [203.173.141.112] claimed to be expresso.localdomain Date: Tue, 6 Nov 2001 22:06:52 +1100 From: Jeff Turner To: general@xml.apache.org Subject: Re: Forcing validation to a particular DTD Message-ID: <20011106220652.R17094@socialchange.net.au> References: <3BE7A484.2E69AD22@apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3BE7A484.2E69AD22@apache.org> User-Agent: Mutt/1.3.23i X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Tue, Nov 06, 2001 at 05:51:16PM +0900, Andy Clark wrote: > Scott Dewitt wrote: > > I would like to make sure when opening a document that it conforms > > to the grammar for this vocabulary so that I can intelligently > > notify the user if the document is not of the correct type. > > The closest that you can get is using an entity resolver but > this doesn't get you very far. For one thing, if the document > didn't include a doctype line, then the entity resolver is > never called. Or perhaps the tool that Jeff mentioned would > be useful. > > However, be careful about what solution you use. There are > certainly pros and cons with each approach. For example, does > Jeff's tool handle all the various encodings? Hmm.. never thought of that :) I'd imagine it would break horribly with UTF-16 or UCS-4. Actually, if an XML stream arrives with an external encoding identifier like a MIME type, then the MIME type is authoritative. The XML spec says: "If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery." So this is an unsolvable problem. The FilterInputStream doesn't get access to things like MIME types, so cannot reliably know the encoding. Does that sound right? If so, I'll add a note to the docs: "only feed this ASCII-compatible byte streams". Fortunately that is 95% of XML. > And what does it do with an internal subset? Does it swallow it > completely or allow general entity declarations to pass through for > use by the XML parser? That's up to the user's event handler. It could be swallowed, modified and replaced or left intact. It's very flexible that way. --Jeff > etc, etc... > > We're currently looking at how to provide a grammar caching > mechanism in Xerces2 which would be a solution to this problem. > But there is currently no code to do this. So if you have any > ideas, please let us know. (We generally talk about design > work in the xerces-j-dev mailing list.) > > -- > Andy Clark * IBM, TRL - Japan * andyc@apache.org --------------------------------------------------------------------- In case of troubles, e-mail: webmaster@xml.apache.org To unsubscribe, e-mail: general-unsubscribe@xml.apache.org For additional commands, e-mail: general-help@xml.apache.org