tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Brownell <>
Subject Re: [PATCH] Tomcat on EBCDIC machines...
Date Sat, 20 Nov 1999 23:30:22 GMT
"Preston L. Bannister" wrote:
> From Appendix F [of the XML spec]:
> "The second possible case occurs when the XML entity is accompanied
> by encoding information, as in some file systems and some
> network protocols. When multiple sources of information are available,
> their relative priority and the preferred method of handling
> conflict should be specified as part of the higher-level protocol
> used to deliver XML. ..."
> You could interpret the XML document as "accompanied by encoding
> information" in the System property for "file.encoding". 

True, and I've gone down that path but rejected it since it
caused many more problems than it solved.

That's based on user feedback from using versions of that XML
package which actually tried that approach.  Consider that
while you're seeing a fairly blatant failure mode, there are
a lot of more subtle ones to consider too.  And yet they all
can be addressed by the simple policy advice I've given, but
which you don't much seem to like.  I'm speaking as a person
who's tried several such policies before settling on the only
one that didn't cause ongoing problems from multiple user
communities.  (Including yours, if you'd take that advice.)

>	 This is
> exactly the case, as the text document (when not ASCII/UTF-8) is
> in exactly the character set specified by "file.encoding".

The problem is that the system "file.encoding" property is quite
definitely NOT the correct one in many cases.  Enough that it's
not worth even considering it as the default for XML data.  It's
not always set correctly, and even if it is correct than it's not
guaranteed to be applicable to any given file.  Trusting it was
a surefire recipe for widely reported problems with very screwey
symptoms, which were hard to diagnose (and often hard to fix).

Hence that current advice:  _always_ use an XML declaration unless
you're using UTF-8 (or its ASCII subset, or UTF-16).  That's a
simple policy that's been proven to work well in many more
environments than just your EBCDIC one.

- Dave

View raw message