tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Preston L. Bannister" <>
Subject RE: Tomcat on IBM OS/390?
Date Fri, 19 Nov 1999 15:56:43 GMT
From: David Brownell []

> First try it without transcoding anything at all.  It should
> work just fine.  If it doesn't, that's a bug.
> Then if that works, try the transcoding but make sure you give
> the appropriate XML declaration, like
> 	<?xml version='1.0' encoding='EBCDIC-CP-US'?>
> If you don't provide such a declaration, things are pretty
> much guaranteed to break -- as in, if they don't, it's a bug.

Do you know that EBCDIC is an entirely different character encoding than ASCII?  For example
if you are an InputStreamReader with a
ASCII-variant encoding (UTF-8, etc) then the string:

  <?xml version='1.0' encoding='EBCDIC-CP-US'?>

when rendered into EBCDIC, and read with an ASCII-variant encoding (UTF-8, etc) will look


(doubtless the above string won't make it through mail unscathed :)

The same string rendered in ASCII and read with EBCDIC encoding looks like:


(roughly as rendered by vi - telnet ignores most of the characters :)

In either case if you pick the wrong encoding before reading the first line of the file, you
will not be able to make sense of the
line that declares the character encoding.

Bit of a chicken and egg problem :).

I suppose that you could try reading an XML file first using the default encoding, and then
using UTF-8.  Only really need to do
this EBCDIC platforms.  Seems a bit of a hack.

> >	  It seems like readable text on ASCII machines should
> > be readable text on EBCDIC machines.  This
> > means that ASCII text files should become EBCDIC on EBCDIC machines.
> >
> > Should *.xml be UTF8 on EBCDIC machines?  Seems a bit awkward
> > for the administrator, but that's how I read the intent from my XML
> > book.
> EBCDIC should work just fine -- if, and only if, you declare
> it correctly.  That XML book would be wrong; use whatever
> encoding you really want, but if it's not UTF-8 or UTF-16
> (or honest 7-bit ASCII) you _must_ declare its encoding.

The book is Charles Goldfarb/Paul Prescod's "The XML Handbook" though the interpretation is
mine :).

My interpretation is that the "native" character set of XML is Unicode, and that if the byte
encoding of the character set has
enough in common with UTF-8 to read the first line, the remainder of the file can be in another
character set.

View raw message