tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Brownell <>
Subject Re: [PATCH] Tomcat on EBCDIC machines...
Date Sat, 20 Nov 1999 19:20:09 GMT
"Preston L. Bannister" wrote:
> "Preston L. Bannister" wrote:
> > There is also an issue with the *.xml and *.dtd files needing
> > to be ASCII (awkward) or edited (to specify EBCDIC).
> From: David Brownell []
> > And what is that issue?  XML has specified defaults, and if for
> > some reason Tomcat tries to change that it'll violate a pretty
> > basic web standard.
> If I translate the *.xml files to EBCDIC (which is the right thing
> to do for text files), then I have to edit each and every *.xml file
> to change the encoding= attribute.
> I'd call this a bug :).

I'd call it a fact of life.  If you modify any other kind of data,
you have to do it in the right way ... why should modifying XML be
different?  The way to transcode XML involves updating any XML or
text declaration.  

Attached, please find a transcoding utility that was distributed
with Sun's XML parser (the "transcode" example).  It should do
that work correctly.  You will need to use the "cp037" encoding
name instead of EBCDIC-CP-US, due to JDK issues, though.  (The
names are equivalant, but the JDK only understands the cryptic
one ... there's a whole table of such names in most XML parsers,
since the JDK's table is missing some basic entries.)

> It seems that once the XML parser figures out that the first line is
> in EBCDIC, that if more explicit encoding is specified then it should
> use EBCDIC for the remainder of the file.

Which EBCDIC?  There are over a dozen variants.  And in any case the
XML specification REQUIRES (!!) an XML declaration.  As it says in
section 4.3.3 of the spec, "Parsed entities which are stored in an
encoding other than UTF-8 or UTF-16 must begin with a text declaration
containing an encoding declaration."

- Dave
View raw message