tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Preston L. Bannister" <pres...@home.com>
Subject RE: [PATCH] Add EBCDIC support for text files in Tomcat.
Date Sat, 25 Mar 2000 06:37:35 GMT
(I know this is long, but I'm soliciting opinions at the end :).

I am going to suggest that this patch not be incorporated.  The code is
nice and clean, but the strategy is probably wrong.

A couple months back I went through the exercise of making Tomcat work
on an EBCDIC machine (an IBM 390 box specifically).

The changes did get incorporated, and *most* things in Tomcat worked
correctly on an EBCDIC machine.  Then the very next major release broke
the EBCDIC support, or to be more exact the only-send-ASCII-to-the-web
support :).  The reason for this is that a number of commonly used
constructor's have two forms, a constructor that takes an explicit
character encoding, and a more commonly used constructor that assumes
the character encoding is the value of the "file.encoding" property.
A programmer on an ASCII machine (just about everyone) will use the
default encoding form of the constructor without knowing that this
will pose a problem on non-ASCII machines.

In other words it is impractical to keep Tomcat compatible with a
default file.encoding of EBCDIC.

Another point that bothered me was that if web pages are stored on
disk using the "native" character encoding then they would have to
be translated to Unicode (on read) and then to ASCII (on write).

For the vast majority of (ASCII) machines this is a complete waste
of time as the encoding on disk and on the web are the same.

For the much smaller number of EBCDIC machines this is exactly what
you need - but only if you are going to store your web pages as EBCDIC.

After thinking about the above I changed my strategy.

I would suggest that EBCDIC -> ASCII translation be done when the web
pages are first created, and that Tomcat always assume that text on
disk is in ASCII (or maybe UTF8).

For each time a web page is created or updated by it's author,
it will be viewed hundreds, thousands or even millions of times.
It makes a lot more sense to do the EBCDIC -> ASCII translation at
the time of publication, rather than on each and every request.

I would also suggest that Tomcat always be run with a default character
encoding of ASCII (or maybe UTF8).  This means that the file.encoding
should be overridden when starting the JVM for Tomcat, like:

  java -Dfile.encoding=8859_1 ...(remaining options)

With this change Tomcat works very nicely on an EBCDIC machine, and
will continue to work just as well as on ASCII machines.

If you still want to read EBCDIC web pages then I would suggest that
this be both an optional item and be made more general.  One possible
approach might be to *optionally* use a subclass of DefaultServlet that
would always do a disk-encoding to ASCII translation.  To be more general
a look-aside could be used to determine the character encoding for files
in a particular directory.

I don't believe that this should be done by the default implementation.

----------------
Opinions please!
----------------

1.  We should recommend that Tomcat be run with a web-compatible default
    character encoding.

This means I'll alter tomcat.sh to always specify the value to the Java
interpreter, and checkin some form of the above text with Tomcat for
future reference as (say) EBCDIC.txt.

2.  The default character encoding should be either ASCII (8859_1) or
    perhaps UTF8.  The web standard for HTTP is ASCII.  I would like to
    suggest that UTF8 might be a good default.

So far as I can tell UTF8 is a strict superset of ASCII.  So for the case
where the original data is ASCII the use of UTF8 wouldn't change anything.

In the case where the data is more than just ASCII, the 8859_1 encoding
will (I believe) cause exceptions to be thrown.  It is quite likely that
applications previously only exercised with ASCII with deal poorly with
the unexpected encoding exceptions.

If the default encoding is UTF8 then non-ASCII characters will be encoded
and decoded correctly.  Code used outside the ASCII-only world is much more
likely to "just work".

Personally I feel that the only practical alternative for (1) is to use the
web encoding as the default encoding.  I suspect that the best choice for
the default web encoding (2) is UTF8, but I might have missed some downside
to deviating (even to a superset) from ASCII.

Opinions??


> -----Original Message-----
> From: Clere, Jean-Frederic [mailto:jfrederic.clere@fujitsu.siemens.es]
> Sent: Friday, March 24, 2000 8:17 AM
> To: Tomcat-Dev (Correo electrónico)
> Subject: [PATCH] Add EBCDIC support for text files in Tomcat.
>
>
> I am porting tomcat to a BS2000 (Siemens EBCDIC mainframe).
> And I have arranged:
> ./jakarta-tomcat/src/share/org/apache/tomcat/servlets/DefaultServlet.java
> The problem is that text documents like html files have to be editable on
> EBCDIC machine -That is  EBCDIC-.
> So some conversions are need native(EBCDIC to java) and then java to ASCII.
>
> Find enclosed the result of a diff -u -w between my patch and the CVS file.
>
> --- jakarta-tomcat/src/share/org/apache/tomcat/servlets/DefaultServlet.java
> Wed Mar 22 22:52:15 2000
> +++ jakarta-tomcat/src/share/org/apache/tomcat/servlets/DefaultServlet.java
> Fri Mar 24 16:48:01 2000
> @@ -289,6 +289,9 @@
>  	FileInputStream in=null;
>  	try {
>  	    in = new FileInputStream(file);
> +	    if (mimeType.startsWith("text"))
> +	        serveStreamNative(in, request, response);
> +	    else
>  	    serveStream(in, request, response);
>  	} catch (FileNotFoundException e) {
>  	    // Figure out what we're serving
> @@ -311,6 +314,22 @@
>  	}
>      }
>
> +    // Sends the text file, wich are encoded in machine encoding.
> +    private void serveStreamNative(InputStream in, HttpServletRequest
> request,
> +        HttpServletResponse response)
> +    throws IOException {
> +        // like the serveStream, we try the stream and the writer.
> +
> +	try {
> +	    ServletOutputStream out = response.getOutputStream();
> +	    serveStreamAsStreamNative(in, out);
> +	} catch (IllegalStateException ise) {
> +	    PrintWriter out = response.getWriter();
> +            // as it uses an InputStreamReader it does the conversion.
> +	    serveStreamAsWriter(in, out);
> +	}
> +    }
> +
>      private void serveStream(InputStream in, HttpServletRequest request,
>          HttpServletResponse response)
>      throws IOException {
> @@ -341,6 +360,27 @@
>  	    out.write(buf, 0, read);
>  	}
>      }
> +
> +    private void serveStreamAsStreamNative(InputStream in, OutputStream
> out)
> +    throws IOException {
> +	char[] buf = new char[1024];
> +	int read = 0;
> +
> +	// Here the input and the output have to be converted.
> +        // input  default encoding output ASCII (ISO8859_1).
> +        OutputStreamWriter output = null;
> +	try {
> +            output = new OutputStreamWriter(out,"ISO8859_1");
> +	} catch(UnsupportedEncodingException e) {
> +	    output = new OutputStreamWriter(out); // try without conversion.
>
> +	}
> +	InputStreamReader input = new InputStreamReader(in);
> +
> +	while ((read = input.read(buf)) != -1) {
> +	    output.write(buf, 0, read);
> +	}
> +	output.flush();
> +    }
>
>      private void serveStreamAsWriter(InputStream in, PrintWriter out)
>      throws IOException {
>
>
>
>  <<DefaultServlet.patch.txt>>
>
> Jean-Frédéric Clère
> EP LP DC22 (BCN)
> Fujitsu Siemens Computers
> Phone + 34 93 480 4209
> Fax     + 34 93 480 4201
> Mail mailto:jfrederic.clere@fujitsu.siemens.es


Mime
View raw message