tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Costin Manolache <cos...@eng.sun.com>
Subject Re: [PATCH] Add EBCDIC support for text files in Tomcat.
Date Sat, 25 Mar 2000 08:01:53 GMT
My vote:
- for this release - do nothing, the problem is too deep and we may brake
something else ( it's not only about EBCDIC - internationaliaztion is probably
broken in few
places too )

- for next release - review the code and make sure it does the right thing. We
need to use the local encoding ( we can't ask the user to convert everything to
UTF8 ) and keep track of this, add comments, and make sure we follow the rules (
including the accept-encoding and all other relevant headers).

I think we can clean up what's broken - if someone can spend some time to review,
comment and clean everything. DefaultServlet is in a very bad shape, so it has to
be rewriten anyway, and I think it is the most important variable.

( Keep in mind that tomcat can be used "integrated" in Apache, IIS and NES, we
need to follow the same rules as the static server - I don't think we have any
choice or option in this area - we can't change the way people work because it's
simpler for us )

Costin

"Preston L. Bannister" wrote:

> (I know this is long, but I'm soliciting opinions at the end :).
>
> I am going to suggest that this patch not be incorporated.  The code is
> nice and clean, but the strategy is probably wrong.
>
> A couple months back I went through the exercise of making Tomcat work
> on an EBCDIC machine (an IBM 390 box specifically).
>
> The changes did get incorporated, and *most* things in Tomcat worked
> correctly on an EBCDIC machine.  Then the very next major release broke
> the EBCDIC support, or to be more exact the only-send-ASCII-to-the-web
> support :).  The reason for this is that a number of commonly used
> constructor's have two forms, a constructor that takes an explicit
> character encoding, and a more commonly used constructor that assumes
> the character encoding is the value of the "file.encoding" property.
> A programmer on an ASCII machine (just about everyone) will use the
> default encoding form of the constructor without knowing that this
> will pose a problem on non-ASCII machines.
>
> In other words it is impractical to keep Tomcat compatible with a
> default file.encoding of EBCDIC.
>
> Another point that bothered me was that if web pages are stored on
> disk using the "native" character encoding then they would have to
> be translated to Unicode (on read) and then to ASCII (on write).
>
> For the vast majority of (ASCII) machines this is a complete waste
> of time as the encoding on disk and on the web are the same.
>
> For the much smaller number of EBCDIC machines this is exactly what
> you need - but only if you are going to store your web pages as EBCDIC.
>
> After thinking about the above I changed my strategy.
>
> I would suggest that EBCDIC -> ASCII translation be done when the web
> pages are first created, and that Tomcat always assume that text on
> disk is in ASCII (or maybe UTF8).
>
> For each time a web page is created or updated by it's author,
> it will be viewed hundreds, thousands or even millions of times.
> It makes a lot more sense to do the EBCDIC -> ASCII translation at
> the time of publication, rather than on each and every request.
>
> I would also suggest that Tomcat always be run with a default character
> encoding of ASCII (or maybe UTF8).  This means that the file.encoding
> should be overridden when starting the JVM for Tomcat, like:
>
>   java -Dfile.encoding=8859_1 ...(remaining options)
>
> With this change Tomcat works very nicely on an EBCDIC machine, and
> will continue to work just as well as on ASCII machines.
>
> If you still want to read EBCDIC web pages then I would suggest that
> this be both an optional item and be made more general.  One possible
> approach might be to *optionally* use a subclass of DefaultServlet that
> would always do a disk-encoding to ASCII translation.  To be more general
> a look-aside could be used to determine the character encoding for files
> in a particular directory.
>
> I don't believe that this should be done by the default implementation.
>
> ----------------
> Opinions please!
> ----------------
>
> 1.  We should recommend that Tomcat be run with a web-compatible default
>     character encoding.
>
> This means I'll alter tomcat.sh to always specify the value to the Java
> interpreter, and checkin some form of the above text with Tomcat for
> future reference as (say) EBCDIC.txt.
>
> 2.  The default character encoding should be either ASCII (8859_1) or
>     perhaps UTF8.  The web standard for HTTP is ASCII.  I would like to
>     suggest that UTF8 might be a good default.
>
> So far as I can tell UTF8 is a strict superset of ASCII.  So for the case
> where the original data is ASCII the use of UTF8 wouldn't change anything.
>
> In the case where the data is more than just ASCII, the 8859_1 encoding
> will (I believe) cause exceptions to be thrown.  It is quite likely that
> applications previously only exercised with ASCII with deal poorly with
> the unexpected encoding exceptions.
>
> If the default encoding is UTF8 then non-ASCII characters will be encoded
> and decoded correctly.  Code used outside the ASCII-only world is much more
> likely to "just work".
>
> Personally I feel that the only practical alternative for (1) is to use the
> web encoding as the default encoding.  I suspect that the best choice for
> the default web encoding (2) is UTF8, but I might have missed some downside
> to deviating (even to a superset) from ASCII.
>
> Opinions??
>
> > -----Original Message-----
> > From: Clere, Jean-Frederic [mailto:jfrederic.clere@fujitsu.siemens.es]
> > Sent: Friday, March 24, 2000 8:17 AM
> > To: Tomcat-Dev (Correo electrónico)
> > Subject: [PATCH] Add EBCDIC support for text files in Tomcat.
> >
> >
> > I am porting tomcat to a BS2000 (Siemens EBCDIC mainframe).
> > And I have arranged:
> > ./jakarta-tomcat/src/share/org/apache/tomcat/servlets/DefaultServlet.java
> > The problem is that text documents like html files have to be editable on
> > EBCDIC machine -That is  EBCDIC-.
> > So some conversions are need native(EBCDIC to java) and then java to ASCII.
> >
> > Find enclosed the result of a diff -u -w between my patch and the CVS file.
> >
> > --- jakarta-tomcat/src/share/org/apache/tomcat/servlets/DefaultServlet.java
> > Wed Mar 22 22:52:15 2000
> > +++ jakarta-tomcat/src/share/org/apache/tomcat/servlets/DefaultServlet.java
> > Fri Mar 24 16:48:01 2000
> > @@ -289,6 +289,9 @@
> >       FileInputStream in=null;
> >       try {
> >           in = new FileInputStream(file);
> > +         if (mimeType.startsWith("text"))
> > +             serveStreamNative(in, request, response);
> > +         else
> >           serveStream(in, request, response);
> >       } catch (FileNotFoundException e) {
> >           // Figure out what we're serving
> > @@ -311,6 +314,22 @@
> >       }
> >      }
> >
> > +    // Sends the text file, wich are encoded in machine encoding.
> > +    private void serveStreamNative(InputStream in, HttpServletRequest
> > request,
> > +        HttpServletResponse response)
> > +    throws IOException {
> > +        // like the serveStream, we try the stream and the writer.
> > +
> > +     try {
> > +         ServletOutputStream out = response.getOutputStream();
> > +         serveStreamAsStreamNative(in, out);
> > +     } catch (IllegalStateException ise) {
> > +         PrintWriter out = response.getWriter();
> > +            // as it uses an InputStreamReader it does the conversion.
> > +         serveStreamAsWriter(in, out);
> > +     }
> > +    }
> > +
> >      private void serveStream(InputStream in, HttpServletRequest request,
> >          HttpServletResponse response)
> >      throws IOException {
> > @@ -341,6 +360,27 @@
> >           out.write(buf, 0, read);
> >       }
> >      }
> > +
> > +    private void serveStreamAsStreamNative(InputStream in, OutputStream
> > out)
> > +    throws IOException {
> > +     char[] buf = new char[1024];
> > +     int read = 0;
> > +
> > +     // Here the input and the output have to be converted.
> > +        // input  default encoding output ASCII (ISO8859_1).
> > +        OutputStreamWriter output = null;
> > +     try {
> > +            output = new OutputStreamWriter(out,"ISO8859_1");
> > +     } catch(UnsupportedEncodingException e) {
> > +         output = new OutputStreamWriter(out); // try without conversion.
> >
> > +     }
> > +     InputStreamReader input = new InputStreamReader(in);
> > +
> > +     while ((read = input.read(buf)) != -1) {
> > +         output.write(buf, 0, read);
> > +     }
> > +     output.flush();
> > +    }
> >
> >      private void serveStreamAsWriter(InputStream in, PrintWriter out)
> >      throws IOException {
> >
> >
> >
> >  <<DefaultServlet.patch.txt>>
> >
> > Jean-Frédéric Clère
> > EP LP DC22 (BCN)
> > Fujitsu Siemens Computers
> > Phone + 34 93 480 4209
> > Fax     + 34 93 480 4201
> > Mail mailto:jfrederic.clere@fujitsu.siemens.es
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Mime
View raw message