tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Kolinko" <knst.koli...@gmail.com>
Subject Re: [OT] Basic int/char conversion question
Date Thu, 01 Jan 2009 23:35:00 GMT
2009/1/1 André Warnier <aw@ice-sa.com>:
> Hi.
>
> This has nothing specific to Tomcat, it's just a problem I'm having as a
> non-java expert in modifying an exiting webapp.
> I hope someone on this list can answer quickly, or send me to the
> appropriate place to find out.  I have tried to find, but get somewhat lost
> in the Java docs.
>
> Problem :
> an existing webapp reads from a socket connected to an external program.
> The input stream is created as follows :
> fromApp = socket.getInputStream();
> The read is as follows :
> StringBuffer buf = new StringBuffer(2000);
> int ic;
> while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB)
>           buf.append((char)ic);
>
> This is wrong, because it assumes that the input stream is always in an
> 8-bit default platform encoding, which it isn't.
>
> How do I do this correctly, assuming that I do know that the incoming stream
> is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is
> being used (such as iso-8859-1 or iso-8859-2) ?
> I cannot change the InputStream into something else, because there are a
> zillion other places where this webapp tests on the read byte's value,
> numerically.
>
> I mean, to append correctly to "buf" what was read in the "int", knowing
> that the proper encoding (charset) of "fromApp" is "X", how do I write this
> ?
>

1. Using iso-8859-1 does not loose any information. That is, you can later
print this out to iso-8859-1 stream, you will get exactly those 8-bit bytes
of iso-8859-2 as were in input.

If you need correctly Unicode, though, you can convert them by calling
String.getBytes(encoding) and new String(bytes, encoding).

new String(str.getBytes("ISO-8859-1"), "ISO-8859-2")

2. Well, the above, and all the others' tips I have read in this thread so far
are the right ones. Those are what you should do when you are engineering
and writing a well-made application. That is, you have to go with
InputStreamReader, String, CharsetDecoder APIs and that will take care of
various encodings, including multi-byte ones.

In you case, when you are tailoring some oddly (bad) written specific
application
to your specific environment, and do not expect much, there is a
simple approach:
implement this conversion by using a lookup table.

You will just need some static table of 256 chars and you are done.

For example,

package mypackage;
import java.io.UnsupportedEncodingException;

public class TranslationTable {
  private static char[] table;

  static {
     // "static initialization" block

     byte[] bytes = new byte[256];
     for (int i=0; i<bytes.length; i++){
        bytes[i] = (byte) i;
     }

     try {
        table = new String(bytes, "ISO-8859-2").toCharArray();
     } catch (UnsupportedEncodingException ex) {
        ex.printStackTrace();
        //System.exit(1);
        throw new Error("Class initialization failed", ex);
     }
  }

  public static char lookup(int i) {
     // will throw ArrayIndexOutOfBoundsException if i is -1, but that
should be OK
     return table[i];
  }
}

and replace

>           buf.append((char)ic);

with

  buf.append(TranslationTable.lookup(ic));

Also, I would replace StringBuffer with StringBuilder, if you are
running in Java 5 or
later, but that is another story.

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message