Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 30514 invoked from network); 1 Jan 2009 23:35:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Jan 2009 23:35:42 -0000 Received: (qmail 6694 invoked by uid 500); 1 Jan 2009 23:35:29 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 6663 invoked by uid 500); 1 Jan 2009 23:35:29 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 6652 invoked by uid 99); 1 Jan 2009 23:35:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jan 2009 15:35:29 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of knst.kolinko@gmail.com designates 209.85.218.13 as permitted sender) Received: from [209.85.218.13] (HELO mail-bw0-f13.google.com) (209.85.218.13) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jan 2009 23:35:21 +0000 Received: by bwz6 with SMTP id 6so16214409bwz.0 for ; Thu, 01 Jan 2009 15:35:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=hoIIOojXoCSrlWdxmxi9giOEL3e9GRl6k3BqcaxHNSY=; b=ja1TfeHtPtZbtkjaDYtFWakelkvqcC8UjnV2Ge0FyhxvjyHv2QkPT02l9sENyR1lvQ 8V+4WucsiXxe0U/q+WDnso2TwdUJauoTxY4LGz/mvIkib5M6FpmPAaZfDBBQ5lqg7mG3 g6yWyYqB9lEwIC1xDUp9BMH+Ronr/tvCup0mA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=Y+3JsC2eBM6p3e1+TNzRC/NQ9DCS9XHrX0tISyhPWweqv8pvX41FZ8jh5FBZC19E5o QgP+osDey5YlZx5XuAoSl+JGyVNtmIJF1iqlWCYeniL+/BllDhH++j30YrdA8uuGssqD FS0HPOJ5j1PJmnZs9eapLJjCSTwDIW28kx7As= Received: by 10.102.234.18 with SMTP id g18mr3831617muh.102.1230852900465; Thu, 01 Jan 2009 15:35:00 -0800 (PST) Received: by 10.103.199.7 with HTTP; Thu, 1 Jan 2009 15:35:00 -0800 (PST) Message-ID: <427155180901011535j774f114h37cd76a75a892a04@mail.gmail.com> Date: Fri, 2 Jan 2009 02:35:00 +0300 From: "Konstantin Kolinko" To: "Tomcat Users List" Subject: Re: [OT] Basic int/char conversion question In-Reply-To: <495CEBBF.7060107@ice-sa.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <495CEBBF.7060107@ice-sa.com> X-Virus-Checked: Checked by ClamAV on apache.org 2009/1/1 Andr=E9 Warnier : > Hi. > > This has nothing specific to Tomcat, it's just a problem I'm having as a > non-java expert in modifying an exiting webapp. > I hope someone on this list can answer quickly, or send me to the > appropriate place to find out. I have tried to find, but get somewhat lo= st > in the Java docs. > > Problem : > an existing webapp reads from a socket connected to an external program. > The input stream is created as follows : > fromApp =3D socket.getInputStream(); > The read is as follows : > StringBuffer buf =3D new StringBuffer(2000); > int ic; > while((ic =3D fromApp.read()) !=3D 26 && ic !=3D -1) // hex 1A (SUB) > buf.append((char)ic); > > This is wrong, because it assumes that the input stream is always in an > 8-bit default platform encoding, which it isn't. > > How do I do this correctly, assuming that I do know that the incoming str= eam > is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding = is > being used (such as iso-8859-1 or iso-8859-2) ? > I cannot change the InputStream into something else, because there are a > zillion other places where this webapp tests on the read byte's value, > numerically. > > I mean, to append correctly to "buf" what was read in the "int", knowing > that the proper encoding (charset) of "fromApp" is "X", how do I write th= is > ? > 1. Using iso-8859-1 does not loose any information. That is, you can later print this out to iso-8859-1 stream, you will get exactly those 8-bit bytes of iso-8859-2 as were in input. If you need correctly Unicode, though, you can convert them by calling String.getBytes(encoding) and new String(bytes, encoding). new String(str.getBytes("ISO-8859-1"), "ISO-8859-2") 2. Well, the above, and all the others' tips I have read in this thread so = far are the right ones. Those are what you should do when you are engineering and writing a well-made application. That is, you have to go with InputStreamReader, String, CharsetDecoder APIs and that will take care of various encodings, including multi-byte ones. In you case, when you are tailoring some oddly (bad) written specific application to your specific environment, and do not expect much, there is a simple approach: implement this conversion by using a lookup table. You will just need some static table of 256 chars and you are done. For example, package mypackage; import java.io.UnsupportedEncodingException; public class TranslationTable { private static char[] table; static { // "static initialization" block byte[] bytes =3D new byte[256]; for (int i=3D0; i buf.append((char)ic); with buf.append(TranslationTable.lookup(ic)); Also, I would replace StringBuffer with StringBuilder, if you are running in Java 5 or later, but that is another story. Best regards, Konstantin Kolinko --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org