Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 73795 invoked from network); 13 Jan 2009 12:14:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Jan 2009 12:14:19 -0000 Received: (qmail 98212 invoked by uid 500); 13 Jan 2009 12:14:08 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 98185 invoked by uid 500); 13 Jan 2009 12:14:07 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 98174 invoked by uid 99); 13 Jan 2009 12:14:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jan 2009 04:14:07 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aw@ice-sa.com designates 212.85.38.174 as permitted sender) Received: from [212.85.38.174] (HELO popeye.combios.es) (212.85.38.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jan 2009 12:13:58 +0000 Received: from [192.168.250.50] (p549EBB69.dip0.t-ipconnect.de [84.158.187.105]) (authenticated bits=0) by popeye.combios.es (8.13.8/8.13.8/Debian-3) with ESMTP id n0DCDa4l025962 for ; Tue, 13 Jan 2009 13:13:37 +0100 Message-ID: <496C8484.4030407@ice-sa.com> Date: Tue, 13 Jan 2009 13:09:40 +0100 From: =?UTF-8?B?QW5kcsOpIFdhcm5pZXI=?= User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Tomcat Users List Subject: Re: [OT] Basic int/char conversion question References: <495CEBBF.7060107@ice-sa.com> <496BE901.90607@christopherschultz.net> In-Reply-To: <496BE901.90607@christopherschultz.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.92.1/8856/Mon Jan 12 17:36:19 2009 on popeye.combios.es X-Virus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org Hi. Christopher Schultz wrote: > > André, > > André Warnier wrote: >> an existing webapp reads from a socket connected to an external program. >> The input stream is created as follows : >> fromApp = socket.getInputStream(); >> The read is as follows : >> StringBuffer buf = new StringBuffer(2000); >> int ic; >> while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB) >> buf.append((char)ic); >> >> This is wrong, because it assumes that the input stream is always in an >> 8-bit default platform encoding, which it isn't. > > Does it? > > The only assumption I see here is that the byte code 0x1a has a special > meaning. Since ASCII is usually the lowest common denominator for > character encodings, is this a bad assumption? Considering the often devious ways in which character encoding questions can come back to bite one, I am not so sure. By doing a read(), the app currently "consumes" one byte, whether it matches 0x1A or not. If the input stream was UTF-8 for instance, that byte might be the 2d, or 3rd byte of a multi-byte "UTF-8 character" sequence, which might happen to have the integer value 0x1A, although it's meaning would be totally different. (I have not re-checked the UTF-8 encoding to verify if that is a possible value for a 2d or 3rd byte, but I think it is). > >> How do I do this correctly, assuming that I do know that the incoming >> stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit >> encoding is being used (such as iso-8859-1 or iso-8859-2) ? >> I cannot change the InputStream into something else, because there are a >> zillion other places where this webapp tests on the read byte's value, >> numerically. and there are other places where the "byte" is being tested against other values than 0x1A. > > I like Chuck's suggestion to use an InputStreamReader because the > interfaces are (at least accidentally) the same, at least for the method > in question. Me too. It is the most logical, and the one which I would apply if I were to rewrite this app from scratch. I would also have the other app (the one which sends this stream to the webapp) send some kind of prefix to the stream, indicating the encoding used. (Or at least have both that app and the webapp have some external parameter telling them respectively what to send and what to expect). I'm not sure how you would modify an entire application to > "fix" this code everywhere, though. Right. I was trying to find a magic shortcut. At first I was hoping that I could just do some kind of "string replace patch" with Notepad, directly on the compiled classes. Unfortunately, considering these byte tests in several places, I can't. Thanks again for all the suggestions though. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org