tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Caldarale, Charles R" <Chuck.Caldar...@unisys.com>
Subject RE: [OT] Basic int/char conversion question
Date Fri, 02 Jan 2009 01:13:49 GMT
> From: André Warnier [mailto:aw@ice-sa.com]
> Subject: Re: [OT] Basic int/char conversion question
>
> I must say that I find it a bit curious that Java does not
> have an easy out-of-the-box method to convert a byte to a
> char, with a character filter specifier.

This would be possible only for 8-bit character sets.  Since Java tries to be general, you
must feed the converter a stream of bytes, rather than one at a time.  If you already have
an array of bytes, that can be wrapped in a ByteArrayInputStream and then further wrapped
in an InputStreamReader, resulting in proper translation of the bytes to Unicode characters.

> I know that I don't lose information by converting
> iso-8859-2 (thinking it is iso-8859-1) to Unicode
> one way, then re-converting this Unicode to iso-8859-2
> (re-using the iso-8859-1 filter).  I will get the
> same bytes in the end.

That may be true for 8859-1 and 8859-2, but I suspect it's not true in general.  The preferred
mappings for a Unicode character in a given encoding may not necessarily be the exact bytes
given on input, especially if they've been sent through the wrong converter to begin with.

> Even a servlet filter won't do it, because by that time the
> response is committed.

It will if you wrapper the response object and not commit the real one until you've set the
desired header in the filter.

> For the same reason, I cannot just replace the InputStream
> by something that would translate these bytes on-the-fly to
> Unicode chars, because for high iso-8859-2 bytes, it would
> generate internal codes that do no longer fall into values
> 0-255, and that may create a problem somewhere deep in code
> I haven't yet looked at.

I suspect that won't be a problem, unless the code is looking for something in the upper ranges.
 The example you posted showed it looking at control codes, which are the same in Unicode
and any ISO-8859 variant.  If the code is looking at high-order bytes, it's seriously flawed
already.

I still think the easiest thing for you to do is put in the InputStreamReader wrapper, and
run your test cases.  You should certainly examine the code for any erroneous tests, but those
should be corrected rather than extending the existing kludge.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus
for use only by the intended recipient. If you received this in error, please contact the
sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message