tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: [OT] Basic int/char conversion question
Date Fri, 02 Jan 2009 00:40:19 GMT
To Konstantin and all the others who have responded,
many thanks for all the tips, specially since this was quite a bit 
I need some time to digest the tips though, and choose the best way 
according to the code that was dumped in my lap.

I must say that I find it a bit curious that Java does not have an easy 
out-of-the-box method to convert a byte to a char, with a character 
filter specifier. Something like
char mychar = toChar(int,charset) (or int.toChar(charset))
Oh well, maybe Java 7..

To Konstantin in particular :
I know that I don't lose information by converting iso-8859-2 (thinking 
it is iso-8859-1) to Unicode one way, then re-converting this Unicode to 
iso-8859-2 (re-using the iso-8859-1 filter).  I will get the same bytes 
in the end.
The problem is that this is a servlet writing the result to the response 
object.  And if I tell it to use iso-8859-1 for the response, it 
automatically also sets the response Content-Type to iso-8859-1.
Which in this case is wrong, because the browser then gets confused.
And as I have found out, it is quite hard to change this Content-Type 
header after-the-fact.
Even a servlet filter won't do it, because by that time the response is 
Even the front-end Apache can't do it, because it won't let you change 
the Content-Type header..

So my problem is in reverse :
The servlet must set the response output encoding to iso-8859-2, in 
order to produce the correct Content-Type for the browser. To produce 
correct iso-8859-2 from the internal Unicode string, this Unicode string 
must have the proper Unicode chars corresponding to the iso-8859-2 
characters I want to output.
But the servlet reads those bytes as int's, and does a bunch of internal 
tests and manipulations on them, without taking into account that they 
could be anything else than iso-8859-1.

For the same reason, I cannot just replace the InputStream by something 
that would translate these bytes on-the-fly to Unicode chars, because 
for high iso-8859-2 bytes, it would generate internal codes that do no 
longer fall into values 0-255, and that may create a problem somewhere 
deep in code I haven't yet looked at.

I think I have to go back to examine that code, and see how often this 
StringBuffer is being used/manipulated.  If not too often, I might 
replace it by a byte buffer, and do the conversion all at once each time 
it is being written out.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message