tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johnny Kewl" <j...@kewlstuff.co.za>
Subject Re: Migrating to tomcat 6 gives formatted currency amounts problem
Date Fri, 12 Sep 2008 21:44:19 GMT

----- Original Message ----- 
From: "André Warnier" <aw@ice-sa.com>
To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Friday, September 12, 2008 10:56 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem


> Just for the sake of completeness :
>
> Christopher Schultz wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> André,
>>
>> André Warnier wrote:
>>> It is on the way through that servlet that they get "corrupted", unless
>>> I start Tomcat with LC_CTYPE="iso-8859-1".
>>
>> What do the HTTP headers say when the file is served correctly versus
>> when it is not? I suspect that the encoding is either set incorrectly or
>> not set at all unless you specify LC_CTYPE.
>>
>
>>> So my question remains, I think : what could be going on in that servlet
>>> so that :
>>> - if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
>>> the upper iso-8859-1 characters in the pages are replaced by "?"
>>> - if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it
>>> starts, then the pages delivered by the servlet are correct
>>> ?
>>
>> My guess is that the magic servlet here is using the platform's default
>> encoding in the HTTP headers, which may be incorrect for the static file
>> in question.
>>
>>> I am not very qualified in Java, but could it be something like :
>>> - the servlet reads those documents with some InputStream, without
>>> specifying a character set or encoding
>>
>> Note that InputStreams are encoding-less. Sounds like semantics, but
>> encodings only come into play with you are dealing with
>> character-oriented streams which, in Java, are called Readers and
>> Writers. Note that neither InputStream nor OutputStream have any methods
>> that deal with the char data type.
>>
>>> and by default that means to use
>>> Tomcat's idea of its default LC_CTYPE for those InputStreams ?
>>> - or the servlet outputs the document via an OutputStream without
>>> specifying an encoding etc..
>>
>> I'll bet a binary stream of data is being sent (that is, with no
>> interpretation or encoding) and that the JVM's default encoding is being
>> advertised by the server in the HTTP headers. That would certainly cause
>> the problem.
>>
> The last tine I looked, the http headers sent along with the documents 
> were the same in both cases.
>
> It is physically (if that's the appropriate expression in this case) the 
> "high" iso-8859-1 characters (bytes) in the htnl document that are being 
> replaced by "?" (single-byte low-ascii question mark), on the way from the 
> disk file to the browser, via the servlet.
> And if the LC_CTYPE of java (and Tomcat) is set to "iso-8859-1" in the 
> Tomcat startup script, it is no longer the case.
>
> So I (now) believe that Chuck's earlier explanation is the correct one : 
> the servlet reads the disk document with a Reader (thanks Chris), without 
> specifying an encoding when it opens this Reader.
> The effect is thus as follows :
> - if the LC_CTYPE environment variable is not set for Java and Tomcat, 
> this Reader is opened using whichever encoding happens to be then the 
> JVM's default.  Obviously, in this case it is not iso-8859-1.
> The servlet thus reads the iso-8859-1 data, but with the wrong decoder.
> I guess then that this decoder replaces anything that does not fit into 
> that default encoding, by a "?". (Would it do that, or would it trigger an 
> exception ?)
> So that is what the servlet reads, and it passes it unchanged to it's 
> Writer and to the browser.
> (Alternatively, it is at the level of the Writer of the servlet that the 
> wrong encoding is used, or both).
> - if the LC_CTYPE variable is set to "iso-8859-1", then these 
> reader_Writer default to that as an encoding, and everything works fine.
>
> Fortunately setting the LC_CTYPE in the Tomcat startup script does not 
> seem to affect other applications on the server; that is probably because 
> this particular servlet is the only sloppy one, which does not explicitly 
> specify an encoding when reading or writing stuff.
> (It's also because in this case, there are not many other servlets apart 
> from the sloppy one).
>
> Now I'm writing the above without a solid knowledge of Java or Tomcat 
> behind, so it's mostly guessing.  If someone has a good reason for 
> shooting this down as an explanation, I'm still open.
>
>
> I'll post another question under another title, I think this thread is 
> long enough by now.
>
> Thanks to all though.

By goerge... I think you have it... the locale encoding is taking preference 
over the header.
In theory... in newer servlets that will no longer happen... the header now 
overrules locale encoding.
If you do decide to look at this link...
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
Whats happening to you is described at the very bottom ;)

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message