From Christopher Schultz <>
Subject Re: Char Encoding text streams on Tomcat 5.5 and Linux
Date Wed, 02 Dec 2009 16:17:05 GMT
On 12/2/2009 2:40 AM, Elli Albek wrote:
> On your Linux box type “locale” + enter. The results should be UTF 8. If not
> change it.

I can have my locale set to whatever I'd like, thank you very much.

> You can also set it in the file encoding java environment
> variable as suggested above as extra safety measure.

Well, you can check it's value. By default, it's UTF-8 on my system (as
mentioned in my post if you read the whole thing).

> Tomcat’s logic of determining the encoding from the request only applies
> when Tomcat is parsing text in the request.

Tomcat's logic of determining the encoding for the request is mandated
by the servlet spec and the HTTP spec. In this case, Tomcat /was/
parsing text in the request: we are talking about POST data, here. Elli,
please read the posts before replying.

> However if you read from the stream directly, using request.getInputStream()
> you are getting binary data. When you create from that input
> stream you need to specify the encoding, or it will default to the file
> system encoding.

Yes. If you look at the source code to Tomcat, you'll see that the
encoding used comes from the request headers, or defaults to ISO-8859-1.
I'm not going to repeat this again.

If you create your own Reader (which you shouldn't be doing), you're on
your own. In this case, nobody was creating their own Reader.

> The fact that tomcat is using ISO-8859-1 to read characters is not relevant
> if you are reading from the stream directly and use your own Reader to
> convert to characters.

..which I wasn't doing. Rather than repeatedly complaining about how you
haven't read this thread properly, I'm simply going to stop.

> I am assuming this is a likely cause, since the XML
> parsing succeeds

Okay, I can't help myself: when did we start talking about XML?

- -chris
