hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Kyle <m_t_k_nos...@yahoo.co.uk>
Subject Re: UTF8 problem?
Date Wed, 29 Oct 2008 06:38:34 GMT

Thanks for the info. I ended up suspecting as much. I confirmed it by locally changing java.net.SocketOutputStream
to confirm the bytes written to the socket were exactly as reported by your logging (which
they were).

It seems that I was being misled by the output from my HTTP sniffer (org.apache.axis.utils.tcpmon)
and a failure on the part of the receiving program.

Cheers, Mike

From: Oleg Kalnichevski <olegk@apache.org>
To: HttpClient User Discussion <httpclient-users@hc.apache.org>
Sent: Saturday, 25 October, 2008 13:23:16
Subject: Re: UTF8 problem?

On Thu, 2008-10-23 at 09:36 +0000, Mike Kyle wrote:
> I set an EntityEnclosingMethod request entity to be a ByteArrayRequestEntity. This entity
has the Java characters "\u4E2D\u6587" as the corresponding UTF8 bytes (UTF8 = 0xE4,0xB8,0xAD,0xE6,0x96,0x87).
This is confirmed by logging httpclient.wire.content. There I see the UTF8 values.


What is logged in the wire log is exactly what gets written to the
underlying socket. I do not think HttpClient is culprit.


> However what appears to really get transmitted is the corresponding Java characters rather
than the UTF8 values! As this is supposedly a UTF8 encoded XML document the receiver is not
best pleased. This is confirmed by performing HTTP sniffing using org.apache.axis.utils.tcpmon.My
suspicion is that somehow a character handler is intervening? 
> Debugging HttpConnection implied that the output stream is a BufferedOutputStream wrapping
a java.net.SocketOutputStream. I had assumed that the socket streams would be byte oriented.
The content type is set to 'text/xml; chartset="utf-8"'.
> I am normally using HttpClient 3.0 + but the latest 3.1 appeared to react exactly the

To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message