hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Jiang" <feng.a.ji...@gmail.com>
Subject Re: how does httpclient detect element-charset?
Date Mon, 11 Jun 2007 10:38:27 GMT
I agree with you. But I do find a lot of servers act in that way. Some
Location headers are in GBK, and some are in UTF8. The only thing I can do
is to hack in the code.

I think httpclient should provide a mechanism to handle it. Httpclient has a
dummy detectror to detect the charset of url, which always return
"US-ASCII". But it allows user to override it.

Feng

On 6/11/07, Oleg Kalnichevski <olegk@apache.org> wrote:
>
> On Mon, 2007-06-11 at 17:27 +0800, Feng Jiang wrote:
> > Hi all,
> >
> > I think the implementation of HttpMethodParams#getHttpElementCharset()
> has a
> > problem. In default, httpclient will choose US-ASCII as the charset to
> > decode the http element, such as some headers.
> >
> > But I do meet some servers from which the LOCATION header is in some
> other
> > charset, such UTF8, so that the httpclient  cannot handles the
> > redirection(in my application, i handle it by myself) correctly. For
> > example, one server response such  a header:
> >
> > Location: http://www.abc.com/****(some chinese character)/hello/world
> >
> > The above url contains some Chinese characters in some other charset,
> such
> > as GBK. The right way of httpclient should be: 1. detect the charset of
> the
> > url. 2. decode the url in that correct charset to a java.lang.String. 3.
> > construct correct header instance.
> >
> > Am I right?
> >
>
> Not really. The use of non-ASCII characters in HTTP head elements (such
> as headers or a request line) is a violation of the HTTP specification.
> You can explicitly override the standard charset with a non-standard one
> such as UTF-8 or GBK by setting the 'http.protocol.element-charset'
> parameter, but I do not think HttpClient should attempt to 'guess' the
> charset being used.
>
> For details see:
>
> http://jakarta.apache.org/commons/httpclient/charencodings.html
>
> Oleg
>
> > Feng
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message