hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sung-Gu" <jeri...@apache.org>
Subject Re: Encoding of special characters in request URI
Date Fri, 11 Jul 2003 10:01:10 GMT

Please see the constructors with a charset argument.
It might help you have your own code for it.

Sung-Gu

----- Original Message ----- 
From: "Martin Schnyder" <martin.schnyder@insonic.com>
To: "Commons HttpClient Project" <commons-httpclient-dev@jakarta.apache.org>
Sent: Friday, July 11, 2003 4:53 PM
Subject: RE: Encoding of special characters in request URI


> Thanks for all the replies.
> Unfortunately, the method URI.setDefaultProtocolCharset(CHARSET) that
Manuel
> proposed doesn't help in my case. It may work for PostMethod but it
doesn't
> for GetMethod when HttpMethodBase.setQueryString(NameValuePair[]) is used.
> Method setQueryString(NameValuePair[]) already selects the charset for the
> encoding (US-ASCII) and this is passed through to the method URI.encode()
> that finally encodes the strings. There, the method
> String.getBytes(charset), when called with US-ASCII, converts all special
> characters like the German Umlaute to ASCII-Code 63 (question mark).
>
> There is currently no way to define a different charset than US-ASCII for
> the encoding with HttpMethodBase.setQueryString(NameValuePair[]). I think
it
> would be good if for the charset instead of the constant  US-ASCII the
> method URI.getDefaultProtocolCharset() was used (then UTF-8 would be use
as
> default) or if there would be an other way to specify a different charset.
>
> Martin
>
>
> > -----Original Message-----
> > From: Oleg Kalnichevski [mailto:olegk@apache.org]
> > Sent: Donnerstag, 10. Juli 2003 19:12
> > To: Commons HttpClient Project
> > Subject: Re: Encoding of special characters in request URI
> >
> >
> > This is one of many 'shady' areas of the HTTP spec. Basically there is
> > no standard way for the client to communicate to the server what coding
> > has been used to decode query parameters. I believe some browsers use
> > 'Accept-charset" or 'Accept-Language' headers to negotiate the locale
> > settings to be used by the server. But I am not sure it these headers
> > can be used to determine what character coding can be used to decode
> > URL-encoded data.
> >
> > I think we definitely should not be using US-ASCII per default. The
> > whole point of URL encoding is to escape non-ASCII characters. I suggest
> > UTF-8 be used per default.
> >
> > Oleg
> >
> >
> >
> > On Thu, 2003-07-10 at 17:48, Michael Becke wrote:
> > > Hello Martin,
> > >
> > > This is a good question, one that I am not positive I know the answer
> > > to.  The HTTP request line (containing the query params) must be
> > > US-ASCII.  That I am sure of.  The catch is that form urlencoding
> > > strings makes them ASCII, regardless of the original charset.  So
> > > HttpMethod.setQueryString(NameValuePair[]) is assuming that the
> > > inputs(query params) are ASCII when really only the output(encoded
> > > params) should be ASCII.
> > >
> > > The question is how does one determine, on the client and the server,
> > > what the charset of the query params is?  The request charset can be
> > > specified with the Content-Type header, but this is meant to apply to
> > > the request entity, not the headers.  I have a feeling that we should
> > > probably be using the content charset anyway.  My reasoning
> > here is that
> > > an HTML form can be sent via a GET(query params) or POST(post
content).
> > >   In both cases the content must be form urlencoded and my feeling is
> > > that it should be done the same for both.
> > >
> > > What does everyone else think?
> > >
> > > Mike
> > >
> > > Martin Schnyder wrote:
> > > > When I use the GetMethod class to send text with special
> > characters (German
> > > > Umlaute "äöü") in the request parameters, the special
> > characters are not
> > > > encoded correctly. This happens when I use method
> > > > HttpMethodBase.setQueryString(NameValuePair[] params)
> > > > to set the query parameters.
> > > >
> > > > I saw that Release 2.0 Beta 2 fixed that with bug fix 20481. Special
> > > > characters are now encoded differently but still wrong, as
> > far as I can see.
> > > >
> > > > Method HttpMethodBase.setQueryString(NameValuePair[]) calls
> > > > formUrlEncode(params, HttpConstants.HTTP_ELEMENT_CHARSET) to
> > encode the
> > > > parameters. The value of HTTP_ELEMENT_CHARSET is US-ASCII.
> > When I change the
> > > > charset to HttpConstants.DEFAULT_CONTENT_CHARSET (which is
> > ISO-8859-1), the
> > > > German "Umlaute" are encoded correctly. I checked that with
> > the code in CVS
> > > > HEAD. Is this a bug or should really only the US-ASCII characters be
> > > > supported in a request URI?
> > > >
> > > > Regards,
> > > > Martin Schnyder
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail:
> > commons-httpclient-dev-help@jakarta.apache.org
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > commons-httpclient-dev-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> > commons-httpclient-dev-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
commons-httpclient-dev-help@jakarta.apache.org
>
>

Mime
View raw message