hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Schnyder" <martin.schny...@insonic.com>
Subject RE: Encoding of special characters in request URI
Date Fri, 11 Jul 2003 07:53:10 GMT
Thanks for all the replies.
Unfortunately, the method URI.setDefaultProtocolCharset(CHARSET) that Manuel
proposed doesn't help in my case. It may work for PostMethod but it doesn't
for GetMethod when HttpMethodBase.setQueryString(NameValuePair[]) is used.
Method setQueryString(NameValuePair[]) already selects the charset for the
encoding (US-ASCII) and this is passed through to the method URI.encode()
that finally encodes the strings. There, the method
String.getBytes(charset), when called with US-ASCII, converts all special
characters like the German Umlaute to ASCII-Code 63 (question mark).

There is currently no way to define a different charset than US-ASCII for
the encoding with HttpMethodBase.setQueryString(NameValuePair[]). I think it
would be good if for the charset instead of the constant  US-ASCII the
method URI.getDefaultProtocolCharset() was used (then UTF-8 would be use as
default) or if there would be an other way to specify a different charset.

Martin


> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org]
> Sent: Donnerstag, 10. Juli 2003 19:12
> To: Commons HttpClient Project
> Subject: Re: Encoding of special characters in request URI
>
>
> This is one of many 'shady' areas of the HTTP spec. Basically there is
> no standard way for the client to communicate to the server what coding
> has been used to decode query parameters. I believe some browsers use
> 'Accept-charset" or 'Accept-Language' headers to negotiate the locale
> settings to be used by the server. But I am not sure it these headers
> can be used to determine what character coding can be used to decode
> URL-encoded data.
>
> I think we definitely should not be using US-ASCII per default. The
> whole point of URL encoding is to escape non-ASCII characters. I suggest
> UTF-8 be used per default.
>
> Oleg
>
>
>
> On Thu, 2003-07-10 at 17:48, Michael Becke wrote:
> > Hello Martin,
> >
> > This is a good question, one that I am not positive I know the answer
> > to.  The HTTP request line (containing the query params) must be
> > US-ASCII.  That I am sure of.  The catch is that form urlencoding
> > strings makes them ASCII, regardless of the original charset.  So
> > HttpMethod.setQueryString(NameValuePair[]) is assuming that the
> > inputs(query params) are ASCII when really only the output(encoded
> > params) should be ASCII.
> >
> > The question is how does one determine, on the client and the server,
> > what the charset of the query params is?  The request charset can be
> > specified with the Content-Type header, but this is meant to apply to
> > the request entity, not the headers.  I have a feeling that we should
> > probably be using the content charset anyway.  My reasoning
> here is that
> > an HTML form can be sent via a GET(query params) or POST(post content).
> >   In both cases the content must be form urlencoded and my feeling is
> > that it should be done the same for both.
> >
> > What does everyone else think?
> >
> > Mike
> >
> > Martin Schnyder wrote:
> > > When I use the GetMethod class to send text with special
> characters (German
> > > Umlaute "äöü") in the request parameters, the special
> characters are not
> > > encoded correctly. This happens when I use method
> > > HttpMethodBase.setQueryString(NameValuePair[] params)
> > > to set the query parameters.
> > >
> > > I saw that Release 2.0 Beta 2 fixed that with bug fix 20481. Special
> > > characters are now encoded differently but still wrong, as
> far as I can see.
> > >
> > > Method HttpMethodBase.setQueryString(NameValuePair[]) calls
> > > formUrlEncode(params, HttpConstants.HTTP_ELEMENT_CHARSET) to
> encode the
> > > parameters. The value of HTTP_ELEMENT_CHARSET is US-ASCII.
> When I change the
> > > charset to HttpConstants.DEFAULT_CONTENT_CHARSET (which is
> ISO-8859-1), the
> > > German "Umlaute" are encoded correctly. I checked that with
> the code in CVS
> > > HEAD. Is this a bug or should really only the US-ASCII characters be
> > > supported in a request URI?
> > >
> > > Regards,
> > > Martin Schnyder
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
>


Mime
View raw message