hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: URLEncodeUtils - change in format behaviour since 4.2
Date Tue, 26 Jun 2012 12:33:23 GMT
On Tue, 2012-06-26 at 11:41 +0100, sebb wrote:
> On 26 June 2012 08:46, Oleg Kalnichevski <olegk@apache.org> wrote:
> > On Tue, 2012-06-26 at 02:00 +0100, sebb wrote:
> >> The escaping of non-alphabetic characters by the format methods is no
> >> longer quite the same as that done by java.net.URLEncoder.encode.
> >>
> >> The former allows the chars in ".-*_!'()" to pass through without
> >> conversion, whereas the latter only allows ".-*_" unchanged.
> >> The latter is also how browsers behave when escaping form fields.
> >>
> >> I think the behaviour should be consistent with URLEncoder and browsers.
> >> That was in fact the behaviour with 4.2, which delegated the escaping
> >> to URLEncoder.
> >> I think the code should revert to using URLEncoder/URLDecoder.
> >>
> >> There is still a need for the extended path, query and fragment
> >> escape/unescape methods, but perhaps these belong in URIBuilder?
> >> If not, maybe they should be in a separate class anyway?
> >>
> >
> > Would not that lead to inconsistent behavior when the same query form
> > gets encoded differently depending on whether it is enclosed in the
> > request URI or in the request body?
> 
> I don't think so, I think encodeFormFields could use a different safe
> character set without problems, so long as the safe set is a subset of
> all possible safe query characters. In fact the UNRESERVED BitSet is
> only currently used in URLEncodedUtils#encodeFormFields(), so I don't
> see how changing encodeFormFields to use a different safe set can
> affect anything.
> 
> Besides, AFAIK 4.2 did not have a problem with using a more limited safe set.
> 
> > Browsers do a lot of silly stuff to maximize compatibility with all
> > sorts of broken software out there. I am not sure we need to do
> > likewise.
> 
> Well-written software will be able to deal with form data that has
> some additional safe characters encoded, so I don't think there is any
> problem in playing safe here.
> 
> [But if we do decide to change the safe list from the one previously
> used, it needs to be flagged up in the release notes.]
> 

Likewise well-written software should be able to deal with the form data
containing valid URL encoded content. To me this is more about doing the
right thing rather than making sure some broken code is unaffected.
Having said all that I see no problem reducing the set of safe
characters in URI query to the bare minimum. 

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message