hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ortwin Gl├╝ck (JIRA) <j...@apache.org>
Subject [jira] Issue Comment Edited: (HTTPCLIENT-787) Redirects with spaces in them are not handled correctly
Date Wed, 16 Jul 2008 15:56:31 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613996#action_12613996
] 

oglueck edited comment on HTTPCLIENT-787 at 7/16/08 8:54 AM:
------------------------------------------------------------------

This is a server bug, not a client issue. And you have heard it before "HttpClient is not
a browser". Also as mentioned dozends of times before, URLs can not be escaped once they are
represented as a series of bytes. Yes, in this particular case it is sort of possible. But
please consider:

a) we can't know the right encoding for the URL. UTF-8 is only a recommendation. So assuming
an ASCII-compatible encoding is arbitrary.
b) This is correct for spaces in URI components (like path), but it is wrong for spaces in
application/x-www-form-urlencoded values of HTML forms (query string): they use a plus sign
+ to escape spaces. 
     And we have no reason to assume that the query string uses x-www-form-urlencoded. It
could just use anything.
    As specified here: http://www.w3.org/TR/html401/interact/forms.html#form-content-type
    see also: http://marc.info/?l=httpclient-commons-dev&m=116859139319469&w=2
  
   Sample: http://people.apache.org/~oglueck/composite path/servlet?composite name=composite
value
   is correctly escaped like so: http://people.apache.org/~oglueck/composite%20path/servlet?composite+name=composite+value
c) you can implement your own redirect handler that can handle all sort of malformed responses

To be fair, for *most* servers out there this shouldn't be a problem, because:
a) they expect URI encodings to be UTF-8
b) they all have "compatible" (broken) parsers that allow + and %20 to be used interchangibly

However, the really relevant point is: if the server does not even care to escape the space
character, it will most like not escape any other non-URI characters. Most likely because
of some careless programming. Such a server or application grossly violates the HTTP protocol
and should be considered broken.

I would like to mark this issue as invalid.

Maybe a good thing to have would be a "CompatibilityRedirectHandler" that immitates the convenient
behaviour of popular browsers. Consider contributing one.

      was (Author: oglueck):
    This is a server bug, not a client issue. And you have heard it before "HttpClient is
not a browser". Also as mentioned dozends of times before, URLs can not be escaped once they
are represented as a series of bytes. Yes, in this particular case it is sort of possible.
But please consider:

a) we can't know the right encoding for the URL. UTF-8 is only a recommendation. So assuming
an ASCII-compatible encoding is arbitrary.
b) This is correct for spaces in URI components (like path), but it is wrong for spaces in
application/x-www-form-urlencoded values of HTML forms (query string): they use a plus sign
+ to escape spaces. 
     And we have no reason to assume that the query string uses x-www-form-urlencoded. It
could just use anything.
    As specified here: http://www.w3.org/TR/html401/interact/forms.html#form-content-type
    see also: http://marc.info/?l=httpclient-commons-dev&m=116859139319469&w=2
  
   Sample: http://people.apache.org/~oglueck/composite path/servlet?composite name=composite
value
   is correctly escaped like so: http://people.apache.org/~oglueck/composite%20path/servlet?composite+name=composite+value

To be fair, for *most* servers out there this shouldn't be a problem, because:
a) they expect URI encodings to be UTF-8
b) they all have "compatible" (broken) parsers that allow + and %20 to be used interchangibly
c) you can implement your own redirect handler that can handle all sort of malformed responses

However, the really relevant point is: if the server does not even care to escape the space
character, it will most like not escape any other non-URI characters. Most likely because
of some careless programming. Such a server or application grossly violates the HTTP protocol
and should be considered broken.

I would like to mark this issue as invalid.

Maybe a good thing to have would be a "CompatibilityRedirectHandler" that immitates the convenient
behaviour of popular browsers. Consider contributing one.
  
> Redirects with spaces in them are not handled correctly
> -------------------------------------------------------
>
>                 Key: HTTPCLIENT-787
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-787
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Dave Clemmer
>            Priority: Minor
>
> If a redirect address has spaces in it (yes, I know, the person creating that situation
should be beaten, but, alas, that is not an option), they are not converted to %20 before
opening, and, hence, fail to open.
> changing line 107 of DefaultRedirectHandler to
> String location = locationHeader.getValue().replaceAll (" ", "%20");
> seems to fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message