hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: double-slash in url causes circular redirect
Date Wed, 29 Jun 2011 19:10:28 GMT
On Wed, 2011-06-29 at 12:21 +0200, khiem nguyen wrote:
> Hi, i tried to retrieve the content of this link:
> 
> http://de.tommy.com//Sale/600000,de_DE,sc.html
> 
> 
> & got circular redirect, logging tells me that httpclient fires : GET
> /Sale/600000,de_DE,sc.html
> server response with redirect back to
> http://de.tommy.com//Sale/600000,de_DE,sc.html
> 
> wget behaves like browser & gives back the content.
> 
> 
> with telnet:
> 
> 
> telnet de.tommy.com 80
> Trying 89.202.105.72...
> Connected to de.tommy.com.
> Escape character is '^]'.
> GET /Sale/600000,de_DE,sc.html HTTP/1.1
> Host:de.tommy.com
> 
> HTTP/1.1 301 Moved Permanently
> Date: Wed, 29 Jun 2011 10:11:15 GMT
> Server: Apache
> Content-Length: 0
> Set-Cookie: dwsid=
> CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==;
> path=/; HttpOnly
> Cache-Control: no-cache,no-store,must-revalidate
> Pragma: no-cache
> Expires: Thu, 01 Dec 1994 16:00:00 GMT
> Location: http://de.tommy.com//Sale/600000,de_DE,sc.html
> Vary: Accept-Encoding
> Accept-Ranges: bytes
> Content-Type: text/plain
> 
> Connection closed by foreign host.
> -----
> 
> 
> de.tommy.com 80
> Trying 89.202.105.72...
> Connected to de.tommy.com.
> Escape character is '^]'.
> GET //Sale/600000,de_DE,sc.html HTTP/1.1
> Host: de.tommy.com
> 
> HTTP/1.1 200 OK
> Date: Wed, 29 Jun 2011 10:07:11 GMT
> Server: Apache
> Set-Cookie: ....
> ....content
> 
> 
> ...
> 
> seems like httpclient strip out one of the 2 slashes.
> is it a bug or the server is misconfigured ( i guess they use rewrite or
> something but its not rare)
> 
> how can i fix this ?
> thanx

The redirect returned by the server is malformed

http://www.ietf.org/rfc/rfc2396.txt

---
3.3. Path Component

   The path component contains data, specific to the authority (or the
   scheme if there is no authority component), identifying the resource
   within the scope of that scheme and authority.

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters
   "/", ";", "=", and "?" are reserved.  Each path segment may include a
   sequence of parameters, indicated by the semicolon ";" character.
   The parameters are not significant to the parsing of relative
   references.

---
The path element of the URI is not supposed to have multiple consecutive
slashes. Such URIs are ambiguous and whichever way HttpClient tries to
normalize them it cannot get it right all the time. You have two options
here: turning off automatic redirect and handling redirects manually or
building a custom RedirectStrategy.

Hope this helps

Oleg   




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message