hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: HTTPClient - HTTP Gets broken with there is a #anchor in the Redirect (301) URL
Date Mon, 24 Oct 2011 13:31:36 GMT
On Mon, 2011-10-24 at 14:15 +1100, Jack Hatch wrote:
> Hey all,
> 
> Bit of a weird one. I'm using HTTPClient 4.1.2, and it seems that whenever
> it finds are URL with something like a '#' in it, it does a full get with
> the # in the URL.
> 
> For example, trying to get the URL http://stks.co/eWt will redirect to the
> URL
> http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter.
> Now this URL is live, but the problem is the HTTPClient sends a get request
> with the URI set to URI:
> /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitterwhich
> causes the server to send back a 404 page not found.
> 
> Looking at the GET sent by IE, Firefox and cURL, they all strip out the #...
> from the end of the URI, so for example the cURL GET request URI is set as
> URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/ -
> all the #... have been removed. This is for the exact same entry URL of
> http://stks.co/eWt.
> 
> As a test, sending this raw URL into HTTPClient (i.e. HttpGet httpget = new
> HttpGet("
> http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter
> ");) gives the same 404 not found result.
> The issue is I dont know if the url has an #anchor in it, as it from a short
> URL service...
> 
> So the question is are there any settings in HTTPClient that can be set so
> that things like the trailing #... can be auto removed from URLs. Or how
> would I go about manually removing this from URLs (remember that I would
> need to capture all redirect URLs as well)?
> 
> Cheers!

You can use a custom RedirectStrategy and reformat / modify redirect
locations as you see fit. Most likely all you need is to subclass the
DefaultRedirectStrategy and override its #createLocationURI method.

Oleg




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message