hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Chain <vc_0...@yahoo.com>
Subject Cannot use HttpClient to search google
Date Sun, 11 Jul 2004 07:15:08 GMT
I am encountering an interesting issue, and I guess the issue is probably not within the HttpClient
itself, but I haven't figured out how to make it work yet. I am using the 2.0 version.
 
What I try to do is simple enough: I want to use HttpClient to simulate a typical browser
request to search in google. For example a query like
 
http://www.google.com/search?hl=en&ie=UTF-8&q=sql+server+trace
 
And I used some code like below
 
client = new HttpClient();
m = new GetMethod("http://www.google.com/search?hl=en&ie=UTF-8&q=sql+server+trace");
s = client.executeMethod(m);
 
Now s is always 403 for me (and this 403 should have nothing to do with Proxy), and the content
of the response is basically google saying that the request is forbidden (because it reaches
a host that the client is not supposed to)... the response is too large for this email, but
it looks like this
 
<html><head><title>403 Forbidden</title>....
<blockquote><H1>Forbidden</H1>Your client does not have permission to get
URL <code>/search?hl=en&amp;ie=UTF-8&amp;q=sql+server+trace</code> from
this server.  (Client IP address: xx.xx.xx.xx)<br><br>Please see Google's Terms
of Service posted at http://www.google.com/terms_of_service.html
....
 
I guess the main reason is google uses akamai's network to distribute loads. On my server
when I do an nslookup of google, I can see that the DNS records returned have very short valid
duration: from several seconds to a couple of minutes. I guess this way the browser will be
forced to issue another DNS query the next time I do a search. The issue when I use HttpClient
however is it always uses a certain IP for www.google.com and seems to ignore the short life
of the DNS entry. I think because HttpClient opened a socket to this 'old' IP address google
somehow figured it's not a valid request and rejected it.
 
I did a quick check on the HttpClient code and it seems to me the Socket it uses to open the
connection is implemented from java.net.Socket (DefaultProtocolSocketFactory::createSocket),
so I guess HttpClient is not directly responsible for the problem here... Nevertheless, I
wonder if any one having similar issue as I do? Especially considering some of the HttpClient
sample codes uses http://www.google.com then it should have similar problems?
 
Thanks a lot for any tips.
 
 
 
 

		
---------------------------------
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message