commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frank W. Zammetti" <fzli...@omnytex.com>
Subject Re: [HTTPClient] Unabe to get post working
Date Fri, 23 Sep 2005 22:47:36 GMT
Hi Oleg,

Oleg Kalnichevski wrote:
> Frank,
> 
> (1) Google search works with HTTP GET type of requests. It does not
> implement HTTP POST

Yeah, that would certainly explain it :)

> (2) By using HttpClient to execute requests against Google's services
> you violate Google's Terms of Service [1]:
> ========================================================================================
> No Automated Querying 
> 
> You may not send automated queries of any sort to Google's system
> without express permission in advance from Google. Note that "sending
> automated queries" includes, among other things: 
> 
>       * using any software which sends queries to Google to determine
>         how a website or webpage "ranks" on Google for various queries;
>       * "meta-searching" Google; and
>       * performing "offline" searches on Google.
> 
> Please do not write to Google to request permission to "meta-search"
> Google for a research project, as such requests will not be granted.
> ========================================================================================

I was not aware of this, I thank you for pointing it out.  I certainly 
do not want to be violating anything.  On the other hand, what I'm doing 
is arguably *not* an automated query since it requires direct user 
action, but I'm not sure how Google would view it :)

But, in the interest of not running into trouble, I'll go use their Web 
Service interface instead since it's pretty much designed explicitly for 
this.  The 1000 query per day limit that has isn't a problem while I'm 
just doing a POC.

> It is absolutely trivial to fool Google Search into believing that
> requests are being executed using a common browser such as IE but I
> strongly urge you to not use HttpClient for this end

I will take your advice, thank you.

Frank

> Oleg
> 
> [1] http://www.google.com/terms_of_service.html
> 
> On Fri, 2005-09-23 at 13:50 -0400, Frank W. Zammetti wrote:
> 
>>Hi all... I'm trying to use HTTPClient to mimic Google's standard search 
>>form, as if a user clicked the "I'm Feeling Lucky" button, and sending 
>>the redirect back to the client (this is done from a servlet).  However, 
>>I can't seem to get the query parameters to go across.  Here's the code 
>>I've written:
>>
>>   public String doSearch(ArrayList searchTerms) {
>>     System.setProperty("org.apache.commons.logging.Log", 
>>"org.apache.commons.logging.impl.SimpleLog");
>> 
>>System.setProperty("org.apache.commons.logging.simplelog.showdatetime", 
>>"true"); 
>>System.setProperty("org.apache.commons.logging.simplelog.log.httpclient.wire.header",

>>"debug");
>>System.setProperty("org.apache.commons.logging.simplelog.log.org.apache.commons.httpclient",

>>"debug");
>>     Log log = LogFactory.getLog(RandomSiteServlet.class);
>>     String redirectLocation = null;
>>     String GOOGLE_SEARCH_URL = "http://www.google.com/search";
>>     HttpClient client  = new HttpClient();
>>     PostMethod theMethod = new PostMethod(GOOGLE_SEARCH_URL);
>>     theMethod.setFollowRedirects(false);
>>     theMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
>>       new DefaultHttpMethodRetryHandler(3, false));
>>     String searchQuery = null;
>>     for (Iterator it = searchTerms.iterator(); it.hasNext();) {
>>       if (searchQuery != null) {
>>         searchQuery += " ";
>>       }
>>       searchQuery += (String)it.next();
>>     }
>>     theMethod.addParameter("hl", "en");
>>     theMethod.addParameter("q", searchQuery);
>>     theMethod.addParameter("btnI", "I'm Feeling Lucky");
>>     try {
>>       int statusCode = client.executeMethod(theMethod);
>>       byte[] responseBody = theMethod.getResponseBody();
>>       if (statusCode == HttpStatus.SC_MOVED_PERMANENTLY  ||
>>         statusCode == HttpStatus.SC_MOVED_TEMPORARILY ||
>>         statusCode == HttpStatus.SC_SEE_OTHER ||
>>         statusCode == HttpStatus.SC_TEMPORARY_REDIRECT) {
>>         Header locationHeader = theMethod.getResponseHeader("location");
>>         if (locationHeader != null) {
>>           redirectLocation = locationHeader.getValue();
>>         } else {
>>           System.err.println("Could not get location header");
>>         }
>>       } else {
>>         System.err.println("Status code " + statusCode + " not expected");
>>       }
>>     } catch (HttpException e) {
>>       System.err.println("Fatal protocol violation: " + e.getMessage());
>>       e.printStackTrace();
>>     } catch (IOException e) {
>>       System.err.println("Fatal transport error: " + e.getMessage());
>>       e.printStackTrace();
>>     } finally {
>>       theMethod.releaseConnection();
>>     }
>>     return redirectLocation;
>>   }
>>
>>The idea is, I call this from doPost() in my servlet, passing it a list 
>>of search terms (words)... I POST the request, mimicing the form, and 
>>send a redirect back to the client to the address returned by Google 
>>(I'm Feeling Lucky returns a simple redirect, code 301 from what I can 
>>se).
>>
>>I have a plain old HTML form that does this perfectly, here it is for 
>>reference:
>>
>><html>
>>   <head>
>>     <title></title>
>>   </head>
>>   <body>
>>     <form name="f" action="http://www.google.com/search">
>>       <input type="hidden" value="en" name="hl">
>>       <input name="q">
>>       <input type="submit" value="I'm Feeling Lucky" name="btnI">
>>     </form>
>>   </body>
>></html>
>>
>>However, when I try it with the above code, I get a 501 code back.  The 
>>complete wire trace is below, but the bottom line, as near as I can 
>>tell, is that the parameters are not added to the request.  I've tried 
>>creating a NameValuePair array and doing myMethod.setRequestBody(data); 
>>instead of using addParameter(), but I get the same result.  This is my 
>>first attempt at using HTTPClient, so this is sure to be something dumb 
>>on my part, but I can't seem to find the problem.  Any ideas?  Thanks!
>>
>>-- 
>>Frank W. Zammetti
>>Founder and Chief Software Architect
>>Omnytex Technologies
>>http://www.omnytex.com
>>
>>... wire trace (note: a few of my own codes' messages mixed in ...
>>
>>2005/09/23 13:36:26:534 EDT [DEBUG] HttpClient - -Java version: 1.4.2
>>2005/09/23 13:36:26:534 EDT [DEBUG] HttpClient - -Java vendor: Sun 
>>MicrosystemsInc.
>>2005/09/23 13:36:26:534 EDT [DEBUG] HttpClient - -Java class path: 
>>C:\java\j2sdk1.4.2\lib\tools.jar;..\bin\bootstrap.jar
>>2005/09/23 13:36:26:534 EDT [DEBUG] HttpClient - -Operating system name: 
>>Windows XP
>>2005/09/23 13:36:26:534 EDT [DEBUG] HttpClient - -Operating system 
>>architecture: x86
>>2005/09/23 13:36:26:534 EDT [DEBUG] HttpClient - -Operating system 
>>version: 5.1
>>2005/09/23 13:36:26:550 EDT [DEBUG] HttpClient - -SUN 1.42: SUN (DSA 
>>key/parameter generation; DSA signing; SHA-1, MD5 digests; SecureRandom; 
>>X.509 certificates; JKS keystore; PKIX CertPathValidator; PKIX 
>>CertPathBuilder; LDAP, Collection CertStores)
>>2005/09/23 13:36:26:550 EDT [DEBUG] HttpClient - -SunJSSE 1.42: Sun JSSE 
>>provider(implements RSA Signatures, PKCS12, SunX509 key/trust factories, 
>>SSLv3, TLSv1)
>>2005/09/23 13:36:26:550 EDT [DEBUG] HttpClient - -SunRsaSign 1.42: SUN's 
>>provider for RSA signatures
>>2005/09/23 13:36:26:550 EDT [DEBUG] HttpClient - -SunJCE 1.42: SunJCE 
>>Provider (implements DES, Triple DES, AES, Blowfish, PBE, 
>>Diffie-Hellman, HMAC-MD5, HMAC-SHA1)
>>2005/09/23 13:36:26:550 EDT [DEBUG] HttpClient - -SunJGSS 1.0: Sun 
>>(Kerberos v5)
>>2005/09/23 13:36:26:565 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.useragent = Jakarta Commons-HttpClient/3.0-rc3
>>2005/09/23 13:36:26:581 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.protocol.version = HTTP/1.1
>>2005/09/23 13:36:26:581 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.connection-manager.class = class 
>>org.apache.commons.httpclient.SimpleHttpConnectionManager
>>2005/09/23 13:36:26:581 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.protocol.cookie-policy = rfc2109
>>2005/09/23 13:36:26:581 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.protocol.element-charset = US-ASCII
>>2005/09/23 13:36:26:581 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.protocol.content-charset = ISO-8859-1
>>2005/09/23 13:36:26:612 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.method.retry-handler = 
>>org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@9
>>0d8ea
>>2005/09/23 13:36:26:628 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, 
>>dd-MMM-yy HH:mm:ss zzz,EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy 
>>HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ssz, EEE, dd MMM yy HH:mm:ss z, EEE 
>>dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy 
>>HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, 
>>EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy 
>>HH:mm:ss z]
>>2005/09/23 13:36:26:784 EDT [DEBUG] DefaultHttpParams - -Set parameter 
>>http.method.retry-handler = 
>>org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@1963b3e
>>Submitting query request...
>>2005/09/23 13:36:26:831 EDT [DEBUG] HttpConnection - -Open connection to 
>>www.google.com:80
>>2005/09/23 13:36:26:956 EDT [DEBUG] header - ->> "POST /search 
>>HTTP/1.1[\r][\n]"
>>2005/09/23 13:36:26:971 EDT [DEBUG] HttpMethodBase - -Adding Host 
>>request header
>>2005/09/23 13:36:27:018 EDT [DEBUG] HttpMethodBase - -Default charset 
>>used: ISO-8859-1
>>2005/09/23 13:36:27:081 EDT [DEBUG] HttpMethodBase - -Default charset 
>>used: ISO-8859-1
>>2005/09/23 13:36:27:112 EDT [DEBUG] header - ->> "User-Agent: Jakarta 
>>Commons-HttpClient/3.0-rc3[\r][\n]"
>>2005/09/23 13:36:27:159 EDT [DEBUG] header - ->> "Host: 
>>www.google.com[\r][\n]"
>>2005/09/23 13:36:27:190 EDT [DEBUG] header - ->> "Content-Length: 
>>44[\r][\n]"
>>2005/09/23 13:36:27:221 EDT [DEBUG] header - ->> "Content-Type: 
>>application/x-www-form-urlencoded[\r][\n]"
>>2005/09/23 13:36:27:268 EDT [DEBUG] header - ->> "[\r][\n]"
>>2005/09/23 13:36:27:300 EDT [DEBUG] EntityEnclosingMethod - -Request 
>>body sent
>>2005/09/23 13:36:27:362 EDT [DEBUG] header - -<< "HTTP/1.1 501 Not 
>>Implemented[\r][\n]"
>>2005/09/23 13:36:27:393 EDT [DEBUG] header - -<< "Content-Type: 
>>text/html[\r][\n]"
>>2005/09/23 13:36:27:440 EDT [DEBUG] header - -<< "Server: GWS/2.1[\r][\n]"
>>2005/09/23 13:36:27:471 EDT [DEBUG] header - -<< "Content-Length: 
>>1236[\r][\n]"
>>2005/09/23 13:36:27:503 EDT [DEBUG] header - -<< "Date: Fri, 23 Sep 2005 
>>17:35:58 GMT[\r][\n]"
>>2005/09/23 13:36:27:550 EDT [DEBUG] header - -<< "Cneonction: Close[\r][\n]"
>>2005/09/23 13:36:27:581 EDT [DEBUG] HttpMethodBase - -Buffering response 
>>body
>>2005/09/23 13:36:27:596 EDT [DEBUG] HttpMethodBase - -Resorting to 
>>protocol version default close connection policy
>>2005/09/23 13:36:27:659 EDT [DEBUG] HttpMethodBase - -Should NOT close 
>>connection, using HTTP/1.1
>>2005/09/23 13:36:27:706 EDT [DEBUG] HttpConnection - -Releasing 
>>connection backto connection manager.
>>Status code 501 not expected
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
> 
> 
> 
> 
> 

-- 
Frank W. Zammetti
Founder and Chief Software Architect
Omnytex Technologies
http://www.omnytex.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message