hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jyu...@aol.com
Subject Re: How to "mimic a browser" for threaded web sites?
Date Tue, 03 Jul 2007 03:54:12 GMT

 It looks like I was way off base on this one.? For the moment, forget my hypothesis of multithreading.?
This web site has something much more interesting that I did not understand at the time.?
I used the methodology in your primer "ForAbsoluteBeginners," to do a study of this website.?
This is what I did ... and what I found.

First, I set up a program to GET the Logon Page.? Here is the program I used.? As you can
see, except for the url, it is exactly the sample program in the HttpClientTutorial.

import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;

import java.io.*;

public class ConnectToSiteNew {
? 
?private static String url = "https://ais4.tiaa-cref.org/customerinquiry/accountHome.do";
? public static void main(String[] args) {
??? // Create an instance of HttpClient.
??? HttpClient client = new HttpClient();

??? // Create a method instance.

??? GetMethod method = new GetMethod(url);
??? // Provide custom retry handler if necessary
??? method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
??? ??? ??? new DefaultHttpMethodRetryHandler(3, false));

??? try {
????? // Execute the method.
????? int statusCode = client.executeMethod(method);

????? if (statusCode != HttpStatus.SC_OK) {
??????? System.err.println("Method failed: " + method.getStatusLine());
????? }

????? // Read the response body.
????? byte[] responseBody = method.getResponseBody();

????? // Deal with the response.
????? // Use caution: ensure correct character encoding and is not binary data
????? System.out.println(new String(responseBody));

??? } catch (HttpException e) {
????? System.err.println("Fatal protocol violation: " + e.getMessage());
????? e.printStackTrace();
??? } catch (IOException e) {
????? System.err.println("Fatal transport error: " + e.getMessage());
????? e.printStackTrace();
??? } finally {
????? // Release the connection.
????? method.releaseConnection();
??? ????? }? 
? }
}

The first hint that there is something unusual here is that the url appears to refer to a
script.? You can run the above java program.? It works and retrieves a Logon page that looks
the same as the one you would get with a browser.? The difference is that the java program
skips the home page and goes directly to the "Logon page" (I tried this with a browser as
well instead of going to the home page "tiaa-cref.org" and clicking the "logon" button.)?
The unusual feature of the Logon Page is that it is different for each user.? Each user gets
his/her own, custom generated logon form.? You can see that the program takes a while to execute,
while it generates the Logon Page, but it does work.? I dumped the standard output (System.out)
to a file so that I could examine it.

Next I did an analysis of the Logon form.? I searched it for <input .../> statements.?
I found the two usual ones for entering the user id and password:

<input type="text" tabindex="1" name="user"? id="user" .../>
<input type="password" tabindex="2" name="password" .../>

I also found some statements that assign constant values to certain names:

<input type="hidden" name="DK" value="" />
<input type="hidden" name="SMAUTHREASON" value="0" />

But the interesting ones were the following three that are unique to my session:

<input type="hidden" name="TARGET" value="https://ais4.tiaa-cref.org/selfservices/secureresource/redirect.do?targetURL=https://ais4.tiaa-cref.org/customerinquiry/accountHome.do"/>

<input type="hidden" name="SMAGENTNAME" value="vAWNg3iV8aADFepETR44Ovi5r0zNV8p2k6u11LgIee9yVDlbNk3m1lHN1QOMpE3h"
/>

<input type="hidden" name="REALMOID" value="06-000ad955-678a-1334-9f02-83ab87ebff3f" />

Pressing the "submit" button on the logon form seems to submit the logon form to yet another
script:

<input type="image" class="submit_button" src="../docs/images/login.png" alt="Log in" onclick='return
submitLoginForm("https://ais4.tiaa-cref.org/forms/tiaacref.fcc","/selfservices/sso/login.do?command=validateForm"
)'/>

What I would like to do is expand the above program to: 

1) GET the logon form (I have already done this)

2) "hold onto" the form while I insert the values for user id and password

3) submit the form and follow the redirects as usual.

How do I do 2) and 3)?

Jerry
















 

-----Original Message-----
From: Roland Weber <ossfwot@dubioso.net>
To: HttpClient User Discussion <httpclient-user@jakarta.apache.org>
Sent: Sun, 8 Apr 2007 1:27 pm
Subject: Re: How to "mimic a browser" for threaded web sites?










Hi Jerry,

> Sorry to take such a long time between requests for help. Before I make
> major changes to my application, how can I test to see whether I can still
> use the httpclient "simple connection" and mimic the sequential requests as
> you suggest or whether I really need a multithreaded connection?

The hard way: trial and error. I don't know of any spec that
*requires* a browser to open multiple connections, so I find
it hard to believe that any web application would rely on that.
Even if there are multiple windows, that doesn't mean that
more than one of them is executing a request at any one time.
Requests are most often generated by the user clicking some
link or button, and users don't click in multiple windows at
the same time.

Just analyse the web application as you used to, noting cases
where the returned page is displayed in a different window.
If it is plain HTML, that is done by the target="windowname"
attribute in links. Then try to run the sequence of requests
generated by multiple windows from a single thread.

hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org




 


________________________________________________________________________
AOL now offers free email to everyone.  Find out more about what's free from AOL at AOL.com.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message