Return-Path: Delivered-To: apmail-jakarta-jmeter-user-archive@apache.org Received: (qmail 46576 invoked from network); 10 Dec 2001 17:52:30 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 10 Dec 2001 17:52:30 -0000 Received: (qmail 17273 invoked by uid 97); 10 Dec 2001 17:52:33 -0000 Delivered-To: qmlist-jakarta-archive-jmeter-user@jakarta.apache.org Received: (qmail 17257 invoked by uid 97); 10 Dec 2001 17:52:32 -0000 Mailing-List: contact jmeter-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "JMeter Users List" Reply-To: "JMeter Users List" Delivered-To: mailing list jmeter-user@jakarta.apache.org Received: (qmail 17246 invoked from network); 10 Dec 2001 17:52:32 -0000 Message-Id: <5.1.0.14.0.20011210094953.03abe7a0@orson.callenish.com> Date: Mon, 10 Dec 2001 09:52:43 -0800 To: jmeter-user@jakarta.apache.org From: Bruce Atherton Subject: Using JMeter for Archiving a Website? Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I am trying to archive the contents of an extranet website which is mostly dynamic content. I'd like to record what it's contents every day, and store them in a format where you can open up the website and browse it as it existed on any given day. I posted a message on Usenet and one respondent suggested I look at JMeter for a solution. I was wondering whether anyone on this list had set up JMeter to do something similar, or had other suggestions as to how I could accomplish my task, involving JMeter or not. I'm willing to code some Java if that would help. Some of the features I require for this website snapshot program: 1. Parse the HTML and extract further URLs to follow, just like any spider does. 2. Provide support for URL Encoding of a Session ID 3. Parse forms to recognize Submit URLs and the field data that must be returned in a POST, including hidden fields. 4. Allow setting a configuration file to provide the data that should be returned for a particular field in a form (for example, setting what should be returned in "username" and "password" fields). 5. Support regular expressions so that you can make sure the session is going the way it should. For example, if you get "Login Failed" in the returned HTML you should be able to recognize that as an error condition. 6. Replace any absolute URLs with relative ones, so that if you open the archive on disk it will look and act exactly the same way the web site did that day. 7. Do depth first searches (which a user could conceivably do) rather than breadth first (which a user could not do) so that context within the session is kept sensible. Any pointers, suggestions, guidelines? I'd be most appreciative of any information. Thanks. -- To unsubscribe, e-mail: For additional commands, e-mail: