commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Sell <christian.s...@netcologne.de>
Subject Re: [httpclient] Parse response for NameValuePairs?
Date Mon, 07 Apr 2003 07:37:19 GMT
Hi all,

I think Johns need is best covered by either HttpUnit or HtmlUnit (both 
@sf.net). Both tools present easy APIs to access the elements from a 
HTML response and to perform processing on them (e.g., follow a link).

BTW, HttpUnit uses HttpURLConnection and either JTidy or Neko under the 
covers, while HtmlUnit uses HttpClient and Neko.

Christian


Bernhard Wagner wrote:
> Hi John and Adrian
> 
> It might be worth also looking at NekoHTML which apparently is a more robust
> HTML-parser than JTidy
> http://www.apache.org/~andyc/neko/doc/html/.
> 
> Other things to consider might be:
> - webtest uses a competitor to HttpClient (httpunit) and allows scripting
>   of interactions with a website via XML http://webtest.canoo.com
>   (for testing purposes, though)
> - Talking of scripting: Jelly http://jakarta.apache.org/commons/jelly/.
>   Jelly might be used to script your session in XML, too (uses HttpClient).
>   Instead of traversing the DOM yourself you might use a declarative
> approach
>   using Xpath (readily available in Jelly) to search for e.g. the
> radiobutton.
> 
> 
> Bernhard (the heretic ;-)
> 
> 
>>-----Original Message-----
>>From: Adrian Sutton [mailto:adrian.sutton@ephox.com]
>>Sent: Montag, 7. April 2003 00:59
>>To: 'Jakarta Commons Users List'
>>Subject: RE: [httpclient] Parse response for NameValuePairs?
>>
>>
>>Hi John,
>>I never even thought of that meaning for name value pairs. :)
>>
>>Essentially the way I'd take this, is to get a HTML parser (like
>>say JTidy,
>>start at http://tidy.sf.net/ and I'm pretty sure there's a link there) and
>>use that.  This is outside of the scope of HttpClient since we wash our
>>hands of all processing once we get the response from the server for you.
>>That said, we are beginning a collection of useful utilities that are
>>outside of the scope of HttpClient but are commonly used with
>>HttpClient and
>>this would fit into that description, so if you do wind up writing it and
>>were kind enough to donate it under the Apache license that would
>>be greatly
>>appreciated.
>>
>>Our application makes heavy use of JTidy for all kinds of wierd and
>>wonderful stuff related to HTML so I can probably help along those lines
>>even though I've never had to extract INPUT tag elements (yet). :)
>>
>>I'd recommend you start with Tidy's parseDOM method which returns a
>>org.w3c.dom.Document object then iterate over each element in the tree
>>recursively looking for either any INPUT element or any INPUT
>>element with a
>>TYPE="radio" attribute depending on your requirements then
>>extract the name
>>and value from that element and store it somewhere for later processing.
>>One key thing to note is that JTidy is a nasty port of the C Tidy
>>implementation and does not support international characters properly
>>(particularly double byte characters).  There is a patch
>>available somewhere
>>that fixes this though and as long as the name and value didn't contain
>>double byte characters it wouldn't matter that the rest of the HTML got
>>corrupted anyway.
>>
>>A less robust but probably simpler and faster solution would be to just do
>>simple string parsing on the HTML, but you'd then have to worry about
>>whether or not the element was commented out, if it was inside
>>the same form
>>you're talking about, if it was in a textarea or (my personal worst
>>nightmare) if the "HTML" was completely invalid (and believe me, you'd be
>>amazed and how bad HTML can be and still display in a browser correctly).
>>Tidy can deal with invalid HTML really well which is why I recommend using
>>it.
>>
>>Hope that helps, let me know if there's anything else I can help with.
>>
>>Adrian Sutton, Software Engineer
>>Ephox Corporation
>>www.ephox.com
>>
>>
>>-----Original Message-----
>>From: John Burke [mailto:johnburke@earthlink.net]
>>Sent: Saturday, 5 April 2003 9:44 AM
>>To: Jakarta Commons Users List
>>Subject: Re: [httpclient] Parse response for NameValuePairs?
>>
>>
>>Hello Adrian,
>>I'm using httpclient to automate the process of using an online
>>reservation system.  This is the first project I have used it for
>>so I'm am grasping at some new concepts.  At one point in the
>>script, I execute a get method and I get a few kilobytes from the
>>server.  Buried in there is a :  <INPUT TYPE="radio" NAME="xxx"
>>VALUE="yyy"> tag.  There may be more than one, but I have to respond
>>with my choice in the next POST method if I want to continue the
>>reservation process.  I thought the getmethod class might have a
>>method that culls the server response for these gems and returns
>>a set of NameValuePair class instances.  It would make building
>>the post method a little easier for the newbies, but I may just be lazy.
>>How would you handle this?
>>Thanks for your response.
>>John
>>
>>On Thursday, Apr 3, 2003, at 22:48 America/New_York, Adrian Sutton
>>wrote:
>>
>>
>>>Hi John,
>>>I take it you wanted to get the name and value of all the headers
>>>returned
>>>in the response.  You can use the getResponseHeaders() method in any
>>>HttpMethod which will return an array of Headers.
>>>
>>>Each header though can contain multiple values so you'd have to
>>>iterate over
>>>the headers and the over the values for each header.  Something like:
>>>
>>>Header[] headers = method.getResponseHeaders();
>>>for (int i = 0; i < headers.length; i++) {
>>>	String headerName = headers[i].getName();
>>>	HeaderElement[] elements = headers[i].getValues();
>>>	for (int j = 0; j < elements.length; j++) {
>>>		HeaderElement el = elements[j];
>>>		// At this stage you have the header and it's value.
>>>		// See below for information on some "funky" headers
>>>	}
>>>}
>>>
>>>Some headers can contain multiple values within the header value, in
>>>particular cookie headers do this.  You can repeat the pattern above to
>>>iterate over the parameters of the HeaderElement to get a name value
>>>pair of
>>>each element if that's what you want.  It really depends what level of
>>>detail you need to go to.
>>>
>>>Why were you wanting to do this?
>>>
>>>Adrian Sutton, Software Engineer
>>>Ephox Corporation
>>>www.ephox.com
>>>
>>>
>>>-----Original Message-----
>>>From: John Burke [mailto:johnburke@earthlink.net]
>>>Sent: Friday, 4 April 2003 1:17 PM
>>>To: commons-user@jakarta.apache.org
>>>Subject: [httpclient] Parse response for NameValuePairs?
>>>
>>>
>>>Hi, I've looked through the API docs but didn't find what I wanted.
>>>I'm wondering if there is a method that will find and return all
>>>NameValuePairs
>>>from a given response?  If not can anyone please suggest a few lines
>>>of code?  Thanks.
>>>John
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>>>
>>>
>>
>>John Burke
>>Booz | Allen | Hamilton Inc.
>>(732) 935-5120
>>burke_john@bah.com
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
> 
> 



Mime
View raw message