commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernhard Wagner" ...@xmlizer.biz>
Subject RE: [httpclient] Parse response for NameValuePairs?
Date Sun, 06 Apr 2003 23:31:38 GMT
Hi John and Adrian

It might be worth also looking at NekoHTML which apparently is a more robust
HTML-parser than JTidy
http://www.apache.org/~andyc/neko/doc/html/.

Other things to consider might be:
- webtest uses a competitor to HttpClient (httpunit) and allows scripting
  of interactions with a website via XML http://webtest.canoo.com
  (for testing purposes, though)
- Talking of scripting: Jelly http://jakarta.apache.org/commons/jelly/.
  Jelly might be used to script your session in XML, too (uses HttpClient).
  Instead of traversing the DOM yourself you might use a declarative
approach
  using Xpath (readily available in Jelly) to search for e.g. the
radiobutton.


Bernhard (the heretic ;-)

> -----Original Message-----
> From: Adrian Sutton [mailto:adrian.sutton@ephox.com]
> Sent: Montag, 7. April 2003 00:59
> To: 'Jakarta Commons Users List'
> Subject: RE: [httpclient] Parse response for NameValuePairs?
>
>
> Hi John,
> I never even thought of that meaning for name value pairs. :)
>
> Essentially the way I'd take this, is to get a HTML parser (like
> say JTidy,
> start at http://tidy.sf.net/ and I'm pretty sure there's a link there) and
> use that.  This is outside of the scope of HttpClient since we wash our
> hands of all processing once we get the response from the server for you.
> That said, we are beginning a collection of useful utilities that are
> outside of the scope of HttpClient but are commonly used with
> HttpClient and
> this would fit into that description, so if you do wind up writing it and
> were kind enough to donate it under the Apache license that would
> be greatly
> appreciated.
>
> Our application makes heavy use of JTidy for all kinds of wierd and
> wonderful stuff related to HTML so I can probably help along those lines
> even though I've never had to extract INPUT tag elements (yet). :)
>
> I'd recommend you start with Tidy's parseDOM method which returns a
> org.w3c.dom.Document object then iterate over each element in the tree
> recursively looking for either any INPUT element or any INPUT
> element with a
> TYPE="radio" attribute depending on your requirements then
> extract the name
> and value from that element and store it somewhere for later processing.
> One key thing to note is that JTidy is a nasty port of the C Tidy
> implementation and does not support international characters properly
> (particularly double byte characters).  There is a patch
> available somewhere
> that fixes this though and as long as the name and value didn't contain
> double byte characters it wouldn't matter that the rest of the HTML got
> corrupted anyway.
>
> A less robust but probably simpler and faster solution would be to just do
> simple string parsing on the HTML, but you'd then have to worry about
> whether or not the element was commented out, if it was inside
> the same form
> you're talking about, if it was in a textarea or (my personal worst
> nightmare) if the "HTML" was completely invalid (and believe me, you'd be
> amazed and how bad HTML can be and still display in a browser correctly).
> Tidy can deal with invalid HTML really well which is why I recommend using
> it.
>
> Hope that helps, let me know if there's anything else I can help with.
>
> Adrian Sutton, Software Engineer
> Ephox Corporation
> www.ephox.com
>
>
> -----Original Message-----
> From: John Burke [mailto:johnburke@earthlink.net]
> Sent: Saturday, 5 April 2003 9:44 AM
> To: Jakarta Commons Users List
> Subject: Re: [httpclient] Parse response for NameValuePairs?
>
>
> Hello Adrian,
> I'm using httpclient to automate the process of using an online
> reservation system.  This is the first project I have used it for
> so I'm am grasping at some new concepts.  At one point in the
> script, I execute a get method and I get a few kilobytes from the
> server.  Buried in there is a :  <INPUT TYPE="radio" NAME="xxx"
> VALUE="yyy"> tag.  There may be more than one, but I have to respond
> with my choice in the next POST method if I want to continue the
> reservation process.  I thought the getmethod class might have a
> method that culls the server response for these gems and returns
> a set of NameValuePair class instances.  It would make building
> the post method a little easier for the newbies, but I may just be lazy.
> How would you handle this?
> Thanks for your response.
> John
>
> On Thursday, Apr 3, 2003, at 22:48 America/New_York, Adrian Sutton
> wrote:
>
> > Hi John,
> > I take it you wanted to get the name and value of all the headers
> > returned
> > in the response.  You can use the getResponseHeaders() method in any
> > HttpMethod which will return an array of Headers.
> >
> > Each header though can contain multiple values so you'd have to
> > iterate over
> > the headers and the over the values for each header.  Something like:
> >
> > Header[] headers = method.getResponseHeaders();
> > for (int i = 0; i < headers.length; i++) {
> > 	String headerName = headers[i].getName();
> > 	HeaderElement[] elements = headers[i].getValues();
> > 	for (int j = 0; j < elements.length; j++) {
> > 		HeaderElement el = elements[j];
> > 		// At this stage you have the header and it's value.
> > 		// See below for information on some "funky" headers
> > 	}
> > }
> >
> > Some headers can contain multiple values within the header value, in
> > particular cookie headers do this.  You can repeat the pattern above to
> > iterate over the parameters of the HeaderElement to get a name value
> > pair of
> > each element if that's what you want.  It really depends what level of
> > detail you need to go to.
> >
> > Why were you wanting to do this?
> >
> > Adrian Sutton, Software Engineer
> > Ephox Corporation
> > www.ephox.com
> >
> >
> > -----Original Message-----
> > From: John Burke [mailto:johnburke@earthlink.net]
> > Sent: Friday, 4 April 2003 1:17 PM
> > To: commons-user@jakarta.apache.org
> > Subject: [httpclient] Parse response for NameValuePairs?
> >
> >
> > Hi, I've looked through the API docs but didn't find what I wanted.
> > I'm wondering if there is a method that will find and return all
> > NameValuePairs
> > from a given response?  If not can anyone please suggest a few lines
> > of code?  Thanks.
> > John
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-user-help@jakarta.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-user-help@jakarta.apache.org
> >
> >
> John Burke
> Booz | Allen | Hamilton Inc.
> (732) 935-5120
> burke_john@bah.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>


Mime
View raw message