Return-Path: Mailing-List: contact commons-user-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list commons-user@jakarta.apache.org Received: (qmail 18288 invoked from network); 6 Apr 2003 23:30:22 -0000 Received: from isp247n.hispeed.ch (HELO smtp.hispeed.ch) (62.2.95.247) by daedalus.apache.org with SMTP; 6 Apr 2003 23:30:22 -0000 Received: from alhambra (dclient217-162-108-160.hispeed.ch [217.162.108.160]) by smtp.hispeed.ch (8.12.6/8.12.6/tornado-1.0) with SMTP id h36NUQCi025829 for ; Mon, 7 Apr 2003 01:30:28 +0200 From: "Bernhard Wagner" To: "Jakarta Commons Users List" Subject: RE: [httpclient] Parse response for NameValuePairs? Date: Mon, 7 Apr 2003 01:31:38 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 In-Reply-To: <0AC2D75550100F4DBDB025D4D05611887704E8@BIGCOW.intraephox.ephox.com> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi John and Adrian It might be worth also looking at NekoHTML which apparently is a more robust HTML-parser than JTidy http://www.apache.org/~andyc/neko/doc/html/. Other things to consider might be: - webtest uses a competitor to HttpClient (httpunit) and allows scripting of interactions with a website via XML http://webtest.canoo.com (for testing purposes, though) - Talking of scripting: Jelly http://jakarta.apache.org/commons/jelly/. Jelly might be used to script your session in XML, too (uses HttpClient). Instead of traversing the DOM yourself you might use a declarative approach using Xpath (readily available in Jelly) to search for e.g. the radiobutton. Bernhard (the heretic ;-) > -----Original Message----- > From: Adrian Sutton [mailto:adrian.sutton@ephox.com] > Sent: Montag, 7. April 2003 00:59 > To: 'Jakarta Commons Users List' > Subject: RE: [httpclient] Parse response for NameValuePairs? > > > Hi John, > I never even thought of that meaning for name value pairs. :) > > Essentially the way I'd take this, is to get a HTML parser (like > say JTidy, > start at http://tidy.sf.net/ and I'm pretty sure there's a link there) and > use that. This is outside of the scope of HttpClient since we wash our > hands of all processing once we get the response from the server for you. > That said, we are beginning a collection of useful utilities that are > outside of the scope of HttpClient but are commonly used with > HttpClient and > this would fit into that description, so if you do wind up writing it and > were kind enough to donate it under the Apache license that would > be greatly > appreciated. > > Our application makes heavy use of JTidy for all kinds of wierd and > wonderful stuff related to HTML so I can probably help along those lines > even though I've never had to extract INPUT tag elements (yet). :) > > I'd recommend you start with Tidy's parseDOM method which returns a > org.w3c.dom.Document object then iterate over each element in the tree > recursively looking for either any INPUT element or any INPUT > element with a > TYPE="radio" attribute depending on your requirements then > extract the name > and value from that element and store it somewhere for later processing. > One key thing to note is that JTidy is a nasty port of the C Tidy > implementation and does not support international characters properly > (particularly double byte characters). There is a patch > available somewhere > that fixes this though and as long as the name and value didn't contain > double byte characters it wouldn't matter that the rest of the HTML got > corrupted anyway. > > A less robust but probably simpler and faster solution would be to just do > simple string parsing on the HTML, but you'd then have to worry about > whether or not the element was commented out, if it was inside > the same form > you're talking about, if it was in a textarea or (my personal worst > nightmare) if the "HTML" was completely invalid (and believe me, you'd be > amazed and how bad HTML can be and still display in a browser correctly). > Tidy can deal with invalid HTML really well which is why I recommend using > it. > > Hope that helps, let me know if there's anything else I can help with. > > Adrian Sutton, Software Engineer > Ephox Corporation > www.ephox.com > > > -----Original Message----- > From: John Burke [mailto:johnburke@earthlink.net] > Sent: Saturday, 5 April 2003 9:44 AM > To: Jakarta Commons Users List > Subject: Re: [httpclient] Parse response for NameValuePairs? > > > Hello Adrian, > I'm using httpclient to automate the process of using an online > reservation system. This is the first project I have used it for > so I'm am grasping at some new concepts. At one point in the > script, I execute a get method and I get a few kilobytes from the > server. Buried in there is a : VALUE="yyy"> tag. There may be more than one, but I have to respond > with my choice in the next POST method if I want to continue the > reservation process. I thought the getmethod class might have a > method that culls the server response for these gems and returns > a set of NameValuePair class instances. It would make building > the post method a little easier for the newbies, but I may just be lazy. > How would you handle this? > Thanks for your response. > John > > On Thursday, Apr 3, 2003, at 22:48 America/New_York, Adrian Sutton > wrote: > > > Hi John, > > I take it you wanted to get the name and value of all the headers > > returned > > in the response. You can use the getResponseHeaders() method in any > > HttpMethod which will return an array of Headers. > > > > Each header though can contain multiple values so you'd have to > > iterate over > > the headers and the over the values for each header. Something like: > > > > Header[] headers = method.getResponseHeaders(); > > for (int i = 0; i < headers.length; i++) { > > String headerName = headers[i].getName(); > > HeaderElement[] elements = headers[i].getValues(); > > for (int j = 0; j < elements.length; j++) { > > HeaderElement el = elements[j]; > > // At this stage you have the header and it's value. > > // See below for information on some "funky" headers > > } > > } > > > > Some headers can contain multiple values within the header value, in > > particular cookie headers do this. You can repeat the pattern above to > > iterate over the parameters of the HeaderElement to get a name value > > pair of > > each element if that's what you want. It really depends what level of > > detail you need to go to. > > > > Why were you wanting to do this? > > > > Adrian Sutton, Software Engineer > > Ephox Corporation > > www.ephox.com > > > > > > -----Original Message----- > > From: John Burke [mailto:johnburke@earthlink.net] > > Sent: Friday, 4 April 2003 1:17 PM > > To: commons-user@jakarta.apache.org > > Subject: [httpclient] Parse response for NameValuePairs? > > > > > > Hi, I've looked through the API docs but didn't find what I wanted. > > I'm wondering if there is a method that will find and return all > > NameValuePairs > > from a given response? If not can anyone please suggest a few lines > > of code? Thanks. > > John > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: commons-user-help@jakarta.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: commons-user-help@jakarta.apache.org > > > > > John Burke > Booz | Allen | Hamilton Inc. > (732) 935-5120 > burke_john@bah.com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: commons-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: commons-user-help@jakarta.apache.org >