hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Fran├žois Rey <charlesfr....@epfl.ch>
Subject Re: Framesets
Date Thu, 04 Jun 2009 16:14:44 GMT
Here's an example:
---
// bare minimum, lots of ways to improve how things are handled
DefaultHttpClient httpclient = new DefaultHttpClient();
String url = "http://www.w3schools.com/HTML/tryit.asp?filename=tryhtml_frame_cols 
";
HttpUriRequest request = new HttpGet(url);
HttpResponse response = httpclient.execute(request);
HtmlCleaner cleaner = new HtmlCleaner();
// note that HtmlCleaner is capable to download a URL, but let's assume
// that you need httpclient to do it .. (e.g. POST request,
// special settings, ...)
TagNode rootNode = cleaner.clean(response.getEntity().getContent());
Document doc = (new DomSerializer(cleaner.getProperties(),  
true)).createDOM(rootNode);
// we're just going to display the target urls of the frames
XPath xpath = XPathFactory.newInstance().newXPath();
// XPath is very useful when dealing with HTML/XML ..
NodeList nodes = ( NodeList )xpath.evaluate("//frame/@src", doc,  
XPathConstants.NODESET);
for(int i = 0; i<nodes.getLength(); i++) {
	System.out.println(nodes.item(i).getNodeValue());
}
---

This example should display the 3 frames of this example at the given  
URL, i.e.:
	frame_a.htm
	frame_b.htm
	frame_c.htm

Those are relative paths, so you would have to prefix with the correct  
basepath to fetch them.

Note on the packages used: XPath comes from the standard package  
javax.xml.xpath, and the HtmlCleaner library comes from http://htmlcleaner.sourceforge.net/

, Document and NodeList come from the standard package org.w3c.dom.

On 4 juin 09, at 17:12, Charles Fran├žois Rey wrote:

> Frameset is an HTML concept. HttpClient takes care of HTTP, not HTML.
>
> That being said, it is possible to follow Framesets, just download the
> HTML file, parse it and follow the Frame definitions.
>
> If I had to do it, I'd use HttpClient to retrieve the HTML,
> HTMLCleaner to clean the HTML, and XPath to filter the Frame "src"
> attributes.
>
> On 4 juin 09, at 16:44, Scott Ward wrote:
>
>> Is it possible to follow framesets using HttpClient?  I have
>> searched all
>> over and haven't found anything so I thought that I would try this.
>>
>> If it is can you direct me to it in the API or show an example.
>>
>> Any help is greatly appreciated.
>> __________________________________________
>>
>> ~Sward

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message