abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: HTML Parser
Date Tue, 15 Jan 2008 01:00:56 GMT
Checked in.

Btw, this code let's us do some other cool things...

   URL url = new URL("http://www.snellspace.com/wp");
   Abdera abdera = Abdera.getInstance();
   Document<Element> doc =
     abdera.getParserFactory().
     getParser("html").parse(
       url.openStream());
   XPath xpath = abdera.getXPath();

   // enumerate all links in the html doc
   List<Element> nodes = xpath.selectNodes("//a", doc);
   for (Element node : nodes)
     System.out.println(node);

   // enumerate all hCards in the html doc
   List<Element> vcards =
     xpath.selectNodes(
       "//*[@class ='vcard']",doc);
   for (Element node : vcards)
     System.out.println(node);

- James

Brian Moseley wrote:
> On Jan 14, 2008 12:26 PM, James M Snell <jasnell@gmail.com> wrote:
> 
>> I could commit this but doing so means adding two new optional
>> dependency jars.  I think the function is valuable enough to justify the
>> addition but I wanted to run it past the rest of you first.
> 
> great addition. +1
> 

Mime
View raw message