commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory" <ggreg...@seagullsoftware.com>
Subject RE: [HttpClient] Screen Scraping Components?
Date Sat, 20 Nov 2004 00:01:48 GMT
Another approach would be to use a util to turn it into XHTML (XML) and
then using Xpath to get to anything.

Gary

-----Original Message-----
From: Brant Hahn [mailto:brant.hahn@insightbb.com] 
Sent: Friday, November 19, 2004 2:30 PM
To: 'Jakarta Commons Users List'
Subject: [HttpClient] Screen Scraping Components?

Hi, 

 

I've been using HttpClient for a few months now.  I was wondering if
anyone
out there using had a recommendation on any 3rd party component for
screen
scraping?  I've seen a few out there, including Jericho, but generally
have
to write more code than I want to when using it.  Just curious if there
was
something out there that takes-in regex Pattern objects (or just regex
pre-compiled strings) to easily get the data that I want off of any
page.

 

Thanks,

Brant


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message