hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henri Yandell <flame...@gmail.com>
Subject Re: robots.txt parser
Date Mon, 01 Nov 2004 23:37:15 GMT
On Mon, 01 Nov 2004 23:59:01 +0100, Oleg Kalnichevski <olegk@apache.org> wrote:
> On Mon, 2004-11-01 at 20:37, Henri Yandell wrote:
> >
> > If not, would anyone be interested in http://www.osjava.org/norbert/ ?
> >
> > I'd like to put it in the sandbox and thought that it would be of a
> > lot of interest to the HttpClient project and users.
> >
> 
> Can we keep it in the sandbox for a while? As soon as HttpClient 4.0 API
> starts shaping up, the robot.txt parser could be migrated to Jakarta
> HttpClient to lay a foundation for a web crawler subcomponent.

I'll go ahead and migrate it into the sandbox at some point soon. 

On the web crawler side; there's:

http://www.osjava.org/scraping-engine/

I need to migrate it to use commons-configuration, and it already sits
on top of HttpClient. Food for thought anyway I hope. I use it
personally and its used at my workplace, but haven't really pushed it
outside of my own use yet.

Hen

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Mime
View raw message