hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastiano Vigna <vi...@di.unimi.it>
Subject Re: GETs and Responses in different threads ?
Date Wed, 04 Mar 2015 10:03:06 GMT

> On 4 Mar 2015, at 10:54, Paul Bear <xpyazu@gmail.com> wrote:
> 
> I have a large list of URLs (about 1 millon) and I want to run
> 
> - one thread that runs through the list and asynchronously sends GET
> requests
> - several worker threads that process the responses
> 
> Is it possible to separate sending GETs and processing responses in
> different threads using Apache Client ?
> 
> Any ideas are welcome!

Well, just use BUbiNG, our crawler. :) It will take care of politeness, etc., and download
in parallel from any number of hosts. You just have to implement the HTMLParser interface
to do the processing you need.

Ciao,

					seba


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message