hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: HttpAsyncClient as a spider
Date Fri, 18 Jul 2014 14:02:18 GMT
On Fri, 2014-07-18 at 18:16 +0800, Li Li wrote:
> hi all,
>     I used to use HttpComponents Client to crawl webpages. I need to
> improve it by using async client. What I want to is something like:
>    Queue<URL> needCrawlQueue;
>    Queue<String[]> htmlQueue;
>     HttpAsyncClient client;
>     int maxConcurrent=500;
>     //if finished a url, then get notified and call back this code
>     if(client.currentCrawlingCount<maxConcurrent){
>              URL url=needCrawlQueue.take();
>              //request this url
>     }
>     //if finished a url, then get notifed and call back this code
>     //String url;String html is call back arguments
>     htmlQueue.put(new String[]{url, html};
>     I mean I have a asnyc client class which take two queues.
>     if current unfinished urls less than maxConcurrent, then it task a
> url from a queue and request this url. if a url succeed(or failed),
> add the result to another queue.

Why do you think the use of an async client would necessarily be an
improvement? What is it exactly you want to improve? Generally a decent
blocking client with a moderate number of threads is likely to be faster
than an async one.


To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org

View raw message