hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vigna <vi...@di.unimi.it>
Subject HttpAsyncClient and cookies
Date Tue, 18 Dec 2012 23:36:14 GMT
We are trying to use DefaultHttpAsyncClient in our new crawler. We need to
handle a few hundred connections per thread asynchronously, and it seems the
right candidate. 

In the last weeks we experimented with many DefaultHttpClient on a few
thousand threads and it worked well (actually, we found a couple of bugs, as
our crawls are very wide and meet any kind of server configuration errors).
Consider that we crawl URLs from different sites continuously, so we need to
change at each request the cookie store, which we do by direct management of
the store itself.

After digging the (little) documentation, I really couldn't figure out how
to manage cookies with HttpAsyncClient. Any suggestion or code snipped would
be really welcome: what we need to do, basically, is:

- keep a few hundred GET requests open in parallel.
- use for each request an AsyncByteConsumer to accumulate in a buffer the
content, and in some data structure headers, cookies, etc.
- on completion, schedule the received data for analysis.

All this requires however to manage cookies, and I could not understand how
to modify the cookie store for each async request, and how to get the cookie
store in onResponseReceived().

Any help appreciated!


View this message in context: http://httpcomponents.10934.n7.nabble.com/HttpAsyncClient-and-cookies-tp16798.html
Sent from the HttpClient-User mailing list archive at Nabble.com.

To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org

View raw message