hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Crawford <samcrawf...@gmail.com>
Subject Re: Best-Practices for Multithreaded use of HttpClient (with Cookies)?
Date Wed, 27 Jan 2010 23:22:16 GMT
I could well be mistaken, but my experience suggests that with version
4.0 you need a new HttpClient each time you deal with a different set
of cookies. Creating multiple HttpContexts used across a single
DefaultHttpClient instance did not seem to be sufficient.

That said, I only tried this briefly and didn't spend a huge amount of
time investigating it. I keep meaning to do so and to submit a bug if
I find a genuinely reproducible issue.



2010/1/27 Jens Mueller supidupi007@googlemail.com <supidupi007@googlemail.com>:
> Hello HC Experts,
> I would be very greatful for an advice regarding my question. I already
> spend a lot of time searching the internet, but I am still have not found an
> example that answers my questions. There are lot of examples available (also
> for the multithreaded use-cases) but the only adress the use-case making
> one(!!) request. I am completely uncertain how to "best" make a series of
> requests (to the same webserver).
> I need to develop a simple Crawler that crawls some websites for specific
> information. The Basic idea is to download the single webpages of a website
> (for example www.a.com) sequentially but run several of these "sequential"
> downloaders in threads for different webpages (www.b.com and www.c.com) in
> parallel.
> My current concept/implementation looks like this:
> 1.  Instanciate a ThreadSafeClientConnManager (with a lot of default
> parameters). This connection Manager will be used/shared by all
> "DefaultHttpClient's"s
> 2.  For every Webpage (of a Website, with multiple webpages), I Instanciate
> for every(!!) webpage-request a new DefaultHttpClient and then call the
> "httpClient.execute(httpGet)" method with the instanciated GetMethod(url).
> ==> I am more and more wondering if this is the correct usage of the
> DefaultHttpClient and the .execute() Method. Am I doing something wrong
> here, to instanciate a new DefaultHttpClient for every request of a wepage?
> Or should I rather instanciate only one(!!) DefaultHttpClient and then share
> this for the sequential .execute() calls?
> To be honest, what I also have not really understood yet is the Cookie
> Management. Do I as the Programmer have to instanciate the CookieStore
> manually
> 1. httpClient.setCookieStore(new BasicCookieStore());
> and then after calling the .execute() method "get" the Cookie store
> 2. savedcookies = httpClient.getCookieStore()
> and then reinject this cookie store for the next call to the same wepage (to
> maintain state)?
> 3. httpClient.setCookie(savedcookies)
> Or is there some implicit magic that A) does create the cookie store
> implicitly and B) somehow shares this CookieStore among the HttpClients
> and/or HttpGet's?
> Thank you very much!!
> Jens

To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org

View raw message