hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jens Mueller supidupi007@googlemail.com" <supidupi...@googlemail.com>
Subject Best-Practices for Multithreaded use of HttpClient (with Cookies)?
Date Wed, 27 Jan 2010 19:42:21 GMT
Hello HC Experts,

I would be very greatful for an advice regarding my question. I already
spend a lot of time searching the internet, but I am still have not found an
example that answers my questions. There are lot of examples available (also
for the multithreaded use-cases) but the only adress the use-case making
one(!!) request. I am completely uncertain how to "best" make a series of
requests (to the same webserver).

I need to develop a simple Crawler that crawls some websites for specific
information. The Basic idea is to download the single webpages of a website
(for example www.a.com) sequentially but run several of these "sequential"
downloaders in threads for different webpages (www.b.com and www.c.com) in
parallel.

My current concept/implementation looks like this:

1.  Instanciate a ThreadSafeClientConnManager (with a lot of default
parameters). This connection Manager will be used/shared by all
"DefaultHttpClient's"s
2.  For every Webpage (of a Website, with multiple webpages), I Instanciate
for every(!!) webpage-request a new DefaultHttpClient and then call the
"httpClient.execute(httpGet)" method with the instanciated GetMethod(url).

==> I am more and more wondering if this is the correct usage of the
DefaultHttpClient and the .execute() Method. Am I doing something wrong
here, to instanciate a new DefaultHttpClient for every request of a wepage?
Or should I rather instanciate only one(!!) DefaultHttpClient and then share
this for the sequential .execute() calls?

To be honest, what I also have not really understood yet is the Cookie
Management. Do I as the Programmer have to instanciate the CookieStore
manually
1. httpClient.setCookieStore(new BasicCookieStore());
and then after calling the .execute() method "get" the Cookie store
2. savedcookies = httpClient.getCookieStore()
and then reinject this cookie store for the next call to the same wepage (to
maintain state)?
3. httpClient.setCookie(savedcookies)
Or is there some implicit magic that A) does create the cookie store
implicitly and B) somehow shares this CookieStore among the HttpClients
and/or HttpGet's?

Thank you very much!!
Jens

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message