hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Best Practice to Use HttpClient in Multithreaded Environment
Date Mon, 17 Aug 2009 18:21:28 GMT
Hi Yan Cheung,

See below - but one caveat...Oleg could very well correct all of my  
comments below :)

On Aug 16, 2009, at 6:17pm, yccheok wrote:

> Hi Ken,
>
> So, in my case, I should set
>
> httpConnectionManagerParams.setDefaultMaxConnectionsPerHost(50);

Yes, if all of your requests will be coming from the same domain, and  
you're going to be hitting it with all 50 threads at the same time.  
But that's not a normal use case - hope you're really good friends  
with that site's ops team :)

E.g. in Bixo we configure HttpClient for one thread per host, as  
that's what you need for polite crawling.

> httpConnectionManagerParams.setMaxTotalConnections(50);
> // hostConfiguration will be obtained from HttpClient iteself.
> httpConnectionManagerParams.setMaxConnectionsPerHost(HostConfiguration
> hostConfiguration, 50);
>
> Is there any side effect of setting the number of too high, like 1000?

I don't know the details of how HttpClient (3.x or 4.x) allocates  
connections in the pool, but I assume they only create a connection  
when one is needed, there's no free connection, and the total number  
of connections is less than this limit.

So leaving aside issues of memory requirements, max # of open sockets,  
etc. that you'd hit with 1000 active connections, I don't think there  
would be any issue with using a large value.

> If compared to 100 HttpClient with maxConnection = 10 each, will  
> single
> HttpClient with maxConnection = 1000 performs better? Or it depends  
> case by
> case situation?

I think performance will mostly depend on the servers that you're  
accessing.

See http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/

  for a blog post I wrote about crawl performance. This was using Bixo  
and HttpClient 4.0

> I know HttpClient does maintain its own connection pool. Does "this  
> figure"
> (1000) affect "number of simultaneous connections allowed" in a  
> given time?
> or "this figure" itself is the number of connections allowed in  
> HttpClient
> connection pool?

There are two HttpClient-based limits for maximum number of  
simultaneous connections - the max connections per host and the max  
total connections. Assuming you are hitting 1000 different hosts, then  
you could have 1000 simultaneous connections. Though you'll also  
typically run into other limits, like running out of system memory due  
to the amount of stack space used per thread, or DNS lookups becoming  
slow, etc.

-- Ken


> Ken Krugler wrote:
>>
>> Hi Yan Cheng,
>>
>> I haven't used HttpClient 3.x for a while - switched to 4.0 and
>> haven't looked back.
>>
>> But in general method A is going to work better. You can configure  
>> the
>> MultiThreadedHttpConnectionManager with a maximum number of threads -
>> e.g. you could pick a number equal to the max # of threads that you
>> know will be using it. If it's configured with less than the max
>> number of threads, then some of your connection requests will block
>> until a free connection becomes available - and if these exceeds a
>> (configurable) limit, you'll get an exception.
>>
>> In extreme situations I've run with up to 1000 threads and one
>> connection manager, so I don't think you'll hit any limits there.
>>
>> -- Ken
>>
>>
>> On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:
>>
>>> Hi all,
>>>
>>> All the while, I am using HttpClient in multithreaded environment.
>>> For every threads, when they initiate a connection, they will create
>>> a complete new HttpClient instance.
>>>
>>> Recently, I discover, by using this approach, it can cause the user
>>> is having too many port being opened, and most of the connections
>>> are in TIME_WAIT state.
>>>
>>> http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html
>>>
>>> Hence, instead of per thread doing :
>>> HttpClient c = new HttpClient();
>>> try {
>>>   c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>> }
>>>
>>>
>>> We plan to have :
>>>
>>> [METHOD A]
>>>
>>> // global_c is initialized once through
>>> // HttpClient global_c = new HttpClient(new
>>> MultiThreadedHttpConnectionManager());
>>>
>>> try {
>>>   global_c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>> }
>>>
>>> In normal situation, global_c will be accessed by 50++ threads
>>> concurrently. I was wondering, whether this will occur any
>>> performance issue? Is MultiThreadedHttpConnectionManager using lock-
>>> free mechanism to implement its thread safe policy?
>>>
>>> It is possible if 10 threads are using global_c, will the other 40
>>> threads being locked?
>>>
>>> Or will it better if in every threads, I create a instance for every
>>> HttpClient, but release the connection manager explicitly.
>>>
>>> [METHOD B]
>>> HttpClient c = new HttpClient();
>>> try {
>>>   c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>>   c.getHttpConnectionManager().shutdown();
>>> }
>>>
>>> Is c.getHttpConnectionManager().shutdown() suffer performance  
>>> issues?
>>>
>>> May I know which method (A or B) is better, for application using  
>>> 50+
>>> + threads?
>>>
>>> I am using HttpClient 3.1
>>>
>>> Thanks and Regards
>>> Yan Cheng Cheok


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message