hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject RE: question about performance
Date Fri, 09 Apr 2004 19:11:00 GMT
Gil,

> How about this, it turns out that the different timeouts are a small
> set, some want 10 secs, others 30 secs, others 1 minute. So I can keep a
> pool of HttpClient objects, one per timeout. This way, each HttpClient
> object can be configured with its own timeout.
> 

Are you talking about connect timeouts or socket (read) timeouts, or
both?


> But will I still get performance benefits if I use HttpClient in this
> way?

Sounds plausible. Would you be accessing different hosts with those
HttpClient instances?


>  And will it behave correctly? I do require a timeout to be set, and
> I will have multiple concurrent requests executing within one
> HttpClient. Will the behavior be as if I specified a timeout per
> request, or am I going to get weird behavior (eg, I set a timeout of 10,
> request R1 starts at T1, R2 starts at T1+5, and R2 times out at T1+10,
> which would be wrong). 

That should not be the case.


> Basically do you fire up that controller thread
> per request, or is there just one per HttpClient object?
> 

The controller thread is started per connect attempt. If it takes a new
connection per request, that basically means a controller thread per
request. The trick is to reuse connections as much as possible.

Another idea:

If you can live with one set of connection/socket timeout values per
target host, you may also do all the threads management yourself,
keeping an instance of HttpClient per worker thread. Just make sure you
reuse HttpClient instances that may already have a connection to the
target host open. Do not let it get GCed

Oleg

> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org] 
> Sent: Friday, April 09, 2004 6:58 AM
> To: Commons HttpClient Project
> Subject: RE: question about performance
> 
> Gil,
> The problem is that until Java 1.4 there has simply been no way to
> ensure connection timeout. HttpClient only 'mimics' connect timeout at
> the expense of having a controller thread watch over the process of
> socket initialization. The controller thread attempts to instantiate a
> socket for a given period time, and if that fails, the controller thread
> simply drops the socket on the floor, leaving it up to the garbage
> collector to clean up the mess. This all is very expensive in terms of
> resource consumption / memory allocation / garbage collection. Knowing
> well about this problem we have put a lot of effects into trying to
> reuse connections as much as possible. This approach works only if you
> keep HttpClient along with its connection manager alive. Creating an
> HttpClient instance per request completely defeats connection re-use and
> results in excessive creation/garbage-collection of objects. 
> 
> > The only setTimeout() calls that I can find are in HttpClient, but
> I'll
> > have multiple concurrent requests that will want different timeouts.
> How
> > do I set a timeout per request?
> > 
> 
> The problem is that 2.0 API does not allow to control timeouts on per
> request basis. There's an open ticket for this bug
> 
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24154
> 
> We are planning to fix the problem for the 3.0 release. You are
> absolutely certain you do need different timeout values on per request
> basis I can even provide a fix for it this weekend. There are also plans
> to add support for 1.4 connect timeout through reflection to circumvent
> the problem by eliminating the controller thread when running in newer
> JDKs. The catch there you'd have to use unstable branch of HttpClient
> which still in pre Alpha1 state.
> 
> Oleg
> 
> 
> > -----Original Message-----
> > From: Oleg Kalnichevski [mailto:olegk@apache.org] 
> > Sent: Thursday, April 08, 2004 1:20 PM
> > To: Commons HttpClient Project
> > Subject: RE: question about performance
> > 
> > Gil,
> > HttpClient#getHost / HttpClient#getPort return the DEFAULT host and
> port
> > used when only relative request path is given
> > 
> > HttpClient agent = new HttpClient();
> > GetMethod get1 = new GetMethod("/relative/whatever.html");
> > // default host configuration applies
> > GetMethod get2 = new
> > GetMethod("http://www.whatever.com/absolute/whatever.html");
> > 
> > Oleg
> > 
> > 
> > 
> > On Thu, 2004-04-08 at 22:01, Alvarez, Gil wrote:
> > > Ok, I considered reusing HttpClient, but when I saw methods such as
> > > HttpClient.getHost() and getPort(), they implied that at the very
> > least
> > > it's not a thread safe class to use. If i have multiple threads
> > > executing within one HttpClient object at the same time, and I call
> > > HttpClient.getHost(), what's going to happen?
> > > 
> > > -----Original Message-----
> > > From: Oleg Kalnichevski [mailto:olegk@apache.org] 
> > > Sent: Thursday, April 08, 2004 12:23 PM
> > > To: Commons HttpClient Project
> > > Subject: Re: question about performance
> > > 
> > > Gil,
> > > (1) First and foremost DO reuse HttpClient instances when using
> > > multi-threaded connection manager. HttpClient class is thread-safe.
> In
> > > fact there are no known problems with having just one instance of
> > > HttpClient per application. Using a new instance of HttpClient for
> > > processing each request totally defeats all the performance
> > > optimizations we have built into HttpClient
> > > 
> > > (2) Use multi-threaded connection manager in case you do not
> > > 
> > > (3) Disable stale connection check
> > > 
> > > (4) Do not use connect timeout which causes a controller thread to
> be
> > > spawned per connection attempt
> > > 
> > > Oleg
> > > 
> > > On Thu, 2004-04-08 at 21:02, Alvarez, Gil wrote:
> > > > We recently ported our url-hitting code from using java.net.* code
> > to
> > > > httpclient code. We use it in a high-volume environment (20
> machines
> > > are
> > > > hitting an external 3rd party to retrieve images).
> > > > 
> > > >  
> > > > 
> > > > 
> > > > 
> > > > After the port, we saw a significant increase in cycles used by
> the
> > > > machines, about 2-3 times (ie, the load on the boxes increased
> from
> > > > using up 20% of the cpu, to about 50%-60% of the cpu.
> > > > 
> > > >  
> > > > 
> > > > For each request, we instantiate an HttpClient object, and a
> > GetMethod
> > > > object, and shut things down afterwards.
> > > > 
> > > >  
> > > > 
> > > > In order to reduce the use of cycles, what is the recommended
> > > approach?
> > > > 
> > > >  
> > > > 
> > > > Thank you.
> > > > 
> > > 
> > > 
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > > commons-httpclient-dev-help@jakarta.apache.org
> > > 
> > > 
> > > 
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > commons-httpclient-dev-help@jakarta.apache.org
> > > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> > commons-httpclient-dev-help@jakarta.apache.org
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-httpclient-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-httpclient-dev-help@jakarta.apache.org


Mime
View raw message