hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: How to limit the response size
Date Mon, 08 Aug 2005 20:54:51 GMT
Tony
While you were away we have fixed a rather nasty bug, which may also
have been the cause of the problems you were having. 

http://issues.apache.org/bugzilla/show_bug.cgi?id=35944

Could you please get the latest SVN snapshot and test your application
against it? I'll look at the logs you have posted if you confirm that
the problem still persists. It is a massive amount of data to go
through, so you would really appreciate it if I did not have to look at
unnecessarily.

Have you seen anything of this sort in the logs or in the standard
out/standard error?

java.lang.IllegalStateException: Connection is not open
        at org.apache.commons.httpclient.HttpConnection.assertOpen(HttpConnection.java:1269)
        at org.apache.commons.httpclient.HttpConnection.isResponseAvailable(HttpConnection.java:872)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
$HttpConnectionAdapter.isResponseAvailable(MultiThreadedHttpConnectionManager.java:1307)
        at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:
2272)
        at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:
1755)
        at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher
(AutoCloseInputStream.java:180)
        at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:140)
        at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1086)

Cheers,

Oleg

On Mon, 2005-08-08 at 16:09 -0400, Tony Spencer wrote:
> Hi Oleg, 
> Sorry for the late reply but I was away on vacation.  I finally
> configured my logging and attempted to use the connection manager and
> yes I did see multiple occurrences of exactly what you mentioned:
> 
> Unable to get a connection, waiting..., hostConfig=HostConfiguration
> 
> I'm sending you the wire and context log privately.  Thank you very
> much for taking a look.
> 
> Tony
> 
> 
> On 7/22/05, Oleg Kalnichevski <olegk@apache.org> wrote:
> > On Fri, 2005-07-22 at 15:13 -0400, Tony Spencer wrote:
> > > Hi Oleg,
> > > I'm not sure exactly whats going on as I haven't dug through the
> > > source code enough but I do know that when I try using
> > > MultiThreadedHttpConnectionManager and calling releaseconnection in
> > > the finally block as you have done here, my bot threads start hanging
> > > after a few hundred requests.  I am only hypothesizing that the
> > > connections are not returning to the pool.
> > >
> > 
> > Tony,
> > 
> > Do you see something like that in the log?
> > 
> > Unable to get a connection, waiting..., hostConfig=...
> > 
> > Anyways, if you manage to produce a context/wire log of the session, we
> > amy be able to figure out that goes wrong
> > 
> > Oleg
> > 
> > > On 7/22/05, Oleg Kalnichevski <olegk@apache.org> wrote:
> > > > On Fri, Jul 22, 2005 at 01:07:11PM -0400, Tony Spencer wrote:
> > > > > In case anyone else is using HttpClient for a multi-threaded crawler,
> > > > > here is the solution that seems to solve all the problems in this
> > > > > discussion:
> > > > >
> > > > > Don't use the MultiThreadedHttpConnectionManager.  You will need
to
> > > > > bail if a response body reaches a limit you define (mine is 100k).
> > > > > The only way to break the connection is to call HttpMethod.abort.
> > > > > Unfortunately this doesn't allow the HttpConnection to be safely
> > > > > returned to the connection manager's pool.
> > > >
> > > > Tony,
> > > >
> > > > Why is that? What is it that prevents the connection from being returned
> > > > back to the pool? I believe HttpMethod#releaseConnection should have no
> > > > problem handling connections that have been closed by HttpMethod#abort
> > > >
> > > > GetMethod httpget = new GetMethod("/stuff");
> > > > try {
> > > >   httpclient.executeMethod(httpget);
> > > >   // do something with the response
> > > >   // and if you get fed up, just call
> > > >   httpget.abort();
> > > > } finally {
> > > >   httpget.releaseConnection();
> > > > }
> > > >
> > > > Oleg
> > > >
> > > >
> > > > Instead, I found pretty
> > > > > good performance by creating a new HttpClient (simple constructor
:
> > > > > new HttpClient()) for each thread and use it for 1,000 requests at
> > > > > which time I destroy the current and create a new one.  I'm sure
this
> > > > > doesn't perform as well as the multi threaded manager but it ran
all
> > > > > night for me with no exceptions, no memory leaks, and pulled down
2
> > > > > million sites in about 12 hours (running 100 threads).  Not bad.
> > > > >
> > > > > On 7/21/05, Tony Spencer <tony.spencer@gmail.com> wrote:
> > > > > > Ok, I hope you aren't getting sick of this problem. :)
> > > > > >
> > > > > > HttpMethod.abort does solve the problem of sites that send an
infinite
> > > > > > response.  However, it seems that by calling abort we cannot
properly
> > > > > > release the connection.  I've tried calling method.releaseConnection
> > > > > > right after abort.
> > > > > >
> > > > > > My usage for HttpClient is a multi-threaded crawler so I've
followed
> > > > > > the suggestions on the threading page
> > > > > > http://jakarta.apache.org/commons/httpclient/threading.html
(nice
> > > > > > documentation by the way).  So I use the
> > > > > > MultiThreadedHttpConnectionManager as suggested and reuse the
same
> > > > > > HttpClient over and over as suggested.  After a certain number
of
> > > > > > calls to HttpMethod.abort my HttpClient goes bad (hangs).
> > > > > >
> > > > > > So it appears that abort is too harsh and  doesn't allow clean
return
> > > > > > of the client to the pool.  Any more suggestions?
> > > > > >
> > > > > > On 7/21/05, Tony Spencer <tony.spencer@gmail.com> wrote:
> > > > > > > Disregard my last message.  Your suggestion did work Oleg.
 Originally
> > > > > > > I put the abort statement after attempted to close the
input stream.
> > > > > > > Once I moved it in front of the stream close statement
it worked fine.
> > > > > > >  Thank you very much.
> > > > > > >
> > > > > > > On 7/21/05, Oleg Kalnichevski <olegk@apache.org>
wrote:
> > > > > > > > Just call HttpMethod#abort to close the underlying
connection
> > > > > > > >
> > > > > > > > Oleg
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, 2005-07-21 at 16:34 -0400, Tony Spencer wrote:
> > > > > > > > > Ok, I managed to limit the the response to 8k
in the following code
> > > > > > > > > but it doesn't help with what I'm really trying
to accomplish.
> > > > > > > > > Sometimes there is a site that will spew a neverending
response.  This
> > > > > > > > > causes HttpClient to hang indefinitely.  My code
below does not solve
> > > > > > > > > the problem.  Here is an example of a nasty site
that never stops
> > > > > > > > > sending response: http://www.tfc-charts.w2d.com/chart/dw/w
(beware.
> > > > > > > > > it may crash your browser if you browse it)
> > > > > > > > >
> > > > > > > > >                 InputStream is = method.getResponseBodyAsStream();
> > > > > > > > >                 BufferedInputStream bis = new
BufferedInputStream(is);
> > > > > > > > >                 byte[] bytes = new byte[ 8192
];
> > > > > > > > >                 bis.read(bytes);
> > > > > > > > >                 bis.close();
> > > > > > > > >                 is.close();
> > > > > > > > >                 ret = new String(bytes);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 7/21/05, Tony Spencer <tony.spencer@gmail.com>
wrote:
> > > > > > > > > > I'd like to limit the size of the response
but don't know how.  For
> > > > > > > > > > instance, if the response body is greater
than 100k I would like to
> > > > > > > > > > close the connection to the site.  How can
I go about doing this?  I
> > > > > > > > > > see the available method param : BUFFER_WARN_TRIGGER_LIMIT
but it only
> > > > > > > > > > seems to control warning logging.
> > > > > > > > > >
> > > > > > > > > > Currently I receive the response body like
so:
> > > > > > > > > > byte[] bytes = method.getResponseBody();
> > > > > > > > > >
> > > > > > > > > > Any help greatly appreciated.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > > > > > > > > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > > > > > > > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > > > > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> > >
> > >
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Mime
View raw message