tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <wbar...@wilshire.com>
Subject Re: mod_jk does not detect a hung Tomcat
Date Wed, 24 Sep 2003 19:55:15 GMT

----- Original Message -----
From: "Glenn Nielsen" <glenn@mail.more.net>
To: "Tomcat Developers List" <tomcat-dev@jakarta.apache.org>
Sent: Wednesday, September 24, 2003 12:28 PM
Subject: Re: mod_jk does not detect a hung Tomcat


>
>
> Henri Gomez wrote:
> > David Rees a écrit :
> >
> >> Henri Gomez said:
> >>
> >>> Henri Gomez a écrit :
> >>>
> >>>>> Nope since you don't have to just test at protocol level but also
on
> >>>>> higher level, for instance check the full chain, up to servlet
> >>>>> handling.
> >>>>>
> >>>>>
> >>>>>> It's easy to simulate this behavior by sending a STOP signal
to
> >>>>>> Tomcat.
> >>>>>>
> >>>>>> I've also attached a log from mod_jk showing the problem.  I
marked
> >>>>>> the
> >>>>>> point at which processing in mod_jk stopped until I sent a CONT
> >>>>>> signal to
> >>>>>> tomcat.
> >>>>>>
> >>>>>> Does mod_jk2 have this same problem?  Is there any interest
in
fixing
> >>>>>> this? Does anyone have a workaround for this issue?
> >>>>>
> >>>>>
> >>>>> Well, if you have a hung tomcat, you're probably allready in serious
> >>>>> trouble.
> >>>>
> >>
> >>
> >> No, actually in my case I wasn't.  I had two Tomcats running, as one
was
> >> prone to locking up due to a JVM or application bug.  With a 50-50 load
> >> distribution between two Tomcats, this left me with 1/2 of the requests
> >> getting stuck and clients waiting forever and tying up Apache
> >> processes. Eventually, a DOS will be the result if action is not taken
> >> in time.  If
> >> mod_jk noticed it wasn't really alive, this wouldn't be an issue at
all.
> >>
> >>
> >>>>> Anyway, if we add stuff like time-out in ajp request, you could
be
> >>>>> stuck with long running servlets. Also jk read request in a blocking
> >>>>> mode for performance and adding timeout here is not an option.
> >>>>
> >>
> >>
> >> Agreed that we wouldn't want a timeout normally to handle normal long
> >> running servlet processes, but if there was a PING/PONG added to the
> >> protocol there should be a timeout to prevent the above situation.
> >>
> >>
> >>>> When I worked on ajp13++ (ajp14) protocol, I added a more secure auth
> >>>> mecanism at connection time.
> >>>>
> >>>> Since there is a bidirectionnal communication, jk could detect that
> >>>> even if the connection is open, the remote didn't respond and so fall
> >>>> back to the next in cluster configuration.
> >>>>
> >>>> But on allready established connections, the problem persist.
> >>>>
> >>>> Or we should add a PING/PONG before sending any request to tomcat.
> >>>>
> >>>> It could be done as optional but I work on it only if many users make
> >>>> such requirements
> >>>
> >>>
> >>> if many users ask for such feature ;)
> >>
> >>
> >>
> >> Well, you've got one so far.  ;-)  Adding a configurable option to have
> >> mod_jk verify (PING/PONG) that Tomcat is actually responding before
using
> >> the connection would solve the problem and I can't imagine that it
would
> >> add a lot of complexity to the code as well.  If I wasn't so rusty
> >> with my
> >> C programming and had some spare time, I would offer to help code it
> >> up. ;-)  In any case, I'll be more than happy to help test.
> >
> >
> > Well, if you could find more users or at least one tomcat commiter
> > (Glenn, Remy, Costin, JFC...) who need it, I'll add the necessary code
> > in java and C areas ;)
> >
>
>
> There may be a simple way to achieve what David is asking for without
> setting a request timeout or implementing a PING/PONG between mod_jk
> and Tomcat.
>
> What if each worker tracked the number of requests which were handled
> by the worker since the last successful completion of a request.
>
> i.e. add the following to a worker
>
> worker->last_completed // Time in seconds since last successfully
completed request
> worker->requests_since_last_completed  // Number of requests sent to
worker
> since last successful completion.
>
> Then logic could be added to try and detect an instance of Tomcat which
has
> failed.  Perhaps even allow several additional worker properties to
determine
> when mod_jk should consider the worker failed.

This won't work  with the pre-fork MPM, since each Apache child will have
its own idea of the timing.  The only way that it could tell that a Tomcat
failed is to try the request and fail :).

>
> The idea needs to be flushed out some more. But we should be able to track
> enough data about how a worker is performing to make some simple
decisions.
>
> Glenn
>
> ----------------------------------------------------------------------
> Glenn Nielsen             glenn@more.net | /* Spelin donut madder    |
> MOREnet System Programming               |  * if iz ina coment.      |
> Missouri Research and Education Network  |  */                       |
> ----------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>


Mime
View raw message