httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jerenkra...@ebuilt.com>
Subject Re: [PATCH] Problems with MPM threaded
Date Sat, 14 Jul 2001 06:48:55 GMT
On Sat, Jul 14, 2001 at 12:42:29AM -0400, GUMMALAM,MOHAN (HP-Cupertino,ex2) wrote:
> I propose the following patch [PATCH A]: It will partially fix the unwanted
> child deaths problem (symptoms mentioned in the mails included below).  It
> fixes the problem by making sure that perform_idle_server_maintenance() does
> not count the threads of the process that recd the POD, in the calculation
> of idle_thread_count.  To do that I have used
> ap_scoreboard_image->parent[process_slot]->process_status field.  I am
> temporarily using an already defined value, SB_IDLE_DIE.  If the general
> idea is acceptable, I can work on solidifying the details.  PATCH A is
> attached below.

Have you taken a look at the patch I posted that merges the POD code in
threaded with the version in mpm_common.c?  Threaded shouldn't be doing
POD checks in threaded.c.  It's redundant and it's done incorrectly
anyway in the threaded MPM.  

Admittedly, this doesn't fix the issue of having a child who received a
POD kill its sibling threads.  More on that in a sec.

> However this patch exposes another problem in the code - by this new fix,
> although the untargetted childs do not get a POD, the targetted child
> process does not die immediately either.  Here is why that happens:  In the
> worker_thread() routine in threaded.c, at the instant when worker 1.0
> (represented in the <process_slot>.<thread_slot> format, i.e, 1 is the
> process_slot and 0 is the thread_slot) gets the POD, the remaining threads
> of process 1 are all waiting at the apr_lock_acquire(accept_mutex).  If the
> web-server is really idle, the chances are slim that all the remaining
> worker threads for process 1 would acquire the lock in a very short time.
> As an effect, the remaining worker threads of process 1 do not die
> immediately.  To resolve this:
> 
> SOLUTION 1:  I plan to temporarily implement a new
> apr_lock_acquire_timeout() function, which would cause the threads waiting
> on the mutex to give-up after sometime.  The piece of code would look
> something like this:
> 
> while ((rv = SAFE_ACCEPT(apr_lock_acquire_timeout(accept_mutex, timevalue)))
> 	!= APR_SUCCESS) {
> 	if (check_if_timer_popped)
> 		if (workers_may_exit)
> 			break;
> 	else {                         /* apr_lock_acquire failed */
> 		ap_log_error(....);
> 		workers_may_exit = 1;
> 	}
> }
> 
> I know that this would cause some performance impact, but my guess is that
> it would not be a lot, especially if we keep the timevalue reasonably high.
> However, in order to get the functionality of
> perform_idle_server_maintenance() working right, we _will_ have to implement
> the above solution (or maybe something similar)!

-1 (my veto only matters in APR-land).  This is ugly.  How are you going 
to implement this?  The problem is that we need to wake the system up from 
holding the mutex.  But, that is almost impossible to implement portably
AFAIK.  Please enlighten me if you know of a way to do this.  You can't 
do a "trylock" (assuming you can do so with all variants) and then sleep 
for the timeout value.  That doesn't work either.  (All of the threads are 
sleeping after trying a held lock when someone comes along?  Oops.)

The only alternative that I can think of is pthread_cancel(), BUT (at 
least Solaris) says (man cancellation):

     A mutex is explicitly  not a cancellation point  and  should
     be held for only the minimal essential time.

A pthread_cond_wait() and fcntl() are viable options according to the
manpage.  But, I'd much prefer us to use pthread_mutex_t if at all
possible for cross-process locking.  (Yeah, yeah, we require Solaris 8+ 
to do pthread_mutex_t for a cross-process lock now for robust locking.
That's what you get for having child processes - see below...)

> SOLUTION 2: One could use a slightly more _involved_ approach, where the
> dying thread could send a signal to its sibling threads, each of which will
> then handle that signal with a graceful exit.

How?  A thread doesn't have a process id associated with it.  And, I'm
curious to whether the signal would somehow kick us out of the acquire
function.  Doubtful.  I may be wrong (probably am).  Please correct me.
We need to be able to tell the threads they need to exit, but you can't
force them to just exit (which is why pthread_cancel wouldn't work).
They need to exit on their own - otherwise, they won't cleanup.

> SOLUTION 3: We could turn our heads the other way by not worrying about this
> situation at all, since these threads would eventually die (when more
> requests are handled by the webserver).  However, by ignoring the problem,
> the purpose of perform_idle_server_maintenance() would be lost!!

And, based on my interpretation and analysis of the threaded MPM and its
intentions, we shouldn't even need to worry about this case.  Let me 
explain: 

The problem you are stating is that the POD is received by one thread in
one child process, it marks workers_may_exit and then quits.  The accept
mutex is then shifted over to another child process which doesn't have
workers_may_exit set and goes on its merry way.  The only time we check
for workers_may_exit is after the mutex has been acquired.  Therefore,
it can take a while to have all of the threads in that POD-receiving
process to see workers_may_exit.

Threaded MPM is designed (rather should be - the implementation
doesn't suggest this) to only have ONE child process that does the
actual work.  Think about it.  A thread is by its very nature (unless 
we have some thread-poor OSes - but they shouldn't be using threads in 
the first place) designed to be isolated and cheap (cheaper than a child
process on many OSes, but not others).  There is no gain (IMHO) to
having it fork to create other servers.  The only reason is to protect
us from segfaults or other bad things.  But, as my recent foray into 
robust locks show, you can have one child process doing threading,
segfault and we'll recover fairly quickly.  (We'll detect the missing
 child in the parent process when we do idle_server_maint and recreate 
it.)  Ideally, you'll see the core of the segfault and submit us a bug 
report and we'll fix it.  =)  (If only I could reproduce the segfaults 
I was receiving last night!)

Upon further review, having the StartServers directive seems bogus in a 
pure threaded MPM.  We're going crazy by doing the forking to create 
multiple child processes in a *threaded* MPM.  If we're a threaded MPM,
then we need to only concern ourselves with threads not child processes.
I believe that we shouldn't be doing forking in a threaded MPM.

So, what does this mean?  Well, if we only have one child process
(admittedly the implementation allows for more, but I think that needs
to go away), then when one worker receives the POD line, all of the
other threads are going to go away very quickly.  Why?  Because the
accept_mutex is now going to be released.  Each of the threads will
acquire the mutex in turn and then check the value of workers_may_exit.
It'll be one, so they will ignore the accept and then exit voluntarily.
Since there should only be one child process, the mutex is going to go 
to another thread in the same process - and it'll have workers_may_exit
set to 1.  So forth and so on.

So, what does the POD now gain for us?  The ability to shut down the
child process full of worker threads cleanly.  And, I think we can do
that other ways besides POD, but I'd have to play with the code more.
I'm not blaming the POD for this - there are much bigger concerns.
The POD makes sense for prefork, but not really for a single-process,
multi-threaded MPM.

I think the threaded MPM suffers a bit from its prefork origins.  I'd
like to take a stab at reworking it to make more sense in a pure
threaded environment.  I realize what I'm volunteering myself for.
Since part of my OS focus is on Solaris (FreeBSD doesn't support threads
and Linux treats threads almost identically to processes), I realize
what a proper threading MPM will do to Solaris and its performance
(it'll do almost nothing to the other platforms).  That's what I care 
about.  =)  Not to mention the code is *very* confusing (decipherable, 
but not easy to understand - Roy rolls his eyes when I discuss the 
current threaded MPM).

And, since we have MPM, I can do this separately without affecting the
current threaded MPM code, but I wouldn't call what we have now a pure
threaded MPM.

The big thing to kill is the StartServer directive and remove the 
concept of processes in the threaded MPM.  I'm not totally sold on 
the stability of the current threaded MPM code.  I think there are 
lots of room for improvement and simplificiation (it took me two 
whiteboards to diagram the current threaded MPM and how it functions!).
If it doesn't fit on a napkin, it's too complicated...

Thoughts?  How far off base am I?  -- justin


Mime
View raw message