httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Laurie <...@algroup.co.uk>
Subject Re: Apache 2.0 brokenness...
Date Sat, 22 Jan 2000 23:24:03 GMT
"Ralf S. Engelschall" wrote:
> 
> In article <3889D03A.BC318F98@algroup.co.uk> you wrote:
> 
> >> > If I run a single instance of the threaded version (by setting
> >> > MaxClients to 1) it works. If I run multiple instances and debug the
> >> > connection, it works. If I don't, I get connected, but the browser
> >> > hangs. I'm not sure how to get a handle on debugging this! Any ideas,
> >> > anyone?
> >> >
> >> > Platform is FreeBSD 3.2.
> >>
> >> Does the problem go away if you use -DNO_SERIALIZED_ACCEPT, Ben?
> >
> > Hmmm, yes it does.
> >
> >> If
> >> yes, then it's the mutex deadlock problem I mentioned a few months
> >> ago, which occurs with all user space threading environments (e.g.
> >> FreeBSD uthreads) because Apache 2.0 still uses flock/fcntl for the
> >> inter-process accept mutex which usually does work only in kernel space
> >> threading environments (e.g. LinuxThreads). If no, then its some new
> >> problem.
> >
> > I must've missed that - would you mind explaining again what the problem
> > is?
> 
> Here it comes: The problem is that the threaded MPMs were originally
> developed under Linux where a thread is implemented in kernel space.
> Apache for various good reasons uses a mutex around accept() calls.
> There are various variants how this mutex is implemented: flock,
> fcntl, pthread_mutex, etc. The mutex has to be an inter-process mutex
> per intention. For non-threaded MPMs this is no problem, because
> there fcntl() or flock() is fine to call: they block the current
> process. And in a threaded MPM with kernel space based threads (e.g.
> LinuxThreads!) it still works (although it violates POSIX), there the
> fcntl()/flock() calls block the current thread only (because it _IS_
> actually a process). But now image what happens under any user space
> (e.g. FreeBSD uthread or GNU Pth!) threading systems: there they block
> not only the thread, they block the whole process and all of its threads
> while it actually should only block a single thread. Bang! Some sort of
> a deadlock occurs which can be freed again only by the next incoming
> HTTP connection.
> 
> In short it runs this way under a user space threading environment:
> 
> 1. say we have 2 childs: c1, c2
>    say each child has only one initial thread: t1a, t2a
> 2. t1a and t2a enter the accept mutex, as a result
>    c1 and c2 are both blocked until a HTTP connection arrives.
> 3. say c1 now gets a connection. the kernel awakes
>    c1 and c2 spawns a request thread t1b which immediately starts
>    running. It enters the I/O part and reads from the socket.
> 4. now say the sockets block a little bit and this
>    way the user space scheduler switches in c2 from t1b to t1a. t1a
>    again enters the accept mutex loop.
> 5. now because the mutex is a process-mutex, the
>    thread t1a blocks the whole child c1 and this way also t1b. BINGO!
>    the request processing in t1b hangs until child c1 again gets another
>    request and the user space scheduler again switches to t1b.
> 
> Is the problem not more clear, Ben?

It is, but I didn't find it immediately clear what you mean (partly
because 2 implies that both children are blocked on the mutex, which
isn't true), so here's an alternate description:

The two children c1 and c2 start in this state:

c1: blocked on the mutex (and hence not scheduling threads)
c2: blocked on accept

A connection comes in, which c2 accepts, c2 then releases the accept
mutex and dispatches a second thread to handle the request. It then
acquires the mutex again. By this point c1 has unblocked on the mutex
because c2 released it, so c2 now blocks. Of course this disables the
thread handling the connection, so nothing is returned to the browser.
Until the next incoming connection, which causes c1 to release the
mutex, so c2 unblocks the connection-handling thread (for a while at
least).

For doubting Thomases, here's the relevant manual snippet:

     In the threaded library, the fcntl syscall is assembled to
     _thread_sys_fcntl() and fcntl() is implemented as a function which
dis-
     ables thread rescheduling, locks fd for read and write, then calls
     _thread_sys_fcntl().  Before returning, fcntl() unlocks fd and
enables
     thread rescheduling.

Of course, this is a disaster and means that fcntl() simply cannot be
used as a mutex in a threaded system (at least not this kind). As you
say, flock() also behaves in the same way, and therefore also cannot be
used.

The interesting question is ... is there a way to test for systems that
behave in this way? And, what do we use instead?

Cheers,

Ben.

--
SECURE HOSTING AT THE BUNKER! http://www.thebunker.net/hosting.htm

http://www.apache-ssl.org/ben.html

"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
     - Indira Gandhi

Mime
View raw message