httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Bannert <aa...@clove.org>
Subject Re: cvs commit: httpd-2.0 STATUS
Date Thu, 03 Jan 2002 14:03:00 GMT
On Thu, Jan 03, 2002 at 09:53:38AM -0000, jerenkrantz@apache.org wrote:
> jerenkrantz    02/01/03 01:53:38
> 
>   Modified:    .        STATUS
...
>   @@ -149,6 +149,18 @@
>                     hang. My theory is that this has to do with the
>                     pthread_cond_*() implementation in FreeBSD, but it's still
>                     possible that it is in APR.
>   +    Justin adds: Oh, FreeBSD threads are implemented entirely with 
>   +                 select()/poll()/longjmp().  Welcome to the nightmare.
>   +                 So, that means a ktrace output also has the thread 
>   +                 scheduling internals in it (since it is all the same to 
>   +                 the kernel).  Which makes it hard to distinguish between 
>   +                 our select() calls and their select() calls.  
>   +                 *bangs head on wall repeatedly*  But, some of the libc_r 
>   +                 files have a DBG_MSG #define.  This is moderately helpful
>   +                 when used with -DNO_DETACH.  The kernel scheduler isn't 
>   +                 waking up the threads on a select().  Yum.  And, I bet 
>   +                 those decrementing select calls have to do with the 
>   +                 scheduler.  Time to brush up on our OS fundamentals.

Good theory, but in my trace we were only looking at the PID of the parent
process, where there aren't any threads (* technically there is only
1 thread). It is almost certainly a bug somewhere, since consuming CPU
without bounds while performing a non-CPU-intensive task is unexpected
behavior. In our case we run waitpid() followed by select() with a
timeout of 1 second (to emulate a sleep()).

The select()-based threading model just means that it's entirely in userspace
and that the threads are not preemptive. It also means that the "request
gets stuck until another one comes along and dislodges it" bug you were
seeing is going to happen on ANY platform with non-preemptive threads,
like:

Netware
FreeBSD
Cygwin (I'm guessing, since he saw the same bug)
Anyone using GNU Pth
Anyone using any other full-userspace non-preemptive thread library.

My guess is we're using a blocking call somewhere in worker that is
not posting an event that the select()-based scheduler can use to do a
context switch on.

apr_thread_yield() anyone?

-aaron

Mime
View raw message