apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Gooderum <m...@verniernetworks.com>
Subject Re: PThread cond_wait bug w/nested mutex
Date Wed, 10 Dec 2003 04:25:19 GMT

I sent an Email a while back asking about a problem with the nested 
mutexes and cond vars and got back a reply but I've now stumbled accross 
a case that bites unnested mutexes.

If the caller of apr_cond_notify() (or anyone else for that matter) 
explicitly locks the mutex while the caller to await is asleep the APR 
mutex is trashed because the apr owner field has been trashed.

This is legal in pthreads (and in fact using nested mutexes is) and is a 
common idiom to wakeup a waiting process but guarentee it doesn't run 
until some later point when you're ready.

A simple case that gets you lossage is:

    "signaler"

    apr_thread_mutex_lock(mymutex)
    apr_thread_cond_signal(mymutex)
    /* Do other stuff you want to happen before client runs */
    apr_thread_mutex_unlock(mymutex)

    "waiter"

    apr_thread_mutex_lock(mymutex)
    /* Do stuff that has to happen safely before I sleep but after I get
    the mutex */
    apr_thread_cond_wait(mycond, mymutex)
    apr_thread_mutex_unlock() <== Fails because mutex->owner is NULL

This sort of thing is most interesting where it's a nested mutex inside 
library code that may or may not be called with the mutex held - you can 
have a utility function that wakes up the another thread but maybe the 
caller also held the mutex to do other bits under its protection.

The key issue is that pthread_cond_signal/pthread_cond_await/timedwait 
guarentee that the thread atomically sleeps and unlocks the mutex and 
then atomically wakes up and reacquires the mutex in the same state it 
left it.  APR doesn't do this - the simple call through to the 
pthread_cond_xx functions breaks the atomicity/integrity of the 
mutex/cond_var pair and requires some massaging to emulate.

The general issue is that when the waiter is "asleep" the state of the 
APR 1/2 of the mutex indicates that the mutex is held by the awaiting 
thread when in fact the mutex is not held by anyone.  This is general 
issue where you share a mutex between a condition variable and 
protecting "other" external bits.
--
Mark

> I do not believe nested mutexes and condition variables are
> compatible.
>
> Do you really need nested mutexes?
>
> -aaron
>
>
> On Thursday, August 21, 2003, at 02:16  PM, Mark Gooderum wrote:
>
>> I've discovered what I think it a bug in the pthreads based 
>> cond_wait() code when used with nested mutexes.
>

------------------------------------------------------------------------

I've discovered what I think it a bug in the pthreads based cond_wait() 
code when used with nested mutexes.

The "proper" (per my experience) way to use a condition variable in the 
pthreads world is a squence like:

Waiter            Notifier
----------        -------------
mutex_lock
cond_wait
(pthreads atomically surrenders lock and waits)
                   mutex_lock
                   cond_signal
                   mutex_unlock
(pthread atomically requires the mutex and wakes up)
mutex_unlock

But if the waiter code is for instance in a library and is called with 
the nested mutex already locked the lock/cond_wait/unlock cycle always 
clears the mutex.  So a cycle like:

Waiter            Notifier
----------        -------------
mutex_lock
mutex_lock
cond_wait
(pthreads atomically surrenders lock and waits)
                   mutex_lock
                   cond_signal
                   mutex_unlock
(pthread atomically requires the mutex and wakes up)
mutex_unlock
mutex_unlock

is broken - Details follow - what happens is the mutex_lock/unlock cycle 
in the notifier trashes the APR portions of the mutex.

    APR_DECLARE(apr_status_t) apr_thread_cond_wait(apr_thread_cond_t *cond,
                                                   apr_thread_mutex_t
    *mutex)
    {
        apr_status_t rv;

        rv = pthread_cond_wait(cond->cond, &mutex->mutex);
    #ifdef PTHREAD_SETS_ERRNO
        if (rv) {
            rv = errno;
        }
    #endif
    }

The owner and owner_ref fields of the APR mutex structure are left 
alone.  The caller does a mutex_lock which works and destroys the 
owner/owner_ref fields:

    APR_DECLARE(apr_status_t) apr_thread_mutex_lock(apr_thread_mutex_t
    *mutex)
    {
        apr_status_t rv;

        if (mutex->nested) {
    ---->This doesn't hit because owner is the other waiting thread:
            if (apr_os_thread_equal(mutex->owner,
    apr_os_thread_current())) {  
                mutex->owner_ref++;
                return APR_SUCCESS;
            }
    --->So we succeed in "stealing" the mutex
            rv = pthread_mutex_lock(&mutex->mutex);
            if (rv) {
    #ifdef PTHREAD_SETS_ERRNO
                rv = errno;
    #endif
                return rv;
            }

            mutex->owner = apr_os_thread_current();
            mutex->owner_ref = 1;
            return rv;
        }
        else {
            rv = pthread_mutex_lock(&mutex->mutex);
    #ifdef PTHREAD_SETS_ERRNO
            if (rv) {
                rv = errno;
            }
    #endif
            return rv;
        }
    }

When the notifier than calls mutex_unlock() the pthread mutex is 
released and the mutex owner/owner_ref fields zeroed out.

So now the waiter thread gets the pthread mutex inside 
pthread_cond_wait() and wakes up.  The call returns to the caller.  The 
caller then trys to release the mutex - this call succeeds because we 
always release the mutex.  But the mutex is now fully released  - so if 
the caller is counting on holding the mutex with nesting he's hosed.

    APR_DECLARE(apr_status_t) apr_thread_mutex_unlock(apr_thread_mutex_t
    *mutex)
    {
        apr_status_t status;

        if (mutex->nested) {
            ---->This fails so it just releases the pthread_mutex
    (incorrectly if my original nest count was > 1)
            if (apr_os_thread_equal(mutex->owner,
    apr_os_thread_current())) {
                mutex->owner_ref--;
                if (mutex->owner_ref > 0)
                    return APR_SUCCESS;
            }
            status = pthread_mutex_unlock(&mutex->mutex);
            if (status) {
    #ifdef PTHREAD_SETS_ERRNO
                status = errno;
    #endif
                return status;
            }

            memset(&mutex->owner, 0, sizeof mutex->owner);
            mutex->owner_ref = 0;
            return status;
        }
        else {
            status = pthread_mutex_unlock(&mutex->mutex);
    #ifdef PTHREAD_SETS_ERRNO
            if (status) {
                status = errno;
            }
    #endif
            return status;
        }
    }

The fix seems to be to save and restore the owner/owner_ref fields 
around the pthread_cond_wait() call.  The same thing applies to the 
timed version of the wait but I used the shorter version here for 
clarity.  P.S.. - I'm not subscribed to the dev list so please CC me on 
replies, thanks.
--
Mark Gooderum
Vernier Networks, Inc.
mark@verniernetworks.com






 


Mime
View raw message