apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <wr...@apache.org>
Subject Re: [PATCH] Thread Locks and SMP boxes
Date Wed, 30 Jul 2003 23:41:34 GMT
At 06:07 PM 7/30/2003, Joe Orton wrote:
>On Wed, Jul 30, 2003 at 11:21:21AM -0500, William Rowe wrote:
>> 
>> The existing unix/thread_mutex.c code is neither thread-safe
>> nor reentrant when using the 'nested' (default) thread mutexes.
>
>Why do you say "nested (default)"? The pools code is the only place I
>can find which actually uses nested mutexes. By default they are not
>nested.

You are correct - it's a platform choice, and Win32 chooses the nested
as a default because critical sections are much faster.  I made 
a misassumption on this point.


>> The bug is most often expressed on SMP boxes, where they
>> are so massively parallel that the subtle flaws of modifying the
>> mutex object AFTER we have released the lock become very
>> apparent.
>>
>> Examples include a number of error (45) Deadlock Avoided
>> alerts using Apache2/mod_ssl on Solaris, where there are both
>> internal thread locks and global locks which contain a nested
>> thread lock.
>
>This code looks like it is making lots of dubious assumptions:
>
>- pthread_t can be a structure, that's why the memset is used I presume,
>assinging 0 to a pthread_t definitely wins no prizes

We are assigning the pthread_t to the result of apr_os_thread_current()
so I presumed it's intregal enough to accept a 0/NULL reset.  memset
is obviously less atomic than such a reset, using many more cycles.

>- who says a memset-to-zero'ed pthread_t isn't a valid pthread_t?  No
>reason why it couldn't be.  If it is *not* a valid pthread_t, then
>passing it to pthread_equal gives undefined behaviour anyway!

This point *does* scare me, and it suggests some sort of requirement
for an APR_THREAD_ID_NONE that's guarenteed to be not-a-thread.

>- let's hope and pray that pthread_equal() is an atomic operation

Good point, and point taken.

>> Many more eyes on this code would be welcomed, I expect to
>> discover that we repeat this mistake elsewhere in the mutex
>> code...
>
>Like Aaron I'm not sure why the comments about reentrancy are relevant:
>you can't use pthread mutexes (and hence apr mutexes :) from a signal
>handler, anyone doing that is SOL.

So we should spell that out as a prerequisite.

>You've made lots of changes in the patch:
>
>1. changing some incr++ to ++incr which makes no differences

The result is add-and-return, as opposed to evalute and increment.
The result will be optimized away since it's unused as an rvalue,
but there always is a difference.

Evaluting some bytecode might be amusing.

>2. re-ordered the mutex_unlock code so that the ownership changes occur
>inside the critical section.

This is the *critical* point of the patch, and one that I would commit first
and prior to any other "legibility" or other insignificant aspects of the patch.

>3. added some "protection" against races
>
>1 and 3 look bogus; 2 looks like the important change here.  The only
>effect of your (3) changes would seem to be to hide races if there still
>are any?

No, the next step in 3 is to actually provide a mechanism for throwing
an exception when those cases are encountered.

>(2) does introduce a semantic change in that the caller cannot rely on
>pthread_mutex_unlock() having defined behaviour if called from a thread
>which does not own the mutex lock, but I expect that's no big deal.

If we are crossing ownership due to this or other races or reentrancy,
this is a significant change, but it should probably become a fatal exception.

>It's worth noting that most modern POSIX implementations offer
>"recursive" mutexes in which pthread_mutex_lock/unlock do all this work
>internally.

Which should be detected with autoconf.  As I invited, more eyes would
be much appreciated.

Bill


Mime
View raw message