Return-Path: Delivered-To: apmail-apr-dev-archive@apr.apache.org Received: (qmail 59425 invoked by uid 500); 16 Nov 2002 23:36:16 -0000 Mailing-List: contact dev-help@apr.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list dev@apr.apache.org Received: (qmail 59404 invoked from network); 16 Nov 2002 23:36:15 -0000 To: dev@apr.apache.org Subject: apr_proc_mutex is broken From: Philip Martin Date: 16 Nov 2002 23:36:20 +0000 Message-ID: <87smy1jde3.fsf@codematters.co.uk> Lines: 65 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: Philip Martin X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hello The following issue was raised in the Subversion issue tracker http://subversion.tigris.org/issues/show_bug.cgi?id=996 "While tracking down Bug 995, I've found out that apr's testglobalmutex and testprocmutex are failing for me when running in SMP mode on a Dual P3. The same tests do not fail when running in Single Processor mode or running on my Dual Athlon in SMP mode. I am not certain if this related to Bug 995, but it does appear that httpd is going into a deadlock and the subversion cmdline client eventually times out. I've also noticed that apache leaks semaphores when this happens (from ipcs)." I'm not really sure why the submitter chose to report it to Subversion, and not (as far as I can tell) to APR, but I took a look at the proc_mutex code and there does appear to be a bug. The documentation for apr_proc_mutex_create states that a proc mutex is intended to synchronize processes. Consider APR_DECLARE(apr_status_t) apr_proc_mutex_lock(apr_proc_mutex_t *mutex) { apr_status_t rv; #if APR_HAS_THREADS if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) { mutex->owner_ref++; return APR_SUCCESS; } #endif if ((rv = mutex->meth->acquire(mutex)) != APR_SUCCESS) { return rv; } #if APR_HAS_THREADS mutex->owner = apr_os_thread_current(); mutex->owner_ref = 1; #endif return APR_SUCCESS; } The first problem is the line if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) { where there is access to the shared data mutex->owner without any sort of synchronization. Now mutex->owner may not be an atomic type, in which case a totally bogus value could be obtained, and if it is an atomic type the unsynchronized access is still pointless. However the real problem is that this is supposed to be a *process* lock, and yet it is comparing *thread* IDs. Thread IDs are distinct within a process, but there is no guarantee that they are distinct across mutiple processes. Comparing thread IDs from two separate processes is undefined behaviour when using POSIX threads. There is at least one common platform (GNU glibc threads) where thread IDs are duplicated. When I run APR's testprocmutex test on a 2-way SMP Linux box it regularly fails (that it doesn't always fail is, I suspect, because it is not a particularly good test and so the processes often complete without any mutex contention). -- Philip Martin