httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Date Fri, 15 Feb 2008 19:48:53 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


basant.kukreja@sun.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW




------- Additional Comments From basant.kukreja@sun.com  2008-02-15 11:48 -------
Thanks Ruediger for your pointer. It was really useful.

Regarding the function : ap_queue_info_wait_for_idler (lines 188-196)
188:        struct recycled_pool *first_pool = queue_info->recycled_pools;
189:        if (first_pool == NULL) {
190:            break;
191:        }
192:        if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools), first_pool->next,
193:                              first_pool) == first_pool) {
194:            *recycled_pool = first_pool->pool;
195:            break;
196:        }

I will represent queue_info->receycled_pools as qu->rp to make it make it
short.  Inside apr_atomic_casptr we acquire a mutex. So I will write 3 steps :
1. Calcualte first_pool.
2. Calculate first_pool->next and invoke apr_atomic_caspptr
3. Acquire lock on qi->rp (inside apr_atomic_casptr)

There is a very clear race condition between the two statement (line 188 and
line 193) and between step 2 & 3. Though I agree that
&queue_info->recycled_pool is protected and it is atomically correct but the
next pointer (first_pool and first_pool->next) are not protected correctly.
There is a very clear race condition between the two. To prove my point, here
is an example :

Suppose at a particular moment recycled_pool pool list is 
1 --> 2 ---> 3 . Where 1,2,3 are the pool nodes. qi->rp = 1. Now consider the
following situation :
Thread 1 :
    first_pool = 1;
    first_pool->next = 2.
    
Now before step 3 is executed that is before we acquire a lock on qi->rp,
context switch happens.

Thread 2 :
    Thread 2 pops a node (1) from the list and hence list becomes 2->3.

Thread 3 :
    Thread 3 pops another node (2) from the list and hence list becomes 3.

Thread 2 :
    push the node back and now list becomes 1->3.

Thread 1:
    first->pool->next = 2.  qi->rp is still 1.  Thread acquires a lock &1 and
atomically compare and swap with 2. It succeeded because qi->rp was 1 but
qi->rp->next was not 3, it becomes 2 and hence queue becomes 2 (or 2-->3).
        
I believe, I can prove my point with a sample standalone application. So far I
used a separate mutex and protected both qi->rp and qi->rp->next both.  I tried
with the attached patch. With this patch, I am able to run the stress for more
than 10 hour without any crash. Without this patch, crash used to happen in
less than 30 minutes. Here is the patch which I tried :
---------------------------------------------------------------------------

--- orghttpd-2.2.6/server/mpm/worker/fdqueue.c	Wed Jul 25 06:13:49 2007
+++ httpd-2.2.6/server/mpm/worker/fdqueue.c	Fri Feb 15 10:57:42 2008
@@ -25,6 +25,7 @@
 struct fd_queue_info_t {
     apr_uint32_t idlers;
     apr_thread_mutex_t *idlers_mutex;
+    apr_thread_mutex_t *queue_mutex;
     apr_thread_cond_t *wait_for_idler;
     int terminated;
     int max_idlers;
@@ -36,6 +37,7 @@
     fd_queue_info_t *qi = data_;
     apr_thread_cond_destroy(qi->wait_for_idler);
     apr_thread_mutex_destroy(qi->idlers_mutex);
+    apr_thread_mutex_destroy(qi->queue_mutex);
 
     /* Clean up any pools in the recycled list */
     for (;;) {
@@ -65,6 +67,11 @@
     if (rv != APR_SUCCESS) {
         return rv;
     }
+    rv = apr_thread_mutex_create(&qi->queue_mutex, APR_THREAD_MUTEX_DEFAULT,
+                                 pool);
+    if (rv != APR_SUCCESS) {
+        return rv;
+    }
     rv = apr_thread_cond_create(&qi->wait_for_idler, pool);
     if (rv != APR_SUCCESS) {
         return rv;
@@ -93,14 +100,14 @@
         new_recycle = (struct recycled_pool *)apr_palloc(pool_to_recycle,
                                                          sizeof(*new_recycle));
         new_recycle->pool = pool_to_recycle;
-        for (;;) {
-            new_recycle->next = queue_info->recycled_pools;
-            if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
-                                  new_recycle, new_recycle->next) ==
-                new_recycle->next) {
-                break;
-            }
-        }
+        rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+        if (rv != APR_SUCCESS)
+            return rv;
+        new_recycle->next = queue_info->recycled_pools;
+        queue_info->recycled_pools = new_recycle;
+        rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+        if (rv != APR_SUCCESS)
+            return rv;
     }
 
     /* Atomically increment the count of idle workers */
@@ -182,19 +189,18 @@
 
     /* Atomically decrement the idle worker count */
     apr_atomic_dec32(&(queue_info->idlers));
-
-    /* Atomically pop a pool from the recycled list */
-    for (;;) {
+    rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+    if (rv != APR_SUCCESS)
+        return rv;
+    if (queue_info->recycled_pools) {
         struct recycled_pool *first_pool = queue_info->recycled_pools;
-        if (first_pool == NULL) {
-            break;
-        }
-        if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
-                              first_pool) == first_pool) {
-            *recycled_pool = first_pool->pool;
-            break;
-        }
+        queue_info->recycled_pools = first_pool->next;
+        *recycled_pool = first_pool->pool;
+        first_pool->next = NULL;
     }
+    rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+    if (rv != APR_SUCCESS)
+        return rv;
 
     if (queue_info->terminated) {
         return APR_EOF;
---------------------------------------------------------------------------

If you agree that there is a clear race condition then to correct this, I have
following suggestion :
(a) Use a dedicated pool for each worker thread, this will avoid any locking.
It will perform better but may require little more memory in those situations
when worker threads are not fully used.
(b) Use some other technique other than a recycled pool list which avoids race
conditions.

I am in favour of option (a) until some good idea for (b) comes to my mind.
If you agree with (a) then I can work and generate a patch.

Note : Also I believe that the crash will happen in linux too. I never ran more
than 1 hour in linux. I will try that tonight.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


Mime
View raw message