Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 90991 invoked from network); 31 Oct 2009 02:14:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Oct 2009 02:14:54 -0000 Received: (qmail 85900 invoked by uid 500); 31 Oct 2009 02:14:53 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 85805 invoked by uid 500); 31 Oct 2009 02:14:53 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 85796 invoked by uid 99); 31 Oct 2009 02:14:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Oct 2009 02:14:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bojan@rexursive.com designates 150.101.121.179 as permitted sender) Received: from [150.101.121.179] (HELO beauty.rexursive.com) (150.101.121.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Oct 2009 02:14:44 +0000 Received: from [10.1.120.24] (shrek.rexursive.com [10.1.120.24]) by beauty.rexursive.com (Postfix) with ESMTP id 9CE918C172 for ; Sat, 31 Oct 2009 13:14:20 +1100 (EST) Subject: Re: worker MPM: close_worker_sockets race? From: Bojan Smojver To: dev@httpd.apache.org In-Reply-To: <1256875186.2488.39.camel@shrek.rexursive.com> References: <1256173732.9540.23.camel@shrek.rexursive.com> <1256245093.9540.31.camel@shrek.rexursive.com> <1256253140.9540.33.camel@shrek.rexursive.com> <1256868439.2488.21.camel@shrek.rexursive.com> <1256875186.2488.39.camel@shrek.rexursive.com> Content-Type: multipart/mixed; boundary="=-AExmOWv0BJuWLsvU0Z65" Date: Sat, 31 Oct 2009 13:14:20 +1100 Message-Id: <1256955260.2488.76.camel@shrek.rexursive.com> Mime-Version: 1.0 X-Mailer: Evolution 2.26.3 (2.26.3-1.fc11) X-Virus-Checked: Checked by ClamAV on apache.org --=-AExmOWv0BJuWLsvU0Z65 Content-Type: text/plain Content-Transfer-Encoding: 7bit On Fri, 2009-10-30 at 14:59 +1100, Bojan Smojver wrote: > Of course, SIGINT plucked out of thin air. Actually, we can just reuse WORKER_SIGNAL for all this. -- Bojan --=-AExmOWv0BJuWLsvU0Z65 Content-Disposition: attachment; filename="safe_close_worker_sockets.patch" Content-Type: text/x-patch; name="safe_close_worker_sockets.patch"; charset="UTF-8" Content-Transfer-Encoding: 7bit --- httpd-2.2.14-v/server/mpm/worker/worker.c 2007-07-18 00:48:25.000000000 +1000 +++ httpd-2.2.14/server/mpm/worker/worker.c 2009-10-31 13:13:33.934697512 +1100 @@ -226,12 +226,118 @@ */ #define WORKER_SIGNAL AP_SIG_GRACEFUL +#ifdef HAVE_PTHREAD_KILL +/* An array of thread and socket descriptors in use by each thread used to + * perform a non-graceful (forced) shutdown of the server. */ +static volatile sig_atomic_t suspend_workers; +static struct worker_data { + apr_os_thread_t *thread; + apr_socket_t *socket; + volatile sig_atomic_t suspended; +} *worker_data; +#else /* An array of socket descriptors in use by each thread used to * perform a non-graceful (forced) shutdown of the server. */ static apr_socket_t **worker_sockets; +#endif + +#ifdef HAVE_PTHREAD_KILL +static void worker_signal_handler(int sig) +{ + int i; + sigset_t signal_set; + + /* find our thread */ + for (i = 0; i < ap_threads_per_child; i++) { + if (worker_data[i].thread && + pthread_equal(*worker_data[i].thread, pthread_self())) { + break; + } + } + + /* just in case we overshot */ + if (i >= ap_threads_per_child) { + return; + } + + /* if we are not being suspended, exit now */ + if (!suspend_workers) { + worker_data[i].suspended = 0; + return; + } + + sigfillset(&signal_set); + sigdelset(&signal_set, WORKER_SIGNAL); + + worker_data[i].suspended = 1; /* safe - thread already executing here */ + + sigsuspend(&signal_set); + + worker_data[i].suspended = 0; +} + +static void interrupt_worker_threads() +{ + int i; + + for (i = 0; i < ap_threads_per_child; i++) { + if (worker_data[i].thread) { + /* try sending the signal twice */ + if (pthread_kill(*worker_data[i].thread, WORKER_SIGNAL) == -1) { + pthread_kill(*worker_data[i].thread, WORKER_SIGNAL); + } + } + } +} + +static void suspend_worker_threads() +{ + suspend_workers = 1; + interrupt_worker_threads(); +} + +static void resume_worker_threads() +{ + suspend_workers = 0; + interrupt_worker_threads(); +} static void close_worker_sockets(void) { + int i, j, csd; + + suspend_worker_threads(); + + /* wait for threads to suspend, but press ahead after a while anyway */ + for (j = 0; j < 5; j++) { + int sum = 0; + + apr_sleep(SCOREBOARD_MAINTENANCE_INTERVAL); + + for (i = 0; i < ap_threads_per_child; i++) { + sum += worker_data[i].suspended; + } + + if (sum == ap_threads_per_child) { + break; + } + } + + /* shut down all client sockets */ + for (i = 0; i < ap_threads_per_child; i++) { + if (worker_data[i].socket) { + apr_os_sock_get(&csd, worker_data[i].socket); + if (csd != -1) { + shutdown(csd, SHUT_RDWR); + } + } + } + + resume_worker_threads(); +} +#else +static void close_worker_sockets(void) +{ int i; for (i = 0; i < ap_threads_per_child; i++) { if (worker_sockets[i]) { @@ -240,6 +346,7 @@ } } } +#endif static void wakeup_listener(void) { @@ -836,7 +943,7 @@ #ifdef HAVE_PTHREAD_KILL unblock_signal(WORKER_SIGNAL); - apr_signal(WORKER_SIGNAL, dummy_signal_handler); + apr_signal(WORKER_SIGNAL, worker_signal_handler); #endif while (!workers_may_exit) { @@ -889,10 +996,22 @@ continue; } is_idle = 0; + +#ifdef HAVE_PTHREAD_KILL + worker_data[thread_slot].socket = csd; +#else worker_sockets[thread_slot] = csd; +#endif + bucket_alloc = apr_bucket_alloc_create(ptrans); process_socket(ptrans, csd, process_slot, thread_slot, bucket_alloc); + +#ifdef HAVE_PTHREAD_KILL + worker_data[thread_slot].socket = NULL; +#else worker_sockets[thread_slot] = NULL; +#endif + requests_this_child--; /* FIXME: should be synchronized - aaron */ apr_pool_clear(ptrans); last_ptrans = ptrans; @@ -975,8 +1094,13 @@ clean_child_exit(APEXIT_CHILDFATAL); } +#ifdef HAVE_PTHREAD_KILL + worker_data = apr_pcalloc(pchild, ap_threads_per_child + * sizeof(*worker_data)); +#else worker_sockets = apr_pcalloc(pchild, ap_threads_per_child * sizeof(apr_socket_t *)); +#endif loops = prev_threads_created = 0; while (1) { @@ -1012,6 +1136,9 @@ /* let the parent decide how bad this really is */ clean_child_exit(APEXIT_CHILDSICK); } +#ifdef HAVE_PTHREAD_KILL + apr_os_thread_get(&worker_data[i].thread, threads[i]); +#endif threads_created++; } /* Start the listener only when there are workers available */ --=-AExmOWv0BJuWLsvU0Z65--