httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <traw...@attglobal.net>
Subject [PATCH] debugging aids for threaded.c
Date Thu, 09 Aug 2001 12:37:23 GMT
It is frustrating to try to guess what worker threads are doing at a
given instance.  You can get a bit of info out of the ps output (at
least on some platforms) but I'm generally left making a lot of
assumptions (and am often wrong :) ), so I wrote some code to tell me
exactly what the worker threads are doing.

I'm curious if anyone else would find this useful.  The concept could
be applied to other MPMs as well; it is probably most useful for
threaded MPMs, where it is problematic to find out what the workers
are doing at a particular time.

When compiled with MPM_DEBUG:

  At startup, a debug process is forked off; this debug process will
  report the state of all server processes/threads if you telnet to
  the right port; the debug processes switches to a different process
  group at startup so that if you want to freeze Apache and all of its
  threads ("kill -STOP -`cat myprefix/logs/httpd.pid`") it will still
  be around to tell you the state of things

  Worker threads set a footprint in the worker_score which indicates
  what sort of processing they are doing; for debugging an MPM, it is
  important to know if they are blocked in mutex, accept, poll,
  processing a connection, doing lingering close, and a few other
  tasks. 

When not compiled with MPM_DEBUG: 

  no change

Here is a sample output from the debug process:

$ telnet 0.0.0.0 10101
Trying 0.0.0.0...
Connected to 0.0.0.0.
Escape character is '^]'.
server process 0, pid 29859, generation 0, not quiescing
worker 0: processing connection
worker 1: processing connection
worker 2: processing connection
worker 3: processing connection
worker 4: processing connection
worker 5: processing connection
worker 6: processing connection
worker 7: processing connection
worker 8: processing connection
worker 9: processing connection
worker 10: processing connection
worker 11: processing connection
worker 12: processing connection
worker 13: processing connection
worker 14: processing connection
worker 15: processing connection
worker 16: processing connection
worker 17: processing connection
worker 18: processing connection
worker 19: processing connection
server process 1, pid 30078, generation 0, not quiescing
worker 0: processing connection
worker 1: processing connection
worker 2: processing connection
worker 3: processing connection
worker 4: processing connection
worker 5: processing connection
worker 6: processing connection
worker 7: processing connection
worker 8: processing connection
worker 9: processing connection
worker 10: processing connection
worker 11: processing connection
worker 12: processing connection
worker 13: processing connection
worker 14: processing connection
worker 15: processing connection
worker 16: processing connection
worker 17: processing connection
worker 18: processing connection
worker 19: processing connection
server process 2, pid 30102, generation 0, not quiescing
worker 0: processing connection
worker 1: processing connection
worker 2: processing connection
worker 3: processing connection
worker 4: processing connection
worker 5: processing connection
worker 6: processing connection
worker 7: processing connection
worker 8: processing connection
worker 9: processing connection
worker 10: processing connection
worker 11: exiting thread
worker 12: exiting thread
worker 13: exiting thread
worker 14: exiting thread
worker 15: exiting thread
worker 16: exiting thread
worker 17: exiting thread
worker 18: exiting thread
worker 19: exiting thread
server process 3, pid 29931, generation 0, not quiescing
worker 0: processing connection
worker 1: processing connection
worker 2: processing connection
worker 3: processing connection
worker 4: processing connection
worker 5: processing connection
worker 6: processing connection
worker 7: processing connection
worker 8: processing connection
worker 9: processing connection
worker 10: processing connection
worker 11: processing connection
worker 12: processing connection
worker 13: processing connection
worker 14: processing connection
worker 15: processing connection
worker 16: processing connection
worker 17: processing connection
worker 18: processing connection
worker 19: processing connection
server process 4, pid 29954, generation 0, not quiescing
worker 0: processing connection
worker 1: processing connection
worker 2: processing connection
worker 3: processing connection
worker 4: processing connection
worker 5: processing connection
worker 6: processing connection
worker 7: processing connection
worker 8: processing connection
worker 9: processing connection
worker 10: processing connection
worker 11: processing connection
worker 12: processing connection
worker 13: processing connection
worker 14: processing connection
worker 15: processing connection
worker 16: processing connection
worker 17: processing connection
worker 18: processing connection
worker 19: processing connection
server process 5, pid 29977, generation 0, not quiescing
worker 0: processing connection
worker 1: processing connection
worker 2: processing connection
worker 3: processing connection
worker 4: processing connection
worker 5: processing connection
worker 6: processing connection
worker 7: processing connection
worker 8: processing connection
worker 9: processing connection
worker 10: processing connection
worker 11: processing connection
worker 12: processing connection
worker 13: processing connection
worker 14: processing connection
worker 15: processing connection
worker 16: processing connection
worker 17: processing connection
worker 18: processing connection
worker 19: processing connection
server process 6, pid 30001, generation 0, quiescing
worker 0: exiting thread
worker 1: exiting thread
worker 2: exiting thread
worker 3: exiting thread
worker 4: exiting thread
worker 5: exiting thread
worker 6: exiting thread
worker 7: exiting thread
worker 8: exiting thread
worker 9: exiting thread
worker 10: exiting thread
worker 11: exiting thread
worker 12: exiting thread
worker 13: exiting thread
worker 14: exiting thread
worker 15: exiting thread
worker 16: exiting thread
worker 17: exiting thread
worker 18: exiting thread
worker 19: exiting thread
server process 7, pid 30024, generation 0, quiescing
worker 0: exiting thread
worker 1: exiting thread
worker 2: exiting thread
worker 3: exiting thread
worker 4: exiting thread
worker 5: exiting thread
worker 6: exiting thread
worker 7: exiting thread
worker 8: exiting thread
worker 9: exiting thread
worker 10: exiting thread
worker 11: exiting thread
worker 12: exiting thread
worker 13: exiting thread
worker 14: exiting thread
worker 15: exiting thread
worker 16: exiting thread
worker 17: exiting thread
worker 18: exiting thread
worker 19: exiting thread
Connection closed by foreign host.

This is not the most interesting scenario imaginable, but it shows the
general idea.

It would be nice to represent the information in a more concise
manner.

The debug process does not exit with Apache.  Sometimes this is good
(you can still retrieve the info in case Apache bailed out improperly
and/or you want to verify that all the worker threads went down
cleanly) but sometimes it is just a nuisance (you have to kill it
manually before restarting Apache).

Another idea would be to let the debug process handle a normal HTTP
request on the special port.  Then you could send a request for
mod_status output there.  Unfortunately, AFAIK there is no current way
for mod_status to let the MPM format MPM-specific information for the
report.  In fact there doesn't seem to be a way for MPMs to decide
what goes in the scoreboard.

Index: include/scoreboard.h
===================================================================
RCS file: /home/cvspublic/httpd-2.0/include/scoreboard.h,v
retrieving revision 1.30
diff -u -r1.30 scoreboard.h
--- include/scoreboard.h	2001/08/01 06:23:00	1.30
+++ include/scoreboard.h	2001/08/09 12:21:43
@@ -164,6 +164,9 @@
     server_rec *vhostrec;	/* What virtual host is being accessed? */
                                 /* SEE ABOVE FOR SAFE USAGE! */
     worker_score *next;
+#ifdef MPM_DEBUG
+    int footprint;
+#endif
 };
 
 typedef struct {
Index: server/mpm/threaded/threaded.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/server/mpm/threaded/threaded.c,v
retrieving revision 1.55
diff -u -r1.55 threaded.c
--- server/mpm/threaded/threaded.c	2001/08/05 18:08:49	1.55
+++ server/mpm/threaded/threaded.c	2001/08/09 12:21:47
@@ -134,6 +134,120 @@
     apr_threadattr_t *threadattr;
 } thread_starter;
 
+#ifdef MPM_DEBUG
+#define SETWFP(_ws_, _val_) \
+(_ws_)->footprint = (_val_)
+enum WFP_vals
+{
+    WFP_ENTER_LOOP = 1,
+    WFP_WAIT_ACCEPTMUTEX,
+    WFP_WAIT_POLL,
+    WFP_CHECK_POD,
+    WFP_ENTER_ACCEPT,
+    WFP_RELEASE_ACCEPTMUTEX,
+    WFP_PROCESS_SOCKET,
+    WFP_LINGERING_CLOSE,
+    WFP_SOCKET_PROCESSED,
+    WFP_CLEAR_PTRANS,
+    WFP_DESTROY_TPOOL,
+    WFP_EXIT_THREAD
+};
+
+static const char *WFP_strings[] =
+{
+    "BAD FOOTPRINT",
+    "entering main loop",
+    "waiting for accept mutex",
+    "polling",
+    "checking pod",
+    "in accept",
+    "releasing accept mutex",
+    "processing connection",
+    "lingering close",
+    "after processing connection",
+    "clearing ptrans pool",
+    "destroying tpool pool",
+    "exiting thread"
+};
+
+#define FATAL(fncall)                               \
+{                                                   \
+    apr_status_t rv;                                \
+    rv = fncall;                                    \
+    if (rv != APR_SUCCESS) {                        \
+        ap_log_error(APLOG_MARK, APLOG_ALERT,       \
+                     rv, ap_server_conf,            \
+                     #fncall);                      \
+        exit(1);                                    \
+    }                                               \
+}
+
+#define WARN(rv, fncall)                            \
+{                                                   \
+    rv = fncall;                                    \
+    if (rv != APR_SUCCESS) {                        \
+        ap_log_error(APLOG_MARK, APLOG_WARNING,     \
+                     rv, ap_server_conf,            \
+                     #fncall);                      \
+    }                                               \
+}
+
+static void start_debug_process(void)
+{
+    if (!fork()) {
+        /* debugger process */
+        apr_pool_t *p;
+        pid_t mypid = getpid();
+        apr_socket_t *sock, *connection;
+        apr_sockaddr_t *sa;
+        int i, j;
+        process_score *ps;
+        worker_score *ws;
+        char buf[2048];
+        apr_size_t len;
+        apr_status_t rv;
+
+        /* change process groups so that we're unaffected if Apache is
+         * frozen via SIGSTOP
+         */
+        setpgid(mypid, 0);
+
+        apr_pool_create(&p, NULL);
+        FATAL(apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, 10101, 0, p));
+        FATAL(apr_socket_create(&sock, sa->family, SOCK_STREAM, p));
+        FATAL(apr_setsocketopt(sock, APR_SO_REUSEADDR, 1));
+        FATAL(apr_bind(sock, sa));
+        FATAL(apr_listen(sock, 5));
+        while (1) {
+            WARN(rv, apr_accept(&connection, sock, p));
+            if (rv != APR_SUCCESS)
+                continue;
+            for (i = 0; i < HARD_SERVER_LIMIT; i++) {
+                ps = &ap_scoreboard_image->parent[i];
+                len = apr_snprintf(buf, sizeof buf,
+                                   "server process %d, pid %" APR_OS_PROC_T_FMT ", generation
%d, %s\n",
+                                   i, ps->pid, ps->generation, 
+                                   ps->quiescing ? "quiescing" : "not quiescing");
+                WARN(rv, apr_send(connection, buf, &len));
+                for (j = 0; j < HARD_THREAD_LIMIT; j++) {
+                    ws = &ap_scoreboard_image->servers[i][j];
+                    if (ws->footprint != 0) { /* if ever initialized */
+                        len = apr_snprintf(buf, sizeof buf,
+                                           "worker %d: %s\n",
+                                           j, WFP_strings[ws->footprint]);
+                        WARN(rv, apr_send(connection, buf, &len));
+                    }
+                }
+            }
+            apr_socket_close(connection);
+        }
+    }
+}
+    
+#else /* MPM_DEBUG */
+#define SETWFP(val)
+#endif /* MPM_DEBUG */
+
 /*
  * The max child slot ever assigned, preserved across restarts.  Necessary
  * to deal with MaxClients changes across SIGWINCH restarts.  We use this
@@ -452,6 +566,9 @@
     conn_rec *current_conn;
     long conn_id = AP_ID_FROM_CHILD_THREAD(my_child_num, my_thread_num);
     int csd;
+#ifdef MPM_DEBUG
+    worker_score *ws = &ap_scoreboard_image->servers[my_child_num][my_thread_num];
+#endif
 
     (void) apr_os_sock_get(&csd, sock);
 
@@ -470,6 +587,7 @@
     current_conn = ap_new_connection(p, ap_server_conf, sock, conn_id);
     if (current_conn) {
         ap_process_connection(current_conn);
+        SETWFP(ws, WFP_LINGERING_CLOSE);
         ap_lingering_close(current_conn);
     }
 }
@@ -535,6 +653,9 @@
     int curr_pollfd, last_pollfd = 0;
     apr_pollfd_t *pollset;
     apr_status_t rv;
+#ifdef MPM_DEBUG
+    worker_score *ws = &ap_scoreboard_image->servers[process_slot][thread_slot];
+#endif
 
     free(ti);
 
@@ -548,6 +669,8 @@
     for(n=0 ; n <= num_listensocks ; ++n)
 	apr_poll_socket_add(pollset, listensocks[n], APR_POLLIN);
 
+    SETWFP(ws, WFP_ENTER_LOOP);
+
     /* TODO: Switch to a system where threads reuse the results from earlier
        poll calls - manoj */
     while (1) {
@@ -558,6 +681,7 @@
 
         (void) ap_update_child_status(process_slot, thread_slot, SERVER_READY, 
                                       (request_rec *) NULL);
+        SETWFP(ws, WFP_WAIT_ACCEPTMUTEX);
         if ((rv = SAFE_ACCEPT(apr_lock_acquire(accept_mutex)))
             != APR_SUCCESS) {
             ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
@@ -570,6 +694,7 @@
 	    apr_status_t ret;
 	    apr_int16_t event;
 
+            SETWFP(ws, WFP_WAIT_POLL);
             ret = apr_poll(pollset, &n, -1);
             if (ret != APR_SUCCESS) {
                 if (APR_STATUS_IS_EINTR(ret)) {
@@ -589,6 +714,7 @@
             if (event & APR_POLLIN) {
                 /* A process got a signal on the shutdown pipe. Check if we're
                  * the lucky process to die. */
+                SETWFP(ws, WFP_CHECK_POD);
                 check_pipe_of_death();
                 continue;
             }
@@ -617,11 +743,13 @@
         }
     got_fd:
         if (!workers_may_exit) {
+            SETWFP(ws, WFP_ENTER_ACCEPT);
             if ((rv = apr_accept(&csd, sd, ptrans)) != APR_SUCCESS) {
                 csd = NULL;
                 ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, 
                              "apr_accept");
             }
+            SETWFP(ws, WFP_RELEASE_ACCEPTMUTEX);
             if ((rv = SAFE_ACCEPT(apr_lock_release(accept_mutex)))
                 != APR_SUCCESS) {
                 ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
@@ -630,7 +758,9 @@
                 workers_may_exit = 1;
             }
             if (csd != NULL) {
+                SETWFP(ws, WFP_PROCESS_SOCKET);
                 process_socket(ptrans, csd, process_slot, thread_slot);
+                SETWFP(ws, WFP_SOCKET_PROCESSED);
                 requests_this_child--;
             }
         }
@@ -644,9 +774,11 @@
             }
             break;
         }
+        SETWFP(ws, WFP_CLEAR_PTRANS);
         apr_pool_clear(ptrans);
     }
 
+    SETWFP(ws, WFP_DESTROY_TPOOL);
     apr_pool_destroy(tpool);
     ap_update_child_status(process_slot, thread_slot, (dying) ? SERVER_DEAD : SERVER_GRACEFUL,
         (request_rec *) NULL);
@@ -667,6 +799,8 @@
     }
     apr_lock_release(worker_thread_count_mutex);
 
+    SETWFP(ws, WFP_EXIT_THREAD);
+
     return NULL;
 }
 
@@ -1223,6 +1357,10 @@
     if (!is_graceful) {
         ap_run_pre_mpm(pconf, SB_SHARED);
     }
+
+#ifdef MPM_DEBUG
+    start_debug_process();
+#endif
 
     set_signals();
     /* Don't thrash... */

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Mime
View raw message