httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <i...@cnet.com>
Subject Re: [PATCH] debugging aids for threaded.c
Date Thu, 09 Aug 2001 15:32:33 GMT
On 09 Aug 2001 08:37:23 -0400, Jeff Trawick wrote:
> It is frustrating to try to guess what worker threads are doing at a
> given instance.  You can get a bit of info out of the ps output (at
> least on some platforms) but I'm generally left making a lot of
> assumptions (and am often wrong :) ), so I wrote some code to tell me
> exactly what the worker threads are doing.
> 

my 2c

would it be worthwile adding these extra states to the scoreboard?
and then you could use something like mod-status to see what is going on
(maybe have a newcommand to enable them like ExtendedStatus does)

I'd like to see this patch as a seperate module. so if I include it then
a new process will be spawned an I can get status like info out of it.

BTW there is a hook (get_mgmt_items) which each of the MPM's could
implement to dump MPM specific info to mod-status

..Ian


> I'm curious if anyone else would find this useful.  The concept could
> be applied to other MPMs as well; it is probably most useful for
> threaded MPMs, where it is problematic to find out what the workers
> are doing at a particular time.
> 
> When compiled with MPM_DEBUG:
> 
>   At startup, a debug process is forked off; this debug process will
>   report the state of all server processes/threads if you telnet to
>   the right port; the debug processes switches to a different process
>   group at startup so that if you want to freeze Apache and all of its
>   threads ("kill -STOP -`cat myprefix/logs/httpd.pid`") it will still
>   be around to tell you the state of things
> 
>   Worker threads set a footprint in the worker_score which indicates
>   what sort of processing they are doing; for debugging an MPM, it is
>   important to know if they are blocked in mutex, accept, poll,
>   processing a connection, doing lingering close, and a few other
>   tasks. 
> 
> When not compiled with MPM_DEBUG: 
> 
>   no change
> 
> Here is a sample output from the debug process:
> 
> $ telnet 0.0.0.0 10101
> Trying 0.0.0.0...
> Connected to 0.0.0.0.
> Escape character is '^]'.
> server process 0, pid 29859, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 1, pid 30078, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 2, pid 30102, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: exiting thread
> worker 12: exiting thread
> worker 13: exiting thread
> worker 14: exiting thread
> worker 15: exiting thread
> worker 16: exiting thread
> worker 17: exiting thread
> worker 18: exiting thread
> worker 19: exiting thread
> server process 3, pid 29931, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 4, pid 29954, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 5, pid 29977, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 6, pid 30001, generation 0, quiescing
> worker 0: exiting thread
> worker 1: exiting thread
> worker 2: exiting thread
> worker 3: exiting thread
> worker 4: exiting thread
> worker 5: exiting thread
> worker 6: exiting thread
> worker 7: exiting thread
> worker 8: exiting thread
> worker 9: exiting thread
> worker 10: exiting thread
> worker 11: exiting thread
> worker 12: exiting thread
> worker 13: exiting thread
> worker 14: exiting thread
> worker 15: exiting thread
> worker 16: exiting thread
> worker 17: exiting thread
> worker 18: exiting thread
> worker 19: exiting thread
> server process 7, pid 30024, generation 0, quiescing
> worker 0: exiting thread
> worker 1: exiting thread
> worker 2: exiting thread
> worker 3: exiting thread
> worker 4: exiting thread
> worker 5: exiting thread
> worker 6: exiting thread
> worker 7: exiting thread
> worker 8: exiting thread
> worker 9: exiting thread
> worker 10: exiting thread
> worker 11: exiting thread
> worker 12: exiting thread
> worker 13: exiting thread
> worker 14: exiting thread
> worker 15: exiting thread
> worker 16: exiting thread
> worker 17: exiting thread
> worker 18: exiting thread
> worker 19: exiting thread
> Connection closed by foreign host.
> 
> This is not the most interesting scenario imaginable, but it shows the
> general idea.
> 
> It would be nice to represent the information in a more concise
> manner.
> 
> The debug process does not exit with Apache.  Sometimes this is good
> (you can still retrieve the info in case Apache bailed out improperly
> and/or you want to verify that all the worker threads went down
> cleanly) but sometimes it is just a nuisance (you have to kill it
> manually before restarting Apache).
> 
> Another idea would be to let the debug process handle a normal HTTP
> request on the special port.  Then you could send a request for
> mod_status output there.  Unfortunately, AFAIK there is no current way
> for mod_status to let the MPM format MPM-specific information for the
> report.  In fact there doesn't seem to be a way for MPMs to decide
> what goes in the scoreboard.
> 
> Index: include/scoreboard.h
> ===================================================================
> RCS file: /home/cvspublic/httpd-2.0/include/scoreboard.h,v
> retrieving revision 1.30
> diff -u -r1.30 scoreboard.h
> --- include/scoreboard.h	2001/08/01 06:23:00	1.30
> +++ include/scoreboard.h	2001/08/09 12:21:43
> @@ -164,6 +164,9 @@
>      server_rec *vhostrec;	/* What virtual host is being accessed? */
>                                  /* SEE ABOVE FOR SAFE USAGE! */
>      worker_score *next;
> +#ifdef MPM_DEBUG
> +    int footprint;
> +#endif
>  };
>  
>  typedef struct {
> Index: server/mpm/threaded/threaded.c
> ===================================================================
> RCS file: /home/cvspublic/httpd-2.0/server/mpm/threaded/threaded.c,v
> retrieving revision 1.55
> diff -u -r1.55 threaded.c
> --- server/mpm/threaded/threaded.c	2001/08/05 18:08:49	1.55
> +++ server/mpm/threaded/threaded.c	2001/08/09 12:21:47
> @@ -134,6 +134,120 @@
>      apr_threadattr_t *threadattr;
>  } thread_starter;
>  
> +#ifdef MPM_DEBUG
> +#define SETWFP(_ws_, _val_) \
> +(_ws_)->footprint = (_val_)
> +enum WFP_vals
> +{
> +    WFP_ENTER_LOOP = 1,
> +    WFP_WAIT_ACCEPTMUTEX,
> +    WFP_WAIT_POLL,
> +    WFP_CHECK_POD,
> +    WFP_ENTER_ACCEPT,
> +    WFP_RELEASE_ACCEPTMUTEX,
> +    WFP_PROCESS_SOCKET,
> +    WFP_LINGERING_CLOSE,
> +    WFP_SOCKET_PROCESSED,
> +    WFP_CLEAR_PTRANS,
> +    WFP_DESTROY_TPOOL,
> +    WFP_EXIT_THREAD
> +};
> +
> +static const char *WFP_strings[] =
> +{
> +    "BAD FOOTPRINT",
> +    "entering main loop",
> +    "waiting for accept mutex",
> +    "polling",
> +    "checking pod",
> +    "in accept",
> +    "releasing accept mutex",
> +    "processing connection",
> +    "lingering close",
> +    "after processing connection",
> +    "clearing ptrans pool",
> +    "destroying tpool pool",
> +    "exiting thread"
> +};
> +
> +#define FATAL(fncall)                               \
> +{                                                   \
> +    apr_status_t rv;                                \
> +    rv = fncall;                                    \
> +    if (rv != APR_SUCCESS) {                        \
> +        ap_log_error(APLOG_MARK, APLOG_ALERT,       \
> +                     rv, ap_server_conf,            \
> +                     #fncall);                      \
> +        exit(1);                                    \
> +    }                                               \
> +}
> +
> +#define WARN(rv, fncall)                            \
> +{                                                   \
> +    rv = fncall;                                    \
> +    if (rv != APR_SUCCESS) {                        \
> +        ap_log_error(APLOG_MARK, APLOG_WARNING,     \
> +                     rv, ap_server_conf,            \
> +                     #fncall);                      \
> +    }                                               \
> +}
> +
> +static void start_debug_process(void)
> +{
> +    if (!fork()) {
> +        /* debugger process */
> +        apr_pool_t *p;
> +        pid_t mypid = getpid();
> +        apr_socket_t *sock, *connection;
> +        apr_sockaddr_t *sa;
> +        int i, j;
> +        process_score *ps;
> +        worker_score *ws;
> +        char buf[2048];
> +        apr_size_t len;
> +        apr_status_t rv;
> +
> +        /* change process groups so that we're unaffected if Apache is
> +         * frozen via SIGSTOP
> +         */
> +        setpgid(mypid, 0);
> +
> +        apr_pool_create(&p, NULL);
> +        FATAL(apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, 10101, 0, p));
> +        FATAL(apr_socket_create(&sock, sa->family, SOCK_STREAM, p));
> +        FATAL(apr_setsocketopt(sock, APR_SO_REUSEADDR, 1));
> +        FATAL(apr_bind(sock, sa));
> +        FATAL(apr_listen(sock, 5));
> +        while (1) {
> +            WARN(rv, apr_accept(&connection, sock, p));
> +            if (rv != APR_SUCCESS)
> +                continue;
> +            for (i = 0; i < HARD_SERVER_LIMIT; i++) {
> +                ps = &ap_scoreboard_image->parent[i];
> +                len = apr_snprintf(buf, sizeof buf,
> +                                   "server process %d, pid %" APR_OS_PROC_T_FMT ", generation
%d, %s\n",
> +                                   i, ps->pid, ps->generation, 
> +                                   ps->quiescing ? "quiescing" : "not quiescing");
> +                WARN(rv, apr_send(connection, buf, &len));
> +                for (j = 0; j < HARD_THREAD_LIMIT; j++) {
> +                    ws = &ap_scoreboard_image->servers[i][j];
> +                    if (ws->footprint != 0) { /* if ever initialized */
> +                        len = apr_snprintf(buf, sizeof buf,
> +                                           "worker %d: %s\n",
> +                                           j, WFP_strings[ws->footprint]);
> +                        WARN(rv, apr_send(connection, buf, &len));
> +                    }
> +                }
> +            }
> +            apr_socket_close(connection);
> +        }
> +    }
> +}
> +    
> +#else /* MPM_DEBUG */
> +#define SETWFP(val)
> +#endif /* MPM_DEBUG */
> +
>  /*
>   * The max child slot ever assigned, preserved across restarts.  Necessary
>   * to deal with MaxClients changes across SIGWINCH restarts.  We use this
> @@ -452,6 +566,9 @@
>      conn_rec *current_conn;
>      long conn_id = AP_ID_FROM_CHILD_THREAD(my_child_num, my_thread_num);
>      int csd;
> +#ifdef MPM_DEBUG
> +    worker_score *ws = &ap_scoreboard_image->servers[my_child_num][my_thread_num];
> +#endif
>  
>      (void) apr_os_sock_get(&csd, sock);
>  
> @@ -470,6 +587,7 @@
>      current_conn = ap_new_connection(p, ap_server_conf, sock, conn_id);
>      if (current_conn) {
>          ap_process_connection(current_conn);
> +        SETWFP(ws, WFP_LINGERING_CLOSE);
>          ap_lingering_close(current_conn);
>      }
>  }
> @@ -535,6 +653,9 @@
>      int curr_pollfd, last_pollfd = 0;
>      apr_pollfd_t *pollset;
>      apr_status_t rv;
> +#ifdef MPM_DEBUG
> +    worker_score *ws = &ap_scoreboard_image->servers[process_slot][thread_slot];
> +#endif
>  
>      free(ti);
>  
> @@ -548,6 +669,8 @@
>      for(n=0 ; n <= num_listensocks ; ++n)
>  	apr_poll_socket_add(pollset, listensocks[n], APR_POLLIN);
>  
> +    SETWFP(ws, WFP_ENTER_LOOP);
> +
>      /* TODO: Switch to a system where threads reuse the results from earlier
>         poll calls - manoj */
>      while (1) {
> @@ -558,6 +681,7 @@
>  
>          (void) ap_update_child_status(process_slot, thread_slot, SERVER_READY, 
>                                        (request_rec *) NULL);
> +        SETWFP(ws, WFP_WAIT_ACCEPTMUTEX);
>          if ((rv = SAFE_ACCEPT(apr_lock_acquire(accept_mutex)))
>              != APR_SUCCESS) {
>              ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
> @@ -570,6 +694,7 @@
>  	    apr_status_t ret;
>  	    apr_int16_t event;
>  
> +            SETWFP(ws, WFP_WAIT_POLL);
>              ret = apr_poll(pollset, &n, -1);
>              if (ret != APR_SUCCESS) {
>                  if (APR_STATUS_IS_EINTR(ret)) {
> @@ -589,6 +714,7 @@
>              if (event & APR_POLLIN) {
>                  /* A process got a signal on the shutdown pipe. Check if we're
>                   * the lucky process to die. */
> +                SETWFP(ws, WFP_CHECK_POD);
>                  check_pipe_of_death();
>                  continue;
>              }
> @@ -617,11 +743,13 @@
>          }
>      got_fd:
>          if (!workers_may_exit) {
> +            SETWFP(ws, WFP_ENTER_ACCEPT);
>              if ((rv = apr_accept(&csd, sd, ptrans)) != APR_SUCCESS) {
>                  csd = NULL;
>                  ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, 
>                               "apr_accept");
>              }
> +            SETWFP(ws, WFP_RELEASE_ACCEPTMUTEX);
>              if ((rv = SAFE_ACCEPT(apr_lock_release(accept_mutex)))
>                  != APR_SUCCESS) {
>                  ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
> @@ -630,7 +758,9 @@
>                  workers_may_exit = 1;
>              }
>              if (csd != NULL) {
> +                SETWFP(ws, WFP_PROCESS_SOCKET);
>                  process_socket(ptrans, csd, process_slot, thread_slot);
> +                SETWFP(ws, WFP_SOCKET_PROCESSED);
>                  requests_this_child--;
>              }
>          }
> @@ -644,9 +774,11 @@
>              }
>              break;
>          }
> +        SETWFP(ws, WFP_CLEAR_PTRANS);
>          apr_pool_clear(ptrans);
>      }
>  
> +    SETWFP(ws, WFP_DESTROY_TPOOL);
>      apr_pool_destroy(tpool);
>      ap_update_child_status(process_slot, thread_slot, (dying) ? SERVER_DEAD : SERVER_GRACEFUL,
>          (request_rec *) NULL);
> @@ -667,6 +799,8 @@
>      }
>      apr_lock_release(worker_thread_count_mutex);
>  
> +    SETWFP(ws, WFP_EXIT_THREAD);
> +
>      return NULL;
>  }
>  
> @@ -1223,6 +1357,10 @@
>      if (!is_graceful) {
>          ap_run_pre_mpm(pconf, SB_SHARED);
>      }
> +
> +#ifdef MPM_DEBUG
> +    start_debug_process();
> +#endif
>  
>      set_signals();
>      /* Don't thrash... */
> 
> -- 
> Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
>        http://www.geocities.com/SiliconValley/Park/9289/
>              Born in Roswell... married an alien...
-- 
Ian Holsman
Performance Measurement & Analysis
CNET Networks    -    415 364-8608


Mime
View raw message