httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [Bug 54852] New: graceful restart takes very long time sometimes
Date Tue, 16 Apr 2013 08:52:07 GMT

            Bug ID: 54852
           Summary: graceful restart takes very long time sometimes
           Product: Apache httpd-2
           Version: 2.4.4
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: mpm_prefork
    Classification: Unclassified

Sometimes graceful restart can take up to few minutes here (time between
"Graceful restart requested, doing restart" and "resuming normal operations")
when using prefork MPM.

I've tracked this to such scenario:

1) main process on gracefull restart calls 

/* kill off the idle ones */
ap_mpm_pod_killpg(pod, retained->max_daemons_limit);

and that does dummy_connection() for each pod. This works fast
(1-2s) in most cases where childrens exist but if children
are gone then it can take ages. Why children are gone and why it takes ages?

2) children were busy and notice that new generation is starting and
thus exit themself without any need to dummy_connection to "ping" them:

        else if (retained->my_generation !=
                 ap_scoreboard_image->global->running_generation) { /* 
restart? */
            /* yeah, this could be non-graceful restart, in which case the
             * parent will kill us soon enough, but why bother checking?
            die_now = 1;

3) Now such scenario can happen? Well in my case main process works
slowly enough that it doesn't finish doing ap_mpm_pod_killpg before
all children exit like in 2). Basically in middle of ap_mpm_pod_killpg()
loop all children already exited. The rest of loop takes then long time.

4) so we are ending up in situation where main process is in middle
of ap_mpm_pod_killpg while all children already exited in 2). Main
process is continuing to do dummy_connection and these connect()s
+ dummy data send, polling are taking very long time (like 1-2s for each
dummy_connection * num of these connections). This is the primary reason
why graceful is painfully slow here (1-5 minutes) sometimes.

5) why connect() and socket sending in dummy_connect succeed if there
are no children? If it failed then ap_mpm_pod_killpg loop would break
and things would end quite fast.

Since all children exited there is only main process left that still
holds listening socket and connection go to this socket. But main process
doesn't call accept() and these connections are not processed. dummy_donnection
sending doesn't fail due to the way tcp works. So loop in ap_mpm_pod_killpg
still get processed but each step takes 1-2s * 64 or more StartServers setting.

Tested the listening vs accept theory by doing external telnet connection
to the same IP/port dummy_connection is doing - telnet connect() succeeded
but telnet data is not processed since main process is not doing accept() etc

6) possible solution - detect that children already exited and stop doing
pointless/time consuming dummy_connection queries.

void ap_mpm_pod_killpg(ap_pod_t *pod, int num)
    int i;
    int max_daemons;
    apr_status_t rv = APR_SUCCESS;


    /* we don't write anything to the pod here...  we assume
     * that the would-be reader of the pod has another way to
     * see that it is time to die once we wake it up
     * writing lots of things to the pod at once is very
     * problematic... we can fill the kernel pipe buffer and
     * be blocked until somebody consumes some bytes or
     * we hit a timeout...  if we hit a timeout we can't just
     * keep trying because maybe we'll never successfully
     * write again...  but then maybe we'll leave would-be
     * readers stranded (a number of them could be tied up for
     * a while serving time-consuming requests)

    for (i = 0; i < num && rv == APR_SUCCESS; ++i) {
        process_score *ps = ap_get_scoreboard_process(i);
        pid_t pid = ps->pid;

        if (pid == 0) {
            continue; /* not every scoreboard entry is in use */

        rv = dummy_connection(pod);

Something like this - so basically do dummy_connection() as long as we have
childrens living. Unfortunately not sure if this approach is always correct
(I'm not sure if scoreboard is updated correctly when children exit in such

7) how to reproduce?

Set StartServers to big number (64 or more), so ap_mpm_pod_killpg will have
this number passed in "num" parameter and thus will have bigger number of loop
iterations to do.

Main process needs to be slown down while doing ap_mpm_pod_killpg(), so that
children have a chance to exit in 2) before ap_mpm_pod_killpg finishes. In my
production setup this simply happens. Since it's timing dependant it is not
always easy to reproduce.

Anyway how we can help it be reproduced:
- add sleep(5); at beginning of mpm_pod_killpg. This simulates a case when main
process does its things in slower way than children exiting

- add sleep or slow down main process in some other way (stracing main process
in my case was also being enough to slown it down)
- ab -n 100000 -c 100 http://apache.ip
- before ab finishes initiate graceful restart

Relevant thread on mailing list
(unfortunately not a popular subjects among devs, so creating this bug report,
so it won't get lost)

You are receiving this mail because:
You are the assignee for the bug.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message