Aaron Bannert wrote:
> Ok, after wading through the code for awhile I have a working theory:
>
> 1) Parent creats a child
> 2) Parent gets graceful-restart signal
> 3) Parent returns from ap_run_mpm, pconf is cleared, cross-process lock file
> is closed and removed.
> 4) Child finally gets scheduled to run the apr_proc_mutex_child_init for
> fcntl(). Oops, apr_file_open fails since step #3 above removed the file.
> Child errors out (ENOENT is returned from apr_file_open()) and dies.
> 5) Parent notices that child has died, errors out and dies completely.
sounds very possible
hopefully it is sane if parent doesn't exit out if a prior generation child
reports APEXIT_CHILDFATAL; but it looks like prefork checks for
APEXIT_CHILDFATAL before checking if it is a current-generation child
> In any case, can anyone else confirm that this race condition exists, and
> maybe suggest a way to synchronize a parent's shutdown with the starting
> up of an old-generation child? (Eg. the parent shouldn't remove the
> lockfile until all children are successfully started.)
it shouldn't be bad to remove the lockfile when it is done now, and certainly
that new child of old generation should exit ASAP anyway since it has old
config; I suspect if parent ignores "fatal" exits of such children we'd be okay
no guesses from me on whether this race condition is what causes the problem
|