Mailing-List: contact dev-help@apr.apache.org; run by ezmlm
Precedence: bulk
Sender: trawick@bellsouth.net
To: <dev@apr.apache.org>
Subject: Re: Other Child processing
References: <024801c0dd79$5eb64a00$bd431b09@sashimi>
From: Jeff Trawick <trawickj@bellsouth.net>
Date: 16 May 2001 07:34:06 -0400
In-Reply-To: <024801c0dd79$5eb64a00$bd431b09@sashimi>
Message-ID: <m3n18dr0i9.fsf@adsl-77-241-65.rdu.bellsouth.net>
Lines: 92
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

"Bill Stoddard" <bill@wstoddard.com> writes:

> apr_proc_other_child_register()
> apr_proc_other_child_unregister()
> Register an OC and unregister an OC. These are the only OC functions that
> make sense... continuing...
> 
> apr_proc_probe_writable_fds()
> This function seems worthless and doesn;t appear to be used for anything.
> You cannot make any meaningful diagnosis on an OC if the fds are not
> writable. Perhaps the pipe is full, perhaps the process is dead, ???.  No
> way of knowing how to handle this case.

agreed...

> apr_proc_other_child_read(apr_proc_t *pid, int status)
> Simply results in the call to *ocr->maintenance(APR_OC_REASON_DEATH, ...)
> which is totally non-intuitive behaviour given the name of the funtion.
> Presumably apr_proc_other_child_read is called after you discover a OC has
> failed and that calling maintenance with APR_OC_REASON_DEATH is the right
> thing to do.  Presumably.

I don't know if the problem is with the semantics or just the function
name.

> apr_proc_other_child_check()
> Runs the list of OCs and checks to see if they are dead or alive and calls
> *ocr->maintenance based on whether the OC is dead or alive. threaded.c calls
> this routine multiple times during -shutdown-, after the process group has
> been signaled to die. What are we checking and why?  This is just
> goofy...

I understand the logic for when they are dead but not the logic for
when they are alive.  I think we agree on this.

> Straw man proposal...
> 1. apr_proc_other_child_*register()
> Leave the register and unregister functions the same
> 
> 2. apr_proc_other_child_check()
> It makes sense to me to use this routine when you want OCs to stay up and
> alive. You would call it during idle_server_maintenance and it would detect
> when an OC has dies and call maintenance to restart it. The Unix
> implementation of Apache HTTP would probably not use this routine as the MPM
> parent processes use other mechanisms to detect child death.  This would be
> good for Windows to detect child death.

Hmmm... If the Unix MPMs call this in idle_server_maintenance() then
they can ignore the death of processes they don't really know about.
If they don't ignore the death of such processes they'll need some
other API to see if a newly-deceased process was a registered
other-child.  It seems simpler just to call
apr_proc_other_child_check().

Oh, I see that this "other API" is apr_proc_other_child_maintenance(),
described below.

> 3. apr_proc_probe_writable_fds()
> Remove it and all references to it.

yep

> 4.  apr_proc_other_child_shutdown()
> Signals each OC to shutdown. When the OC has died, calls maintenance
> reporting OC DEATH (rather than LOST, which would imply a restart)

So how long do we sit in here?  As long as necessary?  On Unix we
should do SIGTERM followed by SIGKILL a five or so seconds later if
the process hasn't gone away.  This would solve the apparent Solaris
2.6 SNAFU affecting Apache 1.3+rotatelogs which Greg Ames mentioned on
new-httpd yesterday.

> 5. apr_proc_other_child_maintenance(apr_proc_t *pid, action)
> Perform specefic OC maintenance.  If threaded.c detects that an OC has gone
> down, it would call...
> apr_proc_other_child_maintenance(apr_proct *pid, APR_OC_REASON_LOST) to
> cause the appopriate maintenance routine to be called.

I guess you mean "If a Unix MPM detects that some child process has
died and it isn't a server process, then it would call
apr_proc_other_child_maintence() which will first see if it is a
registered other-child and if so cause the appropriate maintenance
routine to be called."

Another missing piece is auto-cleanup of other child registrations
when the pool associated with the registration goes away.  I posted a
patch for this a couple of weeks ago.

-- 
Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...