apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <...@rkbloom.net>
Subject Re: [PATCH] Re: apr_proc_mutex is broken
Date Tue, 19 Nov 2002 21:51:15 GMT

On 19 Nov 2002, Philip Martin wrote:

> <rbb@rkbloom.net> writes:
>
> >  As for the app needing to know about every use of
> > fork/apr_proc_fork in every library, that isn't true.  The app needs to
> > know about every use of apr_terminate in every library.  It is the
> > apr_terminate call that is causing the problem, not the actual fork()
> > call.
>
> APR states that apr_terminate must always be called, and suggests
> installing apr_terminate as an atexit handler.  The only APR
> application I've worked on (Subversion) does this.  This means every
> fork-and-exit is a potential problem.
>
> Suppose I writing a library that uses proc_mutex.  I do not know
> whether the application has installed apr_terminate as an atexit
> handler.

If you have a library that is calling apr_terminate in a process that you
didn't create, then you most likely have a completely different bug.  :-)
The apr_terminate function is going to call apr_pool_destroy(global_pool).
However, global_pool is a static variable, which means that you can't just
destroy it without causing massive side-effects.  To combat this, we have
a simple counter in apr_initialize/apr_terminate.  For every call to
apr_initialize, we expect a call to apr_terminate.  If one is called
without the other, things won't work properly.

If you have a library that is calling apr_initialize, then you can safely
call apr_terminate and not worry about deleting a mutex that somebody else
still needs.  However, if you have a library that is calling apr_terminate
without calling apr_initialize, then you have a bug.

> >  However, if the app removes the proc_mutex cleanup before the new
> > process is spawned, it won't have to worry about this, because the child
> > process won't have a cleanup to kill.
>
> So in practice some (lots? most? all?) libraries/applications will
> kill it.  Why bother registering it?  Let the user choose to register
> such a cleanup if required.

No, you are missing the point.  In some instances, the mutex cleanup will
be killed, but in others it won't.  There are two factors to consider when
this decision is made.  What is the underlying mutex mechanism and what
is the mutex being used for.  Regardless, the cleanup should be
registered.  In fact, the problem that you are seeing has nothing to do
with the cleanup being registered, that is a red herring.  The problem is
with the cleanup not being unregisted properly.

Follow my logic please.

In the parent process:
	apr_initialize()
		Sets static counter to 1.

	apr_proc_mutex_create()
		registers the cleanup for you.  At this point, you can
safely drop this mutex on the floor, and things will continue to work
forever.

	apr_terminate()
		destroys the global_pool, mutex is destroyed cleanly.

In the child process:
	apr_initialize()
		increments the counter by 1, now it is 2.

	apr_proc_mutex_child_init()
		Depending on mutex type, the cleanup should be killed or
not.  In general, file-based mutex will be killed, but semaphores will
not.

	apr_terminate()
		counter is decremented, setting it to 1.  _NO_ pool is
destroyed, thus no mutex is deleted.

Now, the test program had two problems.
1)  The child processes were calling apr_terminate, but they weren't
calling apr_initialize.  This left our static counter out of sync, which
is what caused the bug.

2)  We aren't calling apr_proc_mutex_child_init, which is what makes the
mutex available to the child process.  We were relying on inheritance to
get the mutex to the child process.  If we had been calling this function
(and the function was fixed to kill the cleanup if appropriate), then we
would have masked the previous bug.

At the end of the day, the test was bogus, and it wasn't good APR code,
which is what was causing the problem.  There is one change that should be
made to the underlying APR code to help hide this from the user, but the
real problem is in the test.

> There are two possible errors:
>
> - The application can fail to kill the cleanup, in which case a
>   problem may occur if apr_terminate runs.  Whether this is a problem
>   depends on what the child processes do, if they only exit in
>   exceptional circumstances the bug may not be readily apparent.
>
> - Or the application can fail to destroy the mutex, a fairly
>   conventional resource deallocation bug.
>
> I spent some time debugging a Subversion pool cleanup handler that
> erroneously deleted a temporary file when forking a child process, I
> know I would prefer the latter!

But the latter is MUCH worse when it comes to locks.  You only have so
many semaphores available to you.  A resource leak can be a real PITA to
track down, and I suspect that this general bug is biting Apache in a
couple of places because we do leak semaphores.  I currently have 12
semaphores in use by Apache.  When I stop the currently running server, I
have 9 in use (all by the web server).

Resource leaks suck, and they are just as hard to diagnose as doing
something too often.

The real fix for this problem is to fix the test code as I have stated
above so that it is a valid example of how to write APR code, and to fix
the library so that we register cleanups with a public function, and so
that apr_proc_mutex_child_init does the right thing in all cases.  I will
do the work later tonight (assuming my wife doesn't go into labor before
then) unless somebody beats me to it.

Ryan


Mime
View raw message