apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <...@rkbloom.net>
Subject Re: [PATCH] Re: apr_proc_mutex is broken
Date Wed, 20 Nov 2002 12:33:24 GMT

On Tue, 19 Nov 2002, Aaron Bannert wrote:

> On Tuesday, November 19, 2002, at 09:49  PM, <rbb@rkbloom.net> wrote:
> > By allocating the mutex in the pool they have already stated that the
> > pool
> > should control the mutex's scope.  That is the meaning of allocating a
> > varibale inside a pool.  This is why pools are passed to ALL functions.
> >
> > BTW, you can't leave this up to the app.  The abstraction layer makes
> > it
> > too complex to have the app take care of this.  I thought at first that
> > the app should just kill the cleanup, but that is too hard.  If you
> > make
> > the app register the cleanup, you will have code like:
> >
> > if (mutex->type == (fcntl || flock)) {
> >     register_cleanup
> > else
> >     no-op
> Why is that? Perhaps that means that apr_proc_mutex_t is the wrong
> abstraction for these types.

It is because like all abstractions, this one isn't perfect.  However, it
is the best we have been able to find.  That is OK, but you have to
realize it.  Read this for more explanations about why abstractions leak:

The underlying technical reason is that some mutexes need to be cleaned up
in the child while others do not.  That is fine, but hide that from the
user.  The way to do that is with the logic changes that I have described
in detail already.

> > If you read ALL of those functions, they are all:
> >
> >     if ((rv = proc_mutex_(.*)_cleanup(mutex)) == APR_SUCCESS) {
> >         apr_pool_cleanup_kill(mutex->pool, mutex, (.*)_sysv_cleanup);
> >         return APR_SUCCESS;
> >     }
> >     return rv;
> >
> > which means that the cleanup as it is registered today is equivalent to
> > apr_proc_mutex_destroy.  The functions that they call are equivalent.
> > I
> > am making the change now to use a public function for the cleanup,
> > because
> > anything else is completely bogus.
> So there will have to be some internal changes (no critical interface
> changes)
> to make this work. Fine, but what will this break anything else?

Yes, and the internal changes are simple.  I have them mostly done, but I
got tired.  I'll try to finish them today sometime.  No promises, it may
take a few days.

> Let's talk about some use cases:
> 1) parent owns semaphore
>     children created and destroyed, after which parent destroys semaphore

This is the most common case, and thus the case that we default to.

> 2) parent creates semaphore and delegates ownership to a child

This is done by having the parent kill the cleanup (which is why the
cleanup must be done with a public function).  This is essentially
identical to what happens when you have _any_ resource that the parent
creates and then delegates to the child.  The parent must kill the
cleanup, and the child must register one.

> 3) one process creates semaphore, another non-related process attaches
> to
>     the semaphore, a third takes responsibility for destroying it
> ...

Same as case 2, except the child to register the cleanup is a different

> To me it seems the only way to support all of these use cases would be
> to allow the app to explicitly create and destroy their own semaphores
> whenever they want without any magic cleanup handlers happen
> unexpectedly.

No.  This is just how APR works, and it is how it has always worked.  By
default, we install cleanup handlers, because that ensures that we don't
leak resources.  If APR didn't register the cleanup, then EVERY APR call
in Apache would look like:


And, APR would be annoying to use, because the programmer would have to
really think about resources.  APR gives the most common case for free,
the one where the allocator owns the resource.  If you want to change
that, we have a rich pool API that allows you to do so.  However, in order
for that to work, APR must use the pool API properly, and it isn't in the
mutex code.

> > If you read my messages from earlier today, I have already answered
> > all of
> > the questions that you posted in each of the 5 messages you just sent.
> > But one more time:
> >
> > The parent process allocates the mutex out of whatever pool should
> > control
> > the scope.
> This is not in dispute. In testprocmutex, the mutex was created in the
> global
> pool, and since the parent global pool lives longer than the children
> (We call waitpid to synchronize the children) it was expected that the
> mutex would live until the end. This was a false assumption.

Aaron, you specifically stated in the message I was responding to that the
pool shouldn't control the scope.  That is the only reason I went into all
of this again.  Please, don't say that it isn't in dispute when you are
the one disputing it.

> > The child processesmust call apr_proc_mutex_child_init.  That function
> > defines the scope of the mutex for the child process.  If the mutex
> > type
> > requires a cleanup in the child process, then one is registered.
> Again, not in dispute. This still needs to be added, btw.

I know.

> > As for the bug that was hit.  Programs _MUST_ have an equal number of
> > apr_initialize and apr_terminate calls.  If it doesn't, then bugs like
> > this will happen.
> I'm glad we worked through and found this to be the problem, since it
> has shown me how broken this part of APR is. Could we solve this by
> having apr_proc_create() increment the global counter? Are there
> resources that will not be cleaned up in children if we always
> call ap_initialize() in child processes (eg. when a process dies
> on Windows, will it close all its files and sockets?)?

I am really sick of hearing you say "how broken ______ is."  Whenever you
find a part of the code that you didn't help to design/write, you complain
that it is horribly broken.  It isn't.  The apr_initialize/apr_terminate
code was added for a reason.  They need to balance each other out.  We
could have the apr_proc_(fork|create) calls call apr_initialize, but that
wouldn't solve this problem, because there is still fork(), which we have
no control over.  It would be much better to just document this problem

Unless you can find a way to do this cleanly, hiding it behind the
apr_proc_foo calls is just going to make matters worse.


View raw message