apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mladen Turk <mt...@apache.org>
Subject Re: Changing the order of cleanup for some core objects
Date Tue, 22 Jul 2008 06:42:34 GMT
Bojan Smojver wrote:
> On Mon, 2008-07-21 at 22:05 +1000, Bojan Smojver wrote:
>> For example...
> After thinking about it a bit more, it appears that we really should not
> need to fiddle with locking here. Essentially, when we call this, we can
> assume that root pool and below will not be modified by another thread
> (because we can always control what root pool is), so we can just go
> ahead and search in peace.
> Comments?

In the original pool_destroy the pool->sibling access is guarded
by mutex. Since in multithreaded environment child pool might be in
the middle of destroy process detaching himself, think you'll have
to guard the access to pool->sibling in destroy_safe as well

Also there is a problem if the root pool gets destroyed in which case
you'll be accessing zombie memory, so I don't think this will help.

As an example I'll give you Tomcat Native APR connector.
It gets loaded in JVM as an module( .jar + native libraries),
so the module actually uses APR, not the application.
The presumption one controls the application
and apr_initialize/apr_terminate doesn't stand any more, cause module
can be loaded/unloaded many times during the application lifetime.

Since apr_terminate (just an edge case example) can happen at
user choice inside different thread, and there is multiple threads
in the middle of the blocking APR call (accepting socket connections
for example), after the native call breaks one cannot be sure that
both local pool and global (root) pool will be valid.

The problem as I see it, requires some event mechanism, because the
callback is actually a 'message post' to another thread, causing the
callback result to actually execute effectively at some future time,
and due to busyness and thread context switching this can lead to
nasty sporadic cores that we observe nowadays.

The reason is because in one thread apr first destroys child pools in
one quick loop and then go immediately to another loop that calls the
callbacks. If the system is very busy this can cause nasty sync issues
because the function like accept can break (caused by pool destroy)
after you call the registered callback, and since those are executed in
the context of another thread you really have no idea what's going on :)

Perhaps we'll need some sort of event mechanism for callbacks that would
cause waiting before going to another callback in the loop or something
like that. Guarding that externally would make things basically single
threaded and that would be performance killer.
I've got close to solving the issues by having the atomic counter
for each long native function call, causing the apr_pool_destroy to
wait for all native calls to exit, but that's a nightmare to maintain
and write the user code.


View raw message