apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Branko Čibej <br...@apache.org>
Subject Re: [PATCH] deadlocking condition with APR_HAS_THREADS
Date Thu, 09 Feb 2017 07:17:00 GMT
On 09.02.2017 03:19, William A Rowe Jr wrote:
> If you don't re-join your threads to the parent thread, I can't say I
> believe that this represents a bug. But that's just my 2c from the original
> architecture.

The problem is that on Windows, threads have been killed before
apr_thread_join() is called. That causes apr_thread_join() to return
APR_INCOMPLETE, and the thread pool cleanup code does not account for that.

The long story is that we recommend the following code to initialize and
clean up APR:

    atexit(apr_terminate);
    apr_initialize();

The thread pool constructor registers a pre-cleanup handler so
theoretically, when apr_terminate() is called, the worker threads will
be join()'ed. But on Windows, threads are killed before atexit handlers
are called, causing the thread pool destructor to wait forever for
threads that are in fact not live any more.


I'd say the bug is in the thread pool destructor since it does not
behave correctly on one of the platforms it's supposed to work on.

-- Brane

> On Feb 8, 2017 4:30 PM, "Stefan" <luke1410@posteo.de> wrote:
>
>> On 2/2/2017 14:45, Branko Čibej wrote:
>>> On 02.02.2017 13:30, Stefan Hett wrote:
>>>> On 2/2/2017 1:24 PM, Branko Čibej wrote:
>>>>> On 02.02.2017 12:49, Stefan Hett wrote:
>>>>>> On 2/2/2017 12:04 PM, Nick Kew wrote:
>>>>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> the issue was discovered as part of tracing down a deadlock
>>>>>>>> condition in
>>>>>>>> an SVN test [1].
>>>>>>>>
>>>>>>>> As far as I understand it, the problem lies in the C-standard
not
>>>>>>>> explicitly defining when a function registered with atexit()
is
>>>>>>>> called
>>>>>>>> in the context of thread termination [2].
>>>>>>> I had no idea atexit() was called on thread termination.  I guess
the
>>>>>>> manpage must be misleading in telling us it happens at process
exit?
>>>>>> I expressed myself poorly a bit here. I didn't want to suggest
>>>>>> atexit()-registered functions are called on thread termination but
>>>>>> rather that threads can be terminated before the atexit-registered
>>>>>> functions are called.
>>>>>> So if apr_terminate is registered in a function called from a DLL
via
>>>>>> atexit(), it can happen (and in my scenario does happen) that threads
>>>>>> got terminated before apr_terminate() is called.
>>>>> But that should not matter. apr_thread_join() does not terminate a
>>>>> thread, it waits for the thread to terminate. The thread handle should
>>>>> remain valid until apr_thread_join() is called, even if the thread
>>>>> terminated before the join().
>>>>>
>>>> This is correct. The problem is that thd_cnt is not decremented
>>>> (thd->td is valid, apr_thread_join() waits until the thread is (which
>>>> it is already) and then returns APR_INCOMPLETE - nothing decrements
>>>> the thd_cnt value for the already terminated thread).
>>> It turns out that this is a bug in APR's thread pool code, and has been
>>> fixed on trunk. APR_INCOMPLETE is the _expected_ return code from
>>> apr_thread_join() on Windows in this case, as far as I can interpret the
>>> code.
>>>
>> As it turned out, the issue is actually still present on trunk. The full
>> details behind the problem are summed up in a blog post now:
>> http://www.luke1410.de/blog/?p=95
>>
>> What do you (or the others) think. Does this warrant the patch being
>> applied? I'd really not feel too good with leaving that issue unresolved
>> in APR, given there's a fix available; but I'd certainly not commit it,
>> if it's not getting support by the established community/committers.
>>
>> If someone things the patch needs to be improved (or a different
>> approach to the problem should be taken), I'm certainly offering to come
>> up with alternatives.
>>
>> Regards,
>> Stefan
>>


Mime
View raw message