subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan <>
Subject Re: ra-test.exe deadlock condition
Date Sun, 05 Feb 2017 19:57:35 GMT
On 2/4/2017 12:41, Stefan Fuhrmann wrote:
> On 31.01.2017 10:09, Stefan wrote:
>> Hi,
>> I've been looking at the cause of a deadlock when running ra-test.exe
>> with -fs-type=fsx (trunk version).
>> The most important findings are summed up here atm [1].
>> The issue was discussed with brane and danielsh on IRC (thanks for your
>> time, once again).
>> As far as my current understanding of the problem goes: the deadlock is
>> caused by the fact that the apr_terminate() function registered in
>> svn_cmdline_init() via the atexit-call is called after the termination
>> of the threads which were created as part of the calls to
>> apr_thread_pool_push() in svn_fs_x__batch_fsync_run().
>> This means that apr's thread counter (thd_cnt) is getting out of sync
>> (since the apr-function thread_pool_func() is not executed) and then
>> gets stuck in thread_pool_cleanup() (waiting for the already terminated
>> threads to be terminated).
>> To me it looks like svnserve's main-function already contains a
>> safeguard against a corresponding issue, and calls
>> apr_thread_pool_destroy(threads) (or was this a completely different
>> scenario?). This however does not cover the threads created from
>> svn_fs_x__batch_fsync_run().
>> Talking to danielsh and brane it became apparent to me that the issue
>> might not be too obvious (in the end it might still be an issue on how I
>> build SVN and therefore cause the atexit-registered apr_terminate()
>> function to be called too late). It's also not fully clear to me at
>> which exact point (in regards to registerd atexit()-calls) threads of
>> the process are terminated if the process itself terminates. If indeed
>> atexit()-registered functions get called after the threads are forcibly
>> terminates (which to me it looks like it does atm) it might contradict
>> the C(89/99) standard - see[2] On the other side this
>> thread on stackoverflow [3] suggests it's simply undefined (by the
>> standard) what comes first.
>> As danielsh suggested, I'm planning to come up with a plain minimal
>> repro app only based on APR demonstrating the problem, so to make it
>> more obvious (and double check for myself) what the issue is about.
>> Regards,
>> Stefan
>> [1]
>> [2]
>> [3]
> Hi Stefan,
> I had a look at the code and found a possibly related problem.
> If you are using DLLs, this might have affected you.
> It would be nice if you could try r1781657 and see whether it
> makes any difference in your case.
> -- Stefan^2.

Hi Stefan^2,

I tested trunk r1781790 which also includes your follow-up commit
(r1781726). With that one the ra-test.exe test which previously
deadlocked passes now. However, test 60 ( deadlocks now
(svnmucc.exe seems to be the process which is being tested here).

I'm planning to details of the underlying issue which I think has now
been traced down to the actual root-cause in a blog post most likely
tomorrow. That should explain the actual issue in full detail then.


View raw message