mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8729) Libprocess: deadlock in process::finalize
Date Tue, 27 Mar 2018 01:42:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414876#comment-16414876
] 

Benjamin Mahler commented on MESOS-8729:
----------------------------------------

Looking at the last stack:
 
{color:#000000}...{color}
{color:#000000}#8 0x00007f09d2ac1aac in synchronize<std::recursive_mutex> () at ../../3rdparty/stout/include/stout/synchronized.hpp:58
#9 0x00007f09d492c37b in process::ProcessManager::use () at ../../../3rdparty/libprocess/src/process.cpp:2520
#10 0x00007f09d492e955 in process::ProcessManager::deliver () at ../../../3rdparty/libprocess/src/process.cpp:2775
// Trying to get a reference but blocked on the lock.{color}
...
#66 0x00007f09d492e988 in process::ProcessManager::deliver () at [../../../3rdparty/libprocess/src/process.cpp:2776
|https://github.com/apache/mesos/blob/2e2e38628c1b580a231ddac5270f9848ea4af7af/3rdparty/libprocess/src/process.cpp?utf8=%E2%9C%93#L2776]//
XXX Holds a reference!
...
 
This thread is doing a deliver (while holding a reference) and synchronously calls back into
deliver and blocks on the lock while holding a reference. The first thread is therefore stuck
spinning under the lock and the reference will never be released.
 
{color:#000000}I understand the issue now but haven't thought through a fix.{color}

> Libprocess: deadlock in process::finalize
> -----------------------------------------
>
>                 Key: MESOS-8729
>                 URL: https://issues.apache.org/jira/browse/MESOS-8729
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 1.6.0
>         Environment: The issue has been reproduced on Ubuntu 16.04, master branch, commit
`42848653b2`. 
>            Reporter: Andrei Budnik
>            Priority: Major
>              Labels: deadlock, libprocess
>         Attachments: deadlock.txt
>
>
> Since we are calling [`libprocess::finalize()`|https://github.com/apache/mesos/blob/02ebf9986ab5ce883a71df72e9e3392a3e37e40e/src/slave/containerizer/mesos/io/switchboard_main.cpp#L157]
before returning from the IOSwitchboard's main function, we expect that all http responses
are going to be sent back to clients before IOSwitchboard terminates. However, after [adding|https://reviews.apache.org/r/66147/]
`libprocess::finalize()` we have seen that IOSwitchboard might get stuck in `libprocess::finalize()`.
See attached stacktrace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message