mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7921) process::EventQueue sometimes crashes
Date Fri, 01 Sep 2017 20:47:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151119#comment-16151119
] 

Yan Xu commented on MESOS-7921:
-------------------------------

So libprocess GC would delete the managed process upon their exit: https://github.com/apache/mesos/blob/1ae308c2f1344d9e62e094ab11cc195c96eb5c04/3rdparty/libprocess/include/process/gc.hpp#L45

{code:title=}
  virtual void exited(const UPID& pid)
  {
    if (processes.count(pid) > 0) {
      const ProcessBase* process = processes[pid];
      processes.erase(pid);
      delete process;
    }
  }
{code}

What happens when another process who's waiting on it donates the thread to this process which
is terminated after it is extracted from the run queue? Could it be destructed before resuming
it?
 https://github.com/apache/mesos/blob/1ae308c2f1344d9e62e094ab11cc195c96eb5c04/3rdparty/libprocess/src/process.cpp#L3581-L3587

{code:title=}
  if (process != nullptr) {
    VLOG(2) << "Donating thread to " << process->pid << " while waiting";
    ProcessBase* donator = __process__;
    resume(process);
    running.fetch_sub(1);
    __process__ = donator;
  }
{code}

> process::EventQueue sometimes crashes
> -------------------------------------
>
>                 Key: MESOS-7921
>                 URL: https://issues.apache.org/jira/browse/MESOS-7921
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 1.4.0
>         Environment: autotools,gcc,--verbose,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)
> Note that --enable-lock-free-event-queue is not enabled.
> Details: https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/
>            Reporter: Yan Xu
>            Priority: Blocker
>         Attachments: FetcherCacheTest.CachedCustomOutputFileWithSubdirectory.log.txt,
MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt
>
>
> The following segfault is found on [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/]
in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky and shows up
in other tests and environments (with or without --enable-lock-free-event-queue) as well.
> {noformat: title=Configuration}
> ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck
> {noformat}
> {noformat:title=}
> *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are using GNU
date ***
> PC: @     0x2b9e2581caa0 process::EventQueue::Consumer::empty()
> *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack trace:
***
>     @     0x2b9e29d26330 (unknown)
>     @     0x2b9e2581caa0 process::EventQueue::Consumer::empty()
>     @     0x2b9e25800a40 process::ProcessManager::resume()
>     @     0x2b9e2580f891 process::ProcessManager::init_threads()::$_9::operator()()
>     @     0x2b9e2580f7d5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>     @     0x2b9e2580f7a5 std::_Bind_simple<>::operator()()
>     @     0x2b9e2580f77c std::thread::_Impl<>::_M_run()
>     @     0x2b9e29fe5a60 (unknown)
>     @     0x2b9e29d1e184 start_thread
>     @     0x2b9e2a851ffd (unknown)
> make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped)
> {noformat}
> A builds@mesos.apache.org query shows many such instances: https://lists.apache.org/list.html?builds@mesos.apache.org:lte=1M:process%3A%3AEventQueue%3A%3AConsumer%3A%3Aempty



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message