mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anand Mazumdar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7102) Crash when sending a SIGUSR1 signal to the agent.
Date Wed, 15 Feb 2017 20:03:41 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868485#comment-15868485
] 

Anand Mazumdar commented on MESOS-7102:
---------------------------------------

Commit to 1.2.x branch
{noformat}
commit 7e5439d55fd89cb9336220d9a1847391384ea8d5
Author: Anand Mazumdar <anand@apache.org>
Date:   Fri Feb 10 15:41:11 2017 -0800

    Fixed a crash on the agent when handling the SIGUSR1 signal.

    There were some actors that were not being destructed when
    `finalize()` was being invoked. Also fixed the order of the
    destruction of objects i.e., in the reverse order of their
    creation.

    Review: https://reviews.apache.org/r/56525/
{noformat}

> Crash when sending a SIGUSR1 signal to the agent.
> -------------------------------------------------
>
>                 Key: MESOS-7102
>                 URL: https://issues.apache.org/jira/browse/MESOS-7102
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.2.0
>         Environment: ubuntu 16.04
>            Reporter: Anand Mazumdar
>            Assignee: Anand Mazumdar
>            Priority: Critical
>              Labels: mesosphere
>             Fix For: 1.3.0
>
>
> Looks like sending a {{SIGUSR1}} to the agent crashes it. This is a regression and used
to work fine in the 1.1 release. Note that the agent does unregisters with the master and
the crash happens after that.
> Steps to reproduce:
> - Start the agent.
> - Send it a {{SIGUSR1}} signal.
> The agent should crash with a stack trace similar to this:
> {noformat}
> I0209 16:19:46.210819 31977472 slave.cpp:851] Received SIGUSR1 signal from user gmann;
unregistering and shutting down
> I0209 16:19:46.210960 31977472 slave.cpp:803] Agent terminating
> *** Aborted at 1486685986 (unix time) try "date -d @1486685986" if you are using GNU
date ***
> PC: @     0x7fffbc4904fc _pthread_key_global_init
> *** SIGSEGV (@0x38) received by PID 88894 (TID 0x7fffc50c83c0) stack trace: ***
>     @     0x7fffbc488bba _sigtramp
>     @     0x7fe8a5d03f38 (unknown)
>     @        0x10b6d67d9 _ZZ11synchronizeINSt3__115recursive_mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E_clES6_
>     @        0x10b6d67b8 _ZZ11synchronizeINSt3__115recursive_mutexEE12SynchronizedIT_EPS3_ENUlPS1_E_8__invokeES6_
>     @        0x10b6d6889 Synchronized<>::Synchronized()
>     @        0x10b6d678d Synchronized<>::Synchronized()
>     @        0x10b6a708a synchronize<>()
>     @        0x10e2f148d process::ProcessManager::wait()
>     @        0x10e2e9a78 process::wait()
>     @        0x10b30614f process::wait()
>     @        0x10c9619dc mesos::internal::slave::StatusUpdateManager::~StatusUpdateManager()
>     @        0x10c961a55 mesos::internal::slave::StatusUpdateManager::~StatusUpdateManager()
>     @        0x10b1ab035 main
>     @     0x7fffbc27b255 start
> [1]    88894 segmentation fault  bin/mesos-agent.sh —master=127.0.0.1:5050
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message