mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Peach (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-7160) Parsing of perf version segfaults
Date Tue, 27 Jun 2017 00:12:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063895#comment-16063895
] 

James Peach edited comment on MESOS-7160 at 6/27/17 12:11 AM:
--------------------------------------------------------------

This morning, my VM doesn't reproduce this, however it definitely happened :)

The normal code path is that the {{exec}} failure causes an abort. The supervisor then gets
SIGTERM (need to read more code to see why). The signal handler it has installed issued SIGKILL.
If the SIGTERM delivery is delayed, then the second abort in the supervisor could trigger.

{noformat}
[pid  2738] execve("/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */ <unfinished
...>
[pid  2737] wait4(2738,  <unfinished ...>
[pid  2738] <... execve resumed> )      = -1 ENOENT (No such file or directory)
[pid  2738] execve("/usr/sbin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1
ENOENT (No such file or directory)
[pid  2738] execve("/usr/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT
(No such file or directory)
[pid  2738] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2738, si_uid=0} ---
...
[pid  2737] <... wait4 resumed> 0x7f27e8901f44, 0, NULL) = ? ERESTARTSYS (To be restarted
if SA_RESTART is set)
[pid  2737] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2708, si_uid=0} ---
[pid  2738] +++ killed by SIGKILL +++
[pid  2737] +++ killed by SIGKILL +++
{noformat}

The SIGTERM is sent by {{Perf::finalize}}


was (Author: jamespeach):
This morning, my VM doesn't reproduce this, however it definitely happened :)

The normal code path is that the {{exec}} failure causes an abort. The supervisor then gets
SIGTERM (need to read more code to see why). The signal handler it has installed issued SIGKILL.
If the SIGTERM delivery is delayed, then the second abort in the supervisor could trigger.

{noformat}
[pid  2738] execve("/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */ <unfinished
...>
[pid  2737] wait4(2738,  <unfinished ...>
[pid  2738] <... execve resumed> )      = -1 ENOENT (No such file or directory)
[pid  2738] execve("/usr/sbin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1
ENOENT (No such file or directory)
[pid  2738] execve("/usr/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT
(No such file or directory)
[pid  2738] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2738, si_uid=0} ---
...
[pid  2737] <... wait4 resumed> 0x7f27e8901f44, 0, NULL) = ? ERESTARTSYS (To be restarted
if SA_RESTART is set)
[pid  2737] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2708, si_uid=0} ---
[pid  2738] +++ killed by SIGKILL +++
[pid  2737] +++ killed by SIGKILL +++
{noformat}


> Parsing of perf version segfaults
> ---------------------------------
>
>                 Key: MESOS-7160
>                 URL: https://issues.apache.org/jira/browse/MESOS-7160
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Bannier
>            Assignee: Andrei Budnik
>
> Parsing the perf version [fails with a segfault in ASF CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed to execute
perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message