mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anand Mazumdar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6848) The default executor does not exit if a single task pod fails.
Date Tue, 10 Jan 2017 22:18:58 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816362#comment-15816362
] 

Anand Mazumdar commented on MESOS-6848:
---------------------------------------

1.1.x backport
{noformat}
commit 6798ba84a766aeaae0c3f2df455e200c57ce2b28
Author: Anand Mazumdar <anand@apache.org>
Date:   Tue Jan 10 13:52:37 2017 -0800

    Added MESOS-6848 to CHANGELOG for 1.1.1.

commit 17d44f460a553d99edcfa3ff04aefde9e72ae8ea
Author: Anand Mazumdar <anand@apache.org>
Date:   Tue Jan 10 13:08:03 2017 -0800

    Fixed a bug in the default executor around not committing suicide.

    This bug is only observed when the task group contains a single task.
    The default executor was not committing suicide when this single task
    used to exit with a non-zero status code as per the default restart
    policy.

    Review: https://reviews.apache.org/r/55157/
{noformat}

> The default executor does not exit if a single task pod fails.
> --------------------------------------------------------------
>
>                 Key: MESOS-6848
>                 URL: https://issues.apache.org/jira/browse/MESOS-6848
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Anand Mazumdar
>            Assignee: Anand Mazumdar
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0
>
>
> If a task group has a single task and it exits with a non-zero exit code, the default
executor does not commit suicide.
> This mostly happens due to the fact that we invoke {{shutdown()}} in {{waited()}} when
we notice the termination of a single container here: https://github.com/apache/mesos/blob/master/src/launcher/default_executor.cpp#L666
> but then we return early here after executing all the kill calls: https://github.com/apache/mesos/blob/master/src/launcher/default_executor.cpp#L751
> However, when there is just one task in the task group, this won't result in {{__shutdown}}
being called ever leading to the executor committing suicide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message