mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-9052) Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER calls.
Date Thu, 05 Jul 2018 19:54:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534077#comment-16534077
] 

Vinod Kone commented on MESOS-9052:
-----------------------------------

Instead of suicide, it should shutdown the current task group. Since one task/container failing
to launch shouldn't impact other task groups.

Also, should this be more generically applied to all calls from executor to agent or just
launch? 

cc [~gkleiman]

> Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER
calls.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9052
>                 URL: https://issues.apache.org/jira/browse/MESOS-9052
>             Project: Mesos
>          Issue Type: Bug
>          Components: executor
>    Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
>            Reporter: Chun-Hung Hsiao
>            Priority: Major
>
> If there is a network problem (e.g., a routing problem), it is possible that the agent
has received {{LAUNCH_NESTED_CONTAINER}} calls from the default executor and launched the
nested container, but the executor does not get the HTTP response. This would result in tasks
stuck at {{TASK_STARTING}} forever. We should consider making the default executor commit
suicide if it does not receive the response in a reasonable amount of time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message