mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-5763) Task stuck in fetching is not cleaned up after --executor_registration_timeout.
Date Fri, 01 Jul 2016 23:21:10 GMT

    [ https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359818#comment-15359818
] 

Jie Yu commented on MESOS-5763:
-------------------------------

Yep, definitely a bug to me. We'll need to backport it to 0.28.x and 0.27.x. Older releases
are no longer supported.

> Task stuck in fetching is not cleaned up after --executor_registration_timeout.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-5763
>                 URL: https://issues.apache.org/jira/browse/MESOS-5763
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.0, 1.0.0, 0.29.0
>            Reporter: Yan Xu
>            Assignee: Yan Xu
>            Priority: Critical
>             Fix For: 0.28.3, 1.0.0, 0.27.4
>
>
> When the fetching process hangs forever due to reasons such as HDFS issues, Mesos containerizer
would attempt to destroy the container and kill the executor after {{--executor_registration_timeout}}.
However this reliably fails for us: the executor would be killed by the launcher destroy and
the container would be destroyed but the agent would never find out that the executor is terminated
thus leaving the task in the STAGING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message