mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8125) Agent should properly handle recovering an executor when its pid is reused
Date Tue, 30 Jan 2018 15:39:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345240#comment-16345240
] 

Qian Zhang commented on MESOS-8125:
-----------------------------------

I have another proposal: In `DockerContainerizerProcess::_recover`, try to connect to the
executor with a TCP connection, if succeeds, reap the executor process, otherwise, set `container->status`
to `None()`.

Here is RR: https://reviews.apache.org/r/65382/

> Agent should properly handle recovering an executor when its pid is reused
> --------------------------------------------------------------------------
>
>                 Key: MESOS-8125
>                 URL: https://issues.apache.org/jira/browse/MESOS-8125
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Gastón Kleiman
>            Assignee: Qian Zhang
>            Priority: Critical
>
> Here's how to reproduce this issue:
> # Start a task using the Docker containerizer (the same will probably happen with the
command executor).
> # Stop the corresponding Mesos agent while the task is running.
> # Change the executor's checkpointed forked pid, which is located in the meta directory,
e.g., {{/var/lib/mesos/slave/meta/slaves/latest/frameworks/19faf6e0-3917-48ab-8b8e-97ec4f9ed41e-0001/executors/foo.13faee90-b5f0-11e7-8032-e607d2b4348c/runs/latest/pids/forked.pid}}.
I used pid 2, which is normally used by {{kthreadd}}.
> # Reboot the host



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message