mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neil Conway (JIRA)" <>
Subject [jira] [Assigned] (MESOS-6676) Always re-link with scheduler during re-registration
Date Wed, 07 Dec 2016 19:48:58 GMT


Neil Conway reassigned MESOS-6676:

    Assignee: Neil Conway

> Always re-link with scheduler during re-registration
> ----------------------------------------------------
>                 Key: MESOS-6676
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>              Labels: mesosphere
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and is assigned
a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This could happen
due to some transient network error, e.g., 1-way partition. The master sends a {{FrameworkErrorMessage}}
to the framework. The master marks the framework as disconnected, but keeps the {{Framework*}}
for it around in {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is dropped by
the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master link,
but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the master. It
doesn _not_ set the {{force}} flag. This means we follow [this code path|]
in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> scheduler
link is never re-established.

This message was sent by Atlassian JIRA

View raw message