mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anand Mazumdar (JIRA)" <>
Subject [jira] [Issue Comment Deleted] (MESOS-3302) Scheduler API v1 improvements
Date Wed, 25 May 2016 16:03:13 GMT


Anand Mazumdar updated MESOS-3302:
    Comment: was deleted

(was: [~guoger] Thanks for testing out the new API. Here are the answer to your queries:

- Restart leading master
  - For a non HA cluster, the behavior is expected. The scheduler library does not currently
follow a redirect but merely relies on the {{detector}} to let it know of a new master. So,
the behavior is expected and correctly works for a HA cluster as you pointed out.
  - We want to fix the behavior i.e. ensure there is a delay upon (re-)connection.

- Restart agent
- Currently, the long lived framework does not support moving existing tasks across agents.
However, it would be good to test that the executor is correctly recovered upon agent restart
with checkpointing enabled. If checkpointing is disabled, it should kill itself.
- Also, restarting the agent with {{--http_command_executor}} enabled/disabled, should still
successfully recover all the executors.

- Emulate network partitions
  -  I am assuming that when you say "the framework hangs", you just means that it does not
have anything to do?
  - "However there was once that agent keeps launching new tasks without framework being aware
of it during partition."
      This is expected. If a framework is partitioned from the master after sending  {{LAUNCH}}
messages, the agent would still go ahead and launch them. The framework would receive the
status updates for the running tasks upon re-registering since then agent keeps retrying the
updates every 10 mins. We currently do not implement any reconciliation in the long running
  - Also, it would be good to test the other one way partition, i.e. the framework is partitioned
away from the master.

Also, to reduce noise here on this improvement JIRA, we should create a google doc with the
testing details and link it to the JIRA? I would also add the testing details done by me to
that doc and consolidate them at one place. If it's easier for you, I can create the doc myself
and you can then add the details to it. Let me know what works for you.

> Scheduler API v1 improvements
> -----------------------------
>                 Key: MESOS-3302
>                 URL:
>             Project: Mesos
>          Issue Type: Epic
>            Reporter: Marco Massenzio
>              Labels: mesosphere, twitter
> This Epic covers all the refinements that we may want to build on top of the {{HTTP API}}
MVP epic (MESOS-2288) which was released initially with Mesos {{0.24.0}}.
> The tasks/stories here cover the necessary work to bring the API v1 to what we would
regard as "Production-ready" state in preparation for the {{1.0.0}} release.

This message was sent by Atlassian JIRA

View raw message