mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <benjamin.mah...@gmail.com>
Subject Re: MESOS-3545: Investigate restoring tasks/executors after machine reboot.
Date Fri, 06 Nov 2015 21:47:27 GMT
Any reason that this doesn't mention the executor failing during the steady
state? I assume that the desire here is more generally to restart executors
according to a policy, without having the round-trip back to the scheduler
which may not be successful in many circumstances.

Also, any reason that this is focused on tasks instead of executors? It's
not clear to me what the semantics around restarting tasks are. Currently
we only persist a stripped version of TaskInfo called Task, which makes
task re-delivery impossible. Even if we persisted the potentially large
TaskInfos, does it make sense to re-deliver them? That seems to suggest
tasks are idempotent in the executor? If we don't re-deliver, are the
executors expected to checkpoint task state themselves across their own
restarts? When the executor restarts, are all the tasks considered
restarted but still not terminal?

Have you explored whether it makes sense to have the executor be
restartable vs the notion of a "persistent task"?

On Fri, Nov 6, 2015 at 12:10 PM, Anindya Sinha <anindya.sinha@gmail.com>
wrote:

> As discussed with couple of folks yesterday, I just wanted to surface this
> thread to the top of  the dev@mesos list. I would really appreciate if we
> could have some attention on this proposal so that we can make progress on
> this JIRA.
>
> Thanks
> Anindya/Megha
>
> On Mon, Nov 2, 2015 at 10:37 AM, Megha Sharma <
> megha.hitesh.sharma@gmail.com
> > wrote:
>
> > Hi All,
> > I was wondering if you got a chance to look at the design doc for mesos
> > jira
> > - 3545 to handle restart of tasks/executors in the event of slave reboot
> or
> > disconnection from the master. Please take the time to comment.
> >
> > https://issues.apache.org/jira/browse/MESOS-3545
> >
> > Design doc:
> >
> >
> https://docs.google.com/document/d/1l7goeISpYmCjM03l20lmjZ6_BMfdxBs31znEBRtzsuU/edit#heading=h.1i2fqek1ko3e
> >
> > Thanks
> > Megha Sharma
> >
> > On Fri, Oct 23, 2015 at 10:17 AM, Megha Sharma <
> > megha.hitesh.sharma@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I have posted the initial design draft for mesos jira - 3545 to handle
> > > restart of tasks/executors in the event of slave reboot or
> disconnection
> > > from the master. Please take the time to comment or provide feedback.
> > >
> > > https://issues.apache.org/jira/browse/MESOS-3545
> > >
> > > Thanks
> > > Megha Sharma
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message