mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Carey <aca...@ilm.com>
Subject Re: Non-checkpointing frameworks
Date Mon, 17 Oct 2016 08:36:22 GMT
+1 to A and B

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150


On 17 October 2016 at 00:38, Qian Zhang <zhq527725@gmail.com> wrote:

> and requires operators to enable checkpointing on the slaves.
>
>
> Just curious why operator needs to enable checkpointing on the slaves (I
> do not see an agent flag for that), I think checkpointing should be enabled
> in framework level rather than slave.
>
>
> Thanks,
> Qian Zhang
>
> On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji <zmanji@apache.org> wrote:
>
>> +1 to A and B
>>
>> Aurora has enabled checkpointing for years and requires operators to
>> enable
>> checkpointing on the slaves.
>>
>> On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere <
>> joris@mesosphere.io>
>> wrote:
>>
>> > I'm in favor of A & B. I find it provides a better "first experience" to
>> > users.
>> > From my experience you usually have to have an explicit reason to not
>> want
>> > to checkpoint. Most people assume the semantics provided by the
>> checkpoint
>> > behavior is default and it can be a frustrating experience for them to
>> find
>> > out that is not the case.
>> >
>> > —
>> > *Joris Van Remoortere*
>>
>> > Mesosphere
>> >
>> > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <neil.conway@gmail.com>
>> > wrote:
>> >
>> >> Hi folks,
>> >>
>> >> I'd like input from individuals who currently use frameworks but do
>> >> not enable checkpointing.
>> >>
>> >> Background: "checkpointing" is a parameter that can be enabled in
>> >> FrameworkInfo; if enabled, the agent will write the framework pid,
>> >> executor PIDs, and status updates to disk for any tasks started by
>> >> that framework. This checkpointed information means that these tasks
>> >> can survive an agent crash: if the agent exits (whether due to
>> >> crashing or as part of an upgrade procedure), a restarted agent can
>> >> use this information to reconnect to executors started by the previous
>> >> instance of the agent. The downside is that checkpointing requires
>> >> some additional disk I/O at the agent.
>> >>
>> >> Checkpointing is not currently the default, but in my experience it is
>> >> often enabled for production frameworks. As part of the work on
>> >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
>> >> considering:
>> >>
>> >> (a) requiring that partition-aware frameworks must also enable
>> >> checkpointing, and/or
>> >> (b) enabling checkpointing by default
>> >>
>> >> If you have intentionally decided to disable checkpointing for your
>> >> Mesos framework, I'd be curious to hear more about your use-case and
>> >> why you haven't enabled it.
>> >>
>> >> Thanks!
>> >>
>> >> Neil
>> >>
>> >> --
>> >> Zameer Manji
>> >>
>> >
>>
>
>

Mime
View raw message