mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun-Hung Hsiao <chhs...@mesosphere.io>
Subject Re: Convention for Backward Compatibility for New Operations in Mesos 1.6
Date Mon, 16 Apr 2018 21:33:13 GMT
If we do option 1, then there will be no agent crash since the master won't
send any unknown operation to an old agent,
so option 2 is not a necessity.

On Mon, Apr 16, 2018 at 2:12 PM, Silas Snider <swsnider@apple.com> wrote:

> I think we should definitely do option 2 regardless of whether we do
> option 1 as well, since although in this case it will still crash 1.5.0, at
> least in the future we won't have to have this worry again.
>
> ´╗┐On 4/16/18, 2:04 PM, "Chun-Hung Hsiao" <chhsiao@apache.org> wrote:
>
>     Hi all,
>
>     As some might have already known, we are currently working on patches
> to
>     implement the new GROW_VOLUME and SHRINK_VOLUME operations [1].
>
>     One problem surfaces is that, since the new operations are not
> supported in
>     Mesos 1.5, they will lead to an agent crash during the operation
> application
>     cycle if a Mesos 1.6 master send these operations to a Mesos 1.5 agent
> [2].
>
>     We are now consider two possibilities to address this compatibility
> problem:
>
>     1) The Mesos 1.6 master should check the agent's Mesos version in
>     `Master::accept` [3]. Moving forward, if we add new operations in
> future
>     Mesos
>     releases, we would have code like the following:
>
>     ```
>     Version slaveVersion = ...; // Get the Mesos version of the slave of
> the
>     offer.
>     switch (operation.type()) {
>       ...
>       case SOME_NEW_OPERATION: {
>         if (slaveVersion < minVersionForSomeNewOperation) {
>           ... // Drop the operation.
>         }
>         break;
>       }
>       ...
>     }
>     ```
>
>     Pros and cons:
>     + The new operation won't go into the operation application cycle
> since it
>     is
>       rejected in the very beginning. This means no resource metadata is
>     touched.
>     - Explicit slave version checks at master side make the code look not
> very
>     clean,
>       and we will need to update this list every time we add a new
> operation.
>
>     2) Treat this issue as an agent crash bug. The Mesos master would
> forward
>     the operation to the agent, regardless of the agent's Mesos version.
> In the
>     agent,
>     we deploy and backport the following logic in `Slave::applyOperation`
> [4]:
>
>     ```
>     if (message.operation_info().type() == OPERATION_UNKNOWN) {
>       ... // Drop the operation and trigger a re-registration or send an
>           // `UpdateSlaveMessage` to force the master to update the total
>     resource of
>           // the slave.
>     }
>     ```
>
>     Pros and cons:
>     + Easier to add new operations since no new logic needs to be added for
>     backward
>       Compability.
>     - Since the old agent won't know whether the new operations are
> speculative
>     or not,
>       a re-registration or an `UpdateSlaveMessage` is required.
>     - Mesos 1.5.0 agents will still have the bug and crash when a new
> master
>     sends a
>       new operation to them.
>
>     Since both options are viable and there seems to be no clear winner,
> we'd
>     like to
>     check with the community to see which convention is preferable. Please
> let
>     us know
>     what you think. Thanks!
>
>     Best,
>     Chun-Hung
>
>
>     [1] https://issues.apache.org/jira/browse/MESOS-4965
>     [2]
>     https://github.com/apache/mesos/blob/1.5.x/src/common/
> protobuf_utils.cpp#L851
>     [3] https://github.com/apache/mesos/blob/master/src/master/
> master.cpp#L3899
>     [4] https://github.com/apache/mesos/blob/1.5.x/src/slave/
> slave.cpp#L4359
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message