mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Mahler" <benjamin.mah...@gmail.com>
Subject Re: Review Request 18675: Added a Recovery Operation to the Registrar to force a Registry version change during recovery.
Date Fri, 07 Mar 2014 01:41:11 GMT


> On March 4, 2014, 12:49 a.m., Vinod Kone wrote:
> > src/master/registrar.cpp, lines 222-227
> > <https://reviews.apache.org/r/18675/diff/1/?file=507859#file507859line222>
> >
> >     The semantics here are a bit hard to follow.
> >     
> >     How about instead of "recovering" variable you make 'recovered' Option<Promise<Registry>
>? One side benefit is that when some one calls admit() etc with calling recover() first
you could immediately return a failure if 'recovered.isNone()'. I think this explicitness
is better than having those calls waiting forever?
> >     
> >     Also, why does it have to be MasterInfo? IIUC, all you want is someone to update
the version of the registry. Can we just use a "bool" or "UUID" instead of MasterInfo?
> 
> Ben Mahler wrote:
>     1. Promises are not copyable or assignable, so you cannot use a Promise inside an
Option, unless you make it a pointer.
>     
>     2. The calls wait forever only if a call to recover() does not occur. We could, as
you said, fail any operations that are initiated by introducing another check in the admit/readmit/remove
methods:
>     
>     Future<bool> RegistrarProcess::admit(const SlaveInfo& info)
>     {
>       if (!info.has_id()) {
>         return Failure("SlaveInfo is missing the 'id' field");
>       }
>     
>       if (recovered.isPending() && !recovering) {
>         return Failure("Attempted to admit prior to recovering");
>       }
>     
>       return recovered.future()
>         .then(defer(self(), &Self::_admit, info));
>     }
>     
>     However, the old semantics were to provide "auto-recovery". We could still provide
auto-recovery if we had a factory function that allows the Master to construct a Registrar
by passing MasterInfo.
>     
>     3. You are correct, we technically do not need MasterInfo, but flipping a boolean
seems hacky, and a UUID is a bit out of place. The UUID would change with each leader election,
but what also changes with each leader election? The MasterInfo. This was why I went with
MasterInfo.
>     
>     Let me know your thoughts on these points!
> 
> Ben Mahler wrote:
>     We could use an Option<Owned<Promise>> to eliminate the need for a 'recovering'
boolean:
>     
>     if (!recovering && recovered.future.isPending()) // Before.
>     if (recovered.isNone()) // After.
>     
>     I'll look at cleaning this up.
> 
> Vinod Kone wrote:
>     The argument for 3) seems a bit backwards. You actually had to complicate the code
(needing factoring functions and 2)) because you are using MasterInfo. Why use it in the first
place when you don't have to and go through hoops? I don't see why a UUID is out of place?
>     
>     Looking at it in a different way, a registry is a place to store persistent information
that other masters can use to recover state. A new master doesn't need to recovery an old
master's MasterInfo. The MasterInfo is just present to resolve conflicts and seems like an
overkill.

Thanks for bringing this up, I just realized that it is sufficient to simply 'store' what
we 'fetch' in order to cause a UUID change on the underlying state entry, note the comment
here:

inline process::Future<Option<Variable> > State::store(const Variable& variable)
{
  // Note that we try and swap an entry even if the value didn't change!
  UUID uuid = UUID::fromBytes(variable.entry.uuid());

  // Create a new entry to replace the existing entry provided the
  // UUID matches.
  Entry entry;
  entry.set_name(variable.entry.name());
  entry.set_uuid(UUID::random().toBytes());
  entry.set_value(variable.entry.value());

  return storage->set(entry, uuid)
    .then(lambda::bind(&State::_store, entry, lambda::_1));
}

Surprised we didn't notice this earlier! Perhaps we should make this more clear in the state
API.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18675/#review36069
-----------------------------------------------------------


On March 6, 2014, 8:12 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18675/
> -----------------------------------------------------------
> 
> (Updated March 6, 2014, 8:12 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-764
>     https://issues.apache.org/jira/browse/MESOS-764
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> As a safety measure, we would like to force a version change on the Registry as a result
of the recovery process. This helps prevent a "rogue" Master (that still believes it is leading)
from performing writes to the Registry.
> 
> This change adds a 'Recover' operation which adds the latest MasterInfo to the Registry.
Unfortunately, since the Registrar is injected to the Master, I had to add 'MasterInfo' as
an argument to the 'recover' function, thus altering the design somewhat.
> 
> The current semantics are that there one must call 'recover' before calling other operations,
otherwise these other operations will fail.
> 
> 
> Diffs
> -----
> 
>   src/master/registrar.hpp 20734afc69055197e9ab90d42253c56e4af4b97c 
>   src/master/registrar.cpp 37337c07b24a96e71910b7c83085d159361a1188 
>   src/tests/registrar_tests.cpp 3bf42bd77a10470a2afc6fd8e1da30d6134e792c 
> 
> Diff: https://reviews.apache.org/r/18675/diff/
> 
> 
> Testing
> -------
> 
> Added a small test.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message