mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 60105: Added helper method recoverSlaveState.
Date Wed, 21 Jun 2017 16:39:18 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178388
-----------------------------------------------------------



PTAL at these comments but I realize for this review it's probably easier if we sync on the
high-level strategy first.


src/slave/slave.cpp
Lines 5798-5867 (original), 5798-5867 (patched)
<https://reviews.apache.org/r/60105/#comment252425>

    So all of this work is only useful for recovering frameworks, it looks like we don't need
to do them before we decide we actually are going to recover the frameworks (i.e., not continue
as a new agent)?
    
    This reasoning certainly also applies to the current HEAD but since we are donig refactor
here, we should probably address this (in a separate preceding review perhaps)



src/slave/slave.cpp
Lines 5945 (patched)
<https://reviews.apache.org/r/60105/#comment252322>

    An empty blank line between multi-line blocks of code like this. Here and elsewhere.



src/slave/slave.cpp
Lines 5945-5946 (patched)
<https://reviews.apache.org/r/60105/#comment252512>

    This comment seems to be misplaced, why does the code here cares? It just processes the
`slaveState` which could be optional regardless of what we have done for the reboot case.



src/slave/slave.cpp
Lines 5959 (patched)
<https://reviews.apache.org/r/60105/#comment252326>

    Open `{` on the next line.



src/slave/slave.cpp
Line 5941 (original), 5960 (patched)
<https://reviews.apache.org/r/60105/#comment252327>

    The indentation of the whole methods seems off?



src/slave/slave.cpp
Line 5950 (original), 5969-5974 (patched)
<https://reviews.apache.org/r/60105/#comment252429>

    As suggested by related comments, we are not removing the symlink here so we shouldn't
need to comment about it here. 
    
    "This is being done ...": here the comment describes the low-level mechanics but perhaps
we need a high level comment about "this" instead, which is that:
    
    Prior to Mesos 1.4 we bypass the state recovery and start as a new agent upon reboot (introduced
in MESOS-844); starting in Mesos 1.4 we'll attempt to recover the slave state even after reboot
(so we don't discard the agent ID unnecessarily) but in case of SlaveInfo mismatch we'll still
continue as a new agent so we don't introduce a regression for the reboot scenario.
    
    We can go on to explain the rationale: agent resource and attribute changes are more often
associated with host reboots and we don't want to agent to flap in such cases.
    
    But it's fine if we focus on improving the comments after we first get the code rigth.



src/slave/slave.cpp
Lines 5975 (patched)
<https://reviews.apache.org/r/60105/#comment252422>

    So at this point you still have the `slaveState` object and it has a `rebooted` field,
why do we need another `rebooted` field in `recoveryInfo` if this is the only place it's used?



src/slave/slave.cpp
Lines 5976-5983 (patched)
<https://reviews.apache.org/r/60105/#comment252427>

    We shouldn't need to construct the same message twice. If the same message can be used
either for LOG(INFO) or Failure then we can construct the message in a single place first
and decide how to use it / whether to add to it.



src/slave/slave.cpp
Lines 5987-5989 (patched)
<https://reviews.apache.org/r/60105/#comment252347>

    You don't need to do this here. If you continue as a new agent, it's going to update the
latest symlink when it's registered:
    
    https://github.com/apache/mesos/blob/3bea3fcb4eef00ce469ba4f27fcfd2d3eec05aea/src/slave/paths.cpp#L621-L622



src/slave/slave.cpp
Lines 5992 (patched)
<https://reviews.apache.org/r/60105/#comment252435>

    Would a None() not work?
    
    Also, I know I originally suggested a helper method but given how it looks right now,
perhaps it's less of a hassle if we keep it inline?
    
    Even if we keep it as a helper, as a one-use helper whose semantics is so tightly coupled
with the internals of `Slave::recover()`, perhaps it could just be done as a lambda.
    
    ```
    Try<Nothing> adjustSlaveState = [&slaveState]() { ... }();
    ```
    
    Let's try to inline it first?



src/slave/slave.cpp
Lines 5993-5995 (patched)
<https://reviews.apache.org/r/60105/#comment252526>

    You have to clear it here because it's assigned above prematurely. As the comment above
suggests, it's a hack for comparing the slave infos.
    
    Pehaps we should just make a copy of the SlaveInfo above for comparision. If we are going
to continue as a new agent, we are just **not** going to execute `info = slaveState->info.get();`
below?



src/slave/slave.cpp
Line 5965 (original)
<https://reviews.apache.org/r/60105/#comment252444>

    Why remove it?



src/slave/slave.cpp
Lines 6122-6125 (original)
<https://reviews.apache.org/r/60105/#comment252527>

    Why remove it?


- Jiang Yan Xu


On June 15, 2017, 12:31 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> -----------------------------------------------------------
> 
> (Updated June 15, 2017, 12:31 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
>     https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Here, a helper method recoverSlaveState is added to
> determine if the slave state is to be recovered or not.
> During agent recovery, if the SlaveInfo compatibility
> check fails and the agent is rebooted then to be
> backwards compatible with MESOS <= 1.3, since a
> a rebooted agent didn't recover state, we clear
> slave state and slaveId so the agent registers with the
> master as a new agent.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 7ffaed14035a05259ec72c70532ee4f0affa1f5d 
>   src/slave/slave.cpp 7d147ac6609933ac884bfc29032dba572a0952c6 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message