mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone" <vinodk...@gmail.com>
Subject Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.
Date Wed, 04 Nov 2015 17:01:19 GMT


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 6004-6013
> > <https://reviews.apache.org/r/39873/diff/2/?file=1114216#file1114216line6004>
> >
> >     Just looking at the patch that introduced this bug, why are we removing the
out-of-order update prevention? Don't see any mention of this in the stuff around https://issues.apache.org/jira/browse/MESOS-2864.

There are couple reasons for this

1) it's hard to figure out if an update is an out of order update. for example, if TASK_STARTING,
TASK_RUNNING and TASK_FINISHED are all enqued on the slave, the master might receive non-terminal
(TASK_RUNNING) and terminal (TASK_FINISHED) after it transitioned the task to terminal state
(TASK_STARTING update w/ latest state as TASK_FINISHED).

2) when updateTask() is called, typically an update is being forwarded to the scheduler. so
while it might be ok to not transition the (latest) state of the task, it is important to
set the task.status_update_state and task.status_update_uuid correctly. so we shouldn't short-circuit
the function without setting these.

makes sense?


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, line 6039
> > <https://reviews.apache.org/r/39873/diff/2/?file=1114216#file1114216line6039>
> >
> >     Why not have the same logic here to be defensive? Or do you intend to guard
against it? Just seems weird to allow it in one of the cases but not the other.

i didn't do it in the master's case because it would be a bug in the code if the master is
trying to call updateTask() for an already terminated task! we have to do it for updates from
slave because we don't control the executor's updates. i could make it a CHECK for the master
case though.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105027
-----------------------------------------------------------


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
>     https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The master now doesn't change the latest state of a task if it has already terminated.
But it still updates the status update state and uuid.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of the time
without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message