mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone" <vinodk...@gmail.com>
Subject Re: Review Request 23912: Fix MESOS-947: Slave should properly handle a killTask() that arrives between runTask() and _runTask()
Date Tue, 05 Aug 2014 23:20:24 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23912/#review49658
-----------------------------------------------------------



File Attachment: stout patch - MESOS-947-stout.patch2
<https://reviews.apache.org//r/23912/#fcomment22>
    instead of attaching the stout patch, you should set the "depends on" field to the corresponding
review. review bot will then apply the reviews recursively.

- Vinod Kone


On Aug. 5, 2014, 10:45 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23912/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2014, 10:45 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-947
>     https://issues.apache.org/jira/browse/MESOS-947
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Fixes MESOS-947 "Slave should properly handle a killTask() that arrives between runTask()
and _runTask()".
> 
> Slave::killTask() did not check for task in question combination to be "pending" (i.e.
Slave::runTask had happened, but Slave::_runTask had not yet) and then erroneously assumed
that Slave::runTask() had not been executed. The task was then marked "LOST" instead of "KILLED".
But Slave::runTask had already scheduled Slave::_runTask to follow. Now the entry for being
"pending" is removed, and the task is marked "KILLED", and _runTask gets informed about this.
It checks whether the task in question is currently "pending" and if it is not, then it infers
that the task has been killed and does not erroneously try to complete launching it.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp a896bb66db5d8cd27ef02b6498c9db93cb0d525f 
>   src/slave/slave.cpp 1d5691836822c8587e1aa8ed24860a8012c67a6e 
>   src/tests/mesos.hpp 75c66fda2485afa0d4541e710780d90b3411839a 
>   src/tests/mesos.cpp 35c94fa908ad728ea92a7d1bfcbe90d57b1b83d9 
>   src/tests/slave_tests.cpp e45255a6f699e51bf09397da95a5a11edbabe591 
> 
> Diff: https://reviews.apache.org/r/23912/diff/
> 
> 
> Testing
> -------
> 
> Wrote a unit test that reliably created the situation described in the ticket. Observed
that TASK_LOST and the listed log output occurred. This pointed directly to the lines in killTask()
where the problem is rooted. Ran the test after fixing, it succeeded. Checked the log. It
looks like a "clean kill" now :-)
> 
> 
> File Attachments
> ----------------
> 
> stout patch
>   https://reviews.apache.org/media/uploaded/files/2014/08/05/5f4b6886-9a60-4ceb-ad99-6b5a2f69c870__MESOS-947-stout.patch2
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message