mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone" <vinodk...@gmail.com>
Subject Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.
Date Thu, 03 Jul 2014 19:37:50 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/#review47311
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/22313/#comment82975>

    include the executor id.



src/slave/slave.cpp
<https://reviews.apache.org/r/22313/#comment82979>

    s/cancelling/ignoring/



src/slave/slave.cpp
<https://reviews.apache.org/r/22313/#comment82976>

    Just say "Failed to update resources of the container"



src/slave/slave.cpp
<https://reviews.apache.org/r/22313/#comment82977>

    kill this. sending a "discarded" message to frameworks is weird because they wouldnt know
what that means.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82990>

    since you are controlling the clocks, why do you need to do this?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82992>

    Move the expectation for offers2 down to where you expect this to happen.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82982>

    I'm confused. Why are you creating a new offer with fewer resources? Is it because you
want to set the resources in the task below? Why not just create a Resources object?
    
    Also, you should use createTask().



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82980>

    s/status0/status1/
    s/status1/status2/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82986>

    Set the expectation for the second task below where you launch it.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82983>

    ditto. see above.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82985>

    s/containerizer->update()/'Containerizer::update()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82984>

    Where did "42" hours come from? IIUC, you want to ensure the task wasn't launched because
the update is pending?
    
    One way to do it is to catch the _runTask() dispatch and do a clock::settle(). If the
slave were to not wait for update() to finish (which it won't since you fixed it) then scheduler
should have got the status update after settling the clock. You can ensure this by ASSERT(status2.isPending()).
Does that make sense?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82987>

    This is testing the failure case. What about a test for the successful update case? Can
you add that one?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82988>

    s/is/are/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82989>

    s/__runTask/'__runTask()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82991>

    ditto. why?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82996>

    ditto. why can't advance the clock instead?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82993>

    ditto. move the expectation for offers2 down.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82994>

    ditto. use createTask().



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82997>

    s/so//
    
    s/send out/receive/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82998>

    How about
    
    "Besides, the TASK_LOST update for task1 will not received because the framework is shutdown."



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82995>

    ditto. use createTask().



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/22313/#comment82999>

    Make sure you do a clock::settle() on __runTask() to ensure that it doesn't throw any
errors.


Also, please make sure to run the new tests repeatedly for at least a thousand iterations
to make sure they are not flaky.

- Vinod Kone


On July 1, 2014, 8:33 p.m., Yifan Gu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22313/
> -----------------------------------------------------------
> 
> (Updated July 1, 2014, 8:33 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-886
>     https://issues.apache.org/jira/browse/MESOS-886
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added __runTask() to wait for the completion of containerizer->update() and check
the result before sending RunTaskMessage.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/slave_tests.cpp 371a5b8 
> 
> Diff: https://reviews.apache.org/r/22313/diff/
> 
> 
> Testing
> -------
> 
> SlaveTest, CancelTaskIfContainerizerFails
> 
> Which tests that if the containerizer->update() return a Failure, the task won't be
launched and the scheduler will get TASK_LOST.
> 
> make check
> 
> 
> File Attachments
> ----------------
> 
> framework will exit
>   https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
> log
>   https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message