mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone" <vinodk...@gmail.com>
Subject Re: Review Request: Terminate correct tasks when a slave disconnects.
Date Mon, 06 May 2013 18:35:01 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10951/#review20210
-----------------------------------------------------------



src/common/type_utils.hpp
<https://reviews.apache.org/r/10951/#comment41452>

    we typically don't overload "!=" operators for protobufs, but rather use "!(protobuf1
== protobuf2)".
    
    i know thats annoying, but we would like to keep type_utils as short as possible.
    
    also, we only specifically overload "==" operator for a protobuf, when the default is
not good enough.



src/master/master.cpp
<https://reviews.apache.org/r/10951/#comment41453>

    you could do.
    
    if (!(task->framework_id == framework->id)) {
     ..
    }


- Vinod Kone


On May 6, 2013, 6:04 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10951/
> -----------------------------------------------------------
> 
> (Updated May 6, 2013, 6:04 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From d01482457f02acc1e19195995db7a14dfc2a89b9 Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <brenden.matthews@airbnb.com>
> Date: Mon, 6 May 2013 09:54:03 -0700
> Subject: [PATCH] Terminate correct tasks when a slave disconnects.
> 
> Previously, when a slave disconnected all tasks for that framework would
> be removed and it would result in a bad state for a given framework.  In
> the case of Hadoop, it would result in a bunch of zombie tasks running
> on the slaves which never terminate.
> 
> Added some `operator !=' type utilities.
> ---
>  src/common/type_utils.hpp |   66 +++++++++++++++++++++++++++++++++++++++++++++
>  src/master/master.cpp     |    8 ++++--
>  2 files changed, 72 insertions(+), 2 deletions(-)
> 
> 
> Below is a sample of what the Mesos master log looks like:
> 
> 
> I0506 03:01:21.188874  2639 master.cpp:445] Slave 201305040040-3141079306-5050-1068-21(i-ced4aba2)
disconnected
> I0506 03:01:21.189184  2639 master.cpp:464] Removing non-checkpointing framework 201305040040-4196536586-5050-1124-0000
from disconn
> ected slave 201305040040-3141079306-5050-1068-21(i-ced4aba2)
> I0506 03:01:21.190471  2639 master.hpp:295] Removing task Task_Tracker_46 with resources
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-4196536586-5050-1124-3
> I0506 03:01:21.190891  2632 hierarchical_allocator_process.hpp:544] Recovered cpus=9;
mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2;
ports=[31000-32000]; disk=763224) on slave 201305040040-4196536586-5050-1124-3 from framework
201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.191614  2639 master.hpp:295] Removing task Task_Tracker_154 with resources
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-38
> I0506 03:01:21.192049  2634 hierarchical_allocator_process.hpp:544] Recovered cpus=9;
mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2;
ports=[31000-32000]; disk=761189) on slave 201305040040-3141079306-5050-1068-38 from framework
201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.192828  2639 master.hpp:295] Removing task Task_Tracker_195 with resources
cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] on slave 201305040040-3141079306-5050-1068-85
> I0506 03:01:21.193270  2640 hierarchical_allocator_process.hpp:544] Recovered cpus=6.5;
mem=13312; disk=53248; ports=[31999-31999, 31001-31001] (total allocatable: cpus=10; mem=13408.8;
ports=[31001-31999]; disk=596893) on slave 201305040040-3141079306-5050-1068-85 from framework
201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.194039  2639 master.hpp:295] Removing task Task_Tracker_182 with resources
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-45
> I0506 03:01:21.194425  2638 hierarchical_allocator_process.hpp:544] Recovered cpus=9;
mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2;
ports=[31000-32000]; disk=760196) on slave 201305040040-3141079306-5050-1068-45 from framework
201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.195190  2639 master.hpp:295] Removing task Task_Tracker_58 with resources
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-76
> I0506 03:01:21.195636  2636 hierarchical_allocator_process.hpp:544] Recovered cpus=9;
mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2;
ports=[31000-32000]; disk=761175) on slave 201305040040-3141079306-5050-1068-76 from framework
201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.196455  2639 master.hpp:295] Removing task Task_Tracker_160 with resources
cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-85
> I0506 03:01:21.196883  2631 hierarchical_allocator_process.hpp:544] Recovered cpus=20;
mem=40960; disk=163840; ports=[31000-31000, 32000-32000] (total allocatable: cpus=30; mem=54368.8;
ports=[31000-32000]; disk=760733) on slave 201305040040-3141079306-5050-1068-85 from framework
201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.197710  2639 master.hpp:295] Removing task Task_Tracker_96 with resources
cpus=3.5; mem=7168; disk=28672; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-80
> <...log continues...>
> 
> 
> Diffs
> -----
> 
>   src/common/type_utils.hpp 377b65f 
>   src/master/master.cpp 3207157 
> 
> Diff: https://reviews.apache.org/r/10951/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message