mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brenden Matthews" <bren...@diddyinc.com>
Subject Re: Review Request: Slave feature: maximum system load.
Date Fri, 03 May 2013 18:57:50 GMT


> On May 3, 2013, 6:45 p.m., Ben Mahler wrote:
> > Were you running into this issue when using process isolation, or cgroups isolation?
> 
> Brenden Matthews wrote:
>     Using cgroups isolation.
>     
>     I'm still having a major issue where the JVM occasionally 'runs away' and the load
averages go through the roof.  Without a simple check like this, the slave will keep accepting
tasks which hang forever.
>     
>     I still haven't figured out the root cause of the JVM getting stuck.  Between strace
and jstack (which usually hangs forever) there aren't any good indicators of what's going
on.
> 
> Ben Mahler wrote:
>     We've seen several issues when there's heavy disk I/O on a machine as well, since
there's currently no disk isolation in place.

Yeah, I figured I wasn't the only one having this problem.

I think CFQ (https://lwn.net/Articles/427961/) might be the way to go.  For this, I need to
go after the low-hanging fruit first.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10928/#review20128
-----------------------------------------------------------


On May 3, 2013, 6:39 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10928/
> -----------------------------------------------------------
> 
> (Updated May 3, 2013, 6:39 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From 69b4dc2e1fc778b2d8377eb4ec03f793c33e8061 Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <brenden.matthews@airbnb.com>
> Date: Mon, 29 Apr 2013 11:35:53 -0700
> Subject: [PATCH 5/9] Slave feature: maximum system load.
> 
> When the load exceeds a specified value, don't accept tasks.  Some nodes
> may become unstable under excessive load (i.e., heavy disk I/O), and
> this helps prevent the assigning of further tasks to busy slaves.
> ---
>  src/slave/flags.hpp |   11 ++++++++++-
>  src/slave/slave.cpp |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+), 1 deletion(-)
> 
> 
> Diffs
> -----
> 
>   src/slave/flags.hpp f3cbe3d 
>   src/slave/slave.cpp 86a15fc 
> 
> Diff: https://reviews.apache.org/r/10928/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message