Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13D83104BB for ; Fri, 3 May 2013 18:57:54 +0000 (UTC) Received: (qmail 56331 invoked by uid 500); 3 May 2013 18:57:53 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 56294 invoked by uid 500); 3 May 2013 18:57:53 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 56284 invoked by uid 99); 3 May 2013 18:57:53 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 May 2013 18:57:53 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 11ADA1C94D8; Fri, 3 May 2013 18:57:50 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5162556105188885944==" MIME-Version: 1.0 Subject: Re: Review Request: Slave feature: maximum system load. From: "Brenden Matthews" To: "Ben Mahler" , "mesos" , "Brenden Matthews" Date: Fri, 03 May 2013 18:57:50 -0000 Message-ID: <20130503185750.17258.16681@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Brenden Matthews" X-ReviewGroup: mesos X-ReviewRequest-URL: https://reviews.apache.org/r/10928/ X-Sender: "Brenden Matthews" References: <20130503184545.17260.37868@reviews.apache.org> In-Reply-To: <20130503184545.17260.37868@reviews.apache.org> Reply-To: "Brenden Matthews" --===============5162556105188885944== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable > On May 3, 2013, 6:45 p.m., Ben Mahler wrote: > > Were you running into this issue when using process isolation, or cgrou= ps isolation? > = > Brenden Matthews wrote: > Using cgroups isolation. > = > I'm still having a major issue where the JVM occasionally 'runs away'= and the load averages go through the roof. Without a simple check like th= is, the slave will keep accepting tasks which hang forever. > = > I still haven't figured out the root cause of the JVM getting stuck. = Between strace and jstack (which usually hangs forever) there aren't any g= ood indicators of what's going on. > = > Ben Mahler wrote: > We've seen several issues when there's heavy disk I/O on a machine as= well, since there's currently no disk isolation in place. Yeah, I figured I wasn't the only one having this problem. I think CFQ (https://lwn.net/Articles/427961/) might be the way to go. For= this, I need to go after the low-hanging fruit first. - Brenden ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10928/#review20128 ----------------------------------------------------------- On May 3, 2013, 6:39 p.m., Brenden Matthews wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10928/ > ----------------------------------------------------------- > = > (Updated May 3, 2013, 6:39 p.m.) > = > = > Review request for mesos. > = > = > Description > ------- > = > From 69b4dc2e1fc778b2d8377eb4ec03f793c33e8061 Mon Sep 17 00:00:00 2001 > From: Brenden Matthews > Date: Mon, 29 Apr 2013 11:35:53 -0700 > Subject: [PATCH 5/9] Slave feature: maximum system load. > = > When the load exceeds a specified value, don't accept tasks. Some nodes > may become unstable under excessive load (i.e., heavy disk I/O), and > this helps prevent the assigning of further tasks to busy slaves. > --- > src/slave/flags.hpp | 11 ++++++++++- > src/slave/slave.cpp | 43 +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 53 insertions(+), 1 deletion(-) > = > = > Diffs > ----- > = > src/slave/flags.hpp f3cbe3d = > src/slave/slave.cpp 86a15fc = > = > Diff: https://reviews.apache.org/r/10928/diff/ > = > = > Testing > ------- > = > Used in production at airbnb. > = > = > Thanks, > = > Brenden Matthews > = > --===============5162556105188885944==--