Return-Path: X-Original-To: apmail-mesos-dev-archive@www.apache.org Delivered-To: apmail-mesos-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05D9610EFA for ; Tue, 28 Jan 2014 23:33:00 +0000 (UTC) Received: (qmail 70178 invoked by uid 500); 28 Jan 2014 23:31:34 -0000 Delivered-To: apmail-mesos-dev-archive@mesos.apache.org Received: (qmail 68749 invoked by uid 500); 28 Jan 2014 23:30:53 -0000 Mailing-List: contact dev-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list dev@mesos.apache.org Received: (qmail 68379 invoked by uid 99); 28 Jan 2014 23:30:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jan 2014 23:30:42 +0000 Date: Tue, 28 Jan 2014 23:30:41 +0000 (UTC) From: "Jie Yu (JIRA)" To: dev@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MESOS-759) The cgroups TaskKiller should skip freezing the cgroup if it is already empty. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MESOS-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-759: ------------------------- Fix Version/s: (was: 0.17.0) 0.18.0 > The cgroups TaskKiller should skip freezing the cgroup if it is already empty. > ------------------------------------------------------------------------------ > > Key: MESOS-759 > URL: https://issues.apache.org/jira/browse/MESOS-759 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.13.0, 0.14.0, 0.14.1, 0.14.2, 0.16.0, 0.15.0 > Reporter: Benjamin Mahler > Assignee: Ian Downes > Priority: Critical > Labels: twitter > Fix For: 0.18.0 > > > The current TasksKiller code always freezes the cgroup when trying to kill the cgroup: > void killTasks() { > // Chain together the steps needed to kill the tasks. Note that we > // ignore the return values of freeze, kill, and thaw because, > // provided there are no errors, we'll just retry the chain as > // long as tasks still exist. > chain = kill(SIGSTOP) // Send stop signal to all tasks. > .then(defer(self(), &Self::kill, SIGKILL)) // Now send kill signal. > .then(defer(self(), &Self::empty)) // Wait until cgroup is empty. > .then(defer(self(), &Self::freeze)) // Freeze cgroug. > .then(defer(self(), &Self::kill, SIGKILL)) // Send kill signal to any remaining tasks. > .then(defer(self(), &Self::thaw)) // Thaw cgroup to deliver signals. > .then(defer(self(), &Self::empty)); // Wait until cgroup is empty. > This should avoid freezing the cgroup, as we've seen instances where the cgroup is unfreezable and thus this enters a loop attempting to freeze the cgroup as upon failures we retry this procedure. -- This message was sent by Atlassian JIRA (v6.1.5#6160)