Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62DF591D6 for ; Tue, 8 May 2012 17:05:20 +0000 (UTC) Received: (qmail 56719 invoked by uid 500); 8 May 2012 17:05:20 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 56626 invoked by uid 500); 8 May 2012 17:05:20 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 56602 invoked by uid 99); 8 May 2012 17:05:19 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 17:05:19 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 32FD21C3F17; Tue, 8 May 2012 17:05:18 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2313148899237527732==" MIME-Version: 1.0 Subject: Re: Review Request: Fix for slave segfault on framework exit From: "Vinod Kone" To: "Benjamin Hindman" , "John Sirois" Date: Tue, 08 May 2012 17:05:18 -0000 Message-ID: <20120508170518.1511.37278@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org X-ReviewRequest-URL: https://reviews.apache.org/r/5057/ Cc: "mesos" , "Vinod Kone" In-Reply-To: <20120507215001.1539.37166@reviews.apache.org> References: <20120507215001.1539.37166@reviews.apache.org> --===============2313148899237527732== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable > On 2012-05-07 21:50:01, John Sirois wrote: > > src/slave/slave.cpp, line 1487 > > > > > > Is there a test that could be tweaked to ensure this is happening? = Presumably it wasn't before via executorExited? added a test. > On 2012-05-07 21:50:01, John Sirois wrote: > > src/slave/slave.cpp, line 1483 > > > > > > Does this new api call still transition live tasks to LOST/FAILED? This is a bit nuanced. When a framework is shutdown, the slave sends a shut= down to the executor. One of the 2 things might happen. 1) EXECUTOR_SHUTDOWN_TIMEOUT_SECONDS elapses before the isolation module in= forms about the lost executor. A TASK_LOST will be sent by = the slave to the master. But the master drops it to the floor because th= e framework is dead. 2) Isolation module informs about lost executor before EXECUTOR_SHUTDOWN_TI= MEOUT_SECONDS. Slave doesn't send a TASK_LOST. In either case, the master never sends the TASK_LOST to the dead framework,= which is the right thing to do. This might be different when we have slave recovery implemented, but the lo= gic there for handling status updates is very different. In other words, th= is fix will = probably go away when we merge slave recovery stuff. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5057/#review7657 ----------------------------------------------------------- On 2012-05-07 21:11:34, Vinod Kone wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/5057/ > ----------------------------------------------------------- > = > (Updated 2012-05-07 21:11:34) > = > = > Review request for mesos, Benjamin Hindman and John Sirois. > = > = > Summary > ------- > = > Fix for: https://issues.apache.org/jira/browse/MESOS-190 > = > Also prevents slave from infinitely re-trying status updates to a dead fr= amework. > = > = > This addresses bug MESOS-190. > https://issues.apache.org/jira/browse/MESOS-190 > = > = > Diffs > ----- > = > src/slave/slave.cpp 09a8396 = > = > Diff: https://reviews.apache.org/r/5057/diff > = > = > Testing > ------- > = > Checked with long lived framework. > = > $ ./bin/mesos-master.sh > $ ./bin/mesos-slave.sh --master=3Dlocalhost:5050 > $./src/long-lived-framework localhost:5050 > = > = > Thanks, > = > Vinod > = > --===============2313148899237527732==--