Return-Path: X-Original-To: apmail-mesos-issues-archive@minotaur.apache.org Delivered-To: apmail-mesos-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E02B17911 for ; Mon, 20 Oct 2014 16:28:36 +0000 (UTC) Received: (qmail 51659 invoked by uid 500); 20 Oct 2014 16:28:36 -0000 Delivered-To: apmail-mesos-issues-archive@mesos.apache.org Received: (qmail 51556 invoked by uid 500); 20 Oct 2014 16:28:36 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 51433 invoked by uid 99); 20 Oct 2014 16:28:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Oct 2014 16:28:36 +0000 Date: Mon, 20 Oct 2014 16:28:36 +0000 (UTC) From: "Timothy Chen (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (MESOS-1824) when "docker ps -a" returns 400+ lines enabling docker containerizer results in all executors dying MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MESOS-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen resolved MESOS-1824. --------------------------------- Resolution: Fixed > when "docker ps -a" returns 400+ lines enabling docker containerizer results in all executors dying > --------------------------------------------------------------------------------------------------- > > Key: MESOS-1824 > URL: https://issues.apache.org/jira/browse/MESOS-1824 > Project: Mesos > Issue Type: Bug > Components: containerization > Reporter: Jay Buffington > Assignee: Timothy Chen > > To reproduce: > # run this one-liner on your slave to create 400 exited docker containers: > {noformat} > for i in `seq 1 400`; do docker run busybox:latest echo "hello" ; done; > {noformat} > # Start mesos-slave with only mesos containerizer enabled > # Launch tasks that use an executor (which uses libmesos) > # Restart mesos-slave process with --containerizer=docker,mesos > # See mesos-slave fork "docker ps -a" and never return > # Note that this mesos-slave never reregisters with master > # Wait at least 10 minutes and see executors commit suicide, which kills all of the tasks on your system. From executor log: > {noformat} > I0919 21:24:14.018127 21778 exec.cpp:379] Executor asked to shutdown > I0919 21:24:14.018812 21771 exec.cpp:78] Scheduling shutdown of the executor > I0919 21:24:14.020514 21778 exec.cpp:394] Executor::shutdown took 1.866382ms > I0919 21:24:16.000500 21771 exec.cpp:525] Executor sending status update TASK_KILLED (UUID: bfd3969c-ad0a-455a-93fe-06c37bdee513) for task 1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 20140814-221057-1208029356-5050-10525-0000 > I0919 21:24:16.030253 21772 exec.cpp:332] Ignoring status update acknowledgement bfd3969c-ad0a-455a-93fe-06c37bdee513 for task 1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 20140814-221057-1208029356-5050-10525-0000 because the driver is aborted! > I0919 21:24:19.021966 21778 exec.cpp:86] Committing suicide by killing the process group > {noformat} > # mesos-slave fails to tell the master about tasking be killed with this message in the log: > {noformat} > W0918 01:02:57.252231 11725 status_update_manager.cpp:381] Not > forwarding status update TASK_KILLED (UUID: > 6fbacbcf-ad0f-4e89-89ee-e9f88a618573) for task > 1410298578043-some-task-30-29279377-fdf2-4bb7-b862-852adddea09c > of framework 20140522-213145-1749004561-5050-29512-0000 because no > master is elected yet > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)