Return-Path: X-Original-To: apmail-aurora-issues-archive@minotaur.apache.org Delivered-To: apmail-aurora-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23F901866C for ; Mon, 22 Jun 2015 21:17:02 +0000 (UTC) Received: (qmail 60608 invoked by uid 500); 22 Jun 2015 21:17:02 -0000 Delivered-To: apmail-aurora-issues-archive@aurora.apache.org Received: (qmail 60567 invoked by uid 500); 22 Jun 2015 21:17:02 -0000 Mailing-List: contact issues-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list issues@aurora.apache.org Received: (qmail 60558 invoked by uid 99); 22 Jun 2015 21:17:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 21:17:02 +0000 Date: Mon, 22 Jun 2015 21:17:01 +0000 (UTC) From: "Chris Lambert (JIRA)" To: issues@aurora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AURORA-698) aurora executor _shutdown deadline calls should be daemonized MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AURORA-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lambert updated AURORA-698: --------------------------------- Sprint: Twitter Aurora Q2'15 Sprint 3, Twitter Aurora Q2'15 Sprint 6 (was: Twitter Aurora Q2'15 Sprint 3) > aurora executor _shutdown deadline calls should be daemonized > ------------------------------------------------------------- > > Key: AURORA-698 > URL: https://issues.apache.org/jira/browse/AURORA-698 > Project: Aurora > Issue Type: Bug > Components: Executor > Reporter: brian wickman > Assignee: brian wickman > > In the aurora executor shutdown method, we have deadline() calls: > {noformat} > def _shutdown(self, status_result): > runner_status = self._runner.status > try: > deadline(self._runner.stop, timeout=self.STOP_TIMEOUT) > except Timeout: > log.error('Failed to stop runner within deadline.') > try: > deadline(self._chained_checker.stop, timeout=self.STOP_TIMEOUT) > except Timeout: > log.error('Failed to stop all checkers within deadline.') > # If the runner was alive when _shutdown was called, defer to the status_result, > # otherwise the runner's terminal state is the preferred state. > exit_status = runner_status or status_result > self.send_update( > self._driver, > self._task_id, > exit_status.status, > status_result.reason) > self.terminated.set() > defer(self._driver.stop, delay=self.PERSISTENCE_WAIT) > {noformat} > However if runner.stop fails with a Timeout exception, the spawned AnonymousThread is not daemonized and causes the executor to fail to exit. This means that the cgroup will not be torn down and if the runner.stop actually failed, the process can stay alive even if TASK_KILLED was delivered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)