Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3210B200CE6 for ; Wed, 16 Aug 2017 21:20:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2EDF21691A7; Wed, 16 Aug 2017 19:20:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 763D51694E9 for ; Wed, 16 Aug 2017 21:20:07 +0200 (CEST) Received: (qmail 41858 invoked by uid 500); 16 Aug 2017 19:20:06 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 41562 invoked by uid 99); 16 Aug 2017 19:20:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Aug 2017 19:20:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8065DC36DF for ; Wed, 16 Aug 2017 19:20:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 8d5eEMM4BFyJ for ; Wed, 16 Aug 2017 19:20:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 86EFD5FAEA for ; Wed, 16 Aug 2017 19:20:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1DBC2E0E56 for ; Wed, 16 Aug 2017 19:20:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9BBBC25383 for ; Wed, 16 Aug 2017 19:20:00 +0000 (UTC) Date: Wed, 16 Aug 2017 19:20:00 +0000 (UTC) From: "Benjamin Mahler (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MESOS-7783) Framework might not receive status update when a just launched task is killed immediately MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 16 Aug 2017 19:20:08 -0000 [ https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7783: ----------------------------------- Priority: Critical (was: Blocker) > Framework might not receive status update when a just launched task is killed immediately > ----------------------------------------------------------------------------------------- > > Key: MESOS-7783 > URL: https://issues.apache.org/jira/browse/MESOS-7783 > Project: Mesos > Issue Type: Bug > Components: agent > Affects Versions: 1.2.0 > Reporter: Benjamin Bannier > Assignee: Benjamin Mahler > Priority: Critical > Labels: reliability > Attachments: GroupDeployIntegrationTest.log.zip, logs > > > Our Marathon team are seeing issues in their integration test suite when Marathon gets stuck in an infinite loop trying to kill a just launched task. In their test a task launched which is immediately followed by killing the task -- the framework does e.g., not wait for any task status update. > In this case the launch and kill messages arrive at the agent in the correct order, but both the launch and kill paths in the agent do not reach the point where a status update is sent to the framework. Since the framework has seen no status update on the task it re-triggers a kill, causing an infinite loop. -- This message was sent by Atlassian JIRA (v6.4.14#64029)