Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0D01F200C25 for ; Fri, 10 Feb 2017 00:05:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0B668160B6F; Thu, 9 Feb 2017 23:05:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5436D160B50 for ; Fri, 10 Feb 2017 00:05:46 +0100 (CET) Received: (qmail 51073 invoked by uid 500); 9 Feb 2017 23:05:45 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 51061 invoked by uid 99); 9 Feb 2017 23:05:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 23:05:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C5AD3183A61 for ; Thu, 9 Feb 2017 23:05:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id cmPISKmZwWNV for ; Thu, 9 Feb 2017 23:05:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E91545FC6E for ; Thu, 9 Feb 2017 23:05:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 48276E059C for ; Thu, 9 Feb 2017 23:05:43 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id E8CCE21D6A for ; Thu, 9 Feb 2017 23:05:41 +0000 (UTC) Date: Thu, 9 Feb 2017 23:05:41 +0000 (UTC) From: "Billie Rinaldi (JIRA)" To: yarn-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-6168) Restarted RM may not inform AM about all existing containers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Feb 2017 23:05:47 -0000 Billie Rinaldi created YARN-6168: ------------------------------------ Summary: Restarted RM may not inform AM about all existing containers Key: YARN-6168 URL: https://issues.apache.org/jira/browse/YARN-6168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi There appears to be a race condition when an RM is restarted. I had a situation where the RMs and AM were down, but NMs and app containers were still running. When I restarted the RM, the AM restarted, registered with the RM, and received its list of existing containers before the NMs had reported all of their containers to the RM. The AM was only told about some of the app's existing containers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-dev-help@hadoop.apache.org