Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F2223200D49 for ; Fri, 10 Nov 2017 00:10:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id F096B160C02; Thu, 9 Nov 2017 23:10:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 43B1F160BEF for ; Fri, 10 Nov 2017 00:10:07 +0100 (CET) Received: (qmail 96286 invoked by uid 500); 9 Nov 2017 23:10:06 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 96272 invoked by uid 99); 9 Nov 2017 23:10:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 23:10:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A8264C08A6 for ; Thu, 9 Nov 2017 23:10:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id tb9jz3N0P4ly for ; Thu, 9 Nov 2017 23:10:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D935E617F4 for ; Thu, 9 Nov 2017 23:10:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 711D2E0EEF for ; Thu, 9 Nov 2017 23:10:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 65947240F2 for ; Thu, 9 Nov 2017 23:10:00 +0000 (UTC) Date: Thu, 9 Nov 2017 23:10:00 +0000 (UTC) From: "Chandni Singh (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6168) Restarted RM may not inform AM about all existing containers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Nov 2017 23:10:08 -0000 [ https://issues.apache.org/jira/browse/YARN-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246744#comment-16246744 ] Chandni Singh commented on YARN-6168: ------------------------------------- The default value of {{nmExpiryInterval}} is 10 minutes. That will be too long for apps to recover and also this time cannot be influenced by any app setting. So, I prefer the solution proposed by [~jianhe]. Please let me know your thoughts. > Restarted RM may not inform AM about all existing containers > ------------------------------------------------------------ > > Key: YARN-6168 > URL: https://issues.apache.org/jira/browse/YARN-6168 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Billie Rinaldi > Assignee: Chandni Singh > > There appears to be a race condition when an RM is restarted. I had a situation where the RMs and AM were down, but NMs and app containers were still running. When I restarted the RM, the AM restarted, registered with the RM, and received its list of existing containers before the NMs had reported all of their containers to the RM. The AM was only told about some of the app's existing containers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org