Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26EEE182A2 for ; Tue, 22 Dec 2015 07:57:21 +0000 (UTC) Received: (qmail 74930 invoked by uid 500); 22 Dec 2015 07:56:47 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 74888 invoked by uid 500); 22 Dec 2015 07:56:47 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 74629 invoked by uid 99); 22 Dec 2015 07:56:47 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 07:56:47 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D72162C1F62 for ; Tue, 22 Dec 2015 07:56:46 +0000 (UTC) Date: Tue, 22 Dec 2015 07:56:46 +0000 (UTC) From: "Rohith Sharma K S (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4373) Jobs can be temporarily forgotten during recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067712#comment-15067712 ] Rohith Sharma K S commented on YARN-4373: ----------------------------------------- I am seriously surprised with this issue where applications can not be found in {{rmcontext}} during recovery. Currently, Active services will not get started as long as recovery finishes which means none of the ports are open to listen. Once applications are recovery either it can be completed apps or running, both are added to {{rmcontext}}. Would you provide full RM logs for this issue? > Jobs can be temporarily forgotten during recovery > ------------------------------------------------- > > Key: YARN-4373 > URL: https://issues.apache.org/jira/browse/YARN-4373 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.7.1 > Reporter: Daniel Templeton > Assignee: Daniel Templeton > Priority: Critical > > The RM becomes available to service requests before state store recovery is started. Before recovery and during the recovery period, it's possible for a client to request an application report for a running application to which the RM will respond that the application in unknown. > I'm seeing this issue with Oozie during an RM failover. Until the active finishes recovery, it reports erroneous information to Oozie, which doesn't have context to know that it should just try again later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)