Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 514F018C80 for ; Tue, 3 Nov 2015 17:28:28 +0000 (UTC) Received: (qmail 91350 invoked by uid 500); 3 Nov 2015 17:28:28 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 91291 invoked by uid 500); 3 Nov 2015 17:28:28 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 91227 invoked by uid 99); 3 Nov 2015 17:28:28 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2015 17:28:27 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BE77E2C1F65 for ; Tue, 3 Nov 2015 17:28:27 +0000 (UTC) Date: Tue, 3 Nov 2015 17:28:27 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4325) purge app state from NM state-store should be independent of log aggregation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987678#comment-14987678 ] Vinod Kumar Vavilapalli commented on YARN-4325: ----------------------------------------------- [~djp], the JIRA is a little light on details, will help if you can paste exception / log messages etc. Also, does this only happen with mis-configuration? And you are planning to work on this soon? If not, I'd not hold 2.7.2 off for this. > purge app state from NM state-store should be independent of log aggregation > ---------------------------------------------------------------------------- > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > > From a long running cluster, we found tens of thousands of stale apps still be recovered in NM restart recovery. The reason is some wrong configuration setting to log aggregation so the end of log aggregation events are not received so stale apps are not purged properly. We should make sure the removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)