Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D2C618792 for ; Mon, 2 Nov 2015 18:57:41 +0000 (UTC) Received: (qmail 54269 invoked by uid 500); 2 Nov 2015 18:57:28 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 53920 invoked by uid 500); 2 Nov 2015 18:57:28 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 53881 invoked by uid 99); 2 Nov 2015 18:57:28 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Nov 2015 18:57:27 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E066A2C0452 for ; Mon, 2 Nov 2015 18:57:27 +0000 (UTC) Date: Mon, 2 Nov 2015 18:57:27 +0000 (UTC) From: "Junping Du (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-4325) purge app state from NM state-store should be independent of log aggregation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Junping Du created YARN-4325: -------------------------------- Summary: purge app state from NM state-store should be independent of log aggregation Key: YARN-4325 URL: https://issues.apache.org/jira/browse/YARN-4325 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Junping Du Assignee: Junping Du Priority: Critical >From a long running cluster, we found tens of thousands of stale apps still be recovered in NM restart recovery. The reason is some wrong configuration setting to log aggregation so the end of log aggregation events are not received so stale apps are not purged properly. We should make sure the removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)