Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B5871948C for ; Tue, 12 Apr 2016 02:07:26 +0000 (UTC) Received: (qmail 55139 invoked by uid 500); 12 Apr 2016 02:07:25 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 54938 invoked by uid 500); 12 Apr 2016 02:07:25 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 54901 invoked by uid 99); 12 Apr 2016 02:07:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2016 02:07:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7BEBE2C1F5A for ; Tue, 12 Apr 2016 02:07:25 +0000 (UTC) Date: Tue, 12 Apr 2016 02:07:25 +0000 (UTC) From: "Robert Kanter (JIRA)" To: yarn-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-4946) RM should write out Aggregated Log Completion file flag next to logs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Robert Kanter created YARN-4946: ----------------------------------- Summary: RM should write out Aggregated Log Completion file flag next to logs Key: YARN-4946 URL: https://issues.apache.org/jira/browse/YARN-4946 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Haibo Chen MAPREDUCE-6415 added a tool that combines the aggregated log files for each Yarn App into a HAR file. When run, it seeds the list by looking at the aggregated logs directory, and then filters out ineligible apps. One of the criteria involves checking with the RM that an Application's log aggregation status is not still running and has not failed. When the RM "forgets" about an older completed Application (e.g. RM failover, enough time has passed, etc), the tool won't find the Application in the RM and will just assume that its log aggregation succeeded, even if it actually failed or is still running. We can solve this problem by doing the following: # When the RM sees that an Application has successfully finished aggregation it's logs, it will write a flag file next to that Application's log files # The tool no longer talks to the RM at all. When looking at the FileSystem, it now uses that flag file to determine if it should process those log files. If the file is there, it archives, otherwise it does not. # As part of the archiving process, it will delete the flag file # (If you don't run the tool, the flag file will eventually be cleaned up by the JHS when it cleans up the aggregated logs because it's in the same directory) This improvement has several advantages: # The edge case about "forgotten" Applications is fixed # The tool no longer has to talk to the RM; it only has to consult HDFS. This is simpler -- This message was sent by Atlassian JIRA (v6.3.4#6332)