Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20DD9F90C for ; Wed, 27 Mar 2013 00:39:17 +0000 (UTC) Received: (qmail 99708 invoked by uid 500); 27 Mar 2013 00:39:16 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 99670 invoked by uid 500); 27 Mar 2013 00:39:16 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 99600 invoked by uid 99); 27 Mar 2013 00:39:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Mar 2013 00:39:16 +0000 Date: Wed, 27 Mar 2013 00:39:16 +0000 (UTC) From: "Sandy Ryza (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614738#comment-13614738 ] Sandy Ryza commented on YARN-499: --------------------------------- Ravi, The idea of putting the app master in a big try/catch seems good to me, but I was envisioning this JIRA to encompass something more general that would handle non-AM container logs, containers that OOM before getting into the main function, and containers that don't run java. It's true that the approach I outlined doesn't deterministically report exceptions, but it at least gets us back to parity with MR1, and I believe that in most cases (and in all cases that I've seen), the end of the log contains the helpful information. > On container failure, include last n lines of logs in diagnostics > ----------------------------------------------------------------- > > Key: YARN-499 > URL: https://issues.apache.org/jira/browse/YARN-499 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.0.3-alpha > Reporter: Sandy Ryza > Assignee: Sandy Ryza > > When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. > Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. > Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). > This could be done in one of two ways. > * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. > * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. > Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira