Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C5B71200B92 for ; Wed, 28 Sep 2016 16:32:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C4616160AC1; Wed, 28 Sep 2016 14:32:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 17029160AD4 for ; Wed, 28 Sep 2016 16:32:21 +0200 (CEST) Received: (qmail 4986 invoked by uid 500); 28 Sep 2016 14:32:21 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 4197 invoked by uid 99); 28 Sep 2016 14:32:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 14:32:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A88B62C2A6E for ; Wed, 28 Sep 2016 14:32:20 +0000 (UTC) Date: Wed, 28 Sep 2016 14:32:20 +0000 (UTC) From: "Jason Lowe (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Sep 2016 14:32:23 -0000 [ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529829#comment-15529829 ] Jason Lowe commented on MAPREDUCE-6771: --------------------------------------- +1 for the latest patch. This seems like an important fix to get into 2.8 as well. Could you provide a patch for branch-2.8? > RMContainerAllocator sends container diagnostics event after corresponding completion event > ------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.7.3 > Reporter: Haibo Chen > Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, mapreduce6771.004.patch > > > Task containers can go over their resource limit, and killed by Node Manager. Then MR AM gets notified of the container status and diagnostics information through its heartbeat with RM. However, it is possible that the diagnostics information never gets into .jhist file, so when the job completes, the diagnostics information associated with the failed task attempts is empty. This makes it hard for users to root cause job failures that are often caused by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org