Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18278198B1 for ; Thu, 14 Apr 2016 19:05:26 +0000 (UTC) Received: (qmail 4182 invoked by uid 500); 14 Apr 2016 19:05:25 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 4129 invoked by uid 500); 14 Apr 2016 19:05:25 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 4115 invoked by uid 99); 14 Apr 2016 19:05:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2016 19:05:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8B5E12C1F64 for ; Thu, 14 Apr 2016 19:05:25 +0000 (UTC) Date: Thu, 14 Apr 2016 19:05:25 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241734#comment-15241734 ] Jason Lowe commented on YARN-4924: ---------------------------------- bq. leveldbIterator may also throws DBException, yes? Yes, if the constructor throws. That's a bug in LeveldbIterator, since the whole point of that class is to wrap the underlying iterators and translate the runtime DBExceptions into IOExceptions. Arguably we should do the same for DB so clients don't have to keep catching and translating DBException, but that's for another JIRA. +1 for the latest patch. Committing this. > NM recovery race can lead to container not cleaned up > ----------------------------------------------------- > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0, 2.7.2 > Reporter: Nathan Roberts > Assignee: sandflee > Attachments: YARN-4924.01.patch, YARN-4924.02.patch, YARN-4924.03.patch, YARN-4924.04.patch, YARN-4924.05.patch > > > It's probably a small window but we observed a case where the NM crashed and then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)