Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Thu, 14 Apr 2016 19:05:25 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12956247.1459883508000.233592.1460660725567@Atlassian.JIRA>
In-Reply-To: <JIRA.12956247.1459883508000@Atlassian.JIRA>
References: <JIRA.12956247.1459883508000@Atlassian.JIRA>
 <JIRA.12956247.1459883508807@arcas>
Subject: [jira] [Commented] (YARN-4924) NM recovery race can lead to
 container not cleaned up
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241734#comment-15241734 ] 

Jason Lowe commented on YARN-4924:
----------------------------------

bq. leveldbIterator may also throws DBException, yes?
Yes, if the constructor throws.  That's a bug in LeveldbIterator, since the whole point of that class is to wrap the underlying iterators and translate the runtime DBExceptions into IOExceptions.  Arguably we should do the same for DB so clients don't have to keep catching and translating DBException, but that's for another JIRA.

+1 for the latest patch.  Committing this.

> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>
>                 Key: YARN-4924
>                 URL: https://issues.apache.org/jira/browse/YARN-4924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: sandflee
>         Attachments: YARN-4924.01.patch, YARN-4924.02.patch, YARN-4924.03.patch, YARN-4924.04.patch, YARN-4924.05.patch
>
>
> It's probably a small window but we observed a case where the NM crashed and then a container was not properly cleaned up during recovery.
> I will add details in first comment.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)