hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up
Date Thu, 14 Apr 2016 16:11:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241431#comment-15241431

Jason Lowe commented on YARN-4924:

Thanks for updating the patch!

If createWriteBatch does ever throw the runtime DBException we want that translated to the
IOException to avoid the exception bubbling up and becoming fatal to the NM.  Therefore the
createWriteBatch call needs to be in the inner try that will translate DBException->IOException.
 The sample code I wrote above should cover the cases.

> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>                 Key: YARN-4924
>                 URL: https://issues.apache.org/jira/browse/YARN-4924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: sandflee
>         Attachments: YARN-4924.01.patch, YARN-4924.02.patch, YARN-4924.03.patch, YARN-4924.04.patch
> It's probably a small window but we observed a case where the NM crashed and then a container
was not properly cleaned up during recovery.
> I will add details in first comment.

This message was sent by Atlassian JIRA

View raw message