Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Tue, 8 Mar 2016 01:42:40 +0000 (UTC)
From: "Jun Gong (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12947782.1457381946000.27133.1457401360740@Atlassian.JIRA>
In-Reply-To: <JIRA.12947782.1457381946000@Atlassian.JIRA>
References: <JIRA.12947782.1457381946000@Atlassian.JIRA>
 <JIRA.12947782.1457381946708@arcas>
Subject: [jira] [Commented] (YARN-4770) Auto-restart of containers should
 work across NM restarts.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184195#comment-15184195 ] 

Jun Gong commented on YARN-4770:
--------------------------------

Thanks [~vinodkv] for reporting the issue. The patch in YARN-3998 should have handled this case.

{quote}
The relaunch feature needs to work across NM restarts, so we should save the retry-context and policy per container into the state-store and reload it for continue relaunching after NM restart.
{quote}
As [~vvasudev] said, "The container retry policy details are already stored in the state-store as part of the ContainerLaunchContext", so we do not need care it.

{quote}
We should also handle restarting of any containers that may have crashed during the NM reboot.
{quote}
If container crashed during the NM reboot, container would transit to RELAUNCHING state. I will check it again. 

> Auto-restart of containers should work across NM restarts.
> ----------------------------------------------------------
>
>                 Key: YARN-4770
>                 URL: https://issues.apache.org/jira/browse/YARN-4770
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> See my comment [here|https://issues.apache.org/jira/browse/YARN-3998?focusedCommentId=15133367&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133367] on YARN-3998. Need to take care of two things:
>  - The relaunch feature needs to work across NM restarts, so we should save the retry-context and policy per container into the state-store and reload it for continue relaunching after NM restart.
>  - We should also handle restarting of any containers that may have crashed during the NM reboot.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)