hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4770) Auto-restart of containers should work across NM restarts.
Date Tue, 08 Mar 2016 01:42:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184195#comment-15184195
] 

Jun Gong commented on YARN-4770:
--------------------------------

Thanks [~vinodkv] for reporting the issue. The patch in YARN-3998 should have handled this
case.

{quote}
The relaunch feature needs to work across NM restarts, so we should save the retry-context
and policy per container into the state-store and reload it for continue relaunching after
NM restart.
{quote}
As [~vvasudev] said, "The container retry policy details are already stored in the state-store
as part of the ContainerLaunchContext", so we do not need care it.

{quote}
We should also handle restarting of any containers that may have crashed during the NM reboot.
{quote}
If container crashed during the NM reboot, container would transit to RELAUNCHING state. I
will check it again. 

> Auto-restart of containers should work across NM restarts.
> ----------------------------------------------------------
>
>                 Key: YARN-4770
>                 URL: https://issues.apache.org/jira/browse/YARN-4770
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> See my comment [here|https://issues.apache.org/jira/browse/YARN-3998?focusedCommentId=15133367&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133367]
on YARN-3998. Need to take care of two things:
>  - The relaunch feature needs to work across NM restarts, so we should save the retry-context
and policy per container into the state-store and reload it for continue relaunching after
NM restart.
>  - We should also handle restarting of any containers that may have crashed during the
NM reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message