hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
Date Wed, 31 Jul 2013 01:12:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724704#comment-13724704
] 

Zhijie Shen commented on YARN-966:
----------------------------------

bq. Also I think we are mixing YARN-966 with YARN-906. I don't see any point why we should
return null...we can return empty map if no resources are localized.. thoughts?

One more consideration. Empty map can means the case that the container is at LOCALIZED, but
actually there's no localized resources. Returning null is to distinguish this case with the
case of fetch the localized resources when the container is not at LOCALIZED.
                
> The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources()
is called when the container is not at LOCALIZED
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-966
>                 URL: https://issues.apache.org/jira/browse/YARN-966
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-966.1.patch
>
>
> In ContainerImpl.getLocalizedResources(), there's:
> {code}
> assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!!
> {code}
> ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled
on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906),
an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container
cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and
move towards completion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message