hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
Date Thu, 04 May 2017 15:27:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996925#comment-15996925
] 

Vrushali C commented on YARN-6555:
----------------------------------

I am seeing this in our code that's backported to branch-2 (2.6 with some backports).
But I believe this will happen on any branch since we have not yet written the code to read/write
flow context during NM restart/container recovery.

For example here is the relevant place where we would need to add it on branch YARN-5355:
https://github.com/apache/hadoop/blob/YARN-5355/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L387

Similarly on trunk:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L387


> Enable flow context read (& corresponding write) for recovering application with
NM restart 
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-6555
>                 URL: https://issues.apache.org/jira/browse/YARN-6555
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM
fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that timeline
service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are not persisting
it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in
to ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager:
Error starting NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message