hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
Date Thu, 04 May 2017 00:03:04 GMT

     [ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vrushali C updated YARN-6555:
-----------------------------
    Description: 
If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM fails
to start and throws an error as  "flow context can't be null".

This is happening because the flow context did not exist before but now that timeline service
v2 is enabled, ApplicationImpl expects it to exist. 

This would also happen even if flow context existed before but since we are not persisting
it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in
to ApplicationImpl.


full stack trace
{code}
2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error
starting NodeManager
java.lang.IllegalArgumentException: flow context cannot be null
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
{code}




  was:

If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM fails
to start and throws an error as  "flow context can't be null".

This is happening because the flow context did not exist before but now that timeline service
v2 is enabled, ApplicationImpl expects it to exist. 

This would also happen even if flow context existed before but since we are not persisting
it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in
to ApplicationImpl.


{code}

full stack trace
{code}
2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error
starting NodeManager
java.lang.IllegalArgumentException: flow context cannot be null
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
{code}

{code}


> Enable flow context read (& corresponding write) for recovering application with
NM restart 
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-6555
>                 URL: https://issues.apache.org/jira/browse/YARN-6555
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM
fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that timeline
service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are not persisting
it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in
to ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager:
Error starting NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message