hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
Date Wed, 24 May 2017 02:05:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022203#comment-16022203

Rohith Sharma K S commented on YARN-6555:

bq. Do you think we should preserve as much flow context information as possible? The patch
only stores flow context in the state store only if all three fields of flow context is present.
We could sanitize the flow context and fill in default values for whatever field is missing
and then just check if flowcontext !=null before storing application state
There are 2 cents. 
# IMO, we should NOT set default values for flow context. There are 2 cases, 
## Master container launched : RM sets flow context in container launch context and start
it. This required to be recovered during NM restart. 
## AM launches containers : Flow context details are not set. So, it is not required to store
and recover during NM restart and no use also. 
# additional null check for strings before creating a proto is because setter method for strings
in proto throws NPE if  flowName or flowVersion are null. 

bq. FlowContext.toString(). Can we do something like {k1=v1, k2=v2, k3=v3} for better readability
in the log?
make sense, I will change it next patch after Vrushal review it. 

> Enable flow context read (& corresponding write) for recovering application with
NM restart 
> --------------------------------------------------------------------------------------------
>                 Key: YARN-6555
>                 URL: https://issues.apache.org/jira/browse/YARN-6555
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-6555.001.patch, YARN-6555.002.patch
> If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM
fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that timeline
service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are not persisting
it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in
to ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager:
Error starting NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message