hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayank Bansal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
Date Fri, 19 Jul 2013 18:32:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713912#comment-13713912

Mayank Bansal commented on YARN-245:

+      conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true);

Agreed not needed. Removed

+      NodeStatus nodeStatus = request.getNodeStatus();
+      nodeStatus.setResponseId(heartBeatID++);

We need it for sending the heart beat response to NM. As I am tracking the heart beat number
outside the NM and RM class. Its don in this test class in general.

|||There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if we get such
a heartbeat then we will not wait but try again.. check finally code {} which won't get executed.....
and will keep pinging RM until we get correct response with response-id. Should we wait or
immediately request? thoughts?

finally will get executed, I actually did a test now :) and verified that.

I also removed all application specific stuff from the patch and added timeouts.

> Node Manager can not handle duplicate responses
> -----------------------------------------------
>                 Key: YARN-245
>                 URL: https://issues.apache.org/jira/browse/YARN-245
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>            Reporter: Devaraj K
>            Assignee: Mayank Bansal
>         Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, YARN-245-trunk-3.patch
> {code:xml}
> 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message