Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37161CC47 for ; Fri, 19 Jul 2013 18:32:56 +0000 (UTC) Received: (qmail 11457 invoked by uid 500); 19 Jul 2013 18:32:54 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 9793 invoked by uid 500); 19 Jul 2013 18:32:51 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 8171 invoked by uid 99); 19 Jul 2013 18:32:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jul 2013 18:32:49 +0000 Date: Fri, 19 Jul 2013 18:32:49 +0000 (UTC) From: "Mayank Bansal (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713912#comment-13713912 ] Mayank Bansal commented on YARN-245: ------------------------------------ {code} + conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true); {code} Agreed not needed. Removed {code} + NodeStatus nodeStatus = request.getNodeStatus(); + nodeStatus.setResponseId(heartBeatID++); {code} We need it for sending the heart beat response to NM. As I am tracking the heart beat number outside the NM and RM class. Its don in this test class in general. |||There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if we get such a heartbeat then we will not wait but try again.. check finally code {} which won't get executed..... and will keep pinging RM until we get correct response with response-id. Should we wait or immediately request? thoughts? finally will get executed, I actually did a test now :) and verified that. I also removed all application specific stuff from the patch and added timeouts. > Node Manager can not handle duplicate responses > ----------------------------------------------- > > Key: YARN-245 > URL: https://issues.apache.org/jira/browse/YARN-245 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 2.0.2-alpha, 2.0.1-alpha > Reporter: Devaraj K > Assignee: Mayank Bansal > Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, YARN-245-trunk-3.patch > > > {code:xml} > 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED > at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) > at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira