Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70C2CE05F for ; Sun, 3 Mar 2013 09:15:16 +0000 (UTC) Received: (qmail 13095 invoked by uid 500); 3 Mar 2013 09:15:15 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 12982 invoked by uid 500); 3 Mar 2013 09:15:14 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 12949 invoked by uid 99); 3 Mar 2013 09:15:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Mar 2013 09:15:13 +0000 Date: Sun, 3 Mar 2013 09:15:13 +0000 (UTC) From: "Abhishek Kapoor (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591687#comment-13591687 ] Abhishek Kapoor commented on YARN-369: -------------------------------------- I was investigating the above stackTrace and was able to replicate the issue. As per my understanding(relatively new) below is the flow. 1) Application masters needs to register to RM before making an allocate request (which is not enforced, AM can call allocate without registering to RM). 2) If an allocate is called without registering Application Master to RM we will get the above stacktrace. Possible solution 1) ApplicationMasterService needs to identify whether Application attempt ID has registered Application master or not. Example: public boolean hasApplicationMaster(ApplicationAttemptId appAttemptId) 2) we can have map appIDtoAMResponse which would let us know whether Application master is registered or not 3) The above map will be populated when registerApplicationMaster is being called and map entry will be removed when finishApplicationMaster is being called. Below is the algo code public AllocateResponse allocate(AllocateRequest request) throws YarnRemoteException { ApplicationAttemptId appAttemptId = request.getApplicationAttemptId(); authorizeRequest(appAttemptId); if(!hasApplicationMaster(appAttemptId)){ String message = "Application Master does not exist for: " + appAttemptId.getApplicationId(); LOG.error(message); this.rmContext.getRMApps().get(appAttemptId.getApplicationId()).getDiagnostics().append("Application Master does not exist "); RMAuditLogger.logFailure(this.rmContext.getRMApps().get(appAttemptId.getApplicationId()).getUser(), AuditConstants.REGISTER_AM, message, "ApplicationMasterService", "Application master does not exist", appAttemptId.getApplicationId(), appAttemptId); throw RPCUtil.getRemoteException(message); } ............................ Please suggest, if there is better solution for the same. > Handle ( or throw a proper error when receiving) status updates from application masters that have not registered > ----------------------------------------------------------------------------------------------------------------- > > Key: YARN-369 > URL: https://issues.apache.org/jira/browse/YARN-369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Hitesh Shah > Assignee: Abhishek Kapoor > > Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED > at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) > at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) > at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) > at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:680) > ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira