hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Kapoor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered
Date Sun, 03 Mar 2013 09:15:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591687#comment-13591687
] 

Abhishek Kapoor commented on YARN-369:
--------------------------------------

I was investigating the above stackTrace and was able to replicate the issue.

As per my understanding(relatively new) below is the flow.

1) Application masters needs to register to RM before making an allocate request (which is
not enforced, AM can call allocate without registering to RM).

2) If an allocate is called without registering Application Master to RM we will get the above
stacktrace.

Possible solution

1) ApplicationMasterService needs to identify whether Application attempt ID has registered
Application master or not. 
Example: public boolean hasApplicationMaster(ApplicationAttemptId appAttemptId)

2) we can have map appIDtoAMResponse<ApplicationAttemptId,RegisterApplicationMasterResponse>
which would let us know whether Application master is registered or not

3) The above map will be populated when registerApplicationMaster is being called and map
entry will be removed when finishApplicationMaster is being called.

Below is the algo code

public AllocateResponse allocate(AllocateRequest request)
      throws YarnRemoteException {

    ApplicationAttemptId appAttemptId = request.getApplicationAttemptId();
    authorizeRequest(appAttemptId);
   if(!hasApplicationMaster(appAttemptId)){
    	String message = "Application Master does not exist for: " + appAttemptId.getApplicationId();
    	LOG.error(message);
    	this.rmContext.getRMApps().get(appAttemptId.getApplicationId()).getDiagnostics().append("Application
Master does not exist ");
	      RMAuditLogger.logFailure(this.rmContext.getRMApps().get(appAttemptId.getApplicationId()).getUser(),
	          AuditConstants.REGISTER_AM, message, "ApplicationMasterService",
	          "Application master does not exist", appAttemptId.getApplicationId(), appAttemptId);
	      throw RPCUtil.getRemoteException(message);
    }
............................


Please suggest, if there is better solution for the same.
                
> Handle ( or throw a proper error when receiving) status updates from application masters
that have not registered
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-369
>                 URL: https://issues.apache.org/jira/browse/YARN-369
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Hitesh Shah
>            Assignee: Abhishek Kapoor
>
> Currently, an allocate call from an unregistered application is allowed and the status
update for it throws a statemachine error that is silently dropped.
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE
at LAUNCHED
>        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
>        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
>        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
>        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
>        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
>        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
>        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
>        at java.lang.Thread.run(Thread.java:680)
> ApplicationMasterService should likely throw an appropriate error for applications' requests
that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message