hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
Date Thu, 23 Feb 2017 06:27:45 GMT

    [ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879964#comment-15879964
] 

Sunil G commented on YARN-6207:
-------------------------------

[~bibinchundatt], thanks for the patch. Had an offline discussion with [~rohithsharma], we
were expecting something like below.

{code}
 FiCaSchedulerApp app = application.getCurrentAppAttempt();
 if (app != null) {
   // Move all live containers even when stopped.
   // For transferStateFromPreviousAttempt required
   for (RMContainer rmContainer : app.getLiveContainers()) {
     source.detachContainer(getClusterResource(), app, rmContainer);
     // attach the Container to another queue
     dest.attachContainer(getClusterResource(), app, rmContainer);
   }
   if (!app.isStopped()) {
     source.finishApplicationAttempt(app, sourceQueueName);
     // Submit to a new queue
     dest.submitApplicationAttempt(app, user);
     // Finish app & update metrics
     app.move(dest);
   }
   source.appFinished();
   source.getParent().finishApplication(appId, user);
 }	
 
 application.setQueue(dest);
 LOG.info("App: " + appId + " successfully moved from " + sourceQueueName
     + " to: " + destQueueName);
 return targetQueueName;
{code}

Reasons behind this proposal.
# {{source.finishApplication(appId, user);}} is not needed as {{AppSchedulingInfo.move}} is
updating {{abstractUsersManager.deactivateApplication(user, applicationId);}}. So we jus need
to invoke appFinished and inform parent. Hence those two lines are added.
# {{app.move}} need to be inside {{!app.isStopped()}} check. Because if app is stopped, we
ensure that all running and reserved containers are invoked with completedContainer call.

Apart from this, {{app != null}} check need not have to throw exception. Any way app is done,
so do we need to bomb to client?

> Move application can  fail when attempt add event is delayed
> ------------------------------------------------------------
>
>                 Key: YARN-6207
>                 URL: https://issues.apache.org/jira/browse/YARN-6207
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: YARN-6207.001.patch, YARN-6207.002.patch, YARN-6207.003.patch, YARN-6207.004.patch
>
>
> *Steps to reproduce*
> 1.Submit application  and delay attempt add to Scheduler
> (Simulate using debug at EventDispatcher for SchedulerEventDispatcher)
> 2. Call move application to destination queue.
> {noformat}
> Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086)
> 	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659)
> 	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1429)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1339)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115)
> 	at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398)
> 	... 16 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message