hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lujie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9238) Avoid allocating opportunistic containers to previous/removed/non-exist application attempt
Date Fri, 22 Feb 2019 13:32:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775139#comment-16775139
] 

lujie commented on YARN-9238:
-----------------------------

Hi:[~cheersyang]

One more thing. Could please review the patch that fix YARN-9248? This bug also happens to opportunistic
container.

> Avoid allocating opportunistic containers to previous/removed/non-exist application attempt
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-9238
>                 URL: https://issues.apache.org/jira/browse/YARN-9238
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: lujie
>            Assignee: lujie
>            Priority: Critical
>         Attachments: YARN-9238_1.patch, YARN-9238_2.patch, YARN-9238_3.patch, hadoop-test-resourcemanager-hadoop11.log
>
>
> See org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate
> {code:java}
>      // Allocate OPPORTUNISTIC containers.
> 171.  SchedulerApplicationAttempt appAttempt =
> 172.    ((AbstractYarnScheduler)rmContext.getScheduler())
> 173.      .getApplicationAttempt(appAttemptId);
> 174.
> 175.  OpportunisticContainerContext oppCtx =
> 176.  appAttempt.getOpportunisticContainerContext();
> 177.  oppCtx.updateNodeList(getLeastLoadedNodes());
> {code}
>  MRAppmaster crashes before before allocate#171, ResourceManager will start the new
appAttempt and do 
> {code:java}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T
currentAttempt){
>     this.currentAttempt = currentAttempt;
> }{code}
> hence the allocate#171 will get the new appAttmept  and  its field OpportunisticContainerContext
hasn't been initialized.
> so oopCtx ==null at  and null pointer happens at line 177
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177)
> at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message