hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8222) NPE when calling rmContext.getRMApps().get(...).getCurrentAppAttempt()
Date Tue, 01 May 2018 02:28:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459397#comment-16459397
] 

Weiwei Yang commented on YARN-8222:
-----------------------------------

Hi [~Tao Yang]

Thanks for the patch. Generally, it is always good to fix such NPEs. But please allow me sometime
to take a look why it happens and to pickup all the context. Will find sometime to review
this week.

Thanks

> NPE when calling rmContext.getRMApps().get(...).getCurrentAppAttempt()
> ----------------------------------------------------------------------
>
>                 Key: YARN-8222
>                 URL: https://issues.apache.org/jira/browse/YARN-8222
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-8222.001.patch
>
>
> Recently we did some performance tests and found two NPE problems when calling rmContext.getRMApps().get(appId).get...
> These NPE problems occasionally happened when doing performance tests with large number
and fast-finished applications. We have checked other places which call rmContext.getRMApps().get(...),
most of them have null check and some does not need (The process can guarantee that the return
result will not be null). 
> To fix these problems, We can add a null check for application before getting attempt
form it.
> (1) NPE in RMContainerImpl$FinishedTransition#updateAttemptMetrics
> {noformat}
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.updateAttemptMetrics(RMContainerImpl.java:742)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:715)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:699)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:482)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:64)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.containerCompleted(FiCaSchedulerApp.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1793)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:2624)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:663)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1514)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2396)
>         at org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
>         at org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
>         at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>         at java.lang.Thread.run(Thread.java:834)
> {noformat}
> This NPE looks like happen when node heartbeat delay and try to update attempt metrics
for a non-exist app. 
> Reference code of RMContainerImpl$FinishedTransition#updateAttemptMetrics:
> {code:java}
> private static void updateAttemptMetrics(RMContainerImpl container) {
>       Resource resource = container.getContainer().getResource();
>       RMAppAttempt rmAttempt = container.rmContext.getRMApps()
>           .get(container.getApplicationAttemptId().getApplicationId())
>           .getCurrentAppAttempt();
>       if (rmAttempt != null) {
>          //....
>       }
> }
> {code}
> (2) NPE in SchedulerApplicationAttempt#incNumAllocatedContainers
> {noformat}
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.incNumAllocatedContainers(SchedulerApplicationAttempt.java:1268)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:638)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:3589)
>         at org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.tryCommit(SLSCapacityScheduler.java:142)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:962)
> {noformat}
> This NPE should happen when apply a outdated proposal for a non-existed application in
rmContext.
> Reference code:
> {code:java}
>     RMAppAttempt attempt =
>         rmContext.getRMApps().get(attemptId.getApplicationId())
>           .getCurrentAppAttempt();
>     if (attempt != null) {
>       attempt.getRMAppAttemptMetrics().incNumAllocatedContainers(containerType,
>         requestType);
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message