hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4355) NPE while processing localizer heartbeat
Date Fri, 13 Nov 2015 18:43:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004480#comment-15004480
] 

Jason Lowe commented on YARN-4355:
----------------------------------

Stacktrace:
{noformat}
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1089)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1054)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:681)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:330)
        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server.call(Server.java:2297)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:654)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:621)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1680)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2247)
{noformat}

The nodemanager was in the process of tearing down, so applications where being cleaned up.
 Looks like localizer heartbeats can come in and we can lose the localizer tracker just as
the localizer heartbeat tries to use it.

> NPE while processing localizer heartbeat
> ----------------------------------------
>
>                 Key: YARN-4355
>                 URL: https://issues.apache.org/jira/browse/YARN-4355
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>
> While analyzing YARN-4354 I noticed a nodemanager was getting NPEs while processing a
private localizer heartbeat.  I think there's a race where we can cleanup resources for an
application and therefore remove the app local resource tracker just as we are trying to handle
the localizer heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message