hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
Date Wed, 14 May 2014 15:01:29 GMT

    [ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997621#comment-13997621
] 

Wangda Tan commented on YARN-2053:
----------------------------------

Took a look at related code, I think this problem is caused by,

In ApplicationMasterService.registerApplicationMaster(), it will add nmTokens from previous
attempt's container via a loop.
{code}
      List<Container> transferredContainers =
          ((AbstractYarnScheduler) rScheduler)
            .getTransferredContainers(applicationAttemptId);
      if (!transferredContainers.isEmpty()) {
        response.setContainersFromPreviousAttempts(transferredContainers);
        List<NMToken> nmTokens = new ArrayList<NMToken>();
        for (Container container : transferredContainers) {
          try {
            nmTokens.add(rmContext.getNMTokenSecretManager()
                .createAndGetNMToken(app.getUser(), applicationAttemptId,
                    container););
          }
{code}

And NMTokenSecretManager.createAndGetNMToken()
{code}
      NMToken nmToken = null;
      if (nodeSet != null) {
        if (!nodeSet.contains(container.getNodeId())) {
           ...
           // set nmToken
           ...
        }
      }
      return nmToken
{code}

So if multiple container come from same NM (with same NodeId), null nmToken will be added
to NMToken list. And in RegisterApplicationMasterResponsePBImpl.getTokenProtoIterable, it
tried to convert a null NMToken to proto
{code}
          @Override
          public NMTokenProto next() {
            return convertToProtoFormat(iter.next());
          }
{code}

I think this should be the root cause of this problem, uploaded a patch.

> Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2053
>                 URL: https://issues.apache.org/jira/browse/YARN-2053
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Sumit Mohanty
>         Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak
>
>
> Slider AppMaster restart fails with the following:
> {code}
> org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message