From yarn-issues-return-145429-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Thu May 17 21:31:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4B496180634 for ; Thu, 17 May 2018 21:31:04 +0200 (CEST) Received: (qmail 99893 invoked by uid 500); 17 May 2018 19:31:03 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 99879 invoked by uid 99); 17 May 2018 19:31:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 May 2018 19:31:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D67BCCC9C7 for ; Thu, 17 May 2018 19:31:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Sjdx9BJGLeAj for ; Thu, 17 May 2018 19:31:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5A3AB5F666 for ; Thu, 17 May 2018 19:31:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 85177E02DB for ; Thu, 17 May 2018 19:31:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1AC89217A1 for ; Thu, 17 May 2018 19:31:00 +0000 (UTC) Date: Thu, 17 May 2018 19:31:00 +0000 (UTC) From: "Robert Kanter (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479590#comment-16479590 ] Robert Kanter commented on YARN-8310: ------------------------------------- The 002 patches fix the asflicense (was due to a formatting issue that existed before the patch, not sure why it wasn't caught until now), relevant checkstyle, and javac. > Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats > --------------------------------------------------------------------------------------- > > Key: YARN-8310 > URL: https://issues.apache.org/jira/browse/YARN-8310 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Robert Kanter > Assignee: Robert Kanter > Priority: Major > Attachments: YARN-8310.001.patch, YARN-8310.002.patch, YARN-8310.branch-2.001.patch, YARN-8310.branch-2.002.patch > > > In some recent upgrade testing, we saw this error causing the NodeManager to fail to startup afterwards: > {noformat} > org.apache.hadoop.service.ServiceStateException: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero). > at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834) > at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895) > Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero). > at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89) > at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108) > at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860) > at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824) > at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016) > at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011) > at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686) > at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254) > at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177) > at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > ... 5 more > {noformat} > The NodeManager fails because it's trying to read a {{ContainerTokenIdentifier}} in the "old" format before we changed them to protobufs (YARN-668). This is very similar to YARN-5594 where we ran into a similar problem with the ResourceManager and RM Delegation Tokens. > To provide a better experience, we should make the code able to read the old format if it's unable to read it using the new format. We didn't run into any errors with the other two types of tokens that YARN-668 incompatibly changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix those while we're at it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org