Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BC0C2200B6B for ; Fri, 9 Sep 2016 14:43:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BAB2A160ACA; Fri, 9 Sep 2016 12:43:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 19A5D160AB6 for ; Fri, 9 Sep 2016 14:43:21 +0200 (CEST) Received: (qmail 40519 invoked by uid 500); 9 Sep 2016 12:43:21 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 40424 invoked by uid 99); 9 Sep 2016 12:43:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2016 12:43:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 240E42C1B79 for ; Fri, 9 Sep 2016 12:43:21 +0000 (UTC) Date: Fri, 9 Sep 2016 12:43:21 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 09 Sep 2016 12:43:22 -0000 [ https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15476980#comment-15476980 ] Jason Lowe commented on YARN-5630: ---------------------------------- This was introduced by YARN-5221. Sample stacktrace: {noformat} 2016-09-06 17:24:19,258 [main] INFO service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED; cause: java.io.IOException: Unexpected container state key: ContainerManager/containers/container_e44_1472715025911_0001_01_000002/version java.io.IOException: Unexpected container state key: ContainerManager/containers/container_e44_1472715025911_0001_01_000002/version at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:243) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:182) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:267) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:251) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:263) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) {noformat} Ideally we should not store a version key unless we are using the feature that requires it. In other words, if the container version is 0 then we don't store the key and instead infer that if the version key is missing then the container version must be zero. I assume we already do this when upgrading from 2.7 to 2.8. This would preserve the ability to downgrade as long as nothing uses the increase container resource feature that needs the new version key. If something uses the increase container resource capability only then do we lose the ability to downgrade. > NM fails to start after downgrade from 2.8 to 2.7 > ------------------------------------------------- > > Key: YARN-5630 > URL: https://issues.apache.org/jira/browse/YARN-5630 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Jason Lowe > Priority: Blocker > > A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an unrecognized "version" container key on startup. This breaks downgrades from 2.8 to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org