hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch
Date Wed, 12 Jul 2017 21:46:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084766#comment-16084766
] 

Jason Lowe commented on YARN-6798:
----------------------------------

IMHO we should only need to bump the major version if any of the following are true:
* Older NM software will explode when it tries to recover the state store
* Older NM software fails to do something crucial during recovery due to ignoring something
in the state store

otherwise we can keep the major version the same and simply bump the minor version.  It looks
like the two features added to the state store in a way where we can remain on 1.x, but I
haven't dug into it deeply to be sure.  

bq. This will be incompatible the previous alphas and anyone running directly from branch-2
builds.
True, but that's the risk of running on unreleased software (as is the case with branch-2).
 Anyone could check in something that isn't backwards-compatible that needs to be subsequently
fixed, and that could break users who happened to deploy in-between.  AFAIK we don't make
any commitments to compatibility except for official Apache Hadoop releases.

I would argue the same applies to alpha releases.  The whole point of calling it alpha is
to convey that APIs may be unstable and could disappear or change in an incompatible way in
the next release.  It will be annoying to users who expect to do a rolling upgrade from 3.0-alphaX,
but given the "alpha" tag I would not expect anyone to have deployed this in a production
environment such that they cannot live with a downtime when upgrading to a subsequent release.

It would be helpful to have a release note that calls out the incompatibility with 3.0-alpha
releases and that users who are upgrading from one of those releases will need to erase the
NM state store on each node before upgrading.

> NM startup failure with old state store due to version mismatch
> ---------------------------------------------------------------
>
>                 Key: YARN-6798
>                 URL: https://issues.apache.org/jira/browse/YARN-6798
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>         Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 1.4.
> YARN-6127 bumped the version for the NM to 3.0
>     private static final Version CURRENT_VERSION_INFO = Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
>     private static final Version CURRENT_VERSION_INFO = Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager:
Error starting NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Incompatible version
for NM state: expecting NM state version 3.0, but loading version 2.0
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting NM state
version 3.0, but loading version 2.0
>         at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
>         at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
>         at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2017-07-07 15:48:17,277 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> ************************************************************/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message