hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch
Date Mon, 10 Jul 2017 21:32:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081191#comment-16081191
] 

Jason Lowe commented on YARN-6798:
----------------------------------

I don't know the full story behind these various version bumps, but we need to stop the habit
of bumping the major version in the state store without providing a migration path for older
versions.

If we really need to bump the state store major version to support a new feature, my preference
would be to do this in a lazy fashion as much as possible, i.e.: the major version should
not be updated in the state store until the new feature is enabled/used.  That way we don't
lose the ability to rollback the release to the old version if something goes terribly wrong
after the upgrade before the new feature is used.  If that's not possible for some reason
then the code needs to recognize the older state store versions on startup and either do a
one-time pass over the data on startup to migrate it to the new schema or otherwise deal with
it on the fly during reading for recovery.

> NM startup failure with old state store due to version mismatch
> ---------------------------------------------------------------
>
>                 Key: YARN-6798
>                 URL: https://issues.apache.org/jira/browse/YARN-6798
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Ray Chiang
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 1.4.
> YARN-6127 bumped the version for the NM to 3.0
>     private static final Version CURRENT_VERSION_INFO = Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
>     private static final Version CURRENT_VERSION_INFO = Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager:
Error starting NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Incompatible version
for NM state: expecting NM state version 3.0, but loading version 2.0
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting NM state
version 3.0, but loading version 2.0
>         at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
>         at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
>         at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2017-07-07 15:48:17,277 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> ************************************************************/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message