hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7
Date Mon, 12 Sep 2016 14:33:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484295#comment-15484295

Jason Lowe commented on YARN-5630:

I'm not a fan of the "prepare for rollback" approach if we can avoid it.  It adds another
user-visible phase to the rollback procedure and places the burden on admins, requiring them
to either know what keys are valid/appropriate to specify for the command or that they need
to run a special script which embeds this knowledge.  Also simply removing the keys from the
database is not going to be a proper downgrade procedure.  Those keys represent state that
is important to preserve on a restart, and if we ignore it then we are dropping a user request
for a container.  That's not going to be OK in the general case, as that may prevent a container
from launching properly or having the proper properties when it is launched.  Depending upon
the nature of the feature that added the new store keys, we may not be able to support the
downgrade at all short of failing the container because we can't execute it as requested.

In the short term I think we should commit something similar to this patch to unblock the
2.8 release.  IMHO we should be OK if we support downgrades from 2.8 to 2.7 if the user does
not leverage the new features in 2.8 (i.e.: container increase/decrease, queuing, etc.). 
Once those features are used then a downgrade may not work.  This mirrors what was done for
the epoch number in container IDs between 2.5 and 2.6.  Downgrades worked as long as the new
work-preserving RM restart wasn't performed after upgrading to 2.6.  In general if we are
careful only to use new store keys when they are absolutely necessary then we can support
rollbacks as long as users don't use the new features added in the new release.  

After unblocking 2.8 we can then work on the data-driven key ignoring in YARN-5547.  That
will help cover another set of features where a simple delete of the keys is sufficient to
perform the downgrade.  That would then leave the features where we can't just ignore keys,
and we'll have to come up with some other approach or state to users that downgrades do not
necessarily work once that new feature is being used.

> NM fails to start after downgrade from 2.8 to 2.7
> -------------------------------------------------
>                 Key: YARN-5630
>                 URL: https://issues.apache.org/jira/browse/YARN-5630
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: YARN-5630.001.patch, YARN-5630.002.patch
> A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an unrecognized
"version" container key on startup.  This breaks downgrades from 2.8 to 2.7.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message