hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information
Date Mon, 22 Aug 2016 13:57:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430799#comment-15430799

Jason Lowe commented on YARN-5049:

The major version should change when an older version of the software should not try to use
the state store.  If we only bump the minor version then the old software will happily use
the state store because all schemas with the same major version are "compatible."

So we need to think about two scenarios:
# What happens if we upgrade to a newer version of software that sees the old schema without
these keys?
# What happens if we downgrade from a newer version of software with these keys to an older
one that doesn't know about them?

For #1 I think it's easy.  Old software doesn't support queued containers, so those keys won't
be there.  No queued containers means nothing to restore for that subsystem, so we should
be fine during recovery.

For #2 it's more complicated.  If we have queued containers then do a rolling downgrade then
we could end up losing those containers because the old software doesn't support them.  Therefore
I think we can't support rolling downgrades as soon as queued containers are used.

So it looks like the proper way forward is to bump the major version because of the lack of
rolling downgrade support.  IMHO the version number should be updated "lazily," meaning if
we're currently on schema version 1 but never use queued containers then it stays at version
1.  If we're on version 1 when a queued container needs to be saved in the state store then
we update the major version at that time.  This has a number of important benefits to the
end user:
- No need for a "migration script" that needs to be run manually
- Users don't lose the ability to do a rolling downgrade until they leverage the functionality
that broke the ability to downgrade.

This matches the precedent set by the container ID epoch change for RM work-preserving restart
in 2.6.  2.5 apps were supported on 2.6 until the user did a work-preserving RM restart, since
that's what caused the epoch ID to be added to the container ID, breaking any 2.5 app that
tried to parse a container ID.

> Extend NMStateStore to save queued container information
> --------------------------------------------------------
>                 Key: YARN-5049
>                 URL: https://issues.apache.org/jira/browse/YARN-5049
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Konstantinos Karanasos
>             Fix For: 2.9.0
>         Attachments: YARN-5049.001.patch, YARN-5049.002.patch, YARN-5049.003.patch
> This JIRA is about extending the NMStateStore to save queued container information whenever
a new container is added to the NM queue. 
> It also removes the information from the state store when the queued container starts
its execution.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message