hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit
Date Wed, 14 Sep 2016 00:48:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488712#comment-15488712
] 

Arun Suresh edited comment on YARN-5637 at 9/14/16 12:47 AM:
-------------------------------------------------------------

Updating patch based on [~jianhe]'s suggesting and rebasing with latest YARN-5620 patch.

bq. Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot to remove
it :)

bq. In RollbackContainerTransition: the container.getResourceSet() will return all resources
including current and previous version. We should re-request only the previous version's resources,
rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.

bq. I still have question on the commit API, how does AM use this API in practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade (after it
performs some upgrade diagnostics check on the container perhaps) and the container is working
as it should be.. After the AM does a commit, the container cannot be rolledback and any bookkeeping
required to rollback (the reInitContext for eg.) can is deleted by the NM. 

Prior to a commit, if the upgraded Container fails, NM can choose to automatically rollback.
After the AM issues a commit, NM will not be able to rollback.

Of course the AM is still free to call 'upgrade' again, with an old launch context.

By default, autoCommit is 'true' which means, as soon as the container is upgraded, it is
also committed.

bq. ..one implication for this API is that we'll have to persiste the commit state for NM
recovery later on.
Yes.. we would.. I plan to open a JIRA to address NMStateStore changes for this as well as
YARN-5620

bq. Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to explicitly call the
upgrade API again with the previous launchContext.

bq. ContainerLaunchContext already has the ContainerRetryContext ? can we reuse that retryContext?
I wanted to distinguish between the retry policy used to retry a failed container and the
policy used to decide failure retries during upgrades. It is possible both can be the same.
I just put that argument there in the _upgrade()_ API to make it explicit.

bq. The ContainerImpl#ContainerRetryContext is not updated to new value on upgrade.
This is fixed in the latest YARN-5620 patch

bq. RetryFailureTranstion: it's a bit complicated.. is it possible to simplify it something
like below:
I refactored it a bit.. let me know if its ok.






was (Author: asuresh):
Updating patch based on [~jianhe]'s suggesting and rebasing with latest YARN-5620 patch.

bq. Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot to remove
it :)

bq. In RollbackContainerTransition: the container.getResourceSet() will return all resources
including current and previous version. We should re-request only the previous version's resources,
rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.

bq. I still have question on the commit API, how does AM use this API in practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade (after it
performs some upgrade diagnostics check on the container perhaps) and the container is working
as it should be.. After the AM does a commit, the container cannot be rolledback and any bookkeeping
required to rollback (the reInitContext for eg.) can is deleted by the NM. 

Prior to a commit, if the upgraded Container fails, NM can choose to automatically rollback.

Of course the AM is still free to call 'upgrade' again, with an old launch context.

By default, autoCommit is 'true' which means, as soon as the container is upgraded, it is
also committed.

bq. ..one implication for this API is that we'll have to persiste the commit state for NM
recovery later on.
Yes.. we would.. I plan to open a JIRA to address NMStateStore changes for this as well as
YARN-5620

bq. Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to explicitly call the
upgrade API again with the previous launchContext.

bq. ContainerLaunchContext already has the ContainerRetryContext ? can we reuse that retryContext?
I wanted to distinguish between the retry policy used to retry a failed container and the
policy used to decide failure retries during upgrades. It is possible both can be the same.
I just put that argument there in the _upgrade()_ API to make it explicit.

bq. The ContainerImpl#ContainerRetryContext is not updated to new value on upgrade.
This is fixed in the latest YARN-5620 patch

bq. RetryFailureTranstion: it's a bit complicated.. is it possible to simplify it something
like below:
I refactored it a bit.. let me know if its ok.





> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
>                 Key: YARN-5637
>                 URL: https://issues.apache.org/jira/browse/YARN-5637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent rollback
or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message