hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit
Date Wed, 14 Sep 2016 07:54:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489728#comment-15489728

Jian He commented on YARN-5637:

Thanks Arun, some more comments:
- Here, we could make reInitEvent.getResourceSet() be merged with existing resourceSet.localizedResource
upfront, so that both oldResourceSet and newResourceSet contain full copy of resources, rather
than delta. Doing this, the logic of {{container.resourceSet = container.reInitContext.mergedResourceSet();}}
will not needed. We can simply set it with {{container.resourceSet = reInitContext.newResoureSet}},
similar to what’s being done for {{container.launchContext = reInitContext.newLaunchContext}}
return new ReInitializationContext(reInitEvent.getReInitLaunchContext(),
    reInitEvent.getResourceSet(), container.getLaunchContext(),
    container.resourceSet, reInitEvent.getRetryFailureContext(), 

- nit:  the container.reInitContext!= null check is not needed.
if (container.reInitContext != null 
    && container.reInitContext.autoCommit) {

- I found the resourceSet is also not updated when rollback in RetryFailureTransition, I also
tried some refactoring, may be something like below:
      ContainerRetryContext retryContext = container.containerRetryContext;
      int remainingAttempts = container.remainingRetryAttempts;
      if (container.reInitContext != null) {
        retryContext = container.reInitContext.retryOnFailueContext;
        remainingAttempts = container.reInitContext.retryAttemptsRemaining;

      if (shouldRetry(container.exitCode, retryContext,remainingAttempts)) {
        // TODO state-store operation
        doRelaunch(container, container.remainingRetryAttempts,
      } else if (container.canRollback()) {
        // rollback
        container.reInitContext = new ReInitializationContext(
            container.reInitContext.oldResourceSet, null, null,
            container.containerRetryContext, true);
        new KilledExternallyForReInitTransition().transition(container, event);
      } else {
        // fail
        new ExitedWithFailureTransition(true).transition(container, event);
        return ContainerState.EXITED_WITH_FAILURE;

  public static boolean shouldRetry(int errorCode,
      ContainerRetryContext retryContext, int remainingRetryAttempts) {
    if (retryContext == null) {
      return false;

- testContainerUpgradeRollbackDueToFailure: comment does not match code
    // Wait for new processStartfile to be created
    while (!oldStartFile.exists() && timeoutSecs++ < 20) {

> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>                 Key: YARN-5637
>                 URL: https://issues.apache.org/jira/browse/YARN-5637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5637.001.patch, YARN-5637.002.patch
> YARN-5620 added support for re-initialization of Containers using a new launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent rollback
or commit of the upgrade.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message