hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit
Date Wed, 14 Sep 2016 07:54:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489728#comment-15489728
] 

Jian He commented on YARN-5637:
-------------------------------

Thanks Arun, some more comments:
- Here, we could make reInitEvent.getResourceSet() be merged with existing resourceSet.localizedResource
upfront, so that both oldResourceSet and newResourceSet contain full copy of resources, rather
than delta. Doing this, the logic of {{container.resourceSet = container.reInitContext.mergedResourceSet();}}
will not needed. We can simply set it with {{container.resourceSet = reInitContext.newResoureSet}},
similar to what’s being done for {{container.launchContext = reInitContext.newLaunchContext}}
{code}
return new ReInitializationContext(reInitEvent.getReInitLaunchContext(),
    reInitEvent.getResourceSet(), container.getLaunchContext(),
    container.resourceSet, reInitEvent.getRetryFailureContext(), 
    reInitEvent.isAutoCommit());

{code}
- nit:  the container.reInitContext!= null check is not needed.
{code}
if (container.reInitContext != null 
    && container.reInitContext.autoCommit) {
{code}

- I found the resourceSet is also not updated when rollback in RetryFailureTransition, I also
tried some refactoring, may be something like below:
{code}
      ContainerRetryContext retryContext = container.containerRetryContext;
      int remainingAttempts = container.remainingRetryAttempts;
      if (container.reInitContext != null) {
        retryContext = container.reInitContext.retryOnFailueContext;
        remainingAttempts = container.reInitContext.retryAttemptsRemaining;
      }

      if (shouldRetry(container.exitCode, retryContext,remainingAttempts)) {
        // TODO state-store operation
        doRelaunch(container, container.remainingRetryAttempts,
            container.containerRetryContext.getRetryInterval());
      } else if (container.canRollback()) {
        // rollback
        container.reInitContext = new ReInitializationContext(
            container.reInitContext.oldLaunchContext,
            container.reInitContext.oldResourceSet, null, null,
            container.containerRetryContext, true);
        new KilledExternallyForReInitTransition().transition(container, event);
      } else {
        // fail
        new ExitedWithFailureTransition(true).transition(container, event);
        return ContainerState.EXITED_WITH_FAILURE;
      }
    }

  public static boolean shouldRetry(int errorCode,
      ContainerRetryContext retryContext, int remainingRetryAttempts) {
    if (retryContext == null) {
      return false;
    }
  .....
{code}

- testContainerUpgradeRollbackDueToFailure: comment does not match code
{code}
    // Wait for new processStartfile to be created
    while (!oldStartFile.exists() && timeoutSecs++ < 20) {
{code}

> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
>                 Key: YARN-5637
>                 URL: https://issues.apache.org/jira/browse/YARN-5637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent rollback
or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message