hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-433) When RM is catching up with node updates then it should not expire acquired containers
Date Sat, 18 Jul 2015 07:39:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632320#comment-14632320

zhihai xu commented on YARN-433:

Thanks for the patch [~xgong]! The patch looks good to me except two nits:
# It may be better to use {{containerId}} instead of {{remoteContainer.getContainerId()}}
as the parameter of {{containerAllocationExpirer.unregister}} since they are equivalent.
# Since we do {{containerAllocationExpirer.unregister}} for the completed containers in {{RMNodeImpl#handleContainerStatus}}
earlier, Maybe we can remove the redundant code {{containerAllocationExpirer.unregister}}
 at {{RMContainerImpl#ContainerFinishedAtAcquiredState}}. After we remove this code at {{RMContainerImpl#ContainerFinishedAtAcquiredState}},
 ContainerFinishedAtAcquiredState will be the same as FinishedTransition, So we can replace
ContainerFinishedAtAcquiredState with FinishedTransition and remove ContainerFinishedAtAcquiredState.
    .addTransition(RMContainerState.ACQUIRED, RMContainerState.COMPLETED,
        RMContainerEventType.FINISHED, new FinishedTransition())

> When RM is catching up with node updates then it should not expire acquired containers
> --------------------------------------------------------------------------------------
>                 Key: YARN-433
>                 URL: https://issues.apache.org/jira/browse/YARN-433
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-433.1.patch, YARN-433.2.patch, YARN-433.3.patch
> RM expires containers that are not launched within some time of being allocated. The
default is 10mins. When an RM is not keeping up with node updates then it may not be aware
of new launched containers. If the expire thread fires for such containers then the RM can
expire them even though they may have launched.

This message was sent by Atlassian JIRA

View raw message