hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5279) Potential Container leak in NM in preemption flow
Date Thu, 30 Jun 2016 07:54:10 GMT

    [ https://issues.apache.org/jira/browse/YARN-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356693#comment-15356693
] 

Sunil G commented on YARN-5279:
-------------------------------

Thanks [~rohithsharma] for the patch and approach. 
Ideally this can help to find those untracked finished containers and asks NM to remove from
its context. Since we are trying to fix the real issue in preemption flow in YARN-4148 as
mentioned by [~jlowe] here in this [comment|https://issues.apache.org/jira/browse/YARN-4862?focusedCommentId=15345069&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15345069],
this new tracking way also can ensure such corners cases. However its better if we can log
such activities as INFO or WARN. We have very less chance to hit this, still its better we
know such cases are happening and if possible to track how it happened.
Few more comments in the patch:
1. {{RMNodeFinishedContainersPulledByAMEvent}} I guess we can change this name as this event
is used by schedulers to report untracked containers.
2. Since scheduler reports such untracked containers in an event back to RMNode, its possible
that such information reaches NM may be after a heratbeat interval. So scheduler may hit this
same scenario again in worst case, and schedulers can fire {{RMNodeFinishedContainersPulledByAMEvent}}
even. If possible, we can try avoid this.

> Potential Container leak in NM in preemption flow
> -------------------------------------------------
>
>                 Key: YARN-5279
>                 URL: https://issues.apache.org/jira/browse/YARN-5279
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager, resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-5279.patch
>
>
> In discussion YARN-4862 [comment|https://issues.apache.org/jira/browse/YARN-4862?focusedCommentId=15341538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15341538],
it is observed that there could be a container leak in NodeManager whenever container is preempted
from RM
> Basically if NM receives same containerId details in  {{containersToCleanUp}} and {{containersToBeRemovedFromNM}}
in the same heartbeat  then container will never-ever removed in NMContext. Rather NM kills
the container of containersToCleanup and send back status again to RM. But RM blindly reject
the status since RMContainer is already removed and it is null.
> I think whenever RMContainer is null, RMNode should be informed to send {{containersToBeRemovedFromNM}}
so that NM will remove from its context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message