hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl
Date Wed, 22 Jun 2016 05:26:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343725#comment-15343725

Rohith Sharma K S commented on YARN-4862:

Thanks Jason for your suggestion:-)

I see there are two different scenarios where container leak can occur after this JIRA patch.
# NM forgets the completed-container status.  -> Similar to YARN-5197 approach can be done
to handle this leak.
# (RM forgets) YarnScheduler clears RMcontainer details because of preemption. As a result
scheduler(RMContainer) inform RMNodeImpl to add into {{containersToCleanUp}} list. And also
RMAppAttempt inform RMnodeImpl to add into {{containersToBeRemovedFromNM}} after AM pulls
finished containers. If NM-RM heartbeat interval is more then AM-RM heartbeat interval, then
it is sure that both can go together in the same nodeHeartbeat response. If this is the case,
then YARN-5279 issue occurs and also NM keeps sending these container status as completed
to NM. At this point, RM start tracking the completedContainers but never get purged from
the completedContainer set.

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch
> As per [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
from [~sharadag], there should be safe guard for duplicated container status in RMNodeImpl
before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, if any duplicated
container are sent to RM(may be bug in NM also), there is significant impact that RMNodImpl
always create UpdatedContainerInfo for duplicated containers. This result in increase in the
heap memory and causes problem like YARN-4852.
> This is an optimization for issue kind YARN-4852

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message