hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4862) Handle duplicate completed containers in RMNodeImpl
Date Tue, 28 Jun 2016 05:55:57 GMT

     [ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rohith Sharma K S updated YARN-4862:
    Attachment: 0003-YARN-4862.patch

Updating the rebased patch fixing review comments from Jason.

This JIRA is depends upon YARN-5279 for avoiding leak in the new set completedContainers.

I rechecked the 2 scenarios mainly discussed in earlier comments
# RM(scheduler) forgets container details (YARN-5279). In this case, for any unknown completed
container reported from NodeManager, scheduler will intimate back to RMNodeImpl that these
containers are no more maintained by scheduler, so inform to NodeManager to remove from NMContext.
This event avoids leak in new set completedContainer which clears containers that are acknowledging
to NM in heartbeat response 
# NM forgets to send completedContainer
## NM do not send completedContainer at once also. It is nothing but  YARN-5197. 
## NM sends completed container in one heartbeat and later it forgets it in next heartbeat.
In this case, {{RMNodeImpl#completedContainer}} need not worry about leak because if once
completed container has been sent then RM keep track of these containers. Even though NM forgets
sending later, completedContainer will be cleared when RM acknowledge back to NM in heartbeat
to remove from context.  

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch, 0003-YARN-4862.patch
> As per [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
from [~sharadag], there should be safe guard for duplicated container status in RMNodeImpl
before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, if any duplicated
container are sent to RM(may be bug in NM also), there is significant impact that RMNodImpl
always create UpdatedContainerInfo for duplicated containers. This result in increase in the
heap memory and causes problem like YARN-4852.
> This is an optimization for issue kind YARN-4852

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message