hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl
Date Wed, 22 Jun 2016 18:16:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344921#comment-15344921

Sunil G commented on YARN-4862:

Agreeing to the discussion here.

Basically its better that we do a sanity like in YARN-5197 always. It can help to minimize
the risk of leaking or syncing NM and RM in a much better way. I also do not see this as a
performance bottleneck, as we are operating on a small set of running vs finished for a node
per heartbeat.

Reg YARN-5279, Interestingly preemption was trying to make use of KILL_CONTAINER event for
killing a container forcefully from RM. Even though preemption module informed AM that a container
to be preempted, in case of AMs which doesnt handle this preemption messages, RM is forced
to kill with KILL_CONTAINER. 
So I think we need not have to inform attempt immediately in  KILL_CONTAINER. Rather we can
add to RMNodeImpl's {{containersToCleanUp}} list, and can wait for NM to report back with
completed container list. This will slowup the cleanup in case if we preempt AM container,
but may be more cleaner. Will this be fine for preemption scenario? Thoughts.

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch
> As per [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
from [~sharadag], there should be safe guard for duplicated container status in RMNodeImpl
before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, if any duplicated
container are sent to RM(may be bug in NM also), there is significant impact that RMNodImpl
always create UpdatedContainerInfo for duplicated containers. This result in increase in the
heap memory and causes problem like YARN-4852.
> This is an optimization for issue kind YARN-4852

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message