hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl
Date Wed, 22 Jun 2016 20:00:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345069#comment-15345069
] 

Jason Lowe commented on YARN-4862:
----------------------------------

bq. I also do not see this as a performance bottleneck, as we are operating on a small set
of running vs finished for a node per heartbeat.

The performance impact is basically zero because we're not doing that small set comparison
most of the time.  The only thing we do all the time is simply increment of an integer during
a scan of the node report that was already being done before, then simply comparing that integer
to the size of a hash set which is also super cheap.  Only when those numbers are different
do we do the diff between the set and the report.  For YARN-5197, those numbers will always
be the same _unless_ the NM failed to report a container completion which should be a rare
event.   The performance hit is going to be very hard to detect in practice because of the
cheap conditional check up-front before doing the full diff.

bq. This will slowup the cleanup in case if we preempt AM container, but may be more cleaner.

It won't slow how fast the container will be killed, if that's what you mean by "cleanup case."
 Only the NM can kill it anyway, and it won't know to do so until it subsequently heartbeats.
 It will slow down how fast the RM will re-schedule the resource associated with the preempted
container, since it will wait until the NM confirms the container completion before releasing
the resources within the scheduler bookkeeping and re-allocating them.  This means today the
RM can, and does, accidentally overcommit nodes because it considers the resources free before
they actually are free.  Filed YARN-5290 as we've recently seen this in practice on some of
our clusters.

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch
>
>
> As per [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
from [~sharadag], there should be safe guard for duplicated container status in RMNodeImpl
before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, if any duplicated
container are sent to RM(may be bug in NM also), there is significant impact that RMNodImpl
always create UpdatedContainerInfo for duplicated containers. This result in increase in the
heap memory and causes problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message