hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
Date Wed, 04 Feb 2015 20:28:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305891#comment-14305891
] 

Jason Lowe commented on YARN-3136:
----------------------------------

It appears getTransferredContainers is grabbing the lock because it's not sure it's safe to
lookup the SchedulerApplication from the applications map, yet in practice it's always a ConcurrentHashMap.
  Similarly the lookup of the RMApp is also from a concurrent hash map and does not require
a lock.  After that we're simply walking the containers of the SchedulerApplication which
at best should only be locking the app and not the entire scheduler.  Or am I missing a critical
point where we really need the scheduler lock?

> getTransferredContainers can be a bottleneck during AM registration
> -------------------------------------------------------------------
>
>                 Key: YARN-3136
>                 URL: https://issues.apache.org/jira/browse/YARN-3136
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting
for the scheduler lock trying to call getTransferredContainers.  The scheduler lock is highly
contended, especially on a large cluster with many nodes heartbeating, and it would be nice
if we could find a way to eliminate the need to grab this lock during this call.  We've already
done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler
lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message