hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yufei Gu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler
Date Wed, 29 Mar 2017 17:47:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947594#comment-15947594

Yufei Gu commented on YARN-6407:

Hi [~zhengchenyu], thanks for filing this jira. IIUC, you reduced frequency of NM node update
to avoid flooding the network in a 5k nodes cluster, but Continuous Scheduling is not necessary
when there are still enough node update events in the clusters. Besides the improvement of
lock in FS, we can always balance time interval of continuous scheduling  and frequency of
NM node update to get better scheduling latency.

> Improve and fix locks of RM scheduler
> -------------------------------------
>                 Key: YARN-6407
>                 URL: https://issues.apache.org/jira/browse/YARN-6407
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: CentOS 7, 1 Gigabit Ethernet
>            Reporter: zhengchenyu
>             Fix For: 2.7.1
>   Original Estimate: 2m
>  Remaining Estimate: 2m
> First´╝îthis issue dose not duplicate the YARN-3091.
> In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit Ethernet.
So network is bottleneck in our cluster.
> We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must set yarn.scheduler.fair.max.assign
to 5, or must lead to hotspot.
> The setting that max.assign is 5 lead to the assigned ability decreased. So we start
the ContinuousSchedulingThread. 
> As more applicaitons running in our cluster , and with ContinuousSchedulingThread, the
problem of lock contention is more serious. 
> In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high occasionally.
we worried that more problem occure in future with more application are running.
> Here is our logical graph:
> "1 Gigabit Ethernet" and "data hot spot" ==> "set yarn.scheduler.fair.max.assign to
5" ==> "ContinuousSchedulingThread is started" and "more applcations" => "lock contention"
> I know YARN-3091 solved this problem, but the patch aims that change the object lock
to read write lock. This change is still Coarse-Grained. So I think we lock the resources
or not lock the large section code.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message