hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1010) FairScheduler: decouple container scheduling from nodemanager heartbeats
Date Tue, 01 Oct 2013 00:27:24 GMT

    [ https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782473#comment-13782473

Sandy Ryza commented on YARN-1010:

This looks almost there to me.  A few nits:
+        LOG.warn("Error while doing sleep in continuous scheduling: " +
+        e.toString(), e);
There should be indentation non the second line here.

+  private void continuousScheduling() {
Better to have method names be verbs.  Maybe "scheduleContinuously".

Most of the Fair Scheduler properties use dashes at the end instead of dots and I think this
is a good convention.  We should change yarn.scheduler.fair.locality.threshold.node.time.ms
to yarn.scheduler.fair.locality-delay-node-ms. (And the same for rack).  We should also change
yarn.scheduler.fair.continuous.scheduling.enabled to yarn.scheduler.fair.continuous-scheduling-enabled
and yarn.scheduler.fair.continuous.scheduling.sleep.time.ms to yarn.scheduler.fair.continuous-scheduling-sleep-ms.

Adding multi-second sleeps in the unit tests will slow down build times and is still theoretically
open to races if the OS pauses.  Better would be to use the clock interface.  In the test
you can use a MockClock like in TestFairScheduler#testChoiceOfPreemptedContainers, and you
can change the start time in AppSchedulable to come from scheduler.getClock().getTime(). 

> FairScheduler: decouple container scheduling from nodemanager heartbeats
> ------------------------------------------------------------------------
>                 Key: YARN-1010
>                 URL: https://issues.apache.org/jira/browse/YARN-1010
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.1.0-beta
>            Reporter: Alejandro Abdelnur
>            Assignee: Wei Yan
>            Priority: Critical
>         Attachments: YARN-1010.patch
> Currently scheduling for a node is done when a node heartbeats.
> For large cluster where the heartbeat interval is set to several seconds this delays
scheduling of incoming allocations significantly.
> We could have a continuous loop scanning all nodes and doing scheduling. If there is
availability AMs will get the allocation in the next heartbeat after the one that placed the

This message was sent by Atlassian JIRA

View raw message