hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantinos Karanasos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
Date Thu, 03 Dec 2015 06:17:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037335#comment-15037335

Konstantinos Karanasos commented on YARN-2877:

Thank you for the detailed comments, [~leftnoteasy].

Regarding #1:
- Indeed the AM-LocalRM communication should be much more frequent than the LocalRM-RM (and
subsequently AM-RM) communication, in order to achieve mili-second latency allocations.
We are planning to address this by having smaller heartbeat intervals in the AM-LocalRM communication
when compared to the LocalRM-RM. For instance, the AM-LocalRM heartbeat interval can be set
to 50ms, while the LocalRM-RM interval to 200ms (in other words, we will only propagate to
the RM only one in every four heartbeats).
We will soon create a sub-JIRA for this.
- Each NM will periodically estimate its expected queue wait time (YARN-2886). This can simply
be based on the number of tasks currently in its queue, or (even better) based on the estimated
execution times of those tasks (in case they are available). Then, this expected queue wait
time is pushed through the NM-RM heartbeats to the ClusterMonitor (YARN-4412) that is running
as a service in the RM. The ClusterMonitor gathers this information from all nodes, periodically
computes the least loaded nodes (i.e., with the smallest queue wait times), and adds that
list to the heartbeat response, so that all nodes (and in turn LocalRMs) get the list. This
list is then used during scheduling in the LocalRM.
Note that simpler solutions (such as the power of two choices used in Sparrow) could be employed,
but our experiments have shown that the above "top-k node list" leads to considerably better
placement (and thus load balancing), especially when task durations are heterogeneous.

Regarding #2:
This is a valid concern. The best way to minimize preemption is through the "top-k node list"
technique described above. As the LocalRM will be placing the QUEUEABLE containers to the
least loaded nodes, preemption will be minimized.
More techniques can be used to further mitigate the problem. For instance, we can "promote"
a QUEUEABLE container to a GUARANTEED one in case it has been preempted more than k times.
Moreover, we can dynamically set limits to the number of QUEUEABLE containers accepted by
a node in case of excessive load due to GUARANTEED containers.
That said, as you also mention, QUEUEABLE containers are more suitable for short-running tasks,
where the probability of a container being preempted is smaller.

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Konstantinos Karanasos
>         Attachments: distributed-scheduling-design-doc_v1.pdf
> This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling.
 Briefly, some of the motivations for distributed scheduling are the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources
on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates (i.e., task
execution time is much less compared to the time required for obtaining a container from the

This message was sent by Atlassian JIRA

View raw message