hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler
Date Sun, 30 Oct 2016 15:32:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620108#comment-15620108
] 

Arun Suresh edited comment on YARN-5139 at 10/30/16 3:32 PM:
-------------------------------------------------------------

[~leftnoteasy], Was just going thru the design.

Was wondering how you tackle uniform distribution of allocations.
This was one nice thing the existing Node Heartbeat based implementation gives you for free.

For example, assuming you have just a single default queue and you have a cluster of say 10000
nodes.
Say we have around 100 apps running. Since the ClusterNodeTracker will always give the same
ordering of the 10000 nodes, It is possible this new scheduling logic would 'front-load' all
allocations to the Node that appears in the front of the PlacementSet (Since the placement
set provided to each application would be fundamentally the same). In the NodeHeartbeat driven
case, the node that has just 'heartebeat-ed' will be preferred for allocation, and since heartbeats
from all nodes are distributed uniformly, you will generally never see this issue. This is
probably not too much of an issue in a fully pegged cluster, but for clusters that are running
at around 50% utilization, you will probably see half the nodes fully pegged and the other
half mostly sitting idle.

Another thing that came to mind is that, given that you are kind of 'late-binding' the request
to a group of nodes. In large clusters of sizes > 10K, it is very common to have around
5% of nodes to keep going up and down. In which case, you might have to re-do you allocation
if the Node you had selected for an allocation had gone down. In a node heartbeat driven scheme,
the chances of that happening are less, since you are allocating on a node that just 'heartbeat-ed'
so you can be fairly certain that the node should be healthy.

Let me know what you think.










was (Author: asuresh):
[~leftnoteasy], Was just going thru the design.

Was wondering how you tackle uniform distribution of allocations.
This was one nice thing the existing Node Heartbeat based implementation gives you for free.

For example, assuming you have just a single default queue and you have a cluster of say 10000
nodes.
Say we have around 100 apps running. Since the ClusterNodeTracker will always give the same
ordering of the 10000 nodes, It is possible this new scheduling logic fill 'front-load' all
allocations to the Node that appears in the front of the PlacementSet (Since the placement
set provided to each application would be fundamentally the same). In the NodeHeartbeat driven
case, the node that has just 'heartebeat-ed' will be preferred for allocation, and since heartbeats
from all nodes are distributed uniformly, you will generally never see this issue. This is
probably not too much of an issue in a fully pegged cluster, but for clusters that are running
at around 50% utilization, you will probably see half the nodes fully pegged and the other
half mostly sitting idle.

Another thing that came to mind is that, given that you are kind of 'late-binding' the request
to a group of nodes. In large clusters of sizes > 10K, it is very common to have around
5% of nodes to keep going up and down. In which case, you might have to re-do you allocation
if the Node you had selected for an allocation had gone down. In a node heartbeat driven scheme,
the chances of that happening are less, since you are allocating on a node that just 'heartbeat-ed'
so you can be fairly certain that the node should be healthy.

Let me know what you think.









> [Umbrella] Move YARN scheduler towards global scheduler
> -------------------------------------------------------
>
>                 Key: YARN-5139
>                 URL: https://issues.apache.org/jira/browse/YARN-5139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: Explanantions of Global Scheduling (YARN-5139) Implementation.pdf,
YARN-5139-Concurrent-scheduling-performance-report.pdf, YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf,
YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, YARN-5139.000.patch, wip-1.YARN-5139.patch,
wip-2.YARN-5139.patch, wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to sub-optimal decisions
because scheduler can only look at one node at the time when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>    Go to parentQueue
>       Go to leafQueue
>         for application in leafQueue.applications:
>            for resource-request in application.resource-requests
>               try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node constraints
(give me "a && b || c") or anti-affinity (do not allocate HBase regionsevers and Storm
workers on the same host), we may need to consider moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message