hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xianyin Xin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4189) Capacity Scheduler : Improve location preference waiting mechanism
Date Tue, 22 Sep 2015 01:31:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901736#comment-14901736

Xianyin Xin commented on YARN-4189:

[~leftnoteasy], convincing analysis. It's fine X << Y and X is close to the heartbeat
interval, so, should we limit X to avoid users deploy it freely?

> Capacity Scheduler : Improve location preference waiting mechanism
> ------------------------------------------------------------------
>                 Key: YARN-4189
>                 URL: https://issues.apache.org/jira/browse/YARN-4189
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-4189 design v1.pdf
> There're some issues with current Capacity Scheduler implementation of delay scheduling:
> *1) Waiting time to allocate each container highly depends on cluster availability*
> Currently, app can only increase missed-opportunity when a node has available resource
AND it gets traversed by a scheduler. There’re lots of possibilities that an app doesn’t
get traversed by a scheduler, for example:
> A cluster has 2 racks (rack1/2), each rack has 40 nodes. Node-locality-delay=40. An application
prefers rack1. Node-heartbeat-interval=1s.
> Assume there are 2 nodes available on rack1, delay to allocate one container = 40 sec.
> If there are 20 nodes available on rack1, delay of allocating one container = 2 sec.
> *2) It could violate scheduling policies (Fifo/Priority/Fair)*
> Assume a cluster is highly utilized, an app (app1) has higher priority, it wants locality.
And there’s another app (app2) has lower priority, but it doesn’t care about locality.
When node heartbeats with available resource, app1 decides to wait, so app2 gets the available
slot. This should be considered as a bug that we need to fix.
> The same problem could happen when we use FIFO/Fair queue policies.
> Another problem similar to this is related to preemption: when preemption policy preempts
some resources from queue-A for queue-B (queue-A is over-satisfied and queue-B is under-satisfied).
But queue-B is waiting for the node-locality-delay so queue-A will get resources back. In
next round, preemption policy could preempt this resources again from queue-A.
> This JIRA is target to solve these problems.

This message was sent by Atlassian JIRA

View raw message