hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantinos Karanasos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
Date Fri, 04 Dec 2015 07:02:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041169#comment-15041169

Konstantinos Karanasos commented on YARN-2877:

Hi [~wangda],

Thanks for pointing out HADOOP-11552. It seems it can also be used for the same purpose.
I would suggest to follow the technique of frequent AM-LocalRM heartbeats and less frequent
LocalRM-RM heartbeats to start with. Once HADOOP-11552 gets resolved, we can consider using

bq. I think top-k node list technique cannot completely solve the over subscribe issue, in
a production cluster, application comes in waves, it is possible that few large applications
can exhaust all resources in a cluster within few seconds. Maybe another possible approach
to mitigate the issue is: propagating queue-able containers from NM to RM periodically, so
NM can still make decision but RM can also be aware of these queue-able containers.
As long as k is sufficiently big, the phenomenon you describe should not be very pronounced.

Moreover, corrective mechanisms (YARN-2888) will lead to moving tasks from highly-loaded nodes
to less busy ones.
Going further, what you are suggesting would also make sense.

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Konstantinos Karanasos
>         Attachments: distributed-scheduling-design-doc_v1.pdf
> This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling.
 Briefly, some of the motivations for distributed scheduling are the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources
on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates (i.e., task
execution time is much less compared to the time required for obtaining a container from the

This message was sent by Atlassian JIRA

View raw message