hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Shao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3806) Proposal of Generic Scheduling Framework for YARN
Date Wed, 17 Jun 2015 02:32:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589201#comment-14589201

Wei Shao commented on YARN-3806:

Hi Wangda Tan,

Regarding #4 in your comments. (Decouple application / nodes from scheduler).

This proposal suggests new object ResourceManager (or we could call it SchedulerNodeManager)
to manage SchedulerNodes and handle all events from clusters nodes. SchedulerManager doesn't
response to these events directly.
In current implementation of FiCaSchedulerNode, It looks to me container reservation feature
may don't need to bind with fair/capacity scheduling. Fair/capacity scheduling can use it,
but other scheduling policies can choose to use it or not as well.

And by single application queue introduced in the proposal, maybe the scheduler-specific features
of FiCaSchedulerApp can be moved to the implementation of specific scheduler queue, like resource
limits. And parent queues and single application queues can implement these features consistently.
Also, it looks to me delayed scheduling feature may don't need to bind with fair/capacity
scheduling, any scheduler can choose to use it or not.

By proposal, in each scheduling cycle, the SchedulerManager reads status of cluster resources
from ResourceManager, updates scheduling parameters (fairShare, resource limits and so on)
consistently for all queues (application is also queue), and sends resource preemption/allocation
events to SchedulerApp, SchedulerApp can implement container warning feature in preemptResource()
and delayed scheduling feature in acquireResource(), which are applicable for all schedulers.
Also, scheduler doesn't specify the resources SchedulerApp can get, SchedulerApp.acquireResource()
asks available resources from ResourceManager directly.

And by proposal, the procedures to update scheduling parameters are scalable (by parallelism),
idempotent, and transactional. See detail in proposal for why these properties can be helpful.

Since both YARN-3306 and this proposal are trying to address similar issues, If some ideas
in this proposal are useful, maybe efforts can be combined.

Thoughts? Thanks!

> Proposal of Generic Scheduling Framework for YARN
> -------------------------------------------------
>                 Key: YARN-3806
>                 URL: https://issues.apache.org/jira/browse/YARN-3806
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Wei Shao
>         Attachments: ProposalOfGenericSchedulingFrameworkForYARN-V1.0.pdf, ProposalOfGenericSchedulingFrameworkForYARN-V1.1.pdf
> Currently, a typical YARN cluster runs many different kinds of applications: production
applications, ad hoc user applications, long running services and so on. Different YARN scheduling
policies may be suitable for different applications. For example, capacity scheduling can
manage production applications well since application can get guaranteed resource share, fair
scheduling can manage ad hoc user applications well since it can enforce fairness among users.
However, current YARN scheduling framework doesn’t have a mechanism for multiple scheduling
policies work hierarchically in one cluster.
> YARN-3306 talked about many issues of today’s YARN scheduling framework, and proposed
a per-queue policy driven framework. In detail, it supported different scheduling policies
for leaf queues. However, support of different scheduling policies for upper level queues
is not seriously considered yet. 
> A generic scheduling framework is proposed here to address these limitations. It supports
different policies for any queue consistently. The proposal tries to solve many other issues
in current YARN scheduling framework as well.

This message was sent by Atlassian JIRA

View raw message