hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantinos Karanasos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
Date Fri, 21 Nov 2014 15:10:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221012#comment-14221012

Konstantinos Karanasos commented on YARN-2877:

Adding some more details, now that we have added the first sub-tasks.

In YARN-2882 we introduce two *types of containers*: guaranteed-start and queueable. The former
are the ones existing in YARN today (are allocated from the central RM, and once allocated,
are guaranteed to start). The latter make it possible to queue container requests in the NMs
and will be used for distributed scheduling.
The *queuing of (queueable) container requests* in the NMs is proposed in YARN-2883.

Each NM will now also have a *LocalRM* (Local ResourceManager) that will receive all container
requests from the AMs running on the same machine:
- For the guaranteed-start container requests, the LocalRM acts as a proxy (YARN-2884), forwarding
them to the central RM. 
- For the queueable container requests, the LocalRM is responsible for sending them directly
to the NM queues (bypassing the central RM). Deciding the NMs where these requests are queued
is based on the estimated waiting time in the NM queues, as discussed in YARN-2886.

Based on some policy (YARN-2887), each AM will determine *what type of containers to ask*:
only guaranteed-start, only queueable, or a mix thereof. 
For instance, an AM may request guaranteed-start containers for its tasks that are expected
to be long-running, whereas it may ask for queueable containers for its short tasks (in which
the back-and-forth with the central RM may be longer than the task execution time). This way
we reduce the scheduling latency, while increasing the utilization of the cluster (if we had
to go to the central RM for all these short tasks, some resources of the cluster might remain
idle in the meanwhile).

To ensure the NM queues remain balanced, we propose *corrective mechanisms for NM queue rebalancing*
in YARN-2888.
Moreover, to ensure no AM is abusing the system by asking too many queueable containers, we
can impose a limit in the *number of queueable containers* that each AM can receive (YARN-2889).

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
> This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling.
 Briefly, some of the motivations for distributed scheduling are the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources
on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates (i.e., task
execution time is much less compared to the time required for obtaining a container from the

This message was sent by Atlassian JIRA

View raw message