hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
Date Wed, 10 Feb 2016 17:36:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141266#comment-15141266

Carlo Curino commented on YARN-1051:

[~grey], I suggest you to read the attached techreport for full context, but let me try to
summarize the ideas here.

*General Idea*
The reservation system receives reservation requests from users over a period of time. Note
that each reservation can request resources much ahead of time (e.g., I need 10 containers
for 1 hour tomorrow sometime between 3pm and 6pm). The planner will try to "fit" all these
reservation in the plan agenda, while respecting the user constraints (e.g., amount of  resources
and start_time/deadline) and the physical constraints of the plan (which is a "queue", and
thus has access to a portion of the cluster capacity). The APIs exposed to the users allow
them to expose their flexibility (e.g., for a map-only job I can express the fact that I can
run with up to 10 parallel containers, but also 1 container at a time), this allows the plan
to fit more jobs by "deforming them".  A side effect of this is that we can provide support
for gang-semantics (e.g., I need 10 concurrent containers for 1 h). 

The key intuition is that each job might temporarily use a large amount of resources, but
we control very explicitly when it should yield resources back to other jobs. This explicit
time-multiplexing gives very strong guarantees to each job (i.e., if the reservation was accepted
you will get your resources), but allows us to densely pack the cluster agenda (and thus get
high utilization / high ROI). Moreover, best-effort jobs can be run on separate queues with
the standard set of scheduling invariant provided by FairScheduler/CapacityScheduler. 
Another interesting area in which enterprise settings can extend/innovate is the choice of
"SharingPolicy". The SharingPolicy is a way for us to determine (beside physical resource
availability) how much resources can a tenant/reservation ask for in the Plan. This is both

per-reservation and across reservation from a user (or group). We contributed so far a couple
of simple policies allowing to enforce instantaneous and over-time limits (e.g., each user
can grab up to 30% of the plan instantaneously, but no more than an average of 5% 
over a 24h period of time). Internally at MS, we are developing other policies that are specific
to business-rules we care to enforce in our clusters. By design, creating a new SharingPolicy
that match your business settings is fairly easy (narrow  API and easy configuration 
mechanics). Since the Plan stores past (up to a window of time), present, future reservations,
the policy can be very sophisticated, and explicit. Also given the run-lenght-encoded representation
of the allocations, algos can be quite efficient. 

The reservation agents are the core of the placement logic. We developed a few, which optimize
for different things (e.g., minimize cost of the allocation by smoothing out the plan, or
placing as late/early as possible in the window of feasibility). Again this is an area of
enhancement, where business logic can kick in and choose to prioritize certain types of allocations.

*Enforcement mechanics*
Finally, in order to "enforce" this planned decisions, we use dynamically created and resized
queues (each reservation can contain one or more jobs, thus the queue mechanism is useful
to reuse).  Note that [~acmurthy]'s comment was fairly technical, and related to this
last point. He was proposing to leverage application priorities instead of queues as an enforcement
mechanisms. Both are feasible, and have some pros and cons. Overall using queues allowed us
to reuse some more of the mechanisms (e.g., rely on the preemption 
policy, and all of the advancement people are contributing there).

> YARN Admission Control/Planner: enhancing the resource allocation model with time.
> ----------------------------------------------------------------------------------
>                 Key: YARN-1051
>                 URL: https://issues.apache.org/jira/browse/YARN-1051
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler, resourcemanager, scheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>             Fix For: 2.6.0
>         Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, YARN-1051.patch, curino_MSR-TR-2013-108.pdf,
socc14-paper15.pdf, techreport.pdf
> In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing
users to "reserve" capacity over time. This is an important step towards SLAs, long-running
services, workflows, and helps for gang scheduling.

This message was sent by Atlassian JIRA

View raw message