hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4692) [Umbrella] Simplified and first-class support for services in YARN
Date Sat, 13 Feb 2016 19:46:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146157#comment-15146157

Arun Suresh commented on YARN-4692:

Thanks you for starting this [~vinodkv]. The Document itself looks pretty thorough and well
thought out.

Couple of thoughts :

# Preemption and Reservation:
## The document (3.2.1) talks about the fact that Long Running (LR) Containers should be started
on assured capacity (not resources over fair share). I posit LR Containers should *primarily*
be start on over-committed resources (probably as {{OPPORTUNISTIC}} containers, see YARN-2882
and YARN-1011). The point of LR services is that the Service as a whole should be available.
Individual container deaths/restarts should not affect the service.
## On a related note, we can give applications the ability to specify *Preemptability* of
containers in a particular role. A low value could mean, preemption is very costly while a
high value implies the service is still available if some containers die. For eg. if deploying
HBase on YARN, HBase Master can have a *low* preemptability value while HBase Region Servers
can probably have *higher* preemptability. 
## Allow LR Applications to specify *peak*, *min* and *variance*/*mean* (also many transient
and steady-state) of a Resource request to allow schedulers to make better allocation decisions.
Also allow users to specify *min*/*max* num containers required for a particular Service role.
This can be used as a hint for Preemption if other short running tasks are starved.
## Currently Schedulers create a reservation for a container on a node with free resources
but resource does not fit. The document suggests we should ensure that Nodes on which LR containers
are already running should not accept reservations. I feel, we should leverage Peak/Min/Mean/Varience/transient/Steady-state
resource demands to loosen this. For eg, even if Node may not satisfy Peak demand, if steady-state
demand is satisfiable, the Peak demands can probably be met by a combination of leveraging
YARN-2877 / YARN-1011 and YARN-4597 (I'll describe this below).
# Handling Low-latency resource Spikes in LR Containers:
## In YARN-4597 [~chris.douglas] proposed 1) new {{SCHEDULING}} container state 2) a local
*ContainerScheduler* that handles the scheduling (essentially in charge of moving container
from {{SCHEDULING}} to {{RUNNING}} state) 3) Allowing the *ContainerScheduler* and *Localizer*
be directly accessible to Containers running on the node.
## An LR container should be able to ask for more resources if required and shed excess resource
when idling. YARN-1197 tried to add support for changing resources on an allocated container,
but the design doc talks about the request making a round trip from AM to RM and back and
then to the containers. Low-latency elasticity can be probably be achieved using a combination
of YARN-2877 and leveraging the NM local ContainerScheduler
# Queue Modeling:
## When LR Tasks are mixed with Short running Tasks, since LR tasks may never end, resources
might always be tied up. I foresee some alleviation of this by probably ensuring some % of
queue cap always available for non-LR tasks. Also, probably some more intelligent resource
accounting using the Reservation system YARN-1051 would help ?

> [Umbrella] Simplified and first-class support for services in YARN
> ------------------------------------------------------------------
>                 Key: YARN-4692
>                 URL: https://issues.apache.org/jira/browse/YARN-4692
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: YARN-First-Class-And-Simplified-Support-For-Services-v0.pdf
> YARN-896 focused on getting the ball rolling on the support for services (long running
applications) on YARN.
> I’d like propose the next stage of this effort: _Simplified and first-class support
for services in YARN_.
> The chief rationale for filing a separate new JIRA is threefold:
>  - Do a fresh survey of all the things that are already implemented in the project
>  - Weave a comprehensive story around what we further need and attempt to rally the community
around a concrete end-goal, and
>  - Additionally focus on functionality that YARN-896 and friends left for higher layers
to take care of and see how much of that is better integrated into the YARN platform itself.

This message was sent by Atlassian JIRA

View raw message