hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4692) [Umbrella] Simplified and first-class support for services in YARN
Date Sat, 13 Feb 2016 19:46:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146157#comment-15146157
] 

Arun Suresh commented on YARN-4692:
-----------------------------------

Thanks you for starting this [~vinodkv]. The Document itself looks pretty thorough and well
thought out.

Couple of thoughts :

# Preemption and Reservation:
## The document (3.2.1) talks about the fact that Long Running (LR) Containers should be started
on assured capacity (not resources over fair share). I posit LR Containers should *primarily*
be start on over-committed resources (probably as {{OPPORTUNISTIC}} containers, see YARN-2882
and YARN-1011). The point of LR services is that the Service as a whole should be available.
Individual container deaths/restarts should not affect the service.
## On a related note, we can give applications the ability to specify *Preemptability* of
containers in a particular role. A low value could mean, preemption is very costly while a
high value implies the service is still available if some containers die. For eg. if deploying
HBase on YARN, HBase Master can have a *low* preemptability value while HBase Region Servers
can probably have *higher* preemptability. 
## Allow LR Applications to specify *peak*, *min* and *variance*/*mean* (also many transient
and steady-state) of a Resource request to allow schedulers to make better allocation decisions.
Also allow users to specify *min*/*max* num containers required for a particular Service role.
This can be used as a hint for Preemption if other short running tasks are starved.
## Currently Schedulers create a reservation for a container on a node with free resources
but resource does not fit. The document suggests we should ensure that Nodes on which LR containers
are already running should not accept reservations. I feel, we should leverage Peak/Min/Mean/Varience/transient/Steady-state
resource demands to loosen this. For eg, even if Node may not satisfy Peak demand, if steady-state
demand is satisfiable, the Peak demands can probably be met by a combination of leveraging
YARN-2877 / YARN-1011 and YARN-4597 (I'll describe this below).
# Handling Low-latency resource Spikes in LR Containers:
## In YARN-4597 [~chris.douglas] proposed 1) new {{SCHEDULING}} container state 2) a local
*ContainerScheduler* that handles the scheduling (essentially in charge of moving container
from {{SCHEDULING}} to {{RUNNING}} state) 3) Allowing the *ContainerScheduler* and *Localizer*
be directly accessible to Containers running on the node.
## An LR container should be able to ask for more resources if required and shed excess resource
when idling. YARN-1197 tried to add support for changing resources on an allocated container,
but the design doc talks about the request making a round trip from AM to RM and back and
then to the containers. Low-latency elasticity can be probably be achieved using a combination
of YARN-2877 and leveraging the NM local ContainerScheduler
# Queue Modeling:
## When LR Tasks are mixed with Short running Tasks, since LR tasks may never end, resources
might always be tied up. I foresee some alleviation of this by probably ensuring some % of
queue cap always available for non-LR tasks. Also, probably some more intelligent resource
accounting using the Reservation system YARN-1051 would help ?





> [Umbrella] Simplified and first-class support for services in YARN
> ------------------------------------------------------------------
>
>                 Key: YARN-4692
>                 URL: https://issues.apache.org/jira/browse/YARN-4692
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: YARN-First-Class-And-Simplified-Support-For-Services-v0.pdf
>
>
> YARN-896 focused on getting the ball rolling on the support for services (long running
applications) on YARN.
> I’d like propose the next stage of this effort: _Simplified and first-class support
for services in YARN_.
> The chief rationale for filing a separate new JIRA is threefold:
>  - Do a fresh survey of all the things that are already implemented in the project
>  - Weave a comprehensive story around what we further need and attempt to rally the community
around a concrete end-goal, and
>  - Additionally focus on functionality that YARN-896 and friends left for higher layers
to take care of and see how much of that is better integrated into the YARN platform itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message