hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1197) Support changing resources of an allocated container
Date Mon, 15 Jun 2015 23:04:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587098#comment-14587098

Wangda Tan commented on YARN-1197:

I think increasing via AM<->NM and RM<->NM are in very similar range of delay.
(multi-seconds for now)

a. AM<->NM needs 3 stages
1) AM Get increase token from RM
2) AM send increase token to NM
3) Pooling NM about increase status (because we cannot assume increasing can be done in NM
side very fast)

b. RM->NM needs 4 stages
1) RM send back increasing token to NM
2) NM doing increase locally
3) NM report back to RM when increasing done
4) RM send increase done to AM

Solution b. has an additional RM->NM heartbeat interval

Benefits of b. (Some of them also mentioned by Meng)
- Simpler to AM, only need to know about increase done, don't need to receive token and submit/pool
- Create a consistency way for application to increase/decrease containers
- Recovery is simpler, AM only knows increase when its finished, only need to handle 2 component
recovery (NM/RM) instead of 3 components (NM/RM/AM)

Before we have a fast scheduling design/plan (I don't think we can support milli-seconds scheduling
for now, too frequent AM heartbeating will overload RM), I don't think add an additional NM->RM
heartbeat interval is a big problem.

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a container
is fixed during the lifetime of it. When users want to change a resource 
> of an allocated container the only way is releasing it and allocating a new container
with expected size.
> Allowing run-time changing resources of an allocated container will give us better control
of resource usage in application side

This message was sent by Atlassian JIRA

View raw message