hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1197) Support changing resources of an allocated container
Date Tue, 16 Jun 2015 22:14:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588889#comment-14588889
] 

Wangda Tan commented on YARN-1197:
----------------------------------

Thanks for comment, [~sseth]/[~sandyr].

Now I'm convinced, from two downstream developers' view. +1 to do the AM-RM-AM-NM (a) for
increase as the original doc before (b), not sure if (b) is really required, we can do (b)
if there's any real use cases.

bq. More broadly, just because YARN is not good at hitting sub-second latencies doesn't mean
that it isn't a design goal. I strongly oppose any argument that uses the current slowness
of YARN as a justification for why we should make architectural decisions that could compromise
latencies.
Make sense to me.

bq. I.e. that an AM can receive an increase from the RM, then issue a decrease to the NM,
and then use its increase to get resources it doesn't deserve?
Yes, if we send increase request to RM, but send decrease request to NM, we need to handle
complex inconsistency in RM side. You can take a look at latest design doc for more details.

bq. I don't think it's possible for the AM to start using the additional allocation till the
NM has updated all it's state - including writing out recovery information for work preserving
restart (Thanks Vinod for pointing this out). Seems like that poll/callback will be required
- unless the plan is to route this information via the RM.
Maybe we need to wait all increase steps (monitor/cgroup/state-store) finish before using
the additional allocation. If a container is 5G, increase to 10G, RM/NM crashes before write
to state store, and app starts use 10G. After RM restart/recovery, NM/RM will think the container
is 5G, that will be problematic.

[~mding], do you agree with doing (a)?

> Support changing resources of an allocated container
> ----------------------------------------------------
>
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a container
is fixed during the lifetime of it. When users want to change a resource 
> of an allocated container the only way is releasing it and allocating a new container
with expected size.
> Allowing run-time changing resources of an allocated container will give us better control
of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message