hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1197) Support changing resources of an allocated container
Date Tue, 16 Jun 2015 21:50:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588847#comment-14588847

Siddharth Seth commented on YARN-1197:

bq. I would argue that waiting for an NM-RM heartbeat is much worse than waiting for an AM-RM
heartbeat. With continuous scheduling, the RM can make decisions in millisecond time, and
the AM can regulate its heartbeats according to the application's needs to get fast responses.
If an NM-RM heartbeat is involved, the application is at the mercy of the cluster settings,
which should be in the multi-second range for large clusters.
I tend to agree with Sandy's arguments about option a being better in terms of latency - and
that we shouldn't be architecting this in a manner which would limit it to the seconds range
rather than milliseconds / hundreds of milliseconds when possible.

It's already possible to get fast allocations - low 100s of milliseconds via a scheduler loop
which is delinked from NM heartbeats and a variable AM-RM heartbeat interval, which is under
user control rather than being a cluster property.

There are going to be improvements to the performance of various protocols in YARN. HADOOP-11552
opens up one such option which allows AMs to know about allocations as soon as the scheduler
has the made the decision, without a requirement to poll. Of-course - there's plenty of work
to be done before that can actually be used :)

That said, callbacks on the RPC can be applied at various levels - including NM-RM communication,
which can make option b work fast as well. However, it will incur the cost of additional RPC
roundtrips. Option a, however, can be fast from the get go with tuning, and also gets better
with future enhancements.

I don't think it's possible for the AM to start using the additional allocation till the NM
has updated all it's state - including writing out recovery information for work preserving
restart (Thanks Vinod for pointing this out). Seems like that poll/callback will be required
- unless the plan is to route this information via the RM.

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a container
is fixed during the lifetime of it. When users want to change a resource 
> of an allocated container the only way is releasing it and allocating a new container
with expected size.
> Allowing run-time changing resources of an allocated container will give us better control
of resource usage in application side

This message was sent by Atlassian JIRA

View raw message