hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
Date Fri, 12 Apr 2013 23:52:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630769#comment-13630769
] 

Bikas Saha commented on YARN-45:
--------------------------------

I like the idea of the RM giving information to the AM about actions that it might take which
will affect the AM. However, I am wary of having the action taken in different places. eg.
the KILL to the containers should come from the RM or the AM exclusively but not from both.
Otherwise we open ourselves up to race conditions, unnecessary kills and complex logic in
the RM.

Preemption is something that, IMO the RM needs to do at the very last moment when there is
no other alternative of resource being freed up. If we decide to preempt at time T1 and then
actually preempt at time T2 then the cluster conditions may have changed between T1 and T2
which may invalidate the decisions taken at T1. New resources may have freed up that reduce
the number of containers to be killed. This sub-optimality is directly proportional to length
of time between T1 and T2. So ideally we want to keep T1=T2. One can argue that things can
change after the preemption which may have made the preemption unnecessary. So the above argument
of T1=T2 is fallacious. However, preemption policies are usually based on deadlines such as
the allocation of queue1 must be met within X seconds. So RM does not have the luxury of waiting
for X+1 seconds. The best it can do is to wait upto X seconds in the hope that things will
work out and at X redistribute resources to meet the deficit.

At the same time, I can see that there is an argument that the AM knows best how to free up
its resources. It will be good to remember that the AM has already informed the RM about the
importance of all its containers when it made the requests at different priorities. So the
RM knows the order of importance of the containers and the RM also knows the amount of time
each container has been allocated. Assuming container runtime as a proxy for container work
done, this data can be used by the RM to preempt in a work preserving manner without having
to talk to the AM.

Notifying the AM has the usefulness of allowing the AM to take actions that preserve work
such as checkpointing. However, IMO, the AM should only do checkpointing operations but not
kill the containers. That should still happen at the RM as the very last option at the last
moment. If the situation changes in the grace period and the containers do not need to be
killed then there is no point in the AM killing them right now. This also lets us increase
the grace period to a longer time because checkpointing and preserving work usually means
persisting data in a stable store and may be slow in practical scenarios.

To summarize, I would propose an API in which the RM tells the AM about exactly which containers
it might imminently preempt with the contract being that the AM could take actions to preserve
the work done in those containers. The AM can continue to run those containers until the RM
actually preempts them if needed. If we really think that the choice of containers needs to
be made at the AM then the AM needs to checkpoint those containers and inform the RM about
the containers it has chosen. But the final decision to send the kill must be sent by the
RM.
                
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>
>                 Key: YARN-45
>                 URL: https://issues.apache.org/jira/browse/YARN-45
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Chris Douglas
>            Assignee: Carlo Curino
>         Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict enforcement
of resource invariants in the cluster. Individual allocations of containers must be reclaimed-
or reserved- to restore the global invariants when cluster load shifts. In some cases, the
ApplicationMaster can respond to fluctuations in resource availability without losing the
work already completed by that task (MAPREDUCE-4584). Supplying it with this information would
be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol
for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message