hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
Date Fri, 03 Oct 2014 18:23:37 GMT

    [ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158289#comment-14158289
] 

Jian Fang commented on YARN-1198:
---------------------------------

Craig, thanks for your effort. I have already merged in your YARN-1857 and YARN-1198 patches.

For blacklisting, I think there are both props and cons on whether different applications
should share the blacklisting information or not. There are valid cases in both cases. For
example, if multiple nodes have difficulties to access one node, it probably is better to
share this information among all nodes because usually it takes a quite long time to cause
sock timeout and exhaust the retry logic from my own experiences. In this way, the hadoop
system can react faster to a problematic node. Certainly, there are other use cases that the
blacklisting only applies to one application.  I am fine with the current design, but expect
Hadoop becomes smarter to handle different scenarios, or at least provide options for users
to customize. 

When a node is removed from the cluster because of unhealthy, decommission, or lost, the blacklisted
resources should be updated accordingly. Otherwise, new issues will come out.

> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>
>                 Key: YARN-1198
>                 URL: https://issues.apache.org/jira/browse/YARN-1198
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Craig Welch
>         Attachments: YARN-1198.1.patch, YARN-1198.10.patch, YARN-1198.11-with-1857.patch,
YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch,
YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for this calculation
> * If a container finishes then headroom for that application will change and should be
notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the same queue
then
> ** If app1's container finishes then not only app1's but also app2's AM should be notified
about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then both AM should
be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom per User
per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted
in same queue).
> * If a new user submits an application to the queue then all applications submitted by
all users in that queue should be notified of the headroom change.
> * Also today headroom is an absolute number ( I think it should be normalized but then
this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message